Wednesday, September 09, 2009

Pattern Recognition Deciphers Damaged Manuscripts

Here is a Reuters item, so interesting -- exciting, actually -- that we reproduce it here at length. This has definite long-term value to the restoration of damaged Tibetan books, for which we unfortunately have numerous examples. The trove recently found in Ladakh jumps immediately to mind, as do fragments from Dunhuang, and elsewhere. A major sponsor needs to step up and fund this one while the underlying software is still in the development stage.

BEERSHEBA, Israel (Reuters) - Researchers in Israel say they have developed a computer program that can decipher previously unreadable ancient texts and possibly lead the way to a Google-like search engine for historical documents.

Dead Sea Scrolls 1 The program uses a pattern recognition algorithm similar to those law enforcement agencies have adopted to identify and compare fingerprints.

But in this case, the program identifies letters, words and even handwriting styles, saving historians and liturgists hours of sitting and studying each manuscript.

By recognizing such patterns, the computer can recreate with high accuracy portions of texts that faded over time or even those written over by later scribes, said Itay Bar-Yosef, one of the researchers from Ben-Gurion University of the Negev.

"The more texts the program analyses, the smarter and more accurate it gets," Bar-Yosef said.

The computer works with digital copies of the texts, assigning number values to each pixel of writing depending on how dark it is. It separates the writing from the background and then identifies individual lines, letters and words.

It also analyses the handwriting and writing style, so it can "fill in the blanks" of smeared or faded characters that are otherwise indiscernible, Bar-Yosef said.

The team has focused their work on ancient Hebrew texts, but they say it can be used with other languages, as well.

Dead Sea Scroll2 The team published its work, which is being further developed, most recently in the academic journal Pattern Recognition due out in December but already available online.

A program for all academics could be ready in two years, Bar-Yosef said.

And as libraries across the world move to digitize their collections, they say the program can drive an engine to search instantaneously any digital database of handwritten documents.

Dead Sea Scrolls tiny

Uri Ehrlich, an expert in ancient prayer texts who works with Bar-Yosef's team of computer scientists, said that with the help of the program, years of research could be done within a matter of minutes.

"When enough texts have been digitized, it will manage to combine fragments of books that have been scattered all over the world," Ehrlich said.

Stumble Upon Toolbar

1 reader comments:

J.Crow said...

along the lines of the post, this was emailed to me today_kind of like the human brain using algorithm

Eonverye taht can raed tihs rsaie yuor hnad..

To my 'selected' strange-minded friends:

If you can read the following paragraph, forward it on to your friends and the person that sent it to you with 'yes' in the subject line.


Only great minds can read this
This is weird, but interesting!

fi yuo cna raed tihs, yuo hvae a sgtrane mnid too

Cna yuo raed tihs? Olny 55 plepoe out of 100 can.

i cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae.. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh? yaeh and I awlyas tghuhot slpeling was ipmorantt! if you can raed tihs forwrad it