Roland Piquepaille's Technology Trends
How new technologies are modifying our way of life

lundi 28 juillet 2003

Not only this news release from the University of Southern California has a fantastic title, it also has a great content. This story is about one of their scientists, Franz Josef Och, whose software ranks very high among translation systems. It starts with a comparison with Archimedes.

"Give me a place to stand on, and I will move the world," said the great Greek scientist Archimedes, after providing a mathematical explanation for the lever.
"Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, a computer scientist in the USC School of Engineering's Information Sciences Institute.

His approach relies on two concepts, gathering huge amounts of data, and applying statistical models to this data. It completely ignores grammar rules and dictionaries.

Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.
"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English.
"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

Even if the initial steps for gathering data can take a long time, the translation system learns fast.

"One of the great advantages of the statistical approach," Och explained, "is that most of the work goes into components that are language-independent. As long as you give me enough parallel data to train the system on, you can have a new system in a matter of days, if not hours."
Och's ability to work quickly was tested recently in June, 2003, when researchers all over the country (and in England) raced in a "Surprise Language" exercise sponsored by the Defense Advanced Research Projects Agency to create machine translation tools to deal with texts in Hindi.

Source: University of Southern California, July 25, 2003

12:41:37 PM  Permalink  Comments []  Trackback []

Click here to visit the Radio UserLand website. © Copyright 2004 Roland Piquepaille.
Last update: 01/11/2004; 11:49:43.

July 2003
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Jun   Aug

Search this blog for

Courtesy of PicoSearch

Personal Links

Other Links

Ars Technica
Daily Rotation News
Microdoc News
Smart Mobs


Dave Barry
Paul Boutin
Dan Bricklin
Dan Gillmor
Mitch Kapor
Lawrence Lessig
Jenny Levine
Karlin Lillington
Jean-Luc Raymond
Ray Ozzie
John Robb
Jean-Yves Stervinou
Dolores Tam
Dylan Tweney
Jon Udell
Dave Winer
Amy Wohl

Drop me a note via Radio
Click here to send an email to the editor of this weblog.

E-mail me directly at

Subscribe to this weblog
Subscribe to "Roland Piquepaille's Technology Trends" in Radio UserLand.

XML Version of this page
Click to see the XML version of this web page.

Technorati Profile

Listed on BlogShares