Google goes to the top of the language class
Published: 23 Aug 2005 09:20 BST
Google's ambitions to make the Web more international got a slight boost from a US government-run test in which its machine translation software beat out competitors from IBM and academia.
Google scored the highest in Arabic-to-English and Chinese-to-English translation tests conducted by the US National Institute of Science and Technology (NIST). Each test consisted of translating 100 articles from Agence France Presse and the Xinhua News Agency dated from 1 December, 2004, to 24 January, 2005. The results were posted earlier this month.
Although computerised translations historically have read more like broken English, increased processing power and larger data samples have allowed scientists to improve the accuracy of these systems.
Start-up Language Weaver, for instance, has created software that can translate Al Jazeera broadcasts. Research on the topic is being tackled at Carnegie Mellon's Language Technology Institute and other universities. (Neither Language Weaver nor CMU participated in the recent test.)
Google's machine translation wasn't perfect, but it was well ahead of the competition. On a scale from zero to one, the company's software scored 0.5137 on the Arabic tests and 0.3531 on the Chinese tests. In Arabic, the University of Southern California's Information Sciences Institute came in second with a .4657 and second in Chinese with .3073. IBM scored .4646 on Arabic and .2571 on Chinese.
Other participants included the University of Edinburgh, and Harbin Institute of Technology. Most of the software tested came from research labs, NIST said.
It is likely that Google benefited from its huge store of source material. Generally speaking, computerised translation software improves as more data gets fed to it. Through its search operations, Google has amassed billions of translated Web pages.
Like Yahoo and others, Google is looking toward the developing world for new customers. It includes some machine translation tools on its site, as well as several international editions.
Google could not be immediately reached for comment.
Full Talkback thread
1 comment










