Microsoft seeks to build more useful thesaurus
Published: 24 Feb 2009 11:25 GMT
The Next Generation Writing Assistance project within Microsoft's research unit is aiming to build a more useful thesaurus, by tapping techniques used to translate languages.
Although thesauri are good at finding synonyms, they require the user to pick the right one, since they can't understand context. That's where machine language-translation techniques come in.
"We've taken the actual translation tables... and said: 'If a word in Chinese maps to two different English words, maybe those two words are synonyms, with some probability'," said Chris Brockett, a computational linguist and one of the Microsoft researchers leading the project.
This approach offers two key benefits over using a standard thesaurus: it can handle phrases, as opposed to single words, and it can draw on the context in which a phrase is used.
Brockett plans to show off a prototype of the tool next week at TechFest, Microsoft's annual internal science fair. It's one of dozens of projects that will be shown as part of an effort to expose Microsoft's business units to the work being done in Microsoft's research labs.
As is the case with most of the projects that will be displayed at TechFest, the thesaurus effort is still in its infancy.
"We're still working on the algorithms and how much work we give to the language pairs," Brockett said. "We have to get the quality up. There are usability issues that have to be looked into."
Over time, Brockett hopes the technique could be used to effectively rewrite whole sentences. However, would-be plagiarists should beware. Although the technology could one day translate a whole Wikipedia article for you, it would probably translate the article the same way for everyone else as well. Plagiarism-detection software is also evolving.
The thesaurus technology would be naturally suited to inclusion in Word, which already has a built-in traditional thesaurus.
The technology could also help Microsoft in another key area: search. While search engines are good at finding names, for example, that have just one form, they have more difficulty finding expressions that can be phrased in multiple ways.
Credit: Microsoft aims to build a better thesaurus from CNET News











