Search engines make some noise
Published: 28 May 2004 11:35 BST
Similarly, some search engines now analyse text or keywords that describe a multimedia file's content, or what's called metadata. Singingfish, for example, relies on 70 fields of descriptive material about a file -- such as author, bit rate, file size -- to catalogue it, but the company regularly runs into shortcomings in such information.
Others transcribe portions of the audio or video and then analyse the language for meaning, topics of conversation, and relevance to a search term.
Most ambitiously of all, a handful are bent on searching inside the files to extract meaning and relevance by examining audio and video features directly.
Filtering the garble
StreamSage is starting to make waves with its audio and video search technology, introduced late last year. The company developed software after roughly three years of research that uses speech recognition technology to transcribe audio and video. It then uses contextual analysis to understand the language and parse the themes of the content. As a result, it can generate a kind of table of contents for the topics discussed in the files.
The downfalls of this method are that it can be extremely difficult to be 100 percent accurate. In fact, experts say the language-detection technology is typically only 80 percent accurate. Language hurdles such as accents, jargon and dialect can trip up the technology, for example.
StreamSage introduced a Web site called CampaignSearch.com last week to showcase its technology. The site lets people search audio and video files on the Web for clips from the presidential candidates, including files on sites such as Whitehouse.org, CSPAN Voice of America and others.
"It's a timely demonstration of StreamSage's technology," said Seth Murray, the company's president.
For example, the technology can dissect an hour-long speech from Democratic presidential nominee John Kerry to find a segment in which he talks about health care, and earmark that four minutes of the broadcast for access.





