Search engines make some noise
Published: 28 May 2004 11:35 BST
StreamSage has flown under the radar during its last four years of operation while it has invested heavily in research and development. Its chief scientist, Tim Sibley, is known for his work in computational linguistics. StreamSage has received funding from research grants, including the National Institute for Standards and Technology's Advanced Technology Program. Harvard University uses StreamSage's technology to allow medical school students to search past lectures on related subjects. AOL is using the technology to provide closed captions for streaming video and audio on AOL Broadband.
NPR is using StreamSage to transcribe its audio programmes as they're broadcast, thereby helping them to get listed faster. NPR does commission transcripts for many of its programs, but the traditionally manual process of transcription would be too slow for a search related to timely news. Using speech recognition technology, StreamSage can create text from audio much more quickly, and then feed those transcripts to Google and Yahoo.
NPR's Thomas said her outfit eventually replaces the transcripts from StreamSage with those from humans because the human-rendered records much more accurately reflect the audio and video content. StreamSage's results can be garbled.
NPR also licenses technology from Singingfish to meticulously label its audio files with relevant information, or metadata.
In its own first step toward offering multimedia search, Google registers NPR audio files on Google News, its specialty news aggregation service. A search for a headline topic that is discussed on audio-only NPR programs, such as Talk of the Nation, will uncover a link to the audio programme and the specific segment covered, for example.
A Google representative confirmed a relationship with NPR but declined to comment further on the technology. Up until now, Google has not listed multimedia files because the company has sought to avoid the legal uncertainties of indexing and linking to copyrighted works that owners may want protected, company executives have said in the past. Beyond those reasons, audio and video file searching can be a much more difficult technical task to solve than cataloguing the Web.
StreamSage's Murray said he's not worried about potential copyright issues because his company is not housing the information. Rather Streamsage points people to audio and video around the Web, just like Google or Yahoo does.
Exactly how far search engines can go in linking to multimedia files has yet to be worked out definitively in the courts. The recording industry last year quietly settled a long running dispute with MP3Board.com over alleged illegal links to music files without any money changing hands, said MP3Board's attorney, Ira Rothken.
Yahoo also announced a relationship with NPR, in February, when it outlined its "content acquisition" program, a systematic effort to include more hard-to-get information in its searchable database.
An NPR affiliate in Boston, WBUR.org, is using similar technology from Hewlett-Packard, called Speechbot. Robin Lubbock, director of new media for WBUR, said the broadcaster is using Speechbot to translate audio into text so employees and visitors can search for content on its own Web site.
Virage, which is now owned by Autonomy, has technology that analyses in-stream audio and video and lets people zip to the part of the stream they want. Yet it can be an expensive enterprise solution.
Jay Webster, chief technology officer of interactive agency Fathom Online, said that for most audio or video broadcasters to get ranked in search engine results, they would have to employ some manual indexing of their own first.
"Where it gets cool is if you could search on any keyword and find it within audio and that audio would come up in search results," Webster said. "But I don't think we're there yet."





