Advertisement
Promo

Online business Toolkit

Open-source search looks for Web niche

Stefanie Olsen CNET News

Published: 18 Aug 2003 17:05 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

An emerging Web search project is out to keep Google, Yahoo and MSN honest -- and improve the process of finding useful, noncommercial information on the Net.

Called Nutch, the project is developing open-source software for locating documents online. But unlike major search providers, it won't cloak its formulas for matching relevant results to visitors' queries. Rather, it will provide an open window into its calculations, with links to explanations on how it determined each result, according to lead architect Doug Cutting.

"All of the existing search engines have secret methods for deciding which documents are the best documents," said Cutting, whose CV includes research and development stints at Excite, Grand Central and the Palo Alto Research Center. "Search is something that's a basic need for users of the Internet -- it's a valuable tool and yet it's controlled secretly, and that seems like a bad setup. People have the right to know how their search engine works so they can trust it."

Nutch itself has been operating secretly for roughly the last year, as it gathered support from developers and funding from one of the biggest commercial players in search: Overture Services.
 
Two researchers from Overture -- an advertising-supported search service in the process of being acquired by Yahoo -- approached Cutting last year with interest in providing funding for an open-source search system for academic research. Already itching to work on another search engine, Cutting spearheaded the effort from there, bringing on three founding developers, and forming a board of directors that includes Mitch Kapor, founder of Lotus and co-founder of the Electronic Freedom Foundation; and Tim O'Reilly, founder and president of tech book publisher O'Reilly & Associates.

Despite its connection to Overture, the project is not-for-profit and aims to advance search by supplying a technology for experimentation. Academic researchers or developers will be able to download the software and adapt it without having to reinvent the wheel, Cutting said. Foreign governments could use Nutch to develop a noncommercial search site for citizens, rather than licensing a proprietary, ad-supported technology, he said. Or corporate entities could build a for-profit business around the technology.

"If this is Linux, we're hoping there would be Red Hat," Cutting said, drawing a comparison with the open-source operating system and one of the leading companies offering it.

Searching for the next big thing
Search has become a hotbed for innovation in the last year as marketers have poured money into ad campaigns that tie their products to specific search terms. Overture and Google have built billion-dollar businesses around ad-supported search, and all the major portals have recommitted themselves to Web navigation as a result. Top computer scientists at the major portals and some academic researchers are devising ways to improve on search for the Internet and a host of applications.

The industry has also undergone much consolidation in the last year, and only a few companies -- Google, Yahoo and MSN -- are fielding the majority of search traffic worldwide. (Yahoo, for example, last month agreed to spend nearly $1.7bn to buy Overture.) With fewer and fewer players, the industry has little room for checks and balances, industry watchers say. Sites such as Google-watch.org have emerged to try to lend transparency to or raise questions about the company's growing importance in Web search.

Nutch has already taken the wraps off its downloadable software for research, which is suitable for testing by other developers but probably too arcane for the average Web surfer. It is aiming to have a public site by October that will allow people to search 100 million documents to be used as a measure against indexes such as Google.

For example, a Web surfer could pull up search results from Nutch, with transparency to its mathematical calculations, and compare them with those from Google, which does not publicise its formula for calculating search results. Nutch is actively seeking funding for hardware that would support traffic from Web surfers, but for now its systems do not have the capacity to handle an influx of visitors.

Overture would not detail the amount of money it has donated to Nutch. But it said that the effort was part of a desire to better "understand the current issues surrounding search and innovative solutions in that area," said Overture spokeswoman Jennifer Stephens.

Shortly after Overture last year founded its own research group, run by Gary Flake, it invested in the open-source search engine for academic research and to further its own learning, Stephens said. But since Overture acquired AltaVista and Web search technology from Norway-based Fast Search & Transfer, those technologies have come to be the core of its Web search technology and testing. Nutch is an alternative test bed for the company's use, she said.

The engine is written in Java and is based on Lucene, a software library that developers can use to add search to technologies such as email. Nutch builds upon Lucene, also developed in part by Cutting, and uses the technology as its intersearch library and indexing tool. But Nutch is designed to index and crawl the entire Web.

Cutting is particularly concerned about the effects of advertising-heavy search providers. As the engines become laden with links to products and services, that cargo could sway a search for noncommercial data. He's also concerned about US search companies becoming dominant overseas.

"It would be nice if there were an open-source search engine owned by the world."

  • Email
  • Trackback
  • Clip Link
  • Print friendlyPrint with EPSON

Did you find this article useful?
37 out of 86 people found this useful


Full Talkback thread

0 comments

Company/Topic Alerts

Create a new alert from the list below:






Sentry Posts Blog

homer

lets show everyone that labour has compasion[whilst there counting the votes] running upto march/april 2010...http://tinyurl.co...nus very good nb gordon brown said today on our... More

Post a comment

This Crap Site

How utterly stupid - I am ranked #40 in the top 100 - as a member of this site..... I mean HOW utterly stupid.... I have done sweet FA, I have only rejoined this site after a 3 or... More

Post a comment

Microsoft Security Update: November Pa...

Apologies for this late update to our core Patch Tuesday update. Here is a summary of the update .... The November Patch Tuesday update from Microsoft follows the largest patch and... More

Post a comment

Video icon

Video

Google Chrome

Roundup: Full coverage of Google Chrome

The search giant has launched a beta of its own open-source browser, sending a clear challenge to Microsoft in the way it lets users work with applications More

Blog: Google Chrome has Microsoft's code inside, says MS manager

And furthermore, he says, that's a good thing... More

Blog: Google Chrome — nine things we've found since launch

Google must be very happy with the coverage Chrome has gathered. But it's not all good news... More


Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters