ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Online business Toolkit

Open-source search looks for Web niche

Stefanie Olsen CNET News.com

Published: 18 Aug 2003 17:05 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

An emerging Web search project is out to keep Google, Yahoo and MSN honest -- and improve the process of finding useful, noncommercial information on the Net.

Called Nutch, the project is developing open-source software for locating documents online. But unlike major search providers, it won't cloak its formulas for matching relevant results to visitors' queries. Rather, it will provide an open window into its calculations, with links to explanations on how it determined each result, according to lead architect Doug Cutting.

"All of the existing search engines have secret methods for deciding which documents are the best documents," said Cutting, whose CV includes research and development stints at Excite, Grand Central and the Palo Alto Research Center. "Search is something that's a basic need for users of the Internet -- it's a valuable tool and yet it's controlled secretly, and that seems like a bad setup. People have the right to know how their search engine works so they can trust it."

Nutch itself has been operating secretly for roughly the last year, as it gathered support from developers and funding from one of the biggest commercial players in search: Overture Services.
 
Two researchers from Overture -- an advertising-supported search service in the process of being acquired by Yahoo -- approached Cutting last year with interest in providing funding for an open-source search system for academic research. Already itching to work on another search engine, Cutting spearheaded the effort from there, bringing on three founding developers, and forming a board of directors that includes Mitch Kapor, founder of Lotus and co-founder of the Electronic Freedom Foundation; and Tim O'Reilly, founder and president of tech book publisher O'Reilly & Associates.

Despite its connection to Overture, the project is not-for-profit and aims to advance search by supplying a technology for experimentation. Academic researchers or developers will be able to download the software and adapt it without having to reinvent the wheel, Cutting said. Foreign governments could use Nutch to develop a noncommercial search site for citizens, rather than licensing a proprietary, ad-supported technology, he said. Or corporate entities could build a for-profit business around the technology.

"If this is Linux, we're hoping there would be Red Hat," Cutting said, drawing a comparison with the open-source operating system and one of the leading companies offering it.

Searching for the next big thing
Search has become a hotbed for innovation in the last year as marketers have poured money into ad campaigns that tie their products to specific search terms. Overture and Google have built billion-dollar businesses around ad-supported search, and all the major portals have recommitted themselves to Web navigation as a result. Top computer scientists at the major portals and some academic researchers are devising ways to improve on search for the Internet and a host of applications.

The industry has also undergone much consolidation in the last year, and only a few companies -- Google, Yahoo and MSN -- are fielding the majority of search traffic worldwide. (Yahoo, for example, last month agreed to spend nearly $1.7bn to buy Overture.) With fewer and fewer players, the industry has little room for checks and balances, industry watchers say. Sites such as Google-watch.org have emerged to try to lend transparency to or raise questions about the company's growing importance in Web search.

Nutch has already taken the wraps off its downloadable software for research, which is suitable for testing by other developers but probably too arcane for the average Web surfer. It is aiming to have a public site by October that will allow people to search 100 million documents to be used as a measure against indexes such as Google.

For example, a Web surfer could pull up search results from Nutch, with transparency to its mathematical calculations, and compare them with those from Google, which does not publicise its formula for calculating search results. Nutch is actively seeking funding for hardware that would support traffic from Web surfers, but for now its systems do not have the capacity to handle an influx of visitors.

Overture would not detail the amount of money it has donated to Nutch. But it said that the effort was part of a desire to better "understand the current issues surrounding search and innovative solutions in that area," said Overture spokeswoman Jennifer Stephens.

Shortly after Overture last year founded its own research group, run by Gary Flake, it invested in the open-source search engine for academic research and to further its own learning, Stephens said. But since Overture acquired AltaVista and Web search technology from Norway-based Fast Search & Transfer, those technologies have come to be the core of its Web search technology and testing. Nutch is an alternative test bed for the company's use, she said.

The engine is written in Java and is based on Lucene, a software library that developers can use to add search to technologies such as email. Nutch builds upon Lucene, also developed in part by Cutting, and uses the technology as its intersearch library and indexing tool. But Nutch is designed to index and crawl the entire Web.

Cutting is particularly concerned about the effects of advertising-heavy search providers. As the engines become laden with links to products and services, that cargo could sway a search for noncommercial data. He's also concerned about US search companies becoming dominant overseas.

"It would be nice if there were an open-source search engine owned by the world."

  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with HP

Did you find this article useful?
36 out of 76 people found this useful


Full Talkback thread

0 comments

Company/Topic Alerts

Create a new alert from the list below:






Related Jobs

Commercial Financial Analyst

Working knowledge of basic finance, development of P&Ls and NPV calculations are advantageous. The safe keeping and internal distribution of all ...

PM/BA Reference Data - Investment Banking - City of London

The SSI (Standard Settlements instructions) repository consists of two main systems, a package from IMETA and an in-house built matching engine. You ...

Support Services Technician with MCSE and CCNA

Benefits and package are 2nd to none and include company susidised holiday, onsite restaurant and education funding as well as profit related bonuses ...

Sentry Posts Blog

Mobile Linux Better For Mobile Busines...

Mobile Linux Better For Mobile Business Apps? Author: Eric Everson, MyMobiSafe.com As mobile Linux is carving it’s footprint on the future of mobile application development, the... More

Post a comment

DWP downplays security breach

The Department for Work and Pensions (DWP) has admitted that some of its staff have been forwarding passwords with password protected material. An email that was leaked on the 'Dizzy... More

Post a comment

How many headshots does one chairperso...

We got a strange request last week from the head of PR from Russian security experts Kaspersky. It seems although the company was very happy with the interview we recently carried with... More

Post a comment

Featured Talkback

I wonder, who needs .asia domain? I cannot imagine, what would be useful for Microsoft.asia? Toyota.asia? Then let's register .europe (if .eu is too short). Or perhaps Microsoft.southamerica, Dell.australiaandnewzealand, Coca-Cola.africa... Sound funny? Then why not just use the global and country domains? Or perhaps it is time to drop the domains at all?

By: LadyRoot

Read full story:
Businesses advised to register .asia domains