ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Server platforms Toolkit in association with http://ad.doubleclick.net/clk;205413468;14699245;m?http://adfarm.mediaplex.com/ad/ck/2397-58840-22058-14

The magic that makes Google tick

Matt Loney ZDNet.co.uk

Published: 01 Dec 2004 12:15 GMT

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

The process
Obviously it would be impractical to run the algorithm once every page for every query, so Google splits the problem down.

When a query comes in to the system it is sent off to index servers, which contain an index of the Web. This index is a mapping of each word to each page that contains that word. For instance, the word 'Imperial' will point to a list of documents containing that word, and similarly for 'College'. For a search on 'Imperial College' Google does a Boolean 'AND' operation on the two words to get a list of what Hölzle calls 'word pages'.

"We also consider additional data, such as where in the page does the word occur: in the title, the footnote, is it in bold or not, and so on.

Each index server indexes only part of the Web, as the whole Web will not fit on a single machine - certainly not the type of machines that Google uses. Google's index of the Web is distributed across many machines, and the query gets sent to many of them - Google calls each on a shard (of the Web). Each one works on its part of the problem.

Google computes the top 1000 or so results, and those come back as document IDs rather than text. The next step is to use document servers, which contain a copy of the Web as crawled by Google's spiders. Again the Web is essentially chopped up so that each machine contains one part of the Web. When a match is found, it is sent to the ad server which matches the ads and produces the familiar results page.

Google's business model works because all this is done on cheap hardware, which allows it to run the service free-of-charge to users, and charge only for advertising.

  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with Dell

Did you find this article useful?
426 out of 928 people found this useful


Company/Topic Alerts

Create a new alert from the list below:





Related Jobs

Database Developers ( SQL / T-SQL / SSIS / ETL ) - Chatham Maritime

As and when required, carry out database administration and maintenance tasks including capacity planning, security and integrity planning, index ...

Support Analyst - System Administration

As a Support Analyst - System Administration you will help support over 3,200 people worldwide, including working closely with command centres around ...

Senior Technical Specialist - Unix/Veritas - York - 40000

Day to day you will Build, configure and Support Solaris/Veritas HA ( High Availability) clusters. You must have strong skills on Veritas clusters 2, ...