The magic that makes Google tick
Published: 01 Dec 2004 12:15 GMT
The process
Obviously it would be impractical to run the algorithm once every page for every query, so Google splits the problem down.
When a query comes in to the system it is sent off to index servers, which contain an index of the Web. This index is a mapping of each word to each page that contains that word. For instance, the word 'Imperial' will point to a list of documents containing that word, and similarly for 'College'. For a search on 'Imperial College' Google does a Boolean 'AND' operation on the two words to get a list of what Hölzle calls 'word pages'.
"We also consider additional data, such as where in the page does the word occur: in the title, the footnote, is it in bold or not, and so on.
Each index server indexes only part of the Web, as the whole Web will not fit on a single machine - certainly not the type of machines that Google uses. Google's index of the Web is distributed across many machines, and the query gets sent to many of them - Google calls each on a shard (of the Web). Each one works on its part of the problem.
Google computes the top 1000 or so results, and those come back as document IDs rather than text. The next step is to use document servers, which contain a copy of the Web as crawled by Google's spiders. Again the Web is essentially chopped up so that each machine contains one part of the Web. When a match is found, it is sent to the ad server which matches the ads and produces the familiar results page.
Google's business model works because all this is done on cheap hardware, which allows it to run the service free-of-charge to users, and charge only for advertising.
Full Talkback thread
16 comments
-
Hello
One thing that ticks me off about google is... Hilton Santos -
Dude... don't know what you're talking about.... carlos -
Hi Carlos
Thank you for your feedback.
T... Hilton Santos -
Hello
One thing that ticks me off... Trust me i can help -
Open letter to Google.
Hello my dear Googlers... Hilton Santos -
QUOTE FROM PREVIOUS POST
"Now please exc... Frustrated Research and Development specialist -
Very dear Frustrated Research and D... Hilton Santos -
Welcome to www.polorentacar.co... Hilton Santos -
And he calls himself development special... Anonymous -
Dear Hilton,
First of all, please forgiv... Anonymous -
I do not have time for people who d... Hilton Santos -
October 27 Today's Hand-picked Gallery (... hilton santos -
Oh Wilton, Alem de parvo es muito burro!!!!
Epa w... Anonymous -
Sou puto mas tenho colh... para assinar o que escr... Hilton Santos -
Anonymous -
Anonymous






