ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Server platforms Toolkit in association with http://ad.doubleclick.net/clk;205413468;14699245;m?http://adfarm.mediaplex.com/ad/ck/2397-58840-22058-14

The magic that makes Google tick

Matt Loney ZDNet.co.uk

Published: 01 Dec 2004 12:15 GMT

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

The hardware
"Even though it is a big problem", said Hölzle, "it is tractable, and not just technically but economically too. You can use very cheap hardware, but to do this you have to have the right software."

Google runs its systems on cheap, no-name IU and 2U servers -- so cheap that Google refers to them as PCs. After all each one has a standard x86 PC processor, standard IDE hard disk, and standard PC reliability - which means it is expected to fail once in three years.

On a PC at home, that is acceptable for many people (if only because they're used to it), but on the scale that Google works at it becomes a real issue; in a cluster of 1,000 PCs you would expect, on average, one to fail every day. "On our scale you cannot deal with this failure by hand," said Hölzle. "We wrote our software to assume that the components will fail and we can just work around it. This software is what makes it work.

One key idea is replication. "This server that contains this shard of the Web, let's have two, or 10," said Hölzle. "This sound sounds expensive, but if you have a high-volume service you need that replication anyway. So you have replication and redundancy for free. If one fails you have 10 percent reduction in service so no failures so long as the load balancer works. So failure becomes and a manageable event."

In reality, he said, Google probably has "50 copies of every server". Google replicates servers, sets of servers and entire data centres, added Hölzle, and has not had a complete system failure since February 2000. Back then it had a single data centre, and the main switch failed, shutting the search engine down for an hour. Today the company mirrors everything across multiple independent data centres, and the fault tolerance works across sites, "so if we lose a data centre we can continue elsewhere -- and it happens more often than you would think. Stuff happens and you have to deal with it."

A new data centre can be up and running in under three days. "Our data centre now is like an iMac," said Schulz." You have two cables, power and data. All you need is a truck to bring the servers in and the whole burning in, operating system install and configuration is automated."

Working around failure of cheap hardware, said Hölzle, is fairly simple. If a connection breaks it means that machine has crashed so no more queries are sent to it. If there is no response to a query then again that signals a problem, and it can cut it out of the loop.

That is redundancy taken care of, but what about scaling? The Web grows every year, as do the number of people using it, and that means more strain on Google's servers.

  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with Kyocera

Did you find this article useful?
426 out of 928 people found this useful


Company/Topic Alerts

Create a new alert from the list below:





Related Jobs

IT Helpdesk Analyst

Main Duties: - Maintain a sound understanding of the clients business and requirements, the reliance on IT services provided and impact on the ...

Database Developers ( SQL / T-SQL / SSIS / ETL) - Chatham Maritime

Troubleshoot failures of implemented processes, including those designed and implemented by other Database developers, administrators or external ...

Operational Analyst- Edinburgh- 30,000

You will be reviewing and cross checking the daily back-up sequence outlining any failures, timeframes changes etc; Collating and reviewing the ...