The magic that makes Google tick
Published: 01 Dec 2004 12:15 GMT
The scalability
Google has two crucial factors in its favour. First, the whole problem is what Hölzle refers to as embarrassingly parallel, which means that if you double the amount of hardware, you can double performance (or capacity if you prefer -- the important point is that there are no diminishing returns as there would be with less parallel problems).
The second factor in Google's favour is the falling cost of hardware. If the index size doubles, then the embarrassingly parallel nature of the problem means that Google could double the number of machines and get the same response time so it can grow linearly with traffic. "In reality (from a business point of view) we would like to grow less than linear to keep costs down," said Hölzle, "but luckily the hardware keeps getting cheaper."
So every year as the Web gets bigger and requires more hardware to index, search and return Web pages, hardware gets cheaper so it "more or less evens out" to use Hölzle's words.
As the scale of the operation increases, it introduces some particular problems that would not be an issue on smaller systems. For instance, Google uses IDE drives for all its storage. They are fast and cheap, but not highly reliable. To help deal with this, Google developed its own file system -- called the Google File System, or GFS -- which assumes an individual unit of storage can go away at any time either because of a crash, a lost disk or just because someone stepped on a cable.
There are no disk arrays within individual PCs; instead Google stores every bit of data in triplicate on three machines on three racks on three data switches to make sure there is no single point of failure between you and the data. "We use this for hundreds of terabytes of data," said Hölzle.
Don't expect to see GFS on a desktop near you any time soon - it is not a general-purpose file system. For instance, a GFS block size is 64MB, compared to the more usual 2KB on a desktop file system. Hölzle said Google has 30 plus clusters running GFS, some as large as 2,000 machines with petabytes of storage. These large clusters can sustain read/write speeds of 2Gbps - a feat made possible because each PC manages 2Mbps.
Once, said Hölzle, "someone disconnected an 80-machine rack from a GFS cluster, and the computation slowed down as the system began to re-replicate and we lost some bandwidth, but it continued to work. This is really important if you have 2,000 machines in a cluster." If you have 2000 machines then you can expect to see two failures a day.
The problem |
The process |
The hardware
The scalability |
Other challenges
Full Talkback thread
16 comments
-
Hello
One thing that ticks me off about google is... Hilton Santos -
Dude... don't know what you're talking about.... carlos -
Hi Carlos
Thank you for your feedback.
T... Hilton Santos -
Hello
One thing that ticks me off... Trust me i can help -
Open letter to Google.
Hello my dear Googlers... Hilton Santos -
QUOTE FROM PREVIOUS POST
"Now please exc... Frustrated Research and Development specialist -
Very dear Frustrated Research and D... Hilton Santos -
Welcome to www.polorentacar.co... Hilton Santos -
And he calls himself development special... Anonymous -
Dear Hilton,
First of all, please forgiv... Anonymous -
I do not have time for people who d... Hilton Santos -
October 27 Today's Hand-picked Gallery (... hilton santos -
Oh Wilton, Alem de parvo es muito burro!!!!
Epa w... Anonymous -
Sou puto mas tenho colh... para assinar o que escr... Hilton Santos -
Anonymous -
Anonymous










