Net not as interconnected as you think
Published: 15 May 2000 09:03 BST
If you think the World Wide Web is an information superhighway system, think again: The Web's most extensive mapping project shows that Internet traffic tends to flow in a strong one-way direction -- and for most sites, online users would find that "you can't get there from here."
The study, conducted by researchers at IBM, Compaq and AltaVista, is to be presented at scientific conferences next week. It builds on previous research into the structure of the World Wide Web and argues against the widely held impression that the entire Internet is highly interconnected.
The researchers used AltaVista's Web crawler to trace more than 200m Web pages in May and October 1999, following the 1.5bn on links embedded in those pages. That sample is just a fraction of the estimated billion-plus pages on the Web, but it dwarfs the 40m pages used for previous studies.
On the basis of their analysis, the researchers set out a "Bow Tie Theory" of Web structure:
- The central core, the knot of the bow tie, represents Web pages that are interconnected so well that you can eventually get from any page in the core to any other page just by following Internet links. Examples of core pages would include the home pages for IBM.com and MSNBC.com, said Nam LaMore, an IBM spokesman. This "strongly connected core" makes up just 30 percent of the entire Web sample.
- Another 24 percent represents "origination pages." These are pages with links that you can eventually follow into the core -- but which cannot be accessed through links from the core. One example is a personal Web page about your pet that includes links to online pet stores.
"You point to them, but no one (in the strongly connected core) is pointing back at you," LaMore said.
- Yet another 24 percent consists of "destination pages" that can be accessed from links in the connected core but do not link back to the core. One example are research papers buried deep on university or corporate web sites. Such a page "could be on IBM.com/research/projects/almaden and on and on -- and finally here's where it dumps you," La More said.
- The other 22 percent is completely disconnected from the central core: These pages are either "tendrils," connected by links only to pages in one of the other categories; "tubes," which link origination and destination pages without going through the core; or "islands" not linked to the rest of the Internet at all. An example of an "island" would be a group of student or family Web pages that link only to one another.
The only way to find such pages would be to know the address in advance. Even most search engines would not be able to find such an island, unless it was linked to the rest of the Internet at some point in the past.
Moreover, the researchers found, the proportions for these four categories remained constant between the May and October surveys, even though the number of Web pages grew substantially.
To Part II
What do you think? Tell the Mailroom. And read what others have said.





