Publishers are adding one-and-a-half million pages to
the
World Wide Web every day, according to analysts at Alexa Internet, of San Francisco, California.
Alexa said it archived 12 terabytes of
data, an amount equivalent to half the contents of the Library of
Congress, to find this and other facts about the growth and
scope of the Internet.
Among the other findings announced by Alexa Internet is the current size
of public content on the World Wide Web–said to be three terabytes
or three million megabytes. The Web doubles in size every eight months
and spawned 20 million content areas. However, Web traffic is far
from being evenly divided between sites, the report found, with 90 per cent
of the traffic
going to 100,000 different host machines and 50 per cent going to just
900 top sites.
“We have within the Web the largest library of information ever available to
humankind,” said Brewster Kahle, president and CEO of Alexa Internet.
“There are millions of unique ideas and perspectives represented on the Web
with few clear modes for access. Alexa’s efforts
are focused on finding the most helpful information and making it
available to as many Web users as possible.”
Alexa said it continually gathers Web content and uses it to provide site
statistics and related links to users of its free service. It donates a copy
of each “snapshot” of the World Wide Web to the non-profit Internet
Archive, which preserves as study material for future generations.
“Alexa’s archival efforts mean they’ve got more to say about the Web in
general
than any other Web data providers,” Chris Shipley, industry analyst and
editor of DEMOLetter. “This means businesses and organisations using
Alexa’s statistics and trend data are tapping a vast
data resource pulled from the most comprehensive archive of documents
‘born digital’–that is, electronic at conception and through
publication–than any currently available source.”
Alexa’s navigation aid appears as a toolbar at the bottom of the user’s
screen. Key features include its provision of information about a site’s
popularity, the number of links to it, its affiliations, etc. It also gives
users a list of 10 related links for each site they visit.
Finally: a by-product of Alexa’s archiving technique is its virtual abolition
of “404 Not Found” messages. It serves the most recently archived
version of an unavailable page.