Geoff, do you know about this already?
(from the Merc)
The non-profit
Internet Archive has, since 1996, worked to preserve the ephemera of the digital age, capturing copies of millions of Web pages in bimonthly sweeps and maintaining its Wayback Machine, a tool that allows you to see the changes in a page over time. As you would imagine, both the collection and the haul brought in by each new sweep have grown rapidly. The library (tech geek details
here, video tour of the Sun "internet in a box"
here) now holds about 151 billion archived Web pages in a 3 petabyte (that's 3 million gigabytes) database that is expected to expand at the rate of 100 terabytes a month (that's 100,000 gigabytes). That database is tapped up to 500 times a second by some 200,000 visitors a day.
In addition to old Web pages, the archive also houses collections of texts, audio, video and live concert recordings, including
an extensive cache of Grateful Dead shows. In case of disaster, the entire database is mirrored at the New Library of Alexandria in Egypt, where, based on the fate of the Old Library of Alexandria, they are well aware of the importance of back-ups.