Preservation of Digital Information
Web sites - Archiving
Web sites present different, more complex problems for preservation. There are not only individual files that need to be preserved, but the relationships and linkages between them, and their structured indices, pose further preservation issues.
The following article provides a helpful summary of the issues for Web sites, including
- instability of URLs
- viruses
- the need for change documentation
Preservation Risk Management for Web Resources
Virtual
Remote Control in Cornell's Project Prism
D-Lib Magazine
January 2002
The article also describes projects that have tried to address the problem of preserving Web sites:
In 2001, the Internet Archive introduced the Wayback
Machine, where users can view snapshots of Web sites as they
appeared at various points in the past.
Although this project provides useful snapshots, it does not record
- Changes in structure of a Web site
- Databased material
The following paper describes the survey of Web sites at the Smithsonian, and makes some key recommendations for preserving Web sites:
Archival
Preservation of Smithsonian Web Resoures: Strategies, Principles,
and Best Practices
July 20, 2001
Smithsonian recommendations for design and authoring of Web sites and HTML pages for long-term access
- Good Dublin Core metatags on pages
- HTML markup should be XML compliant
- Avoid the use of a third-party proprietary search engine not under control of the site's Webmaster
- All links within the site should be relative
- Copies of Web site should be periodically created and maintained
- Major revisions to a Web site should be fully documented
An
Approach to Managing Internet and Intranet Information for Long
Term Access and Accountability A Paper Prepared by the IM Forum
and Intranet Working Group
This guide was produced to provide government-wide guidance on managing records and publications on the Internet and on departmental intranets and extranets.
Listed here are some international projects that have tackled the issues related to preserving the complex information objects created for the Web.
Prototype Web Archiving Projects
Minerva -- Mapping the Internet Electronic Resources Virtual Archive project of Library of Congress
This project is described in
Collecting and Preserving the Web: The Minerva Prototype
(RLG Diginews, Vol 5, No 2) William Y. Arms, et al., Cornell University
Pandora - Preserving and Accessing Networked Online Resources of Australia
National Library of Australia
Archive of selected Australian online publications, including Web sites developed strategy for long-term preservation.
Collecting and Preserving the Web: Developing and Testing the NEDLIB Harvester, National Library of Finland (RLG Diginews, Vol 5, No 2)