Digital Preservation
Best Practice for Museums
Best Practices Guides: A Typology
After establishing the general issues of digital preservation, we can
proceed to developing a typology for comparing and categorizing current
best practices documents based on those issues. It is possible to identify
eight focus areas addressed in the literature:
Conceptualization of digital preservation issues
Organizational recommendations
Assessment of preservation strategies and recommended methodology
Analysis of Storage Media and Digital Formats, including lifespan assessment and recommendations
Metadata standards and practices
Issues of resource discovery, persistent identification and verification of authenticity
Intellectual property rights issues and approaches to rights management
Cost/resource recommendations and forward looking statements
As part the of the literature survey, a number of guidance and best
practices documents were identified as noted in the bibliography and using
the above typology, their recommendations categorized. Although this survey
not exhaustive, the goal is to identify key documents representing either
the advice of organizations leading the digital preservation enterprise or
commonly recommended in discussions of digital preservation. It is also
important to note that most of the best practice guides are recent and the
recommendations may not have empirical support yet. Indeed, the Arts and
Humanities Data Service notes a lack of information about how standards and
methodology may be applied effectively (AHDS, 2001).
Conceptualization of Digital Preservation Issues
Almost all guides identify the primary problems associated with digital
preservation as media deterioration/obsolescence and technology obsolescence.
In particular, the NINCH guide suggests preservation faces two types of
long-term accessibility challenges: machine accessibility (bit integrity) and
human accessibility (semantic integrity) (NINCH, 2002). Similarly, the Visual
Arts Data Service guide distinguishes between physical reliability and
continued usability (Grout, Purdy and Rymer, 2002). There is also general
consensus on digital archiving or preservation as a continuous activity in
the form of a series of managed activities (RLG/OCLC, 2002) or as a lifecycle
management approach (Beagrie and Jones, 2002). On the other hand, the source
of digital objects is an area of divergence. A number of guides focus on
digitized surrogates of physical objects while others encompass both digitized
and born-digital objects with the preservation recommendations biased accordingly.
Organizational Recommendations
In the area of organizational recommendations, the pressing need for
institutional policies on digital preservation is king. As Howard Besser
argues, "our community needs to develop a concrete set of guidelines that
can be used by people and organizations wishing to make information persist"
(Besser, 2000). NINCH goes even further to suggest that "[i]nsufficient
institutional commitment to long-term preservation can create digital
resources with limited sustainability" (NINCH, 2002). Beyond the creation of
policies, digital preservation needs to be incorporated into the organization
as a whole, as in the case of the National Library of Australia where digital
preservation is "part of existing core operations and systems … [as opposed
to developing] a special or separate undertaking requiring its own
infrastructure" (Gatenby, 2000). Finally, most guides reference OAIS as the
model for developing a digital preservation architecture as noted in the
literature survey.
Assessment of Preservation Strategies and Recommended Methodology
There is relative uniformity identifying the two dominant strategies in
digital preservation as being emulation and migration. However, some guides
only identify these two strategies while others present them as two in a
wider spectrum of possibilities. In most cases though, there appears to be
an implicit acceptance of migration as the primary strategy, ranging from
presenting migration as the only strategy (Hodge, 2000) to identifying a
list of strategies but explicitly recommending migration (NINCH, 2002).
The Cedars guidance documents are unique in presenting a more nuanced
migration with an emphasis on the retention of the original digital object
supplemented by migration upon request (Cedars, 2002). In contrast, the
NINCH guide suggests migrating with every version of a format (NINCH, 2002),
an undertaking requiring the application of greater resources over the
long-term. One reason given for why migration is favored is that
emulation-based approaches are experimental (Besser, 2000); while
emulation may be the best hope for complex digital objects in the future,
there are few institutions with the technological expertise to create
emulators in the short term. In general, there may not be a best strategy
but rather efforts should be focused on refining existing strategies
(Kenney, 2000) with an emphasis on providing a suite of digital
preservation tools.
While there is not a clear answer as to which strategy to choose, there is
agreement that all digital preservation strategies require that the digital
bits be available for future use. To this end, refreshing is identified as
the best practice for long term machine readability. Refreshing can be
implemented in a number of ways from immediately moving files onto a common
media maintained for all digital objects (Cedars, 2002) to more traditional
approaches emphasizing environmental stability and routine maintenance and
migration of the media (Grout, Purdy and Rymer, 2000).
Analysis of Storage Media and Digital Formats
In comparison to strategies, there is less consensus in recommending
specific media or formats. This can be attributed to the need to support
the significant properties of a broad range of artifacts, often requiring
specific formats and media. Even with more general issues such as whether
or not to use a compressed format, there is little consensus. While some
(NLC, 1998) are explicit about holding only non-compressed items, others
(Grout, Purdy and Rymer, 2000) only recommend the format be lossless to
retain maximum fidelity. One argument for why compression of any type
(including lossless) is problematic for archival files is the introduction
of an additional level of complexity (Besser, 2000) which only serves to
increase the recovery/migration problem in the future. One recommendation
that is agreed upon is the use of a standard format that is non-proprietary
(NLA, 2002, Kenney, 2000) as these kind of formats are more likely to have
a preservation path in the future.
One recommendation to handle media issues is to create backups (Cedars,
2002, Kenney, 2000, IMLS, 2001, NINCH, 2002) using more than one kind of
backup software to write the copies so as to safeguard against software bugs.
In this scenario, at least one copy should be maintained in an offsite
location and the media periodically checked as per the refreshing methodology.
Metadata Standards and Practices
There is general recognition of the importance of metadata in an overall
strategy for digital preservation (NINCH, 2002, Hodge, 2000). However, as
there is no single standard for preservation metadata widely accepted, many
organizations and projects (Cedars, 2002, Gatenby, 2000) have created their
own schemes for local use. Fortunately, there are enough common factors
between the local schemes that convergence should be possible (RLG/OCLC,
2001) with appropriate crosswalks to convert existing metadata to the
emerging standards.
For resource discovery metadata and structural metadata, there are emerging
standards which have some degree of consensus. Most guides that identify the
need for structural metadata recommend the usage of the Metadata Encoding
Transmission Standard (METS) (RLG/OCLC, 2001, Cedars, 2002, NINCH, 2002)
while Dublin Core is often recommended for resource discovery (Hodge, 2000,
IMLS 2001, Jones and Beagrie, 2002, Grout, Purdy and Rymer, 2000, Cedars,
2002). However, identifying these standards does not imply endorsement and
the relative newness of METS (2002) means that implementation guides may not
be available for some time.
Resource Discovery, Persistent Identification and Authenticity
The need for a persistent identifier to track the object is raised by a
number of guides, with proposed systems including PURL (IMLS, 2001, Gatenby
2000, Hodge, 2000, Kenney, 2000), DOI (IMLS, 2001, Hodge, 2000, Kenney, 2000),
ISBN/ISSN (IMLS, 2001) and local persistent identifiers (Cedars, 2002). What
is interesting is the surprising number of guides (NINCH, 2002, Grout, Purdy
and Rymer, 2000, Jones and Beagrie, 2002) that make no mention of the need
for a system of persistent identification. Related to the persistent
identifier is the need to ensure that the object has not been corrupted or
altered. Unfortunately, while the need has been identified, little work has
been done except for verification of authenticity at a bit level. For
instance, recognition between authentication (integrity of the record) and
authenticity (the quality and context of the record) is made (Jones and
Beagrie, 2002) but typically only authentication of the object recommended
with the common practice to calculate a value such as a checksum to ensure
bit-level integrity (Kenney, 2000, IMLS, 2001, Jones and Beagrie, 2002).
It should be noted that even for this recommendation, the issue of
standardization is problematic as checksums can be computed through a number
of algorithms and verification requires the algorithms be identical for the
checksum to be useful.
Intellectual Property Rights and Rights Management
The issue of intellectual property rights and rights management is
consistently raised with the primary emphasis on ensuring that institutions
have sufficient rights for digital preservation activities. Current and
proposed changes to copyright laws may prevent preservation either by
preventing copying or preventing modifying the digital object (Besser, 2000,
Gatenby, 2000) and represent a possible future barrier to preservation.
Ideally, an arrangement needs to be made with the copyright holder but as
noted in the experience of the National Library of Australia (Gatenby, 2000),
the amount of work this represents can be daunting.
Cost/Resource Recommendations and Forward Looking Statements
Finally, there has not been much discussion in terms of how much these
recommendations will cost or even the overall cost of digital preservation.
The CAMiLEON project does identify types of costs but the general consensus
is that future costs are currently unknown and cannot be well predicted.