About Digital Preservation

Digital preservation (DP) is a set of activities required to make sure digital objects can be located, rendered, used and understood in the future. This can include managing the object names and locations, updating the storage media, documenting the content and tracking hardware and software changes to make sure objects can still be opened and understood.
  • "Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time." (ALA 2007:2)
  • "The act of maintaining information, in a correct and Independently Understandable form, over the Long Term." (CCSDS 2002: 1-11)
  • "All activities concerning the maintenance and care for/curation of digital or electronic objects, in relation to both storage and access." (Research Councils UK 2008: 6)
  • (see digital preservation Europe: http://www.digitalpreservationeurope.eu/what-is-digital-preservation)
Digital preservation offers the economic and social benefits associated with the long-term preservation of information, knowledge and know-how for re-use by later generations. However, digital preservation has a great problem, namely that preservation support structures are built on projects which are short lived and is fragmented. European stakeholders in digital preservation have come together in the project APARSEN to create a shared vision and framework for a sustainable digital information infrastructure providing permanent access to digitally encoded information.

Digital Preservation State of the Art

Data and information management is essential within any organisation but is becoming increasingly challenging given the long and increasing time frames over which information is required to be retained. This means information contained in documents and files created many years ago, as well as those created today and in the future, is often required for time periods exceeding the supported life of the application used to create and render it. This problem shows no sign of abating, and in addition as the complexity and interconnectivity of information grows, the challenge of long-term access becomes greater.

Much pioneering work has been performed by national archives and libraries, large data archives, academia and industry. In order to perform long-term digital preservation, it is necessary to (i) understand the technology of the material being stored, (ii) be able to decide whether this technology is obsolete (and if so, what to do about it) and (iii) perform verifiable actions to remove the causes of this obsolescence (for example, via format migration) or provide new approaches to delivering environments in which the original software can run (for example, via hardware emulation).

The two key strategies currently recommended are migration and emulation. The former requires moving the data to formats currently supported, for example, moving Word 2.0 to Word 2007. Alternatively you may move it to a different format family, for example, Word 2.0 to PDF 1.4. These have their challenges, for example, Word to PDF may lose hidden text so any migration has to be validated and errors identified. This requires extracting the characteristics of the migrated file and comparing it to the original to identify changes. This can be simple (is the page count the same?) or more complex (is the image colour histogram the same?).

As described above, when migrating information it is important to move beyond the file view and migrate logical components. The best example of this occurs with web pages where migrating image files will result in broken links. It is important to follow up migrating files by migrating other files within the same logical units of information that depend on them, for example, changing the links in the HTML to point at the new image file. Also, containers must be recognised and the files within them migrated and replaced, leading to a new copy of the container file. This can lead to a cascade of migrations from one original action.

The result is a new “manifestation” of the information being managed. This terminology is important – it is not a new version as it is intended to convey the same meaning. A manifestation may combine some files that have changed and some that have not resulting in a many-to-many relationship between information objects, manifestations and files.

Emulation is a less developed approach and is intended to deliver a synthetic hardware environment on which old operating systems and programs (both preserved within the system) run to enable the original files to be used. Simulating the action of old hardware using software is complex especially where such aspects as clock speed and interaction with specific hardware are important. It can be useful for active content such as databases where the value is as much in the interaction with the data as the raw data itself. Other examples include computer games where consoles may no longer run in the future so need to be synthesised in software. As might be expected, this huge challenge requires future research to become truly useful.

The challenges discussed above extend the OAIS model to include Preservation Action as well as Preservation Planning. It is also important that as many of the actions above are automated as in very large data stores it will become too complex to migrate information individually. To that end all the identification, validation, characterisation and migration tools must be deployable at run time within a configurable system that allows actions to be run with minimum human intervention.

-- VeronikaPraendlZika - 2012-10-15

Topic revision: r1 - 2012-10-15 - VeronikaPraendlZika
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback