Tessella Scenarios (typical customer scenarios)

Scenario 26.1 Born-Digital Government Departmental Records

User Scenario ID 26.1
Author Ashley Hunter, Tessella plc
Background National Archives are founded on the basis that they must provide long term access to a wide variety of government records. Simple cases may include keeping the minutes of specific departmental meetings in PDF or DOC safe and available for access for a given period of time. More complex data types may be digital objects like databases, websites, CAD files etc. Each digital information object may have a defined closure period during which access permissions can only be granted to specific authorised individuals, but after this period, the record becomes open for wider or even public consumption. The digital records may well only reach the National Archive after a specific holding period (typically 10-20yrs) has expired within the issuing department's own Content Management System, setting the status of the material as "Archival". This further complicates the task of the National Archive as it strives to ensure that records do not become obsolete before they are even allowed to be transferred out of the issuing department to the Archival Repository. Records that are transferred to the Archive may then remain closed for significant periods (e.g. 50yrs, etc) or until such a reasonable time has expired that anyone referenced in the record is likely to be deceased.
Type of digital information MS Word documents(DOC), Excel Spreadsheets(XLS), Powerpoint presentations (PPT), Outlook PST files, Simple Text Files (TXT), Presentation copies of printed documents (PDF, PDF/A)
Link to sample data Tessella to provide a test set of typical files (not real, made up - likely the DROID test file corpora) for testing purposes
Threat(s) to the data The main purpose of the Archive is to provide long term search and access of records for the various approved user communites, and to this end the Archive must defend against passive preservation issues, relating to (1) keeping the digital objects safe from 'bit rot' or 'data decay' on the proimary storage media. (2) Complex container & compression formats, or files with embedded digital certificates & signatures can also introduce further risk to the ability to access the information objects in the future, as access may be dependent on having a tool available that knows how to unpack or uncompress the objects, or be dependent on performing a verification process with a 3rd party entity that may not be available anymore. (3) The Archive may want to provide multiple versions of its digital objects, rendering them in different formats for different access purposes (free low grade manifestation, and commercially available full high-resolution manifestations), and to this end Archives can use Active Preservation methods including file format migration to provide these various manifestation types. (4) Representation Information about the record's provenance and authenticity is provided through arrangement of the digital objects in to a hierarchy of collections, with Descriptive Metadata provided according to agreed metadata schemas. Commonly these are bespoke to each Archive, rather than using standard Schemas such as METS, MODS, EADS, PREMIS, etc.
Designated Community Initially the submitting Government Department and any other affiliated departments and organisations will have access to the material and over time this access will be relaxed for some of the records leading to wider or even public access to records for use by policy investigators, historians, and the general public.
Preservation Technique (1)The standard approach to defend against bit rot is to keep multiple copies of the AIP on different storage media, and to periodically test the integrity of these objects against their known fixity/checksum values at the time of ingest. Where corruptions are found, the system should notify the Archive staff so that the corrupted file can be replaced from one of the other AIP copies that has recently passed the integrity test. (2.a) During ingest, unpack and uncompress all digital objects and characterise these objects as individual digital objects in their own right, extracting descriptive, administrative, technical metadata and structural information in relation to any other extracted digital objects. Remove (2.b) Digital certificates and signatures from objects where possible and re-assert provenance and authenticity from within the Archival system itself (i.e. make it too have a discernable and guaranteed provenance and authenticity) whilst maintaining references to the original certification system. (3) Use file migration techniques to provide digital information in alternative file formats. This process may in itself be a lossy process, so some process of quality assessment and validation needs to be applied to ensure that the appropriate level of quality is maintained following each format migration. Several migrations may be required over time to provide the Designated Community with the information that they want to access in a format suitable to their needs. (4) Provide methods for translating between metadata schema definitions via code or XSLT, to enable exchange of networks of representation information. Facilitate further integration with catalogue collection systems through the use of OAI-PMH exchange protocols.
Usage Archive staff, submitting government department staff via access requests, public access request information
Success Criteria Digital objects remain available to their designated community through 'Search' &/or 'Browse' functionality, and are directly accessible to these users in forms and ways that are meaningful to them and can faciliate their re-use if allowed. (e.g. secure downloads, scheduled reader-room deliveries, public internet, etc)

Scenario 26.2 Digitised Presentation Manifestations / Hybrid Catalogues of Paper based Government Records

User Scenario ID 26.2
Author Ashley Hunter, Tessella plc
Background Collections of Government records may span several years of accessions, during which time the producers of the material moved from creating paper based records to digital records. To maintain accessibility across the 'divide' archives are digitising the paper manifestations in order to make these accessible along with the born-digital material. Typically, the paper based record will be scanned (or photographed) to create a high resolution Preservation Manifestation, and at the same time create a lower grade presentation copy along with other files including OCR'd text where available, and specific technical metadata extracted during the digitisation process (Page number, camera specification, creator, etc).
Type of digital information Preservation Manifestation formats include, but not limited to TIF, JP2, RAW (Where this is a specific format to the camera manufacturer; Presentation formats including JPG, JP2, PDF; OCR ouput in the form of TXT, RTF, CSV; and metadata in XML formats.
Link to sample data Tessella to provide a test set of typical files (not real, made up - likely the DROID test file corpora) for testing purposes
Threat(s) to the data The main purpose of the Archive is to provide long term search and access of records for the various approved user communites, and to this end the Archive must defend against passive preservation issues, relating to (1) keeping the digital objects safe from 'bit rot' or 'data decay' on the proimary storage media. (2) The Archive may want to provide multiple versions of its digital objects, rendering them in different formats for different access purposes (free low grade manifestation, and commercially available full high-resolution manifestations), and to this end Archives can use Active Preservation methods including file format migration to provide these various manifestation types. (3) Representation Information about the record's provenance and authenticity is provided through arrangement of the digital objects in to a hierarchy of collections, with Descriptive Metadata provided according to agreed metadata schemas. Commonly these are bespoke to each Archive, rather than using standard Schemas such as METS, MODS, EADS, PREMIS, etc.
Designated Community Initially the submitting Government Department and any other affiliated departments and organisations will have access to the material and over time this access will be relaxed for some of the records leading to wider or even public access to records for use by policy investigators, historians, and the general public.
Preservation Technique (1)The standard approach to defend against bit rot is to keep multiple copies of the AIP on different storage media, and to periodically test the integrity of these objects against their known fixity/checksum values at the time of ingest. Where corruptions are found, the system should notify the Archive staff so that the corrupted file can be replaced from one of the other AIP copies that has recently passed the integrity test. (2) Use file migration techniques to provide digital information in alternative file formats. This process may in itself be a lossy process, so some process of quality assessment and validation needs to be applied to ensure that the appropriate level of quality is maintained following each format migration. Several migrations may be required over time to provide the Designated Community with the information that they want to access in a format suitable to their needs. (3) Provide methods for translating between metadata schema definitions via code or XSLT, to enable exchange of networks of representation information. Facilitate further integration with catalogue collection systems through the use of OAI-PMH exchange protocols.
Usage Archive staff, submitting government department staff via access requests, public access request information
Success Criteria Digital objects remain available to their designated community through 'Search' &/or 'Browse' functionality, and are directly accessible to these users in forms and ways that are meaningful to them and can faciliate their re-use if allowed. (e.g. secure downloads, scheduled reader-room deliveries, public internet, etc)

Scenario 26.3 Scientific Datasets (ISIS Neutron and Muon Facility, STFC)

User Scenario ID 26.3
Author Ashley Hunter, Tessella plc
Background ISIS, the Neutron and Muon Source at Rutherford Appleton Laboratory in the UK, and part of STFC, wanted to preserve their instrument data that has been generated over the many years of operation. The instrument data has evolved over time as the instruments themselves have been updated and enhanced, resulting in various file formats specific to each instrument. A common data structure was developed for these known as ISIS RAW. Additional ad-hoc files may also be present to describe additional metadata about the instrument or its operation, and temporary files are generated during the cyclic operation of the instruments for backup purposes (SAV file formats). A separate catalogue system maintains representation information relating to the setup of each instrument and why it was used in a specific experiment and by whom. This remains confidential for a 2 year period so that results can be derived by the initiating investigator before the results are made publically available for use by the wider research community.
Type of digital information Instrument files in SAV, ISIS RAW and Nexus (NXS). Ad-hoc formats include TXT.
Link to sample data Link to publically available historic instrument data to be added here
Threat(s) to the data The main purpose of the Archive is to provide long term search and access of records for the various approved user communites, and to this end the Archive must defend against passive preservation issues, relating to (1) keeping the digital objects safe from 'bit rot' or 'data decay' on the proimary storage media. (2) ISIS may want to provide multiple versions of its digital objects, like aggregating older ISIS RAW and other ad-hoc files in to one combined NXS format. (3) Network of Representation Information about the instrument data is held in the iCAT cataloguing system.
Designated Community Initially the Research scientists that commission the work, but later this will become public access to all the instrument data after the 2 year closure period has passed.
Preservation Technique (1)The standard approach to defend against bit rot is to keep multiple copies of the AIP on different storage media, and to periodically test the integrity of these objects against their known fixity/checksum values at the time of ingest. Where corruptions are found, the system should notify the ISIS support staff so that the corrupted file can be replaced from one of the other AIP copies that has recently passed the integrity test. (2) Use file migration techniques to provide digital information in alternative file formats. The MANTID software has been used to provide this migration pathway from the older ISIS RAW data formats to the newer NXS formats. (3) Provide methods for translating between metadata schema definitions via code or XSLT, to enable exchange of networks of representation information. Facilitate further integration with catalogue collection systems through webservices.
Usage ISIS Research staff, Client sponsoring research staff, and wider academic community following release as public access
Success Criteria Research datasets remain available to their designated community through 'Search' &/or 'Browse' functionality, and are directly accessible to these users in forms / formats that are meaningful to them and can faciliate their re-use when allowed.

-- AshHunter - 2011-10-25

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r7 - 2011-10-26 - AshHunter
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback