#1 Background Discussion on Known Testbed systems

CASPAR

The aim of the CASPAR testbeds arose from the SMART metrics (similar to APARSEN's). The specific metrics of concern here were:

  • Demonstrate a sound theoretical basis for the approach taken, including the compatibility with the OAIS Reference Model and related standards – all of which have been peer reviewed extensively in the standards process itself and also by practitioners of digital preservation in a great number of areas
  • Provide a practical demonstration by means of what may be regarded as “accelerated lifetime” tests. These should involve demonstrating the ability of the Framework and digital information to survive:
    • environment (including software, hardware) changes
    • changes in the Designated Communities and their Knowledge Bases
The results was a collection of evidence from the domains of science (STFC and ESA), cultural heritage (UNESCO) and Contemporary Performaing Arts (CIANT, IRCAM, INA and ULeeds). This evidence is available in D4104.

The testbeds were also linked to the threats which were identified through the PARSE.Insight surveys (thousands of responses from around the world and across domains)

Threat STFC ESA UNESCO IRCAM UnivLeeds CIANT INA
Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved X   X X X X  
Non-maintainability of essential hardware, software or support environment may make the information inaccessible X X X X   X  
The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity X     X     X
Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future             X
Loss of ability to identify the location of data Not addressed
The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future X            
The ones we trust to look after the digital holdings may let us down Covered by section 4.3 in D4104


There are no specific tools to simply install and use. The CASPAR testbed is really a collection of techniques and examples to follow, and which can thereby be used for any digital object and any preservation technique

-- DavidGiaretta - 2011-05-12


MIXED

DANS has developed a digital preservation service to migrate binary files into an intermediate XML format. Upon dissemination the XML files will be converted into a current binary file format. This infrastructure has the name MIXED. DANS has the intention to incorporate the MIXED strategy in the "daily" digital archiving procedures. Part of this implementation project is the testing of the plugins and services.


It looks as if this is migration based testbed technique - databases and tables to XML and back again. Is this a testbed? If it tests migration to XML and back again - what does it test? Databases ith built in queries or code/formulae?

-- DavidGiaretta - 2011-05-11

For DANS, MIXED is a relative small component relevant for the day to day data management practice of a scientific data archive that curates a considerable number of data files in obsolete formats. The MIXED development project has delivered specifications, plug-ins and services. Whether the testing and incorporation of the service can be considered as a testbed is debatable. Issues we are going to test and evaluate are: 1. The provenance metadata relevant for the proces 2. The quality of the plugins (if possible based on existing specs such as Siard or ODF

Given the fact that DANS has 1 manmonth in WP14 we thought the testing and reporting might be a suitable contribution to the Aparsen project. It would be nice to have more Aparsen partners involved, if possible. But we are open for suggestions to contribute in an other way in the WP.

-- ReneVanHorik - 2011-05-12

Sorry for any confusion. MIXED got included on this page as Rene had offered it up for testing in WP14 and this was a convenient place to record it.

We think that MIXED is a digital preservation tool to test, rather than a testbed to test digital preservation tools or techniques in.

-- PaulineSinclair - 2011-05-12


Planets Testbed

The Planets Testbed provided a controlled environment and Corpus of 5000 sample digital objects. It allowed users to carry out experiments to test the suitability of preservation tools and workflows developed in Plato. Results of experiments carried out in the central instance of the Testbed were made publicly available making it possible to benchmark the outcomes of preservation processes.


Digital objects - documents only? Is this about significant properties? If so see PLATO comments

-- DavidGiaretta - 2011-05-11

-- BarbaraSierman - 2011-08-02

Not only documents but also images and sound objects and yes it focused on significant properties

-- BarbaraSierman - 2011-08-02


Planets Preservation Planning Tool (PLATO)

Plato is "a decision support tool that implements a solid preservation planning process and integrates services for content characterisation, preservation action and automatic object comparison in a service-oriented architecture to provide maximum support for preservation planning endeavours."

Both PLATO and the Planets Testbed have a common origin in the DELOS and Dutch Testbeds, however they have developed to meet different needs. The PLATO tool performs evaluation of migration tools guided by a given institution’s policies and strategies, and answers questions of the following kind:

  • Is this tool the most convenient one given the personnel constraints of my institution?
  • Does the usage of this tool for conversion of my content match my institution’s policies?
The Planets testbed is designed to allow individuals to run objective experiments aimed at answering questions that are independent of institutional considerations and exclusively related to the intellectual properties of the digital objects, for example:
  • Which of these 2 tools preserves line spacing better?
  • How does the performance of this tool scale with the object size or with the number of objects?
N.B. Description of the differences is taken from the Planets/OPF wiki.


In a paper in iPRES I presented the case that these kinds of testbeds are to do with Authenticity. In the revised OAIS this comes under the concept Transformational Information Properties.

That concept is applicable to data but it's not clear how this particular tool can be used for anything other than rendered objects - images, text.

Also note that it does not capture the semantics associated e.g. with the line spacing

-- DavidGiaretta - 2011-05-11

It should be possible to define Significant Properties / Transformational Information Properties for self-describing data formats like NXS and HDF4,5 in order to validate the results of a transformation from 1 format to another. This might include a simple test such as data column count, row counts, etc to perform some level of automated validation of the transformation process.

-- AshHunter - 2011-06-07


Safety Deposit Box (SDB) Test systems

Tessella's SDB includes Preservation Planning functional workflows, that incorporate common Characterisation tools such as DROID, JHOVE, Registry (a.k.a PCR / PRONOM) and a variety of wrapped migration tools from both the commercial and public domain sectors. Demonstrator systems can be deployed to provide access to these workflows and tools in order to allow assessments of various preservation policies and strategies.


Same comment as for Planets testbed

-- DavidGiaretta - 2011-05-11

SDB primarily uses migration actions to support its preservation activities. This is supported by the automatic gathering of technical metadata including significant properties, which are used to validate the migration actions. Additionally, through the KEEP EU project, we have investigated the feasibility of providing an emulation strategy for accessing digital objects held within SDB. We recognise the importance of recording metadata (Representational Information) for the digital objects that we store within SDB, and so maintain the descriptive and administrative metadata supplied with the digital objects by the Designated Community, as well as gathering technical metadata automatically about the digital objects on ingest.

-- AshHunter - 2011-05-12

Until recently the main usage of SDB has been very document centric, as we have primarily supplied the system to Memory Institutions, however recent work with the STFC's ISIS pulsed neutron and muon team have shown that the system is capable of preserving their data objects, and performing file migrations from older .RAW formats to the newer .NXS (Nexus) file formats whilst preserving the metadata supplied about those digital objects.

-- AshHunter - 2011-05-12


SHAMAN Demonstrator

The SHAMAN Demonstrator provides a specific implementation of the flexible and adaptable SHAMAN framework for long-term preservation in the context of memory institutions. Motivated by the scenarios and workflows of the DNB and the other partners, the SHAMAN demonstrator will integrate and evaluate new technologies.

ISP1

The software demonstrator ISP1 prototype addresses the needs emerging from the memory institution domain. The actors, processes and workflows in this domain have been analysed in order to steer the development of the ISP1 prototype. The ISP1 prototype address three different scenarios:
  • Indexing and archiving book-like publications in depot libraries
  • Indexing and archiving digitisations
  • Scientific publishing and archiving heterogeneous interlinked material
The implementation of the ISP1 prototype utilises document processing tools (Xeproc), SOA and GRID (iRODS) technology to showcase and facilitate the flexibility and adaptability of the SHAMAN framework. The end result of the ISP1 prototype is an integrated software demonstrator that is accessible for test purposes via the SHAMAN homepage. It allows to evaluate the functionality provided by the software demonstrator using a test-collection consisting of book-like publications.

The functionality provided by the ISP1 demonstrator covers the 5 phases of a digital object's lifecycle (creation, assembly, archival, adoptation and reuse) and mainly focuses on the requirements of the following key user groups:

  • archivists and librarians managing digital collections;
  • digital records managers in the heritage and/or public sector; and
  • managers and administrators of digital libraries and institutional repositories
As a proof-of-concept implementation, the ISP1 demonstrator was limited to the processing of PDF documents and JPG images. However, during evaluation sessions with the key user groups, the data migration, access and authentication, interoperability, HW/SW independence, search capability etc. of the ISP1 prototype were demonstrated and highly appreciated by audience.

ISP3

The SHAMAN e-Science Data Acquisition and Harmonisation Testbed is a proof-of-concept prototype of experimental nature for the acquisition and preservation of e-Science data, with the following aims: (i) Capture a large spectrum of e-Science data, such as sensor data, scientific data, experimental data, and mathematical simulation data, as also capture the information of the respective workflows (context information); (ii) Definition of flexible workflow-based data processing models, allowing for the customization of acquisition and preservation processes; (iii) Reuse of components, bearing a SOA architecture; and (iv) Make use of federated data grid-based for the storage of data for infrastructure independence and scalability. For those purposes, three scenarios are addressed:
  • Acquisition and Preservation of Sensor Data, which addresses the capturing and preservation of data originated from monitoring systems used in structural safety of, for example, dams (collaboration with the Portuguese Civil Engineering Laboratory);
  • Acquisition and Preservation of Scientific Workflows, which addresses the capturing and preservation of workflows for the production and execution of experiments (cooperation with the MyGrid project, funded by the JISC/UK);
  • Acquisition and Preservation of Data and Simulations, which addresses the acquisition and preservation of data produced in the context of particle physics experiments and mathematical simulations (cooperation with Laboratory of Instrumentation and Particles, Portugal).
Due to the challenges of acquiring and preserving data in such scenarios, the e-Science Testbed focuses on the phases that precede the archival of data, dealing with the capture of the context surrounding the data creation/use, and with the creation of archival packages containing all the information needed to interpret the data in the future. For that, we will demonstrate how to define and manage relevant flexible processes for data harvest (using SOA-based techniques) combined with metadata registry (http://metadata-stds.org/11179/) services to support the contextual descriptions.

Testbed LDP

Testbed LDP, 2008-2011, is financed by the European Regional Development Fund (ERDF) and the County Council of Norrbotten in Sweden.

The project is a first step towards a flexible testbed, built on loosely coupled modules which are easily exchanged as technology evolves. The testbed is to reflect the entire OAIS-model, i.e. cover testing of digital information from delivery to e.g. an archive to management of the information in the archive to a future customer gaining access to the information.

-- JohnLindstrom - 2011-05-12

I assume the above refers to the OAIS Functional Model but what about the Information Model?

-- DavidGiaretta - 2011-05-12

Is the Testbed LDP the Testplatform described here: http://www.ltu.se/org/srt/Centrumbildningar/Centrum-for-langsiktigt-digitalt-bevarande-LDB/Vara-projekt/Testplattform?l=en ? This testplatform only seems to be able to test how well individual websites are archived by the system, which doesn't match up with your description.

I'm puzzled as I can only find one reference to "Testbed LDP" on the Luleå University of Technology's website and that's in a webpage that no longer seems to exist (but is still in Google's cache). Can you provide a link to more information about the Testbed LDP, please?

-- PaulineSinclair - 2011-05-18

The whole LTU web site is being re-developed and it seems like the LDB part is now missing...

-- JohnLindstrom - 2011-05-19

The web site is now being launched again...and here is a link to read more on the Testbed/platform: http://www.ltu.se/centres/Centrum-for-langsiktigt-digitalt-bevarande-LDB/Vara-projekt/Testplattform?l=en

-- JohnLindstrom - 2011-05-26

The testbed/platform is aimed for "practical technical tests of existing methods in order to find best practices in digital preservation and access. We are now building a technical platform for these tests, following the OAIS-model. Among others, certain functions from the projects above will be tested at the platform. It will also be possible for other parts to test their solutions on the platform". There is not a lot of open info on how the testbed/platform integrates with the OAIS-model. Will look into this. There is a architecture figure availble at: http://www.ltu.se/centres/Centrum-for-langsiktigt-digitalt-bevarande-LDB/Vara-projekt/Testplattform/Arkitektur-1.52939?l=en

If the testbed/platform is not in line with what is looked for - then we can just remove the entry...

-- JohnLindstrom - 2011-05-26


SCAPE Testbeds

From the DoW of the SCAPE project "The main goal of the Testbeds is to assess the large scale applicability of the SCAPE Preservation Platform and the preservation components developed within the project. Using these software components, it creates test environments for the different application scenarios [ Web Content, Scientific Data Sets, Large Scale Digital Repositories -BS] and complex large scale preservation workflows which will shed new light on existing preservation services and the improved or new components developed within the project." -- BarbaraSierman - 2011-08-02

Kopal Library for Retrieval and Ingest (koLibRI)

koLibRI was originally developed in the kopal project. It is a framework for the integration of a long-term archiving system into the infrastructure of an institution. In kopal, it was used to integrate the IBM Digital Information Archiving System (DIAS) into the DNB infrastructure. In particular, it organizes the creation and ingest of archival packages into DIAS and provides functionalities to retrieve, manage and migrate these packages. koLibRI has been designed with the intention to be re-usable as a whole or in parts within other contexts, too.

In the meantime koLibRI was enhanced by further digital preservation projects (DP4lib, textGrid, LuKII, SHAMAN) and was also used as a test environment there. Its modular design and the using of standards like OAIS, METS, URN and LMER, enable koLibRI to operate in different environments. Besides this flexibility koLibRI allows the integration of other third party modules into various workflows.

In the project DP4lib for example, the file information tool set FITS was integrated and tested in koLibRI. FITS uses many different metadata extraction and validation tools and consolidates their results. This has substantially increased the number of supported file formats and the quality of the generated technical metadata. For the ingest it means in short that koLibRI generates a METS conform XML file from the metadata, which were delivered with the content object and generated by FITS. Then, koLibRI bundles the content object and the METS file into an archive file (.zip or .tar) and delivers the resulting Submission Archiving Package (SIP) to the long-term archiving system like DIAS.

koLibRI has also been used in the context of implementing and testing interoperability between kopal and LOCKSS in the project LuKII. The LuKII installation has focused on the analysis and test of conversion tools and migration scenarios between kopal and LOCKSS. In this scenario, the koLibRI MigrationManager is used. The MigrationManager is a koLibRI component which manages and executes migrations. Objects which have to be migrated and the result of the migration can be described. Depending on those objects and an institution’s individual requirements individual migration workflows can be configured and different migration tools can be used.

Although the projects around koLibRI gathered experiences with publications used in libraries as input material, koLibRI is not limited in principle on a special kind of digital information. -- StefanHein - 2012-01-16

#2 Common Functionality Shared by Testbed Systems

In this section, we examine the common functionality that can be derived from analysing the above testbed systems to provide a general view of the constituent parts of a digital preservation testbed system


First impression is that all but CASPAR use migration as the only strategy and use of significant properties/characterisation which puts these in the heading of Authenticity for rendered objects.

-- DavidGiaretta - 2011-05-11

Even though not explicitly stated above, SHAMAN also uses Multivalent/Fab4 browser for rendering digital objects in diverse formats. Hence media engines are updated/migrated instead of migrating the digital objects themselves. -- HolgerBrocks - 2011-05-25


-- AshHunter - 2011-04-13

-- AshHunter - 2012-05-10

Topic revision: r1 - 2012-05-10 - AshHunter
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback