Upcoming APARSEN results will be made available on this site continously.


The first part of the report is devoted to the state of art. We analyse the main international projects in the field, as well as the standards, recommendations and guidelines for keeping and preserving digital objects, with a special attention on the management of provenance and authenticity. The state of the art is completed by an extensive reference list and by an appendix where all the major projects in the area, their goals and their results are individually presented. On the whole, the state of the art testifies that significant scientific contributions have been given, and that a good level of theoretical formalization has been achieved in this area, even if a large gap still divides the mostly theoretical results of the scientific community from the actual practices carried on in most repositories. This gap needs to be filled with more concrete guidelines and proposals. Acting in this direction we propose a model of the digital object lifecycle, in order to identify the main events that impact on authenticity and provenance and to investigate in detail, for each of them the evidence that has to be gathered in order to conveniently document the history of the digital object. The crucial problem to be addressed is, of course, interoperability, since along its lifecycle the digital object may go through several changes of custody, and therefore the authenticity evidence needs to be managed and interpreted by systems, both keeping and preservation systems, which may be different from the ones that gathered it. Thus, the authenticity evidence needs to comply with a common standard. Achieving such a standard is a quite an ambitious goal, but some basic guidelines can be developed. The model and the guidelines proposed in this report may be considered as a preliminary step in this direction, and a basis to derive operational guidelines to improve the current (and often very limited) practices in managing authenticity and provenance in keeping and preservation systems. The report also documents other important activities that have been carried as part of APARSEN WP24. Interesting original results are presented about provenance interoperability and reasoning (described in detail at ID2401), which include a discussion of the mapping between different provenance models, and the proposal of a set of relevant reasoning rules for reducing the amount of provenance information that has to be explicitly stored and for making corrections easier. A further discussion is devoted to secure logging mechanisms, a specific aspect of the problem, which has a significant impact on managing authenticity.

In the original definition given in CASPAR, Authenticity Protocols (APs) are the procedures to be followed in order to assess the authenticity of specific type of Digital Resource (DR). The CASPAR definition is quite general and does not make reference to a specific authenticity management model. As part of the activities of APARSEN WP24 we have formalized an authenticity management model, which is based on the principle of performing controls and collecting authenticity evidence in connection to specific events of the DR lifecycle. This allows to trace back all the transformations the DR has undergone since its creation and that may have affected its authenticity. The model is complemented by a set of operational guidelines that allow to set up an Authenticity Management Policy, i.e. to identify the relevant transformations in the lifecycle and to specify which controls should be performed and which authenticity evidence should be collected in connection with these transformations. To formalize the policy we have indeed resorted to CASPAR's AP definition, and we have adapted and extended to integrate it in our authenticity management model. In our methodology the AP therefore becomes the procedure that is to be followed in connection with a given lifecycle event to perform the controls and to collect the AER as specified by the authenticity management policy. Accordingly, the original content of this deliverable, which was aimed at "implementing and testing an authenticity protocol on a specific domain", has been adapted and extended to encompass the whole scope of the authenticity evidence management guidelines. The current aim of the deliverable has therefore become to test the model and the guidelines at the operational level when dealing with the concrete problem of setting up or improving a LTDP repository in a given specific environment, to get to the definition of an adequate authenticity management policy. Moreover, instead of concentrating on a single environment, we have decided to extend the analysis to multiple test environments provided by APARSEN partners. Shifting to a practical ground and facing the actual problems that arise in the management of a repository has indeed been an important move to fill the gap that still divides the mostly theoretical results of the scientific community from the actual practices carried on in most repositories, and to reduce the fragmentation among the different approaches that prevents interoperability. And the case studies have proved the validity of this approach. On the one hand they have proved to be easily applied and well understood in all the test cases, and on the other hand the simple and yet rigorous concepts introduced by the model may provide a common ground for the management of authenticity evidence and for exchanging it among different systems. In at least one of the case studies, the guidelines have been applied to their full extent, i.e. from the preliminary analysis, to the identification of the relevant lifecycle events, to the detailed specification of the authenticity evidence to be collected, to the formal definition of the authenticity management policy, that is to the specification of the AP. In all cases, referring to the guidelines has provided valuable help, both in pointing out any weakness in the current practices and in providing a reasonable way to fix the problems.

Quality assurance of scientific information is a precondition and integral part of digital long-term archiving. To operate successful digital long-term archiving, organizations from the fields of science, culture and business cooperate within the EU project, APARSEN. The objective of this project is to set up a “long-lived Virtual Centre of Digital Preservation Excellence”. Securing permanent access of quality assured research data on reliable repositories is a central concern of APARSEN. This report documents ideas, developments and discussion concerning the quality assurance of research data. Focus is placed on action taken by science, e-infrastructure and publishers on quality assurance of research data. Such action is documented and classified in this report. Future fields of research are then identified based on this work.

This document reports on the work which has been undertaken in support of the European Framework for Audit and Certification of Digital Repositories which was initiated by the European Commission‟s unit which funds APARSEN. In negotiation this work was integrated into the APARSEN project and this document made part of the D33.1 deliverable. The work undertaken for ISO 16363 and for DIN 31644 is reported separately under the various headings. The DSA (Data Seal of Approval) process took place separately, prior to and independently of, this project, and is not reported in detail here.




In recent years, the rising growth of scientific and non-scientific digital data is resulting in an increasing number of digital objects and resources that has to be managed, creating a new set of opportunities and challenges in the realm of science and culture in general. The possibility of accessing a massive amount of scientific and cultural data in digital format, the increasing linkage across authors and their publications, the development of new and much more powerful metrics for assessing impact of scientific production are only some of the opportunities that can be created in this data-intensive environment. However, this scenario has led to the emergence of new challenges, like for example digital preservation, data integration, quality assessment and provenance. These challenges become magnified in global contexts where resources are distributed across systems and standards, and the movement of data across disciplines and organizations is very intensive. This imposes the need for implementing solutions, which allow identifying digital resources in a global and interoperable way across these boundaries, making different systems able to communicate and operate together in an efficient way. One area of interoperability that has been scarcely investigated is between identifiers and particularly PIs (PI). Since different kinds of identifiers are in use across different stakeholder communities and systems, and multiple identifiers can be available and used within the same system, a reasonable solution is to guarantee interoperability across different identifier systems as well as develop services common to more than one system. This report aims to investigate the interoperability issues between PIs and proposes a general Interoperability Framework (IF) as a starting point to design solutions to support interoperability.

-- VeronikaPraendlZika - 2012-10-15

Topic revision: r1 - 2012-10-15 - VeronikaPraendlZika
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback