Common Vision for Digital Preservation

Outline of how the various aspects of the common vision come together

APARSEN’s work programme is structured to allow us to focus on small, specific, aspects of digital preservation so that we can break the work into work packages of fixed length, each with deliverables. The alternative would be to have a small number of work packages, each lasting essentially the whole of the project, with one big deliverable at the end - however had we proposed this we would not have been funded.

Nevertheless it is clear that these various areas are related and so we group them into “topics” – trust, sustainability, usability and access. This is done to allow us to bring together what we believe are closely related pieces of work under a common topic, however each topic is also related and so we overlap the topics in stages so that we can digest them in manageable pieces.

This purpose of this document is to remind ourselves of the strategy which guided the structure of the project, and then to map out in more details the steps of the integration in terms of:

  • how the topics can fit together (where appropriate)
  • how the work feeds into the VCoE, and
  • where the gaps are for future research.

Before beginning this we remind ourselves that, although APARSEN is large it cannot address everything. Therefore the specific work packages were constructed based on the members’ interests, which are in turn, we believe, guided by a natural prioritisation in terms of the maturity of the research in each area. In other words the areas we do not address are either sufficiently well understood or alternatively are so little understood that they must be the subject of further research in the next several years before they can be dealt with.

Our aims are therefore to carry out these focused pieces of research, identifying or very often putting in place, the common ground, while showing how these fit into the well accepted pieces of work and pointing the way to the required new areas of research.

The next subsections summarise the way in which the topics fit together. After that there are more detailed descriptions of how the topics are integrated in stages. At this point the Trust topic is complete and Sustainability is almost complete and so quite a lot can be said about these stages of the integration. The other topics and stages of integration are left as placeholders but we should try to put in our best guesses.

Trust

If one is to preserve the digitally encoded information then one must be certain that this is going to be done successfully. There are various aspects to that one must be able to deal adequately with the authenticity of the object, supported by the provenance which may come from many sources (WP24). In addition one must be able to choose the appropriate tools, ones for which there is adequate evidence of their effectiveness for the type of objects which one is trying to preserve (WP14 and WP16). Additionally one may seek third party views both for the quality of the information (WP26) and the quality of one’s ability to successfully preserve the information (WP33).

Sustainability

Preserving digitally encoded information requires resources – supplied continuously (although probably unevenly) over a lengthy period. This requires that we have some estimates of the costs (WP32) and also the balancing business cases and related benefits (WP36) which justify those costs. We can seek to reduce costs by looking at storage options (WP23) and, because it may be possible to reduce costs to any one repository by sharing the costs through the use of shared preservation services (WP21).

Usability

Just as one can try to reduce costs, one can also try to increase benefits through increasing the usability of the preserved digitally encoded information (WP25). Related to this is the ability to use large amounts of information – where the issue of scalability comes to the fore (WP27). There are two key related aspects which must accompany the above considerations, access by those who wish to use the information and trust in those who look after it.

Access

One must be able to get to the digital object, which requires identifying the object according to various criteria – this is an area we do not address since there are so many transitory methods and this is a subject of much research. Then having selected the object of interest – which normally means obtaining a pointer – one must then resolve this pointer to find the location, for example via an internet address. Thus we must rely on mechanisms for resolving such identifiers which persist over time (WP22). Those who look after the digitally encode information must also take care that the access rights associated with the object are respected; otherwise the repository itself may be under some legal and financial threat. This requires that the governance and data policies of the repository and the digital rights associated with the data are understood and respected over time (WP 31 and WP35).

INTEGRATION

As indicated above there is, we believe, a consistent narrative connecting the various work packages providing a rationale for the structure of the project. However the mechanism for integration must be made explicit. By integration we mean that we should make the interconnections between the various concepts explicit so that we can identify gaps and the future research areas. Clearly not everything has a direct overlap.

TRUST INTEGRATION

Using Audit and Certification as the central aspects, aspects of Authenticity and Provenance integrate to provide key evidence. Similarly evidence about the effectiveness of tools, while not directly applicable to audits, nevertheless should contribute to evidence that preservation plans are credible. Reputation and data quality are somewhat separate from digital preservation but are important for trust and its effect on the demand for use of the information, and hence for sustainability.

See here for further information about TRUST common vision

Use in the VCoE:

  1. Advice and training on improving trustworthiness with respect to the European Framework
  2. Advice on what evidence to seek when presented with claims about digital preservation
  3. Advice and training on the fundamentals of preservation including potential risks and available solutions
  4. Recommendations on tools to use first when wishing to preserve various types of digitally encoded information, including working with and maintaining evidence about authenticity

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. application of provenance mapping and rules to tracing of authenticity evidence with large numbers of generations of large number of objects
  2. secure logging, which is important in terms of confidence in the evidence which is presented needs to be converted into a practical method
  3. data quality, which is very far from being generally solved

SUSTAINABILITY INTEGRATION

Analysing sustainability must include costs and benefits. In addition costs will depend on the use of external services and storage costs.

Use in the VCoE:

  1. Advice and training how to make business cases
  2. Advice on which cost model should be applicable to a specific repository and explanations on cost parameters
  3. Recommendations on services to use when wishing to preserve various types of digitally encoded information
  4. Recommendations on storage solutions

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. More specific cost models for more specific types of repositories
  2. Better quantification of benefits and their evolution over time, and ways to enhance the benefits.
  3. Updates on developments in storage technology

TRUST and SUSTAINABILITY INTEGRATION

  1. The continued funding of a repository should be closely linked to the trust in the ability of a repository to preserve its holdings, and hence to the European Framework for audit and certification if the importance of certification is widely accepted by funders.
  2. The costs which affect sustainability will be affected by the choice of tools and an understanding of associated risks.
  3. Benefits derived from data holdings will have some dependence on the trust which potential users have in the authenticity and quality of those holdings.
  4. The Preservation Services used by a repository will affect the trustworthiness of that repository in that those services are likely to be under different management. In so far as services are likely to rely on their own holdings of information, which needs to be preserved, hence those services may also be subject to certification with respect to preservation.

Use in the VCoE:

  1. Enhance the advice/training about enhancing benefits which may be derived from digital holdings, factoring in the advantages of additional trust balanced against the costs e.g. of audit and subsequent implementation of improvement plans, which might include improved provenance tracking and other authenticity evidence, and capturing evidence of the quality of the holdings.

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. Quantification of costs of factoring in trust issues.

USABILITY INTEGRATION

Use in the VCoE:

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. ….

TRUST, SUSTAINABILITY and USABILITY INTEGRATION

Use in the VCoE:

  1. .

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. .

ACCESS INTEGRATION

Persistent Identifiers are an essential building block for enabling effective and efficient technical solutions and for supporting the creation of value-added services like:

1) Data and information Access, Search and Navigation

2) Fast, large-scale and decentralized Data Sharing & Reuse

3) Effective Linkage of data and information across repositories

4) Fine-grained Access Control

5) Data and information Quality assessment

6) Reputation assessment & Citation indexes

7) Impact and ROI assessment (reliable research outputs beyond the scope of published literature)

8) Ownership management for data and scholarly content (citability)

Use in the VCoE:

  1. Defining a common agenda among the key stakeholders to  ensure  that a  coordinated  ecosystem  of  identifiers  can  be  built and the implementation of an infrastructure for the interoperability between existing solutions can be realized.
  2. Favoring the raising of synergies between research communities and private commercial sector.
  3. Planning   interventions   to   promote   awareness,   dissemination   and   education   activities   aiming   at   expanding  and  reinforcing  PIs  knowledge  and  skills

Gaps:

There are obvious gaps, requiring further research, in terms of
  1. The Persistent Identifiers ladscape is a very fragmented environment where solutions are orchestrated by very few parties. This has led to a lack of consensus and coordination between parties in finding common interoperable solutions. -> need of coordination and agreement on common needs and objectives
  2. . Financial   sustainability   is   only  recently  becoming  an  issue  in  the  digital  identifier  ecosystem  and  traditional  funding  schemes   appear   to   be   inadequate   to   address   sustainable   solutions. -> need of business model to guarantee the long term sustainability of PIs solutions
  3. There   is   a   scarce   level   of   awareness   within   many   stakeholder   communities   about   the   benefits   of  Persistent Identifiers   for   the   creation   of   value-added   services   on   top   of   scientific   data   and   content. ->need of awareness and skills among relevant stakeholders

TRUST, SUSTAINABILITY, USABILITY and ACCESS INTEGRATION

Use in the VCoE:

Gaps:

There are obvious gaps, requiring further research, in terms of

Mechanisms for reaching Common Vision for Digital Preservation research

  • Glossary - based on OAIS with extensions, additions etc
    • Position various projects with OAIS terminology
  • Pick up various topics from organisations' research info and place in common terminology- add new ideas to general list
    • Reflect this back to organisations and see if there is agreement
  • Collect gaps identified from tools and services lists and evidence gathering
  • Define a common agenda which should include - prioritized objectives and measures - policies and responsibilities - disseminations aspects and strategies - interoperability challenges at different levels (not only technical but also political, social, economical...) - temporal priorities

Additional Info



  • APARSEN_WP11_integration.jpg:
    APARSEN_WP11_integration.jpg
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2012-12-06 - BarbaraBazzanella
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback