UKDA Scenarios

User Scenario ID 33
Author Sharon Bolton (UK Data Archive), Contact Hervé L’Hours (UK Data Archive)
Background The UK Data Archive is curator of the largest collection of digital data in the social sciences and humanities in the United Kingdom. The Archive holds several thousand datasets relating to society, both historical and contemporary. For conversion to standard preservation (and access) formats, a standard approach is required at ingest to ensure that processing staff and, subsequently, users have confidence in the conversion output .
Type of digital information A large proportion of our files are quantitative data files which are deposited primarily in SPSS but we also receive STATA, SAS and Excel format.
Link to sample data An ESDS Government sample dataset can be made available on request
Threat(s) to the data Various proprietary statistical software packages manage files in a variety of formats to support general and software-specific functions. Differences between packages and their formats and the large and complex nature of the data sets present a significant risk that some information will be lost (through error or truncation) or damaged on conversion with no record produced of the changes incurred. These are ongoing problems with ingest conversion but similar issues exist with obsolescence of a particular format version of a statistical software package. Threats include: Truncation of variables (to a reduced number of decimal places), Truncation of labels (to a reduced number of characters), Non-identical feature sets supported (Not all of format A included in format B or some elements of format B left blank because not present in format A), Different approaches to the application of weighting (and truncation of decimal places may occur with weighting variables)
Designated Community Social Scientists working with quantitative data
Usage Statistical information used for secondary analysis or replication of results
Success Criteria Processing users will either be presented with confirmation that the outcome of the format conversion is content-identical or with clearly flagged difference between the two files, preferably with verbose explanations. The end users will either be presented with content-identical statistics or detailed explanations of any variation from the originally deposited material sufficient to replicate any analysis made on that material

-- HerveLH - 2011-10-25

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2011-10-26 - DavidGiaretta
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback