STFC Scenarios

STFC Scientific Data - MSST

User Scenario ID S.1
Author David Giaretta STFC/APA
Background Scientific laboratories collect and archive data from various sources. It is important for other scientists to be able to use that data to, for example, reprocess to confirm some published results or, probably more frequently, to analyse in new ways and/or to combine with data from other sources,
Type of digital information MSST radar data
Threat(s) to the data (1) Existing software libraries to access the data may be unusable, .(2) the structure of the data i.e. the format, may be forgotten (3) the semantics i.e. the meaning of the individual numbers may not be understood e.g. this number is a temperature measured in degrees C, measurement from (...) using a type of thermometer and the raw values were turned into degrees C using this (...) calibration curve.
Designated Community Scientists involved in atmospheric physics
Preservation Technique Create a fairly complete Representation Information Network to analyse risks. Create additional Representation Information - Structure and Semantic. Save the BADC web site for information about the measurement instrument. Save the software and associated algorithms
Usage  
Success Criteria Ask members of the Designated Community and those from a closely related discipline if they would be able to sensibly use the data, given that additional Representation Information

User Scenario ID S.1a
Author David Giaretta STFC/APA
Background The International Ultraviolet Explorer (IUE) was an astronomical satellite which obtained UV spectra of tens of thousands of astronomical objects. The data for one object consists of an image which has one or more spectral orders (each as a band across the image). The raw data is processed through several stages, first correcting photometrically and geometrically and then extracting the spectrum.
The original IUE processing software created what is called VICAR format; the VICAR files had binary (i.e. non text) header files followed by data with, in the processed files, various quality flags to show where pixels cannot be trusted.
Since the launch of IUE the FITS astronomical format has become the accepted astronomical format for data.
What actually happened was that it was decided to create the "IUE Final Archive" was a way to ensure that the scientific data collected by IUE would not be lost. In the process of doing this a new way to process the raw data in a more accurate was developed however putting this to one side there were a number of interesting considerations.
* the binary header data encoded temperatures and voltages of the instrument - in FITS the headers are essentially text. Therefore what was done was to convert the various temperatures, voltages into physical into (degrees K or Volts) and these were put into the header as numerical data in characters. The FITS headers take the form of NAME = VALUE where NAME is limited to 8 characters. Therefore for each value a name must be created - limited to 8 characters.
* The quality flags needed to be converted into separate images within the FITS file. The meaning of each quality image pixel value needed to be defined (e.g. 1 means pixel was saturated to has no meaningful data, 2 means the calculated value is affected by reseaux marks and so should be regarded with suspicion...)
Type of digital information Astronomical data
Threat(s) to the data 1 Existing software libraries to access the data may be unusable,
2 the structure of the data i.e. the format, may be forgotten
3 the semantics i.e. the meaning of the individual numbers may not be understood e.g. this number is a temperature meanured in degrees C, measurement from (...) using a type of thermometer and the raw values were turned into degrees C using this (...) calibration curve.
Designated Community Astronomers
Preservation Technique Transformation into a new format - FITS.
However as noted above, even ignoring the processing algorithms, this is not a simple transformation. A great deal of new semantics must be passed on to the users and the relationship between the various components within the new format file must be explained in order for the digitally encoded information.
Usage Astronomers access the FITS file and use a variety of different suites of astronomical software to extract new astronomical information, perhaps combining with data from other sources.
Success Criteria The success of the preservation activity can be seen from the fact that the IUE data is still used by astronomers, 33 years after the launch of the satellite and 13 years since the satellite was closed down.

User Scenario ID S.1b
Author David Giaretta STFC/APA
Background Astronomical data is often in the form of tables. These vary from simple text files, with a few lines of headers followed by columns of numbers and/text, to components in FITS files either as text or binary. Although the column headings often seem simple e.g. "VsubJ" - Johnson visual magnitude. However the accurate interpretation of the data values one needs to know the filter transmission curves. Moreover in some cases the names can be misleading.
Type of digital information Tabular data containing data from various sources.
Threat(s) to the data (1) Existing software libraries to access the data may be unusable, .(2) the structure of the data i.e. the format, may be forgotten (3) the semantics i.e. the meaning of the individual numbers may not be understood e.g. this number is a temperature meanured in degrees C, measurement from (...) using a type of thermometer and th raw values were turned into degrees C using this (...) calibration curve.
Designated Community Mostly astronomers
Preservation Technique Several preservation techniques have been used.
In many cases the original data format has been Transformed to FITS. However since XML became popular it was decided to create a table format which would be better suited to exchange via Web Services. This new format was called VOTable. It was also believed that as XML it would have some advantages for preservation.
In many cases a Uniform Column Descriptor (UCD) has been created to capture the semantics of the column to give some idea of which column could sensibly be combined.
Usage The various encodings of data must be able to be understood and used. IN particular the various datasets must be combinable sensibly. One way this is done is to virtualise the various encodings into the Java AbstractTableModel and a variety of specialisations which capture additional information and semantics. This allows data from multiple sources and in multiple formats to be combined.
Success Criteria Astronomers can use and understand the data that is encoded, and in particular can combine data from various sources.

User Scenario ID S.1d
Author David Giaretta STFC/APA
Background The European Space Agency launches a number of satellites which captured data and, after processing these were stored in Common Data Format (CDF). CDF was a format which originated in NASA to encode data which was repeatedly measured on a grid. The internal format is very complex and was not described anywhere except in the access libraries - which itself was very complex. However NASA decided that it would no longer support the CDF access software at some point in the future.
ESA had what was, at that time, a huge amount of data in this format.
Moreover the CDF file format had certain limitations and so a number of "conventions" were imposed on the CDF files which meant that analysis software had some built-in semantics which the "standard" CDF software did not know about.
Type of digital information Solar Terrestrial Physics (STP) measurements obtained from a number of satellites.
Threat(s) to the data 1 Existing software libraries to access the data may be unusable,
2 the structure of the data i.e. the format, may be forgotten
3 the semantics i.e. the meaning of the individual numbers may not be understood e.g. this number is a temperature measured in degrees C, measurement from (...) using a type of thermometer and the raw values were turned into degrees C using this (...) calibration curve.
Designated Community STP scientists
Preservation Technique Transforming to another format was a possibility although the volume and the hidden semantics made this unattractive.
Instead it was decided to ensure the long-term usability of the data by describing it using the EAST language. This first required the CDF software team to write a fairly full description of the CDF internal structures which could then be described in EAST.
This gave ESA the confidence to continue to keep the data in CDF format for a considerable time. Eventually technology changes in storage and the emergence of new analysis tools and formats meant that at least some of the data was transformed, but the hoidden semantics had to be exposed.
Usage The data was used in a variety of analysis tools.
Success Criteria The ability of scientists to use and combine data.

Contemporary Performing Arts

User Scenario ID S.2
Author David Giaretta STFC/APA
Background New contemporary performing arts composition must be able to be re-performed over time.
Type of digital information Consists of musical composition perhaps PDF) plus software (known as patches) which changes the music e.g. adding reverberation etc in a complex workflow using proprietary software and hardware. The patches are essentially subroutines which run in the proprietary software.
Link to sample data Provide a link to the samples of data that are available for testing with the Testbeds
Threat(s) to the data The interaction and timing of the interactions between the music and computer effects must be maintained. The computer generated effects must be maintained despite the lack of ability to run the software patches and the availability of the hardware.
Designated Community Performers of this type of music plus their musical assistants.
Usage The performer and musical assistant must be able to re-perform the music.
Success Criteria The music must be able to be re-performed to the satisfaction of the composer, if available, or to the performer.

UNESCO World Heritage Site data

User Scenario ID S.3
Author David Giaretta STFC/APA
Background World Hertiage Sites (WHS) are documented using a variety of techniques including laser-scans, satellite observations, etc captures as a variety of data files. The state of the site at one poit in time must be able to be compared with other measurements later on in order to determine if the site has deterorated
Type of digital information ESRI shape files etc etc.
Link to sample data Provide a link to the samples of data that are available for testing with the Testbeds
Threat(s) to the data The ESRI software needed may become unavailable in future years.
Designated Community UNESCO WHS experts.
Usage The data from one time must be able to be compared with measurements taken with different instruments in order to see if the site has deteriorated.
Success Criteria The measurements are successfully compared and the older data does, in spot checks, agree with the data values extracted by the original s/w.

-- DavidGiaretta - 2011-08-03

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2011-11-01 - DavidGiaretta
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback