Long-term Accessibility and Preservation of published research results and their genesis


Weber, Andreas ; Piesche, Claudia:
Long-term Accessibility and Preservation of published research results and their genesis.
Veranstaltung: Collaborative Research Centre / Transregio 32: 2nd Data Management Workshop , 28.-29. Nov. 2014 , Köln, Deutschland.
(Veranstaltungsbeitrag: Workshop , Poster )


Angaben zu Projekten

Projektfinanzierung: Deutsche Forschungsgemeinschaft


As a response to current requirements concerning the long-term preservation of research data from research projects funded, for instance, by the DFG, the discussion about appropriate archiving of research results has gained importance. While portals for globally standardised research data, e.g. climate data, are available, there is currently no provision for the large amount of data resulting from specialised research in individual research foci. In these cases, the requirements for long-term preservation have to be met by local solutions. In addition to the permanent storage of primary data and associated metadata, important steps of the genesis and the transformation process of published research results should also be incorporated into these individual solutions. Thus, the transparency of the research process will be assured.
In order to gain a promising approach some additional demands should be met. Most importantly, the resulting system has to be attractive to scientists by offering a preferably intuitive usability for the research staff. Consequently, the system has to be easy to use and all relevant metadata should be provided by automated extraction from the primary research data as far as possible. Furthermore, the system has to offer reliable long-term preservation with high availability as well as a suitable rights management and access control.
These requirements are met by the described approach, developed in the project “INF Project Z2” within SFB 840. The basic aim of the project is to grant access to the research data of the SFB 840 in accordance with the guidelines of the DFG.
The first challenge to the implementation arises from the diversity of the participating research fields (chemistry, physics, and biology). As a consequence, it is necessary to use various metadata formats for the primary research data as it may be of any kind, for instance the output of a spectrometer or the result of numerical simulations.
The second challenge originates from the requirement that the results have to be easily reproducible, even though the responsible researcher might not be available at the time. Therefore, in addition to the findings themselves and the underlying primary data, all the steps of editing and transforming these data have to be documented with appropriate metadata and linked to each other. With the published results as a starting point, it must be possible to browse through the underlying data and associated transformation processes. Thus, the portal widely exceeds the basic task of storing research data and becomes a tool that describes the process of producing research results.
In the current approach the processing steps are described as nodes, possessing particular data and associated metadata. Therefore, publications have to be analysed to show the relation to different research results, i.e. images, tables, or other published data. Next, the genesis of each individual result is reconstructed by defining all the processes undertaken and abstracting identical or similar process steps. Afterwards, all steps are implemented as object types determined by the underlying node type, the associated metadata schema, and the upload process (kind of files, structure, and automated extraction of metadata).
The implementation is based on the software MediaTUM, developed at the TU Munich (TUM), which allows a flexible definition of metadata and provides appropriate access control. The long-term preservation in accordance with the OAIS standard is made available by the software Rosetta developed by Ex Libris. Here, we collaborate with the Bavarian State Library (BSB) that provides access to their Rosetta system hosted at the Leibniz-Rechenzentrum in Munich. The DOI registration is provided by the TIB Hannover leading to the visibility of the results in DataCite. Exchange of information exist with the TUM, BSB and the ETH Zurich.

Keywords: research data management; publications; long-term preservation; metadata
