EUDAT: Research Data Infrastructure and European Open Science Cloud Vision

Research Data Infrastructures are critical for the realization of the European Open Science Cloud (EOSC) vision. A strategic vision of EOSC is to support multidisciplinary researchers with secure, seamless, and open access to research data, as well as: provision of advanced digital services, such as data storage, cloud services, data sharing, analytics and high performance computing et cetera. Strategically, to make this vision work, EOSC must build upon the scientific infrastructures project to utilize the different available services by connecting national and international infrastructures. At present there are many ongoing infrastructures projects, such as EUDAT, OPENAIRE and GeRDI. So forth and undoubtedly, EUDAT is one of the prominent infrastructure projects because it works in close collaboration with end users from different scientific fields.

What is EUDAT?

EUDAT is a pan-European data infrastructure operating across Europe since 2011. It is a collaborative data infrastructures project which is comprised of both generic and topic related infrastructure providers who have implemented and are running the e-infrastructure services to support research data. EUDAT follows a service oriented architecture and provides a set of cross-disciplinary service stack for the research data management life cycle.

From 3rd to 4th April 2017, EUDAT organized a Semantic Working Group Workshop at 9th RDA Plenary in Barcelona. The agenda of the workshop was to introduce the EUDAT service suite with particular focus on the newly developed pilot service B2Note. Moreover, the topic of deliberation was how to improve the discoverability and the interoperability of multi-disciplinary scientific semantic resources which are core for B2Note annotation service.

Service stack of EUDAT

Given below is the service suite of EUDAT which majorly supports the findability, accessibility, interoperability and reusability of research data:

  • B2Access: Is an easy to use and federated identity management system. It provides users with the identifier (EUDAT ID) upon registration, in-case they don’t have a valid Google account or a Home Organization Identity provider.
  • B2Handle: Provides the tools for the persistent identification of the objects/resources. The objective of the persistent identifier is to make sure long term data management and to facilitate the users for reliable research data identification and citation over a long period of time.
  • B2Share: Is a user friendly web based service for the publishing, storing, preserving and sharing of research data. It is specifically intended for the researchers, scientific communities and citizen scientists to store and share small scale research datasets.
  • B2Find: Is a service to find and access data in EUDAT. B2Find search index consists of the research data metadata which is harvested from EUDAT data centers and other community repositories. On top of its search index B2Find offers faceted browsing and allows in particular discovering data that is stored through the B2SHARE service.
  • B2Note: Is a very recent EUDAT pilot service which allows researchers and scientists to easily and intuitively put annotation on files hosted within the EUDAT collaborative data infrastructures. Three types of annotations are supported by B2Note: Semantic Annotation in which a semantic tag came from identified ontology repository, free text keywords when no semantic tag is found and a free text comment.

What is an annotation and why it is relevant to research data management

An annotation is a descriptive tag or note that is added to a text or diagram to explain it better. Whereas semantic annotation is also for a better description of a digital object but the additional information (tags) usually comes from vocabularies and is typically used by machines. When the document is (semantically) annotated, it becomes enriched as a source of information which is easy to find, interpret and re-use. Annotations have been used in various research activities and domain for better knowledge representation and management.

Annotations are also very relevant to the domain of research data management. For example, at the instance of research data production the dataset may lack an adequate metadata description, but with the event of storage in the repository the metadata of this research dataset gets frozen. In the current status of this dataset it is difficult to change the metadata and that is where the role of annotation comes. It can leverage the users to re-annotate and enrich the dataset with additional information, shaping it to be a better information source. Specifically to facilitate the researcher in the aforementioned situation EUDAT rolled out B2Note service.

Challenges

EUDAT deals with research data from multiple disciplines and also wants to have a multidisciplinary central index of semantic tags for B2Note. Though at the moment, semantic tags are only coming from BioPortal which is a comprehensive repository of biomedical vocabularies. However, due to discoverability and interoperability challenges it is difficult to find vocabulary repositories which belong to different disciplines. To tackle these challenges a general set of recommendations was discussed in the Semantic Working Group Workshop:

  • The prominent repositories such as EUDAT, BioPortal, Agroportal and Linked Open Data vocabulary must agree on the minimalistic set of metadata elements for vocabulary description.

    Moreover, these repositories ensure that only those vocabularies are allowed to store in their repository which complies with agreed metadata elements. This practice will enrich the description of the vocabularies further and ensure the discoverability and interoperability.

  • The repositories must adhere to the standard practices for providing access to vocabularies by means of RestApi or Sparql Endpoint.
  • Currently, a lot of vocabularies which may have been used in various discipline specific cases do exist out there but nobody knows about it. One proposal discussed in the workshop was to setup a social market place where users of these vocabularies can share their usage experience with respect to their discipline. This may help in choosing the right vocabularies for the multidisciplinary research data annotations.
  • Conclusion

    In summary, the service suite of EUDAT does nicely map with EOSC’s FAIR Data and Services initiative and can undoubtedly make a concrete contribution to EOSC. It is obvious that European data infrastructure cannot be constructed from scratch and that initiatives like EUDAT are fundamental to the realization of EOSC.

    Further information

  • Communication: European Cloud Initiative – Building a competitive data and knowledge economy in Europe
  • Realising the European Open Science Cloud
  • Autor: Dr. Atif Latif (Current areas of work: Linked Open Data, Digital Libraries and Research Data Infrastructure; ZBW – Leibniz Information Centre for Economics)

    Share this post:

    The ZBW – Leibniz Information Centre for Economics is the world’s largest research infrastructure for economic literature, online as well as offline.

    Open Science and Organisational Culture: Openness as a Core Value at the ZBW Barcamp Open Science 2022: Connecting and Strengthening the Communities! Generation R: Forming Open Scientists and Shaping Science Systems

    View Comments

    re:publica 2017: A Day That Covers Cultural Memory, Research Data and Web Culture
    Next Post