Wikidata and Open Science: A Model for Open Data Work

An interview with Dr Timo Borst

The free encyclopaedia Wikidata is the largest collaboratively generated collection of Open Data worldwide. The data contained in Wikidata are interlinked and freely usable for everyone at any time. The platform currently comprises over 90 million data objects and there are around 25,000 active editors. Wikidata is therefore the most intensively used database within the Wikimedia community.

Logo Wikidata, licensed under (CC BY-SA 3.0).

The fundamental idea of Wikidata is to create a language-independent factual database that keeps available encyclopaedic knowledge in machine-readable form for Wikipedia and also other content providers. But what role does Wikidata play in the context of Open Science? We talked with Dr Timo Borst), head of the ZBW department “Innovative Information Systems and Publishing Technologies”, who deals with software developments at the ZBW.

What is the importance of Wikidata for research, in your opinion?

As a data hub or generally-valid and quasi “neutral” knowledge base, Wikidata is a superb port of call for verifiable Open Data. From a data technology aspect, Wikidata forms the basis for Open Science. The “hub” supports open working, even if this database was not primarily developed for research purposes, for example in the sense of a repository for research data. Information specialists and scientists can perform “data curation” in the form of shared data maintenance, without first having to go through elaborate editing or release processes.

Particularly in biomedical research, but also in the humanities, there are some interesting initiatives and also findings on this. Wikidata has been practising the FAIR principles regarding the findability, accessibility, interoperability and reusability of its data from the very beginning. This database is not so much a reflection of current data-based research – Wikidata regards itself, according to its own policy, explicitly as a “secondary database” – but much more the materialisation of an encyclopaedic knowledge, which is also the basis of data-driven research.

Added to this is the fact that it is possible to communicate via certain entities – concepts, persons, places or events – using Wikidata identifiers and/or the respective linked identifier systems directly via the Web and/or in corresponding applications. In this sense, Wikidata also offers the model for a consistently web-based science communication.

Have you already benefited from Wikidata in your projects or developments?

Yes, definitely. At the ZBW we do information-scientific research and development in the broadest sense by collating, evaluating and processing meta information into specialist information. Here, Wikidata is an extremely comprehensive source for formal metadata, for example in the context of journals. I was recently amazed myself at how many journal titles Wikidata contains and about the number of links that exist to other identifier systems such as those of ISSN, Scopus or Open Citations. Wikidata is more complete than any publishing system or many other bibliographical aggregators.

The fact that Wikidata is not based on one or several content providers, nor on any specific project consortium means that there are links from all possible contributing communities. And we have only considered the consumption side as yet here – Wikidata naturally also offers excellent opportunities to contribute with one’s own Open Data, preferably via programmes and machine interfaces. It is also possible, where appropriate, – after an internal proposal and review procedure – to introduce new properties as well, thereby expanding the data schema of Wikidata.

And Wikidata as a jointly enriched data source: What were your experiences here?

We have enriched Wikidata with our data at various points: We have linked descriptors from our STW Thesaurus for Economics with economic concepts available in Wikidata to create, among other things, entry points in our holdings search. In the context of a so-called “data donation” we have added data and links to thousands of dossiers with digitalised newspaper articles on well-known historical personalities – including some economists – who are featured in Wikidata.

Wikidata in the Linked Open Data Cloud. Databases indicated as circles (with wikidata indicated as ‘WD’), with grey lines linking databases in the network if their data is aligned. (Layout by graphopt algorithm by the igraph package in [R]. Data from Datasets). This file is licensed under (CC BY 4.0).

And we have expanded the information on economic researchers listed in Wikidata with the RePEc ID, which is particularly common in this discipline, and with the Integrated Authority File Identifier (GND-ID). We are currently using the latter ourselves in the context of our EconBiz Author Profiles, although this expanded information is naturally also available to all third parties at any time – because that is ultimately the idea of transparent and collaborative scholarship in the sense of Open Science: making one’s own findings and work results available for re-use by oneself and others.

Examples of the usage scenarios of Wikidata in research:

Life sciences:

Humanities

Wikidata links of the ZBW:

Examples for links of Wikidata items with STW Thesaurus for Economics descriptors:

Examples for links of Wikidata items to persons with press material (you can find the link to ZBW data in the section “identifiers” as property “PM20 folder ID”)

Examples of EconBiz author profiles that are supplemented with Wikidata (the link is located at the end of the right infobox on the respective person):

This text has been translated from German

This might also interest you:

This article also appeared in the 2020 ZBW Annual Review “Open” (PDF) that highlights developments at the ZBW, among other things: Research Data Management, Open Science and organised knowledge.

We were talking to Timo Borst

Dr Timo Borst is head of the Innovative Information Systems and Publishing Technologies department at the ZBW – Leibniz Information Centre for Economics. He researches open bibliographic data and systems that serve their creation, processing, standardisation and linking. These include in-house ZBW applications as well as external data hubs such as Wikidata. He can also be found on LinkedIn, ORCID, ResearchGate and Twitter.
Portrait: ZBW©

Share this post:

The ZBW – Leibniz Information Centre for Economics is the world’s largest research infrastructure for economic literature, online as well as offline.

GO-FAIR: A Member States-Up Strategy for the EOSC Implementation Open Science Infiltrates the Engine Room of Research:  ZBW Panel at the German Economic Association Annual Conference 2019 The GO FAIR International Coordination and Support Office is Operational

View Comments

Open Science Conference 2021: On the Way to the “New Normal”
Next Post