Wikidata and Open Science: A Model for Open Data Work
Wikidata is a language-independent factual database belonging to the Wikimedia family which includes the particularly well-known Wikipedia. In an interview, Timo Borst explains the dimensions and particular significance of this database, especially in the context of Open Science.
An interview with Dr Timo Borst
The free encyclopaedia Wikidata is the largest collaboratively generated collection of Open Data worldwide. The data contained in Wikidata are interlinked and freely usable for everyone at any time. The platform currently comprises over 90 million data objects and there are around 25,000 active editors. Wikidata is therefore the most intensively used database within the Wikimedia community.
The fundamental idea of Wikidata is to create a language-independent factual database that keeps available encyclopaedic knowledge in machine-readable form for Wikipedia and also other content providers. But what role does Wikidata play in the context of Open Science? We talked with Dr Timo Borst), head of the ZBW department “Innovative Information Systems and Publishing Technologies”, who deals with software developments at the ZBW.
What is the importance of Wikidata for research, in your opinion?
As a data hub or generally-valid and quasi “neutral” knowledge base, Wikidata is a superb port of call for verifiable Open Data. From a data technology aspect, Wikidata forms the basis for Open Science. The “hub” supports open working, even if this database was not primarily developed for research purposes, for example in the sense of a repository for research data. Information specialists and scientists can perform “data curation” in the form of shared data maintenance, without first having to go through elaborate editing or release processes.
Particularly in biomedical research, but also in the humanities, there are some interesting initiatives and also findings on this. Wikidata has been practising the FAIR principles regarding the findability, accessibility, interoperability and reusability of its data from the very beginning. This database is not so much a reflection of current data-based research – Wikidata regards itself, according to its own policy, explicitly as a “secondary database” – but much more the materialisation of an encyclopaedic knowledge, which is also the basis of data-driven research.
Added to this is the fact that it is possible to communicate via certain entities – concepts, persons, places or events – using Wikidata identifiers and/or the respective linked identifier systems directly via the Web and/or in corresponding applications. In this sense, Wikidata also offers the model for a consistently web-based science communication.
Have you already benefited from Wikidata in your projects or developments?
Yes, definitely. At the ZBW we do information-scientific research and development in the broadest sense by collating, evaluating and processing meta information into specialist information. Here, Wikidata is an extremely comprehensive source for formal metadata, for example in the context of journals. I was recently amazed myself at how many journal titles Wikidata contains and about the number of links that exist to other identifier systems such as those of ISSN, Scopus or Open Citations. Wikidata is more complete than any publishing system or many other bibliographical aggregators.
The fact that Wikidata is not based on one or several content providers, nor on any specific project consortium means that there are links from all possible contributing communities. And we have only considered the consumption side as yet here – Wikidata naturally also offers excellent opportunities to contribute with one’s own Open Data, preferably via programmes and machine interfaces. It is also possible, where appropriate, – after an internal proposal and review procedure – to introduce new properties as well, thereby expanding the data schema of Wikidata.
And Wikidata as a jointly enriched data source: What were your experiences here?
We have enriched Wikidata with our data at various points: We have linked descriptors from our STW Thesaurus for Economics with economic concepts available in Wikidata to create, among other things, entry points in our holdings search. In the context of a so-called “data donation” we have added data and links to thousands of dossiers with digitalised newspaper articles on well-known historical personalities – including some economists – who are featured in Wikidata.
And we have expanded the information on economic researchers listed in Wikidata with the RePEc ID, which is particularly common in this discipline, and with the Integrated Authority File Identifier (GND-ID). We are currently using the latter ourselves in the context of our EconBiz Author Profiles, although this expanded information is naturally also available to all third parties at any time – because that is ultimately the idea of transparent and collaborative scholarship in the sense of Open Science: making one’s own findings and work results available for re-use by oneself and others.
Examples of the usage scenarios of Wikidata in research:
Life sciences:
- Science Forum: Wikidata as a knowledge graph for the life sciences
- Expansion der Wikidata-Strukturen für Coronaforschung
- Comprehensive data collection on genome information
Humanities
- Using Wikidata to build an authority list of Holocaust-era ghettos
- Archive guide to the German Colonial Past
Wikidata links of the ZBW:
Examples for links of Wikidata items with STW Thesaurus for Economics descriptors:
Examples for links of Wikidata items to persons with press material (you can find the link to ZBW data in the section “identifiers” as property “PM20 folder ID”)
Examples of EconBiz author profiles that are supplemented with Wikidata (the link is located at the end of the right infobox on the respective person):
This text has been translated from German
This might also interest you:
- Wikimedia 2030: Together with Libraries to the Largest Knowledge Infrastructure in the World
- Digital Library: Simulation of User Interactions to Optimise Literature Searches
- Software development in libraries: From Open Source to “not invented here” (only german)
- GitHub and ´social coding´: New forms of software development and distribution as an opportunity (only german)
- Researcher Profiles in EconBiz: Semi-Automated Generation Based onlLinked Open Data
- FOLIO Library Management System: Open Source on its way into Everyday Library Life
This article also appeared in the 2020 ZBW Annual Review “Open” (PDF) that highlights developments at the ZBW, among other things: Research Data Management, Open Science and organised knowledge.
Dr Timo Borst is head of the Innovative Information Systems and Publishing Technologies department at the ZBW – Leibniz Information Centre for Economics. He researches open bibliographic data and systems that serve their creation, processing, standardisation and linking. These include in-house ZBW applications as well as external data hubs such as Wikidata. He can also be found on LinkedIn, ORCID, ResearchGate and Twitter.
Portrait: ZBW©
View Comments
Open Science Conference 2021: On the Way to the “New Normal”
What were the main points of focus at the Open Science Conference 2021? What are the...