GithubHelp home page GithubHelp logo

kbnlwikimedia / wikidata-kb-overview Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 3.84 MB

An (non-complete) overview of how Wikidata is used in both the linked open datasets (thesauri) and public domain heritage collections of the KB, national library of the Netherlands

License: Creative Commons Attribution 4.0 International

wikidata-kb-overview's Introduction

Wikidata & KB national library of the Netherlands, an overview

A (non-exhaustive) overview of how Wikidata is used by/in/for both the linked open datasets (thesauri) and public domain heritage collections of the KB, national library of the Netherlands.

Latest update: 21 November 2023

This page is a textual summary of the course Verdieping: Wikidata & de KB for employeees of KB, national library of the Netherlands on 14 November 2023, 15:00-16:15.

See also


Contents

Table of contents generated with markdown-toc


Intro

Required basic knowledge about Wikidata

See the course Wegwijzer in Wikidata (Introduction to Wikidata), June 6, 2023 (in Dutch)

Course objectives

To provide more understanding about

  1. Why we use Wikidata at KB
  2. How we use Wikidata for KB thesauri & heritage collections
  3. What value this adds for KB

Course layout


BLOCK 1 - What does Wikidata add for the KB?

Open doors

(Captain Obvious mode) For KB & its services: Be findable in Google - Be present on Facebook - Be present on Instagram - Be present on YouTube - Be present on Twitter. --> Summary (open door): Be present on the large (web-scale) platforms

So also open doors:

  • Add your collection knowledge to Wikipedia
  • Add your collection images to Wikimedia Commons
  • Add your collection data to Wikidata

Wikidata characteristics

Wikidata is one of the largest and most popular LOD platforms in the world.

Characteristics:

  • Central part of the (web-scale) Wikimedia infrastructure (Wikipedia, Commons, 700+ Wikimedia platforms)
  • Free, public utility for data (no IT costs)
  • Centralized, no data silos, 1 language (w.r.t. SPARQL and API calling)
  • Global scope, (much) broader than KB/library/heritage/Netherlands domain
  • Connection point for 8330+ external databases worldwide
  • Multilingual, language independent, 300+ languages
  • Collaborative --> International community, 25K+ content creators
  • For humans (GUI) and machines (API, SPARQL, JSON, RDF, Python etc.)
  • LOD, the least scary of all LOD platforms --> Understandable & warm, thanks to community!
  • No copyright on data (CC0)
  • Strong growing, positive outlook & sustainable

Effective result: advantages of scale and community & network effects

Added value of Wikidata for KB

What values does Wikidata add for the KB & its services?

  1. Increased visibility, findability and reusability of our collections
    • Greater public reach of KB collections, worldwide
    • KB data in cross-domain, global, multilingual context --> Increasing interoperability KB with the outside world
    • Community: External expertise, skills, tools and enthusiasm to enrich & connect KB data
  2. New functionalities for our data (and images) --> See block 4
    • Functionalities that we do not or cannot offer in our own KB services
    • Regarding Search, Data enrichment, data quality control, data visualization and data formats, Image metadata, Machine interactions
    • Both for our thesauri and heritage collections
    • For people and machines
    • 'KB collections as LEGO'
  3. Toolkit & platform to create and publish new KB LOD
    • Internal KB LOD renewal process is not yet delivering public results
  4. Developing and sharing knowledge & skills related to LOD
    • Both internally and externally
    • Strengthening our cooperation with KB network partners via Wikidata/media

BLOCK 2 - Wikidata & KB thesauri (NTA + DBNLa)

KB datasets (thesauri): http://data.bibliotheken.nl/

Criteria for suitability of KB thesauri for Wikidata

Ergo: Focus on NTA and DBNL authors with regard to the KB thesauri-Wikidata activities.

a) From the NTA to Wikidata

Persons in the NTA with a Wikidata URI:

# Which NTA items have a link to Wikidata?
SELECT  * WHERE {
 ?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
 ?nta rdfs:label ?ntaLabel.  
 ?nta schema:sameAs ?wikidata .
FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
  • 499K of 2.75M NTA items have a Wikidata link (source)

b) From the DBNLa to Wikidata

Persons in DBNLa with a Wikidata URI (via the NTA)

# Which DBNLa authors have a link to Wikidata?
SELECT *  
WHERE {
 ?dbnl schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/dbnla> .
 ?dbnl rdfs:label  ?dbnlLabel.  
 ?dbnl owl:sameAs ?nta .
 ?nta schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
 ?nta rdfs:label ?ntaLabel.   
 ?nta schema:sameAs ?wikidata .
 FILTER(regex(?wikidata, 'wikidata', 'i'))
} LIMIT 1000
  • 14.5K of 109K DBNLa items have a Wikidata link (source)

Federated query to retrieve extra data from Wikidata

# Get supplementary data about DBNL author 'acke001' from Wikidata
PREFIX wdt:  <http://www.wikidata.org/prop/direct/>
SELECT * 
WHERE {
 ?dbnl schema:mainEntityOfPage/owl:sameAs  <http://data.bibliotheken.nl/doc/dbnla/acke001> .
 ?dbnl rdfs:label ?dbnlLabel.  
 ?dbnl owl:sameAs ?nta .
 ?nta  schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/persons> .
 ?nta rdfs:label ?ntaLabel.   
 ?nta schema:sameAs ?wikidata .
 FILTER(regex(?wikidata, 'wikidata', 'i'))

 SERVICE <https://query.wikidata.org/sparql> {
   ?wikidata wdt:P18 ?imageURL. #P18 = image
   ?wikidata wdt:P69 ?edcucatedAt. #P69 = educated  at
   ?wikidata wdt:P102 ?MemberOfPoliticalParty. #P102 = member of political party

 }
} 

Checks are OK:

c) From Wikidata to the NTA - P1006

Persons in Wikidata with an NTA id

SELECT ?item ?itemLabel ?NTAurl
{
  ?item wdt:P1006 ?NTAid.
  BIND(IRI(CONCAT('http://data.bibliotheken.nl/doc/thes/p', ?NTAid)) AS  ?NTAurl)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en"  }
}
LIMIT 1000

Insights in the usage of P1006

https://www.wikidata.org/wiki/Property_talk:P1006

  • Wikidata contains 550K links to the NTA: see 'Current uses' at bottom of this page, or via this SPARQL query
  • Map of birthplaces of people with an NTA id: https://w.wiki/7rsT
  • Famous people with an NTA id: https://w.wiki/85si (famous people have extensive Wikidata entries) with many statements

P1006 and data quality

Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the NTA

For example:

Usage of NTA ids in Wikipedia articles

Wikidata: Category:Articles with NTA identifiers

In summary: via Wikidata the NTA is used as an authority in 100,000 Wikipedia articles in many languages. (but not Dutch!)

Summary for NTA/P1006

d) From Wikidata to the DBNLa - P723

Persons in Wikidata with an DBNLa id

SELECT ?item ?itemLabel ?DBNLaUrl
{
  ?item wdt:P723 ?DBNLaId.
  BIND(IRI(CONCAT('http://data.bibliotheken.nl/id/dbnla/', ?DBNLaId)) AS ?DBNLaUrl)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en"  }
}
LIMIT 1000

Insights into the usage of P723

https://www.wikidata.org/wiki/Property_talk:P723

P723 and data quality

Two pages provide insight into the data quality (and possible improvements) of both Wikidata and the DBNLa

Historical metrics of Wikidata and NTA & DBNLa

Historical metrics of the usage of NTA and DBNLa identifiers in Wikidata, and v.v.: https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Koninklijke_Bibliotheek_en_Nationaal_Archief/Resultaten/KPIs/KPI10#Historische_ontwikkeling_van_KPI_10


BLOCK 3 - Intermezzo: Linking Wikimedia Commons with Wikidata

Look at File:Atlas_de_Wit_1698-pl017-Leiden-St_Pancraskerk.jpg on Wikimedia Commons (=Saint Pancras Church in Leiden, now called Hooglandse Kerk )

Regular metadata for KB images

  • Manifest textual and visual KB source references
  • 'Manual' multilingualism of the title in Latin, Dutch and French
  • Source code appears to be structured, but really is unstructured metadata (free text)
  • Tab 'Structured Data'

Structured Data on Commons

Structured Data on Commons (SDoC) is a project to add multilingual structured information from Wikidata to files on Wikimedia Commons that can be understood by humans, with enough consistency that it can also be uniformly processed by machines.

Added value of SDoC

  • Images are linked to Wikidata
  • Images are provided with real structured (and therefore machine-readable) data
  • Linked open data for Commons files is created, files become part of the LOD cloud
  • Not only for images, eg. see the structured data on this PDF file
  • Files are made searchable via SPARQL
  • For KB: Structured 5* LOD metadata for 31,348 KB files

SPARQL queries for KB images

What is depicted on KB images?

Summary of image search options in Wikimedia Commons

Let's summarize: KB images on Commons are searchable in 3 ways

  1. Via regular metadata (= free text search)
  2. Via structured metadata
  3. By content (What is depicted in KB images?)

Tooltip: Hay's Structured Search

The (super handy!) tool Hay's Structured Search offers all three options. It is a visual, multilingual search engine to find images with (and without) structured data in Wikimedia Commons.

In summary: The search functionalities shown (SPARQL, structured search, multilingual search, search by content) are much more advanced than the propriatary KB (image) services such as Het Geheugen!

Adding Depicts tags to KB images yourself

This manual from 2020 explains step by step how to make images from the KB collection more discoverable, visible and reusable by indicating (tagging) which things (entities) can be seen on those images. This is done by connecting Wikidata items to those things. Available on Wikimedia Commons and Zenodo

Results per 1 november 2023


BLOCK 4 - Wikidata & KB heritage collections

Examples of KB heritage collections: Medieval manuscripts - Maps and atlases - Armorials - Alba amicorum - Catchpenny prints - Children's picture books - Flora and fauna books

Criteria for suitability of KB heritage collections for Wikidata

  1. Collection highlights, canonical objects: The most important objects of the KB must be present on Wikidata (and Commons)
  2. Copyright free objects: Public domain = no hassle with copyright
  3. Limited collection size: 10-100s of images are easier to process than 10-100Ks
  4. Visually rich collections: What is depicted on the images, see Block 3
  5. Connectable to other things: Making semantic links between the KB collections and persons, places, events etc. described in Wikidata
  6. Collections consisting of similar, unique objects with narrow, flat, well-defined data models/classes: Similar values for instance of and/or subclass of. Not OK: hetereogenous ephemera.

WikiProject KB Collection highlights (2020-present)

KB collection highlights are part of our national heritage, just like e.g.

Presentation of collection highlights on kb.nl

Collection highlights on (previous) KB website from Febr 2020

Typical presentation of collection highlights on kb.nl, for instance for Atlas Ortelius 1571

  1. Catalog record --> Metadata
  2. Hi-res flip book --> Images
  3. Contextual article --> Stories, context

This presentation on kb.nl has limited functionalities and reuse options. This presentation represents an old way of thinking: Collection highlights (on kb.nl) are only for reading and viewing, inviting for passive consumption. More explanation in this article.

A new paradigm for collection highlights

A new way of thinking:

  • KB collection highlights are building blocks and invite for active reuse and creation.
  • Building blocks for tech community: Developers, app builders, tech companies, AIs, digital humanities, data scientists, hackathons, Wikimedia communities, LOD world, NDE, Europeana etc.
  • KB collection highlights as a toolbox of Technical LEGO
    • Contents of this toolbox: Eg. 5-star Linked Open Data - Automatic image recognition (AI) - Semantic tagging - Data dumps & bulk downloads - SPARQL - Images searchable by content - Data visualizations - Python - Machine-readable data - Flexible REST APIs - Manifest legal terms - IIIF - Data as JSON, XML, CSV - Automatic multilingualism - External LOD Identifiers
  • All these building blocks are available in the Wikimedia infrastructure: the combination of Wikidata (for metadata), Wikimedia Commons (for images) and Wikipedia (for contextual stories) - and their associated international communities - providing a coherent technical and social infrastructure to make KB's collection highlights much more visible, findable and reusable.

Wikification of KB collection highlights

Wikifying KB’s collection highlights

E.g. Atlas Ortelius:

  1. Catalog record KB --> Metadata to Q67465742 on Wikidata, with collection = Koninklijke Bibliotheek, and qualifier subject has role = collection highlight
  2. Hi-res flip book KB --> Images to Atlas Ortelius 1571 on Wikimedia Commons
  3. Contextual article KB --> Context to Theatrum Orbis Terrarum on Dutch Wikipedia

The WikiProject KB Collection highlights (2020-present) aims to improve the findabilty, visibilty and reusability of KB's collection highlights for both humans and machines by

  • creating and improving the Wikidata descriptions for all digitised KB collection highlights,
  • uploading their public domain images to Wikimedia Commons, reusing data from Wikidata as much as possible to create image metadata
  • creating and improving the Wikipedia articles about them on Dutch and English Wikipedia

Result of the project: All cool and value adding functionalities, tools and community capacities of the Wikimedia infrastructure are now available for our KB collection highlights. The party can start!

50 cool new things you can do now with KB's collection highlights

The party can start, let's build cool new things! --> See the article 50 cool new things you can do now with KB's collection highlights
In this series of 5 articles we show the added value of putting images and metadata of digitised collection highlights of the KB, national library of the Netherlands, into the Wikimedia infrastructure. By putting our collection highlights into Wikidata, Wikimedia Commons and Wikipedia, dozens of new functionalities have been added. As a result of Wikifying this collection in 2020, you can now do things with these highlights that were not possible before.

This article has 5 parts:

Examples
  1. All functionalities for KB images regarding SPARQL, structured search, multilingual search, search by content, as explained in Block 3
  2. Gallery of KB collection highlights on Dutch Wikipedia (never mind the new WP layout!)
  3. Persons/roles involved in each collection highlight
  4. Contributors to the Album Jacob Heyblocq
  5. Works by these contributors in DBNL
  6. Works by these contributors elsewhere, via Europeana, as Excel: See for example Govert Flinck on Europeana + this explanation, see Point 48

Contact

Questions or remarks can be sent to Olaf Janssen, Wikimedia coordinator of the KB - [email protected] - @ookgezellig

Reuse and licensing

This overview can be reused freely and openly, it is available under the CC-BY 4.0 license, so attribution is required. Use something like

Wikidata & KB national library of the Netherlands, an overview, Olaf Janssen & KB national library of the Netherlands, https://github.com/KBNLwikimedia/Wikidata-KB-Overview

wikidata-kb-overview's People

Contributors

ookgezellig avatar

Stargazers

Hisyam Athaya avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.