wikicompare's Introduction

WikiCompare: Comparing knowledge in wikipedia projects

The goal of this project is to compare the content/knowledge of different Wikipedia projects. In particular, we are interested in multilingual Wikipedias and Wikidata.

For example, looking at the University of Amsterdam:

UvA (Dutch)	UvA (English)	UvA (Wikidata)

You see different content. The goal of this project to create quantative measures of the different.

This is useful in the context of projects we work on in indelab.org which focus on adding knowledge to knowledge bases like Wikidata.

See for example:

Prompting as Probing: Using Language Models for Knowledge Base Construction by Dimitrios Alivanistos, Selene Báez Santamaría, Michael Cochez, Jan-Christoph Kalo, Emile van Krieken, Thiviyan Thanapalasingam Github
Inductive Entity Representations from Text via Link Prediction Daniel Daza, Michael Cochez, and Paul Groth, in The Web Conference 2021. Github

Current Results

Data

The results below are for Dutch univiersities as defined by the following SPARQL query executed over Wikidata

SELECT ?item
WHERE {
  ?item wdt:P31 wd:Q3918 .
  ?item wdt:P17 wd:Q55 .
  ?nlSite schema:isPartOf <https://nl.wikipedia.org/> .
  ?enSite schema:isPartOf <https://en.wikipedia.org/> .
  ?nlSite schema:about ?item .
  ?enSite schema:about ?item .
}

This retrieve all entities of type (wdt:P31) univerity (wd:Q3918) who have a country (wdt:P17) of the Netherlands (wd:Q55). We then use Pywikibot to retrieve the wikipedia pages from the Dutch and English wikipedias as well as the representation from Wikidata.

Comparison of number of sections between NL en EN wikipedias

Word count: comparing the English pages and the Dutch pages translated to English

Gzip size: comparing the English pages and the Dutch pages translated to English

Comparision of the number of entities extracted

Here we used the pretrained small language models from Spacy for Dutch and English to do named entity recognition.

Recommend Projects

fred-white94 / wikicompare Goto Github PK

wikicompare's Introduction

WikiCompare: Comparing knowledge in wikipedia projects

Current Results

Data

Comparison of number of sections between NL en EN wikipedias

Word count: comparing the English pages and the Dutch pages translated to English

Gzip size: comparing the English pages and the Dutch pages translated to English

Comparision of the number of entities extracted

wikicompare's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs