GithubHelp home page GithubHelp logo

marfox / wikidata-constraints-violation-checker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wmde/wikidata-constraints-violation-checker

0.0 1.0 0.0 65 KB

a tool to analyze constraint violations on Wikidata

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

wikidata-constraints-violation-checker's Introduction

Wikidata Constraints Violation Checker

The Wikidata Constraints Violations Checker allows you to analyze the number of constraints violations on a list of Wikidata Items. This is useful to better understand which Items need improvements the most and to better understand the data quality of a specific area of Wikidata.

Installation

This script requires at least Python 3.6. In your terminal, run:

git clone https://github.com/wmde/wikidata-constraints-violation-checker.git
cd wikidata-constraints-violation-checker
pip3 install -r requirements.txt

Usage

# To run the script with an input file
python3 checkDataQuality.py -i <inputfile>

# To run the script using randomly generated Item IDs
python3 checkDataQuality.py -r <number of items>

# You can also specify an output filename
python3 checkDataQuality.py -i <inputfile> -o <outputfile>

# Or a batch size
python3 checkDataQuality.py -r <number of items> -b <batch-size>
Arg Name Description
-i Input file The path to the file containing the input data
-r Randomly generate Items The number of Items to randomly generate
-o Output file The path to the file for output
-b Batch Size The list of Items are broken down into batches for processing.
Default value is 10

Input Data

The script can read CSV files or generate random Item IDs.

CSV File

Example input file, the first column will be used to query for constrains violations:

Q60,New York
Q64,Berlin
Q70,Bern
Q84,London
Q90,Paris

Output Data

The following fields are provided in the output data for Items that are succesfully checked.

Field Description
QID The unique Item identifier
statements Total amount of statements on the Item
violations_mandatory_level # of violations at a mandatory level
violations_normal_level # of violations at a normal level
violations_suggestion_level # of violations at a suggestion level
violated_statements # of statements with violations
total_sitelinks # of sitelinks on the Item
wikipedia_sitelinks # of sitelinks to Wikipedia
ores_score ORES Item quality score
From 1 to 5 (lowest to highest)

Note

Please be aware that some large Items are skipped during the analysis because the constraint check API times out for them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.