GithubHelp home page GithubHelp logo

phucty / mtab_tool Goto Github PK

View Code? Open in Web Editor NEW
29.0 2.0 0.0 90.98 MB

MTab: Entity Search and Table Annotation with Wikidata, Wikipedia, and DBpedia

Home Page: https://mtab.kgraph.jp

License: MIT License

Python 99.67% HTML 0.33%
entity-search mtab table-annotation tabular-data annotation knowledge-graph semantic-labeling multilingual semtab

mtab_tool's Introduction

MTab


MTab: Entity Search and Table Annotation with Knowledge Graphs (Wikidata, Wikipedia and DBpedia)

Demo

API usage

Source code:

Other works:

  • MTab4D: Table Annotation with DBpedia
  • WikiDB: Build a DB (key-value store - LMDB style) from Wikidata dump, offline access Wikidata, fast boolean search

References

Awards:

  • 1st prize at SemTab 2021 (usability track). Results

    MTab
  • 1st prize at SemTab 2020 (tabular data to Wikidata matching). Results MTab

  • 1st prize at SemTab 2019 (tabular data to DBpedia matching). Results MTab

Citing

If you find MTab tool useful in your work, and you want to cite our work, please use the following referencee:

@inproceedings{2021_mtab4wikidata,
  author    = {Phuc Nguyen and
               Ikuya Yamada and
               Natthawut Kertkeidkachorn and
               Ryutaro Ichise and
               Hideaki Takeda},
  title     = {SemTab 2021: Tabular Data Annotation with MTab Tool},
  booktitle = {SemTab@ISWC 2021},
  series    = {{CEUR} Workshop Proceedings},
  volume    = {3103},
  pages     = {92--101},
  publisher = {CEUR-WS.org},
  year      = {2021},
  url       = {http://ceur-ws.org/Vol-3103/paper8.pdf},
}

Contact

Phuc Nguyen ([email protected])

mtab_tool's People

Contributors

phucty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mtab_tool's Issues

The API does not work now

Hi~
I want to build a system related to using an external knowledge base to provide useful information for end-users. I found your API is suitable for my project and I want to cite your work. However, the API of search entity does not work now. I try to ask for help and figure out the reason.

——————Thank you. I notice that you are carrying out maintenance!

Return all entity candidates and their confidence scores as the final CEA annotation results

I got a new request to enable returning all entity candidates and their confidence scores as the final results of the CEA task.

The user could find the example as turning debug=True in the request.

Please change the dir_table to your table location.

debug=True, # return all candidates, and their confidence scores in CEA tasks

The annotation results will look like this file.
mtab_annotation_result.txt

Replicate MTab results on Tough Table dataset

I got a request to replicate MTab results on the Tough Table dataset with MTab API.

The results can be replicated with this script: https://github.com/phucty/mtab_tool/blob/master/run_2t.py
Other resources:
2T dataset: https://github.com/phucty/mtab_tool/blob/master/data/semtab/tables.zip
Annotation Results: https://github.com/phucty/mtab_tool/blob/master/results.zip

The final result is very close to our results in SemTab 2020. (F1: 0.895 vs. 0.907 as reported in our SemTab 2020 report)
Setting:

- search_mode="a", (Using aggregation search: Fuzzy + keyword )
- search_limit=100 (Get top 100 relevant entity candidates),
- search_expensive=True (Do not stop early once got a good candidate),
- chunk_size=200 (Split big table to smaller (200 line/table)),

Result:

180/180. Z4M8AT89: 100%|██████████████████████████████████████| 667244/667244 [21:53:57<00:00,  8.46it/s]
{
    "ALL": {
        "precision": 0.8954061559997541,
        "recall": 0.8953726073220591,
        "f1": 0.8953893813466541,
        "correct": 597432,
        "gt": 667244,
        "submit": 667219
    }
}

The differences in the results could be explained as:

  1. Wikidata change over time: The online API of the MTab system is built on a Wikidata dump in 2021, while the 2T dataset results in 2020. There are many changes in Wikidata, so we have a difference in the final results.
  2. Since we have limited resources to host MTab services, we have to split big tables into chunks and feed them to MTab API, so it could also affect the final results.

API connection refused

Hello~ The API worked fine yesterday, but today the connection was rejected.
I annotated a table with curl -X POST -F file=@"0.t0.csv" https://mtab.app/api/v1/mtab, but got curl: (7) Failed to connect to mtab.app port 443: Connection refused. And inference can't be opened, too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.