GithubHelp home page GithubHelp logo

treatkgc's Introduction

SICKLE

  1. Data
  • We use three datasets: DBped-P, DB15K, and NELL.
  • Unzip resources/NELL.zip to resources/NELL/
  • Unzip resources/DB15K.zip to resources/DB15K/

Data format:

  • Triples: filename abox_hrt_uri.txt, h, r and t, seperated by tab
  • Ontology: ontology in OWL2, n-triples format, containing at lease class hierarchy, disjoint class, domain/range etc.
  • The L-method need literals, please refer to resources/NELL for data formats.
  • The ACC need entity2type.txt, please refer to resources/NELL for data formats.
  1. Dependencies
  • Install pytorch according to your cpu/gpu environment.
  • Install all packages in requirements.txt. Feel free to use conda, pyenv or others.
  • Install java, jdk, maven
  1. Preparing Konclude reasoner
mkdir java_owlapi/Konclude
unzip resources/packages/Konclude-v0.7.0-1135-Linux-x64-GCC-Static-Qt5.12.10.zip -d java_owlapi
mv java_owlapi/Konclude-v0.7.0-1135-Linux-x64-GCC-Static-Qt5.12.10 java_owlapi/Konclude 
  1. Preparing TBox scanner
cd java_owlapi/TBoxTREAT
mvn clean install
  1. Triple producer
source setup.sh
cd pipeline
python experiments.py --dataset=DB15K --work_dir="../outputs/proDB15K/"  --produce=True --start_acc=True --silver_eval=True --pred_type=False --pipeline=a_m_l --loops=1 --rel_model=complex --inductive=False --parallel=True --schema_aware_sampling=False 

"a_m_l" means run AnyBURL, materialization, blp together. If you only run single method, just use option a, m or l. --inductive=True would set blp to literal embedding. --inductive=False would set blp to pure KG embedding. Please refer to blp paper for more information.

For type prediction:

  • Set --pred_type=True. This will run type prediction after the link prediction step.
  • Or run
cd module_utils
python type_producer.py --dataset=DB15K --work_dir="../outputs/proDB15K/a_m_l/
  1. For custom datasets:
  • Please refer to the required data format.
  • You need to generate TBox inconsistency justification patterns before you run the pipeline:
scripts/tbox_Scanner.sh  schema_file work_dir

  1. TREAT downstream sampling (For my co-worker's project. You don't need this)
source setup.sh
cd ../../pipeline 
python TREAT_downstream.py

There are a few output files in outputs/treat_downstream/: \

  • blp_new_triples.csv: all produced triples with scores.
  • sample_and_score.pt: train sampling file, with format h,r,t,s.
  • valid_hrt.txt: triples that consistent with schema.
  • invalid_hrt.txt: triples that consistent with schema. The entities and relations in these files are indexed by abox_scanner/abox_utils.py. However, it can be translated to URI with minor changes in a pipeline.

treatkgc's People

Contributors

sylviawangfr avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.