GithubHelp home page GithubHelp logo

snap-stanford / kgreasoning Goto Github PK

View Code? Open in Web Editor NEW
256.0 10.0 53.0 21 KB

Multi-Hop Logical Reasoning in Knowledge Graphs

License: MIT License

Python 96.95% Shell 3.05%
knowledge-graph knowledge-base embedding reasoning

kgreasoning's Introduction

KGReasoning

This repo contains several algorithms for multi-hop reasoning on knowledge graphs, including the official Pytorch implementation of Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs.

Models

KG Data

The KG data (FB15k, FB15k-237, NELL995) mentioned in the BetaE paper and the Query2box paper can be downloaded here. Note the two use the same training queries, but the difference is that the valid/test queries in BetaE paper have a maximum number of answers, making it more realistic.

Each folder in the data represents a KG, including the following files.

  • train.txt/valid.txt/test.txt: KG edges
  • id2rel/rel2id/ent2id/id2ent.pkl: KG entity relation dicts
  • train-queries/valid-queries/test-queries.pkl: defaultdict(set), each key represents a query structure, and the value represents the instantiated queries
  • train-answers.pkl: defaultdict(set), each key represents a query, and the value represents the answers obtained in the training graph (edges in train.txt)
  • valid-easy-answers/test-easy-answers.pkl: defaultdict(set), each key represents a query, and the value represents the answers obtained in the training graph (edges in train.txt) / valid graph (edges in train.txt+valid.txt)
  • valid-hard-answers/test-hard-answers.pkl: defaultdict(set), each key represents a query, and the value represents the additional answers obtained in the validation graph (edges in train.txt+valid.txt) / test graph (edges in train.txt+valid.txt+test.txt)

We represent the query structures using a tuple in case we run out of names :), (credits to @michiyasunaga). For example, 1p queries: (e, (r,)) and 2i queries: ((e, (r,)),(e, (r,))). Check the code for more details.

Examples

Please refer to the examples.sh for the scripts of all 3 models on all 3 datasets.

Citations

If you use this repo, please cite the following paper.

@inproceedings{
 ren2020beta,
 title={Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs},
 author={Hongyu Ren and Jure Leskovec},
 booktitle={Neural Information Processing Systems},
 year={2020}
}

kgreasoning's People

Contributors

hyren avatar roks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kgreasoning's Issues

create_queries.py

Thanks for sharing your code!
How to run the create_queries.py file? How do I set parameters
In addition, in the index_dataset function, only the train_indexified.txt was generated. Does the program only need to generate the train_indexified.txt file?

How are the easy and hard answers generated?

Hi, as I'm generating queries for a new dataset using the create_queries.py, I am wondering how are the easy and hard answers generated exactly? Using create_queries.py generate three answer files, fp-answers.pkl, fn-answers.pkl, and tp-answers.pkl, but how to build easy and hard answers on top of these three?

data

Excuse me, how is the data of query built? For example, the file train-queries.pkl.

KL Divergence

Why not use Jensen-Shannon Divergence instead of KL Divergence

Natural Queries Extraction

Thanks for sharing the repo.
I'm trying to find/extract the natural language queries of the respective entity and relation represented queries. Could you share if it is available or could be extracted. Any reference would be helpful.

Codes for generating queries

Hi, thanks for sharing the codes of the nice work!

Could you please release the codes for generating queries? Now I want to generate some queries in other structures but don't know how to make it.

"&" operator instead of "weighted product of the PDFs"

Hi ! Thanks for sharing the source code .
In the paper, you use “weighted product of the PDFs” to implement the intersection operator . But why not use the "&" operator to implement the intersection operator ?

CUDA error: device-side assert triggered

Hello, currently I am trying to using your code to generate the query and answer on my own dataset. I succeed to generate the query and answer .pkl file, but when I run the training, it yeld the error like this

C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: block: [4,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2020.3.2\plugins\python\helpers\pydev\pydevd.py", line 1477, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2020.3.2\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/ERCLab/Desktop/RESEARCH/Natural language processing/KGReasoning/main.py", line 449, in <module>
    main(parse_args())
  File "C:/Users/ERCLab/Desktop/RESEARCH/Natural language processing/KGReasoning/main.py", line 385, in main
    log = model.train_step(model, optimizer, train_path_iterator, args, step)
  File "C:\Users\ERCLab\Desktop\RESEARCH\Natural language processing\KGReasoning\models.py", line 591, in train_step
    positive_logit, negative_logit, subsampling_weight, _ = model(positive_sample, negative_sample, subsampling_weight, batch_queries_dict, batch_idxs_dict)
  File "C:\Users\ERCLab\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\ERCLab\Desktop\RESEARCH\Natural language processing\KGReasoning\models.py", line 196, in forward
    return self.forward_box(positive_sample, negative_sample, subsampling_weight, batch_queries_dict, batch_idxs_dict)
  File "C:\Users\ERCLab\Desktop\RESEARCH\Natural language processing\KGReasoning\models.py", line 434, in forward_box
    center_embedding, offset_embedding, _ = self.embed_query_box(batch_queries_dict[query_structure], 
  File "C:\Users\ERCLab\Desktop\RESEARCH\Natural language processing\KGReasoning\models.py", line 216, in embed_query_box
    offset_embedding = torch.zeros_like(embedding).cuda()
RuntimeError: CUDA error: device-side assert triggered

I have already tried with FB15k dataset but it didn't yield the error. I can not figure out what wrong happen.
Could you give me some clue for solving that?

The link of KG data

Thanks for sharing your code!
The link of KG data seems to be missing, can you update the link?
Thanks a lot!

What's the difference between 'easy' and 'hard' answers?

Hi Dear Author,

Thanks for releasing this excellent work! The codes are very clean in use. But one thing unclear for me is about the dataset, since there're easy answers and hard answers for valid and test queries, and they seem to have no intersection. So what is the principle when you distinguish their difficulty? And how do you use both in evaluations (MRR and Hit@1,3,10)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.