xwhan / deeppath Goto Github PK

code and docs for my EMNLP paper "DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning"

Python 99.35% Shell 0.65%

tensorflow knowledge-graph reasoning emnlp2017

deeppath's Introduction

Deep Reinforcement Learning for Knowledge Graph Reasoning

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector-space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuravy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

Access the dataset

Download the knowledge graph dataset NELL-995 FB15k-237

How to run our code

unzip the data, put the data folder in the code directory
run the following scripts within scripts/
- ./pathfinder.sh ${relation_name} # find the reasoning paths, this is RL training, it might take sometime
- ./fact_prediction_eval.py ${relation_name} # calculate & print the fact prediction results
- ./link_prediction_eval.sh ${relation_name} # calculate & print the link prediction results
Examples (the relation_name can be found in NELL-995/tasks/):
- ./pathfinder.sh concept_athletehomestadium
- ./fact_prediction_eval.py concept_athletehomestadium
- ./link_prediction_eval.sh concept_athletehomestadium
Since we already put the reasoning paths in the dataset, you can directly run fact_prediction_eval.py or link_prediction_eval.sh to get the final results for each reasoning task

Format of the dataset

raw.kb: the raw kb data from NELL system
kb_env_rl.txt: we add inverse triples of all triples in raw.kb, this file is used as the KG for reasoning
entity2vec.bern/relation2vec.bern: transE embeddings to represent out RL states, can be trained using TransX implementations by thunlp
tasks/: each task is a particular reasoning relation
- tasks/${relation}/*.vec: trained TransH Embeddings
- tasks/${relation}/*.vec_D: trained TransD Embeddings
- tasks/${relation}/*.bern: trained TransR Embedding trained
- tasks/${relation}/*.unif: trained TransE Embeddings
- tasks/${relation}/transX: triples used to train the KB embeddings
- tasks/${relation}/train.pairs: train triples in the PRA format
- tasks/${relation}/test.pairs: test triples in the PRA format
- tasks/${relation}/path_to_use.txt: reasoning paths found the RL agent
- tasks/${relation}/path_stats.txt: path frequency of randomised BFS

If you use our code, please cite the paper

@InProceedings{wenhan_emnlp2017,
  author    = {Xiong, Wenhan and Hoang, Thien and Wang, William Yang},
  title     = {DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning},
  booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {ACL}
}

Acknowledgement

deeppath's People

Contributors

Stargazers

Watchers

Forkers

ml-lab vikingmew jdc08161063 caoge4 stevenlol sungjinlees srivignessh benjamesbabala tartaruszen leechikara codeaudit ravitejaanantha cdyangbo wuqixiaobai hardikgw liukangling akbari59 todpole3 nlpprof sallyzhu colinsongf jac2130 bitgirlcoder ramonyeung yunwenhuang bkyanchang seanliu96 misoknisky xitongdashi doreenruirui rxlgq jimmycao moran1986 ufomeiyi yshihui liqipap davidie simonzhao777 readyteresa tonyissacjames cxjtju emir-munoz hitercs david-lee-1990 shubhampachori12110095 wangbq18 leiloong wurentidai tarsbase jimmywangheng cl91 fendaq emfomy abhishekhandacse xcgfth bkwapong njirene gzupanda excelsimon xerebral ihaeyong jimiipsec scape1989 todun fengyinyang rpatil524 unt-iia-lab li-ming-fan phychaos fridex beaterli siddharth2908 adixxov xrosliang originprince patzaa mazi-hou h-n-song zjjhym happywwy zzw-x nwpusunyue pxiuqin ouyangbo1988 teenkevo sailfish009 hanabhp ndobb 7clearlove71 romantic-little-boy zdqf qqq525 kaizhiyu bapleliu zdh2292390 taoistboy tigerff yuanzx33033 dataminingfans trinh-hoang-hiep

deeppath's Issues

the FB15K-237 Dataset

How does your model works on the FB15K-237 Dataset, as mentioned by your paper?

Part of the open source

When I browse your code and datasets, I found some question which can solve by your giving information. So I want to ask whether the project is part of the open-source or not.

rel_net

I can not understand the work of rel_net, if you can reply, I will appreciate it.

Update Some Information about File

Hi xwhan:

Thanks for your work and code. Could you update some information about file, such as tasks/${relation}/train_pos and tasks/${relation}/graph.txt. Thanks!

Only 12 relations have training data accessible in the NELL-995 dataset

I downloaded the NELL-995 dataset as instructed, but only found processed training data for 12 relations vs. 200 mentioned in the paper.

Would you be able to release a script for generating data for the other relations?

Training is very slow

Hi Wenhan, I try to train this program, but the training is very slow on my machine.
It seems that the program does not use the GPU on my machine.
But I'm sure that the tensorflow in my system can employ the GPU acceleration, because I have tested it using other programs.
Is there any idea for that?
Thank you very much!

How to run the program with python2.7 and tensorflow1.13.1

python 2.7 cannot install tensorflow1.13.1,how can they combine to run this program?it cannot work.

求助

sli_police中teacher是什么意思呢，调用的那个函数

Tensorflow version

Hi, I am trying to run your code to learn some rules, I use python 2.7 and tensorflow 1.12.0 and I have some problems that may be caused by tensorflow's version, so do you still remeber your tensorflow version when you write this code ?
Thanks!

A.bern

It is fine to run this code with NELL-995. However, I got stuck in process of running the same code with FB15k-237 dataset. Is there any edit of code that I should change when I run a different dataset?

I found that NELL-995 includes a file called "A.bern" already, while this file does not exist in FB15k-237 dataset. What exactly is this "A.bern" file used for and where can I use it? Does the source code cope with this dataset issues, or, how can I solve this problem and run the code on FB15k-237 data set smoothly?

I would be very grateful if you can answer my question in detail. Looking forward to further idea exchanges and discussions with you.

Few tips about running the code

I ran this code successfully on the following enviroment within Docker Desktop:
Python 2.7.3 (Notice that python version under 3.0 is necessary due to the existence of many utilization of the "print" function without brackets, which would throw errors if versions above 3.0 are implemented)
Tensorflow 1.13.1 (I didnt manage to setup the gpu version)

About the dataset:
You should copy the download URL of NELL-995 into your downloading software instead of visiting it directly in your broswer, while FB15k-237 can be done so.

Reason why use the docker:

Tensorflow 1.13.1 for python 2.7 has no distribution for windows platform.
You can pull the ready Tensorflow 1.13.1 image and run a container out of it easily.

about the tasks

Would you please explain every file in tasks/${relation} and describe their constructing process,the information in readme is not so detail.Thanks

generating TransE experiment dataset

Thank you for your code, I am trying to re-produce MAP results of TransE model using NELL-995 for relation as the paper presented but not using pretrained embeddings such as task/relation/entity2vec.vec. I wonder how you generated supporting triples for your train.txt dataset which are all necessary triples used for training task/relation/train_pos.txt. I assume you didn't use only task/relation/train_pos.txt for training.
Thanks

Could you explain the program more clearer?

I don't know the meaning of these files

Error in executing link_prediction_eval.sh

Hi Wenhan,

I found this error when running this command ./link_prediction_eval.sh concept_athletehomestadium (I run it in a CPU server that has Python 2.7, Keras, and Tensorflow).

Using TensorFlow backend.
11
How many paths used: 11
evaluate.py:42: UserWarning: The nb_epoch argument in fit has been renamed epochs.
model.fit(training_features, train_labels, nb_epoch=300, batch_size=128)
Traceback (most recent call last):
File "evaluate.py", line 260, in
evaluate_logic()
File "evaluate.py", line 108, in evaluate_logic
model = train(kb, kb_inv, named_paths)
File "evaluate.py", line 42, in train
model.fit(training_features, train_labels, nb_epoch=300, batch_size=128)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.py", line 102, in standardize_input_data
str(len(data)) + ' arrays: ' + str(data)[:200] + '...')
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4867 arrays: [array([[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[1]]), array([[0],
[0],
[0],
[0],
[0],
...
Is this was caused by the Keras Library? The transR_eval.py, transE_eval.py, transX_eval.py worked fine.
Could you advise me how to solve this?

Thank you very much.

Some questions about the details of the processed FB15k-237

Hi Wenhan,
I have learned your excellent works such as DeepPath and DIVA. However, I have some questions about the details of which 20 relations you actually selected in the FB15k-237. Of course, if you can release the processed FB15k-237 with selected relations, I will be deeply grateful.

Look forward to your reply.

Why need model updating (training) in test function

Hi Dear Author, can I ask why there is a model parameter updating during testing (here)

A.vec

hi,
can you share the work of A.vec and how to get the A.vec file

20 freebase tasks

Thank you for your code, I would like to reproduce experimental results of Freebase datasets presented in the paper. Would you please let me know what 20 tasks (relations) you used for the experiment? Some of them are mentioned in the paper such as teamSports, birthPlace but not all 20 tasks. Thanks.

please add prerequisites for package version

Hi,

Could you please add the prerequisites package (eg. tensorflow version) for this repo.?
Thanks.

Why most work on Knowledge Graph Reasoning mainly focus on link prediction task as (s, r, ?) ?

Hi Wenhan,
I have read most work on KGR( DeepPath, MINERVA, ReinforceWalk, DIVA) recently, but I can't find out that why task settings such as (s, ?, t) or (s, r, t)? are easier than (s, r, ?), look forward to your point of view?

About Atlas Format of PRA Contrast Experiment

Hello, thank you very much for your code. When I reproduce the comparison algorithm PRA, I cannot use your data set NELL-995 or Freebase to generate an edges file on the PRA program. Can you tell me what format should be used when I run The following instructions in the PRA program:
java -cp pra-src-20140421.jar edu.cmu.pra.data.WKnowledge createEdgeFile NELL.08m.165.cesv.csv 0.8 edges
Thanks again

Retraining code is a little different from the algorithm decription.

In policy_agent.py, the retraining code, why there is a BFS teacher-guided training after the agent failed?
This is not the same as the algorithm decription.
Does this mean BFS is the upper bound of the RL agent?

Do you think KG can solve this kind of question?

1,
A lives in C. (A, liveInCity, C)
B lives in C. (B, liveInCity, C)
Does A and B live in the same city?

2,
A is 180 cm height. (A, height, 180)
B is 175 cm height. (B, height, 175)
Is B taller than A?