GithubHelp home page GithubHelp logo

thunlp / tensorflow-transx Goto Github PK

View Code? Open in Web Editor NEW
511.0 23.0 199.0 6.01 MB

An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow

License: MIT License

C++ 37.77% Shell 0.23% Python 62.00%
knowledge-embedding

tensorflow-transx's Introduction

TensorFlow-TransX

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

The implementation of TransE [1], TransH [2], TransR [3], TransD [4] for knowledge representation learning (KRL). The overall framework is based on TensorFlow. We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs.

These codes will be gradually integrated into the new framework [OpenKE].

Customizing Your Own Model

If you have a new idea and need to implement its code, you just need to change Python interfaces for your customized model. Read these codes, you will find that to change the class TransXModel will meet your needs.

Evaluation Results

More results about models can be found in ("https://github.com/thunlp/KB2E").

Data

Datasets are required in the following format, containing three files:

triple2id.txt: training file, the first line is the number of triples for training. Then the follow lines are all in the format (e1, e2, rel).

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

You can download FB15K and WN18 from [Download], and the more datasets can also be found in ("https://github.com/thunlp/KB2E").

Compile

bash make.sh

Train

To train models based on random initialization:

  1. Change class Config in transX.py

     class Config(object):
     
     	def __init__(self):
     		...
     		lib.setInPath("your training data path...")
     		self.testFlag = False
     		self.loadFromData = False
     		...
    
  2. python transX.py

To train models based on pretrained results:

  1. Change class Config in transX.py

     class Config(object):
     
     	def __init__(self):
     		...
     		lib.setInPath("your training data path...")
     		self.testFlag = False
     		self.loadFromData = True
     		...
    
  2. python transX.py

Test

To test your models:

  1. Change class Config in transX.py

     class Config(object):
     
     	def __init__(self):
     		...
     		test_lib.setInPath("your testing data path...")
     		self.testFlag = True
     		self.loadFromData = True
     		...
    
  2. python transX.py

Citation

If you use the code, please kindly cite the papers listed in our reference.

Reference

[1] Bordes, Antoine, et al. Translating embeddings for modeling multi-relational data. Proceedings of NIPS, 2013.

[2] Zhen Wang, Jianwen Zhang, et al. Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of AAAI, 2014.

[3] Yankai Lin, Zhiyuan Liu, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings of AAAI, 2015.

[4] Guoliang Ji, Shizhu He, et al. Knowledge Graph Embedding via Dynamic Mapping Matrix. Proceedings of ACL, 2015.

tensorflow-transx's People

Contributors

guotong1988 avatar helloxcq avatar thucsthanxu13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-transx's Issues

Can you explain entity2id file?

Hi, how and why are the entities represented as numbers initially? Can you please explain?
For entities with text values, how can I generate a similar file?

OSError: [WinError 126] The specified module could not be found

Traceback (most recent call last):
File "C:/KNOWLEDGE GRAPH/ALL KG CODE/KG/TensorFlow-TransX-master/transE.py", l
ine 10, in
lib = ll("./init.so")
File "C:\Users\n9346821\AppData\Local\Programs\Python\Python35\Lib\ctypes_in
it
_.py", line 425, in LoadLibrary
return self.dlltype(name)
File "C:\Users\n9346821\AppData\Local\Programs\Python\Python35\Lib\ctypes_in
it
.py", line 347, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

About regularization of entity embedding and relation embedding

I also have a try on Fast-TransX, which do regularization on entity embedding and relation embedding. Tensorflow-TransE's result is much worse than Fast-TransE, maybe it make sense. So Why not put norm constraint on entity embedding and relation embedding on Tensorflow-TransX.

知识表示模型的开源应用?

感谢博主,有一个问题就是看到很多这种表示推理的模型,但是不清楚这些模型都是具体怎么应用的,想问一下有没有什么应用知识表示的开源案例?

Triplet Classification

The papers seem to discuss about triplet classification during evaluation. Does the code have triplet classification in it?

Input Files path:

I run the transE directly
And then I got this error:
Input Files path:
Input Files path:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The data path is already in the config.I don't konw why coming this error?
Thank you!

A problem about the the orthogonal constraint of TransH

Hi, to begin with, thanks a lot for your great efforts of developing this project.

I have a problem with the loss function of TransH,

self.loss = tf.reduce_sum(tf.maximum(pos - neg + margin, 0))

which seems to ignore the orthogonal constraint of e{r} and w{r} in the original paper, or I missed its implementation?

Looking forward to your reply. Thanks in advance for considering my issue.

评价指标问题求助

请问下在实体链接预测中,我看test中是分别求出了left和right的mean和Hit@10指标 论文中的总统指标是left和right求平均得出的吗

TransE missing normalisation?

Hi! Sorry if this is a noob question.
In the paper, the authors mention 'at each main iteration of the algorithm, the embedding vectors of the entities are first normalized'. Is this missing from TransE because I can't seem to find the code that does it?
Thanks!

Segmentation Fault

Hi I'm trying this tensorflow implementation of trans methods, I get "Segmentation Fault" error each time I try to run any of the script.

Code for Test

Hi, thanks for your helpful training code! Could you please release the code of complete testing?

How to compile and use it?

Hi,

I compiled the project with "make.sh" and was generated a file init.so. Now, what are the step to use the lib?

When I call python TransR.py I have the problem: TabError: inconsistent use of tabs and spaces in indentation. This problem was fixed reformating the code by PyCharm

What the Python version of the project? How do use the TransX python file on the data FB15K?

Now, I execute python TransR and I get the error: "Ambiguous dimension: 4831.42"

I am using the FB15K, and the triple2id.txt has 483142 triples in the file. How could I to fix this problem?

Thanks.

How to generate .so files?

Thank you a lot for your work. But could you please tell me how to generate init.so and test.so? And before running the transX.py, what else do I have to do? Thank you so much!

List index not being checked in init function causing segmentation fault

If you also get core dump error like this,

*** Error in `python': free(): corrupted unsorted chunks: 0x000055a4f72835e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fbbb99567e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fbbb995f37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fbbb996353c]
python(PyArena_Free+0x19)[0x55a4f5966d29]
python(PyRun_FileExFlags+0xb3)[0x55a4f5a82f03]
python(PyRun_SimpleFileExFlags+0x1c4)[0x55a4f5a830f4]
python(Py_Main+0x648)[0x55a4f5a86c28]
python(main+0xee)[0x55a4f594e71e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fbbb98ff830]
python(+0x1c7c98)[0x55a4f5a35c98]
======= Memory map: ========
55a4f586e000-55a4f5b2c000 r-xp 00000000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d2c000-55a4f5d2f000 r--p 002be000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d2f000-55a4f5d92000 rw-p 002c1000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d92000-55a4f5dc3000 rw-p 00000000 00:00 0 
55a4f7241000-55a4f9668000 rw-p 00000000 00:00 0                          [heap]
7fbb64000000-7fbb64021000 rw-p 00000000 00:00 0 
7fbb64021000-7fbb68000000 ---p 00000000 00:00 0 

This error may be generated by a wrongly generated input file, in my case:
relationTotal : 586
But in my train2id.txt, relation id is larger than 586, which caused the illegal memory visit .

10782 13086 920
5507 7281 921
7081 12858 922
9292 1679 923
12781 4198 924
9690 1818 925
11087 3123 926
13193 8794 927
13206 13207 928
13219 8112 929

My suggestion is to add some index check over the input value or assertion to notify the user.

Hope this is useful for others.

How to explain the relation in triple?

One relation has three parts in a general way. Do you consider the meaning of every part when building Knowledge Graph? For example, in the relation, "/tv/tv_program/languages", what is the relationship between two entities and three parts divided by slash. Two entities are the O.C. and English. Thanks!

为什么train.py在训练模型时res=0?

我在运行transE.py时,设置self.testFlag = False,self.loadFromData = False,运行结果打印出来所有的res都为0。为了找到res=0的原因,我尝试打印出config.batch_size和lib.getTripleTotal() ,发现config.batch_size和lib.getTripleTotal() 也都等于0。我在init.cpp文件中找到了getTripleTotal(),按道理来说,如果fscanf(fin, "%d", &trainList[tripleTotal].h) == 1,TripleTotal就应该不为0,这与实际运行的结果不符合,请问这是怎么回事呢?

How to test?

Please, could you put the code for test?

Thank you.

How to train transx models from ipython

I want to use transx models in ipython but dont understand how training data is given as input and call main function.

I call the followings. But kernel is killed first calling then restart and main function seems not progressing.

import transR
transR.main()

Evaluation metrics

Can you explain the evaluation metrics?

I understand that the evaluation metrics most likely correspond to the Mean Rank and Hits in Top 10(as described in the TransE paper), but am not able to fully comprehend the code here.

  1. What is the difference between results and results(type constraints)?
  2. What do left and right mean? I am guessing they are related to head and tail in the triples, but don't understand what they refer to in the evaluation.
  3. Also, the results for 1-1, 1-n, n-1 do not seem to be computed separately. Why is it done only for the n-n category? Is this a bug (since they are attempted to be printed in the test method)?
  4. I also don't seem to get results any closer to those mentioned in the TransE paper on the WN dataset. The mean rank, hits etc. are all way off. Has someone verified this?

Thanks.

why pos_r_e and neg_r_e are multiplied by a matrix in transR.py?

in transR.py line 59 and line 62

57: pos_h_e = tf.reshape(tf.batch_matmul(matrix, pos_h_e), [-1, sizeR])
58: pos_t_e = tf.reshape(tf.batch_matmul(matrix, pos_t_e), [-1, sizeR])
59: pos_r_e = tf.reshape(tf.batch_matmul(matrix, pos_r_e), [-1, sizeR])
60: neg_h_e = tf.reshape(tf.batch_matmul(matrix, neg_h_e), [-1, sizeR])
61: neg_t_e = tf.reshape(tf.batch_matmul(matrix, neg_t_e), [-1, sizeR])
62: neg_r_e = tf.reshape(tf.batch_matmul(matrix, neg_r_e), [-1, sizeR])

embedding of relations are multiplied by rel_matrix, which seems wrongly implemented.
pos_r_e is of shape [batchSize, sizeR] , matrix is of shape [batchSize, sizeE, sizeR], tf.batch_matmul(matrix, neg_r_e) cannot be computed unless sizeE = sizeR, but this assumption isn't always true.
In transR's original paper , relation's embedding does not need to be multiplied by some matrix.

transR.py script will crash if you set hidden_sizeE != hidden_sizeR

Getting Fatal Python error: Segmentation fault

Hi, when im running TransE code , i'll get (interrupted by signal 11: SIGSEGV) error.

  Current thread 0x00007fd87785e740 (most recent call first):
  File "/home/home/PycharmProjects/.../main.py", line 84 in main
  File "/home/home/.local/lib/python3.6/site-packages/absl/app.py", line 251 in _run_main
  File "/home/home/.local/lib/python3.6/site-packages/absl/app.py", line 303 in run
  File "/home/home/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40 in run
  File "/home/home/PycharmProjects/.../main.py", line 176 in <module>
init path...
init path...

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

InvalidArgumentError when running transR.py in test mode

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'model/read_inputs/Placeholder_5' with dtype int32 and shape [?]

I figured that this is because predict is being computed in the test step and predict depends on matrix which needs neg_r.

My solution to overcome this problem was to create 2 different projection matrices, like

pos_matrix = tf.reshape(tf.nn.embedding_lookup(self.rel_matrix, self.pos_r), [-1, sizeR, sizeE])
neg_matrix = tf.reshape(tf.nn.embedding_lookup(self.rel_matrix, self.neg_r), [-1, sizeR, sizeE])

and using these accordingly in the subsequent computations.

Is this the right way?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.