thunlp / tensorflow-transx Goto Github PK

View Code? Open in Web Editor NEW

511.0 23.0 199.0 6.01 MB

An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow

License: MIT License

C++ 37.77% Shell 0.23% Python 62.00%

knowledge-embedding

tensorflow-transx's Introduction

TensorFlow-TransX

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

The implementation of TransE [1], TransH [2], TransR [3], TransD [4] for knowledge representation learning (KRL). The overall framework is based on TensorFlow. We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs.

These codes will be gradually integrated into the new framework [OpenKE].

Customizing Your Own Model

If you have a new idea and need to implement its code, you just need to change Python interfaces for your customized model. Read these codes, you will find that to change the class TransXModel will meet your needs.

Evaluation Results

More results about models can be found in ("https://github.com/thunlp/KB2E").

Data

Datasets are required in the following format, containing three files:

triple2id.txt: training file, the first line is the number of triples for training. Then the follow lines are all in the format (e1, e2, rel).

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

You can download FB15K and WN18 from [Download], and the more datasets can also be found in ("https://github.com/thunlp/KB2E").

Compile

bash make.sh

Train

To train models based on random initialization:

Change class Config in transX.py

 class Config(object):
 
 	def __init__(self):
 		...
 		lib.setInPath("your training data path...")
 		self.testFlag = False
 		self.loadFromData = False
 		...

python transX.py

To train models based on pretrained results:

Change class Config in transX.py

 class Config(object):
 
 	def __init__(self):
 		...
 		lib.setInPath("your training data path...")
 		self.testFlag = False
 		self.loadFromData = True
 		...

python transX.py

Test

To test your models:

Change class Config in transX.py

 class Config(object):
 
 	def __init__(self):
 		...
 		test_lib.setInPath("your testing data path...")
 		self.testFlag = True
 		self.loadFromData = True
 		...

python transX.py

Citation

If you use the code, please kindly cite the papers listed in our reference.

Reference

[1] Bordes, Antoine, et al. Translating embeddings for modeling multi-relational data. Proceedings of NIPS, 2013.

[2] Zhen Wang, Jianwen Zhang, et al. Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of AAAI, 2014.

[3] Yankai Lin, Zhiyuan Liu, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings of AAAI, 2015.

[4] Guoliang Ji, Shizhu He, et al. Knowledge Graph Embedding via Dynamic Mapping Matrix. Proceedings of ACL, 2015.

tensorflow-transx's People

Contributors

Stargazers

Watchers

Forkers

chagge wanghaox stevenlol sungjinlees bookmanhan hyzcn dingboy lancoyy wellwang samithaj eshijia aporia3517 weichunstevelee vikingmew skepsun mingguangshao donghaoye huokedu leezqcst yyyzzzhhh zhuyuuyuhz laxatives emir-munoz yanyushu avneets wisonhuang hadyelsahar qingyouzhuying wdaqua ml-lab marcevrard zhangzhaocs chenglongchen geraldhzy mustang2247 bingbao qitong leeyifu jarvx kerc2016 xjhann cryoco xitongdashi erickguan frfy mzkhan2000 lomberer davidie magellen hallwoodzhang v-mostafapour frank1993 zxsted wjerry5 shieh maximtian yunwenhuang 0xuye0 songfgh harshitad ai3dvision gzupanda flyrainkey darvid7 fuyanzhe zhaohuiqiang l6270789 yshihui afcarl ease112 1780041410 kalpagunaratna crazyxuehu kk0spence qyhboy cmq0506 yodagast mgwave facingwaller allensmile codeaudit zhangxuemiao shalou-13 zhouchunyi tomfat zorrock 838340139 tx19980520 liuyunpeng788 boywaiter liyazheng baobaobaobaobao interfacefeng even-zhang hackerapple lovehoroscoper readyteresa pwecar baylee001 eycab

tensorflow-transx's Issues

transR.cpp:427:59: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

When I was using the transX.cpp, I got this warning. How could I fix this.

Segmentation fault

Can you explain entity2id file?

Hi, how and why are the entities represented as numbers initially? Can you please explain?
For entities with text values, how can I generate a similar file?

OSError: [WinError 126] The specified module could not be found

Traceback (most recent call last):
File "C:/KNOWLEDGE GRAPH/ALL KG CODE/KG/TensorFlow-TransX-master/transE.py", l
ine 10, in
lib = ll("./init.so")
File "C:\Users\n9346821\AppData\Local\Programs\Python\Python35\Lib\ctypes_in
it_.py", line 425, in LoadLibrary
return self.dlltype(name)
File "C:\Users\n9346821\AppData\Local\Programs\Python\Python35\Lib\ctypes_in
it.py", line 347, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

FB15K数据下载链接已失效

Download link for FB15K is no longer available.

About regularization of entity embedding and relation embedding

I also have a try on Fast-TransX, which do regularization on entity embedding and relation embedding. Tensorflow-TransE's result is much worse than Fast-TransE, maybe it make sense. So Why not put norm constraint on entity embedding and relation embedding on Tensorflow-TransX.

知识表示模型的开源应用？

感谢博主，有一个问题就是看到很多这种表示推理的模型，但是不清楚这些模型都是具体怎么应用的，想问一下有没有什么应用知识表示的开源案例？

the evalute result, could you explain it?

I have run the transR.py file for 1000 epoch. But I can't get the same result as the original paper, especially meanbank.

Triplet Classification

The papers seem to discuss about triplet classification during evaluation. Does the code have triplet classification in it?

Input Files path:

I run the transE directly
And then I got this error:
Input Files path:
Input Files path:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The data path is already in the config.I don't konw why coming this error?
Thank you!

A problem about the the orthogonal constraint of TransH

Hi, to begin with, thanks a lot for your great efforts of developing this project.

I have a problem with the loss function of TransH,

TensorFlow-TransX/transH.py

Line 83 in 1f2ce7e

self.loss = tf.reduce_sum(tf.maximum(pos - neg + margin, 0))

which seems to ignore the orthogonal constraint of e{r} and w{r} in the original paper, or I missed its implementation?

Looking forward to your reply. Thanks in advance for considering my issue.

评价指标问题求助

请问下在实体链接预测中，我看test中是分别求出了left和right的mean和Hit@10指标论文中的总统指标是left和right求平均得出的吗

Can I ask how to initialize transR's parameters

TransE missing normalisation?

Hi! Sorry if this is a noob question.
In the paper, the authors mention 'at each main iteration of the algorithm, the embedding vectors of the entities are first normalized'. Is this missing from TransE because I can't seem to find the code that does it?
Thanks!

Segmentation Fault

Hi I'm trying this tensorflow implementation of trans methods, I get "Segmentation Fault" error each time I try to run any of the script.

Code for Test

Hi, thanks for your helpful training code! Could you please release the code of complete testing?

How to compile and use it?

Hi,

I compiled the project with "make.sh" and was generated a file init.so. Now, what are the step to use the lib?

When I call python TransR.py I have the problem: TabError: inconsistent use of tabs and spaces in indentation. This problem was fixed reformating the code by PyCharm

What the Python version of the project? How do use the TransX python file on the data FB15K?

Now, I execute python TransR and I get the error: "Ambiguous dimension: 4831.42"

I am using the FB15K, and the triple2id.txt has 483142 triples in the file. How could I to fix this problem?

Thanks.

How to generate .so files?

Thank you a lot for your work. But could you please tell me how to generate init.so and test.so? And before running the transX.py, what else do I have to do? Thank you so much!

Questions about test.txt, valid.txt and type_constrain.txt

Are the relationships present within the test.txt and valid.txt files existing relations or relations that are not true? How do I create these files to my dataset?

Could you explain the new file type_constrain.txt?

Thanks.

the evalute results

l1_tot / testTotal为什么训练的最后一列总是为0？

List index not being checked in init function causing segmentation fault

If you also get core dump error like this,

*** Error in `python': free(): corrupted unsorted chunks: 0x000055a4f72835e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fbbb99567e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fbbb995f37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fbbb996353c]
python(PyArena_Free+0x19)[0x55a4f5966d29]
python(PyRun_FileExFlags+0xb3)[0x55a4f5a82f03]
python(PyRun_SimpleFileExFlags+0x1c4)[0x55a4f5a830f4]
python(Py_Main+0x648)[0x55a4f5a86c28]
python(main+0xee)[0x55a4f594e71e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fbbb98ff830]
python(+0x1c7c98)[0x55a4f5a35c98]
======= Memory map: ========
55a4f586e000-55a4f5b2c000 r-xp 00000000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d2c000-55a4f5d2f000 r--p 002be000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d2f000-55a4f5d92000 rw-p 002c1000 08:02 4765723                    /opt/anaconda/anaconda3/bin/python3.6
55a4f5d92000-55a4f5dc3000 rw-p 00000000 00:00 0 
55a4f7241000-55a4f9668000 rw-p 00000000 00:00 0                          [heap]
7fbb64000000-7fbb64021000 rw-p 00000000 00:00 0 
7fbb64021000-7fbb68000000 ---p 00000000 00:00 0

This error may be generated by a wrongly generated input file, in my case:
relationTotal : 586
But in my train2id.txt, relation id is larger than 586, which caused the illegal memory visit .

10782 13086 920
5507 7281 921
7081 12858 922
9292 1679 923
12781 4198 924
9690 1818 925
11087 3123 926
13193 8794 927
13206 13207 928
13219 8112 929

My suggestion is to add some index check over the input value or assertion to notify the user.

Hope this is useful for others.

有关测试样例的问题

测试样例中的type_constrain是什么呢

How to explain the relation in triple?

One relation has three parts in a general way. Do you consider the meaning of every part when building Knowledge Graph? For example, in the relation, "/tv/tv_program/languages", what is the relationship between two entities and three parts divided by slash. Two entities are the O.C. and English. Thanks!

OSError: exception: access violation writing 0x0000024BFF054AB0

My environment is Windows10 and python3.7, what is your environment? When running transX.py, I got this error and I have tried many methods to no avail. Do you have any suggestion?

为什么train.py在训练模型时res=0？

我在运行transE.py时，设置self.testFlag = False，self.loadFromData = False，运行结果打印出来所有的res都为0。为了找到res=0的原因，我尝试打印出config.batch_size和lib.getTripleTotal() ，发现config.batch_size和lib.getTripleTotal() 也都等于0。我在init.cpp文件中找到了getTripleTotal()，按道理来说，如果fscanf(fin, "%d", &trainList[tripleTotal].h) == 1，TripleTotal就应该不为0，这与实际运行的结果不符合，请问这是怎么回事呢？

How to test?

Please, could you put the code for test?

Thank you.

Could you explain the file triple2id.txt?

Hi,

I get the files in data folder on project https://github.com/thunlp/KB2E.

So, a have the files: entity2id.txt and relation2id.txt. How do I generate the triple2id.txt file?

Thank you.

How to train transx models from ipython

I want to use transx models in ipython but dont understand how training data is given as input and call main function.

I call the followings. But kernel is killed first calling then restart and main function seems not progressing.

import transR
transR.main()

entity/relation embedding normalization?

It seems that there is no norm for the embedding. So I wonder whether the performance can be as good as the paper's result

Evaluation metrics

Can you explain the evaluation metrics?

I understand that the evaluation metrics most likely correspond to the Mean Rank and Hits in Top 10(as described in the TransE paper), but am not able to fully comprehend the code here.

What is the difference between results and results(type constraints)?
What do left and right mean? I am guessing they are related to head and tail in the triples, but don't understand what they refer to in the evaluation.
Also, the results for 1-1, 1-n, n-1 do not seem to be computed separately. Why is it done only for the n-n category? Is this a bug (since they are attempted to be printed in the test method)?
I also don't seem to get results any closer to those mentioned in the TransE paper on the WN dataset. The mean rank, hits etc. are all way off. Has someone verified this?

Thanks.

What's the difference between "results(type constraints)" and "results" in evaluation?

why pos_r_e and neg_r_e are multiplied by a matrix in transR.py?

in transR.py line 59 and line 62

57: pos_h_e = tf.reshape(tf.batch_matmul(matrix, pos_h_e), [-1, sizeR])
58: pos_t_e = tf.reshape(tf.batch_matmul(matrix, pos_t_e), [-1, sizeR])
59: pos_r_e = tf.reshape(tf.batch_matmul(matrix, pos_r_e), [-1, sizeR])
60: neg_h_e = tf.reshape(tf.batch_matmul(matrix, neg_h_e), [-1, sizeR])
61: neg_t_e = tf.reshape(tf.batch_matmul(matrix, neg_t_e), [-1, sizeR])
62: neg_r_e = tf.reshape(tf.batch_matmul(matrix, neg_r_e), [-1, sizeR])

embedding of relations are multiplied by rel_matrix, which seems wrongly implemented.
pos_r_e is of shape [batchSize, sizeR] , matrix is of shape [batchSize, sizeE, sizeR], tf.batch_matmul(matrix, neg_r_e) cannot be computed unless sizeE = sizeR, but this assumption isn't always true.
In transR's original paper , relation's embedding does not need to be multiplied by some matrix.

transR.py script will crash if you set hidden_sizeE != hidden_sizeR

Python3 read nothing from cpp

Debug stop at def train_step and watch.

I didn't find " string outPath = "./out/" " . Where is the output file?

Where is the output file? I didn't find " string outPath = "./out/" " in your code. How to get the vectors of entities and relations. Thank you!

where is trainX.py

Getting Fatal Python error: Segmentation fault

Hi, when im running TransE code , i'll get (interrupted by signal 11: SIGSEGV) error.

  Current thread 0x00007fd87785e740 (most recent call first):
  File "/home/home/PycharmProjects/.../main.py", line 84 in main
  File "/home/home/.local/lib/python3.6/site-packages/absl/app.py", line 251 in _run_main
  File "/home/home/.local/lib/python3.6/site-packages/absl/app.py", line 303 in run
  File "/home/home/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40 in run
  File "/home/home/PycharmProjects/.../main.py", line 176 in <module>
init path...
init path...

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

InvalidArgumentError when running transR.py in test mode

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'model/read_inputs/Placeholder_5' with dtype int32 and shape [?]

I figured that this is because predict is being computed in the test step and predict depends on matrix which needs neg_r.

My solution to overcome this problem was to create 2 different projection matrices, like

pos_matrix = tf.reshape(tf.nn.embedding_lookup(self.rel_matrix, self.pos_r), [-1, sizeR, sizeE])
neg_matrix = tf.reshape(tf.nn.embedding_lookup(self.rel_matrix, self.neg_r), [-1, sizeR, sizeE])

and using these accordingly in the subsequent computations.

Is this the right way?

thunlp / tensorflow-transx Goto Github PK

tensorflow-transx's Introduction

TensorFlow-TransX

Customizing Your Own Model

Evaluation Results

Data

Compile

Train

Test

Citation

Reference

tensorflow-transx's People

Contributors

Stargazers

Watchers

Forkers

tensorflow-transx's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs