thunlp / kb2e Goto Github PK

View Code? Open in Web Editor NEW

1.4K 73.0 451.0 65.14 MB

Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE

License: MIT License

C++ 95.54% Makefile 1.09% Python 3.32% Shell 0.06%

knowledge-embedding

kb2e's Introduction

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

The project will no longer be maintained and users are recommended to access and use the new package https://github.com/thunlp/OpenKE.

kb2e's People

Contributors

Stargazers

Watchers

Forkers

bluetropic mrlyk423 nooralahzadeh zhichun aman2302 lightsilver chagge yinyanfei sleepsophia ji-xin binbinbian veterun zxlzr peihuachen lousiaye hfxunlp kukumayas stevenlol wencolani nukui-s hivewang zsh1993 wellwang chengdezhi jude2014 vishwajeetkumar93 skallumadi shashankg7 jianliu91 hyzcn dylan-fan benjamesbabala liuchunhuahua lipiji 9vivian88 caomw kaharjan sandra1993 yuwenlidao arnabgho boyangumn oncebasun wqw547243068 shirveon chriszblong zorrock chengniu zhouhoo eriq-augustine cosecant-csc zhoujialinmumu unmeshvrije fydlzr sainid77 liulj0507 iiapache himmelstein fanyangxyz tuan1101 junaraki joneswong zdh2292390 leyiwang zhenv5 rubeeny simplejian frankict shawn1993 fangqingan whuopm werterhong liu-weiyi budaicidewei gtbai raleighhan percentcent fanfannothing wubinzzu wanghaox liangmingyang 2php xyhxiayuhang djher colinsongf shinichr wj1031924 lemonnight rahulrawat11 day2begin thinkwp squirrel1982 datagold2017 zhuyuuyuhz yufan2ga nausheenfatma kimiqq laxatives leiloong hades210 ryfan-rs

kb2e's Issues

求教，TransH和TransR的梯度以及L1、L2距离的问题

1 关于L1和L2的选取问题
Trans系列算法中，对距离的选取对最后的结果有影响么？看了您github上分享的代码，感觉主要是侧重于用L1距离。

2 梯度下降算法中的偏导数取值的问题
参考了您github上的代码，transE的梯度求取没有问题，但是transH和transR算法的梯度求解值，我不是太明白。
具体的细节，我给您发邮件了。
非常感谢!

TransH Wr向量梯度公式的推导过程

能给下 TransH 中 Wr 向量的梯度推导过程吗，推不出代码里的结果：
A_tmp[rel][ii]+=belta*rate*x*tmp1;
A_tmp[rel][ii]-=belta*rate*x*tmp2;
A_tmp[rel][ii]+=belta*rate*sum_x*entity_vec[e1][ii];
A_tmp[rel][ii]-=belta*rate*sum_x*entity_vec[e2][ii];
谢谢！

TransR and TransH cannot work

Hi, TransE can work with respect to FB15K, but when TransR and TransH cannot. The error is showed as:
Segmentation fault: 11

Magic Integer `1345` in PCRA.py

From PTransE/PCRA.py:

for line in f:
    seg = line.strip().split()
    relation2id[seg[0]] = int(seg[1])
    id2relation[int(seg[1])]=seg[0]
    id2relation[int(seg[1])+1345]="~"+seg[0]
    relation_num+=1
f.close()

What does 1345 in line 26 represent? Is this adding a negative label to the end of the list? Is 1345 some default number of ids?

Train_TransR.cpp BUG

#236-#237：
norm(entity_tmp[k]);
这里 k 值与实体数量有什么关系？当 train.txt 容量大于实体数量100倍时会引起段错误。

Segmentation fault error

Thanks for releasing the project code and data.
When running the code i got "Segmentation fault", why is that? any idea.

Normalize `relation_tmp[rel_neg]` in PTransE/Train_TransE_path.cpp

Is there a reason why the section

			norm(relation_tmp[rel]);
            		norm(entity_tmp[e1]);
            		norm(entity_tmp[e2]);
            		norm(entity_tmp[j]);

Skips normalization of rel_neg? This leads to huge scores in train_kb for the corrupt relation triple (comparing values like 0.1 + MARGIN > 250 and may lead to fewer gradient updates than expected for each label pair, possibly leading to underfitting.

Test_TransE

double tmp = calc_sum(h,l,rel);
这一行计算出来的距离值后面没有使用，是不是有其他的用途？

Questions about Vector Normalization in TransE

Hi,
I'm interested in your research, but I have some questions about vector normalization in Train_TransE.cpp.

Paper "Translating Embeddings for Modeling Multi-relational Data" says:

The optimization is carried out by stochastic gradient descent (in minibatch mode), over the possible
h, and t, with the additional constraints that the L2-norm of the embeddings of the entities is 1 (no regularization or norm constraints are given to the label embeddings ). This constraint is important for our model, as it is for previous embedding-based methods, because it prevents the training process to trivially minimize L by artificially increasing entity embeddings norms.

But I'm confused about two places in Train_TransE.cpp:

1:
Here is the normalization function in Train_TransE.cpp:
double norm(vector<double> &a) { double x = vec_len(a); if (x>1) for (int ii=0; ii<a.size(); ii++) a[ii]/=x; return 0; }
It seems that you will not normalize vectors whose length is smaller than 1.

2:
Relation vectors are normalized.

Can you give some hints to help me understand your algorithm?

你好，请问一下，e1_e2.txt文件是怎么生成的，可以简单描述一下吗？只找到这个e1_e2.txt: the top-500 entity pairs which are calculated by TransE.

why no the Test_transH.cpp file

the question as upper

求教~关于负样本的选取问题

想问下TransE代码中负样本的选取是哪一部分呢~

Train_TransE_RNN.cpp

Hi,你好，请问一下代码中是不是少这个文件？

数据含义

在实体到id的映射文件中，假设我有这样一行数据 /m/027rn 0 我想请问 “/m/027rn”是什么意思。

论文中说训练3分钟，为什么我的得跑好几个小时。

大神们是在服务器上跑的？我训练的wn18，跑了6个小时。求解，祝好。

关于transH中法向量的更新

尊敬的林先生：

关于法向量的更新有2个问题想要请教您：
1）train_transH的法向量Wr为什么要进行如下的更新？(line 292---296)

for (int ii=0; ii<n; ii++)
{
A_tmp[rel][ii]+=belta_rate_sum_x_entity_vec[e1][ii];
A_tmp[rel][ii]-=belta_rate_sum_x_entity_vec[e2][ii];
}
2）在您编写的代码中是如何保证Wr和r正交的？

谢谢~

您好，请问可以分享一下关系抽取的代码吗？

如题，出于科研需要，请求分享一下关系抽取（RE）的代码，用于深刻学习，感激不尽。

数据集

能否提供一下WN11和FB13数据集？

./TransE 段错误 (核心已转储) ,请教这个是什么原因？

[New LWP 24671]
Core was generated by `./Train_TransE'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
347 vfscanf.c: 没有那个文件或目录.
(gdb) where
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
#1 0x00007fa3fe62c457 in ___vfscanf (s=, format=, argptr=argptr@entry=0x7fff7663cbc8)
at vfscanf.c:3066
#2 0x00007fa3fe6337d7 in __fscanf (stream=, format=) at fscanf.c:31
#3 0x0000000000402249 in prepare() ()
#4 0x00000000004019b2 in main ()
(gdb)

validate.txt作用

请问题目中文件的作用是什么？
如果将validate.txt去掉对结果会有什么样的结果？将该文件中的数据都用来训练结果会更好吗？

How to generate e1_e2.txt

you mean e1_e2.txt as all top-500 entity pairs mentioned in the task entity prediction.
Can you explain how to generate e1_e2.txt explicitly? And what dou you mean all top-500 entity pairs

A good link predictor should achieve lower mean rank or higher Hits@10

In my tests when I set the parameters to values as: alpha=0.1and margin=8, I get better Hits@10, but it seems does not correct. Sometimes I set parameters to high values and the hit continues to climb

What does the file "n2n.txt" mean?

如题.
n2n.txt并没有找到相关的文档说明? 请问n2n.txt是什么文件? 如何计算得到呢?
期待你的回复, 感谢.

关于Trans系列模型训练集的问题

你好，我抽取了FB15K/train.txt中的一个实体/m/04ztj及其所有的关系，发现关系为/people/marriage_union_type/unions_of_this_type./people/marriage/spouse（配偶）有几千条之多，关系为/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony的也有几百条之多，这明显不可能，为什么训练集中的错误关系会如此之多呢？

Train_TransE.cpp在训练过程中存在内存泄漏的问题

您好，
在调用函数bfgs()训练时内存不断增长，感觉应该是发生了内存泄漏，可是没有找到原因。因为我用的freebase知识库比较大，所以内存占用很严重。您有什么建议吗？

Predict difference in head and tail in test_scipt?

I notice there are evaluation results on predicting head and tail (left and right?). How do you get the final results shown in KB2E Readme.md? Thanks.

TransH中没有测试代码？

TransH中，只有训练的，没有测试代码？

entity有没有真实的东西而不仅仅是“乱码”？

entity2id.txt这个文件里的数据，是否有对应真实的事物？
@Mrlyk423 谢谢

PTransE confidence.txt 文档

您好，在PtransE add模式下的训练代码里的confidence.txt的作用是什么呢？需要怎么编辑？

FB15K WN18数据集无法下载

你好，不知是由于墙的问题还是链接失效，Bordes在NIPS论文里使用的FB15K和WN18数据集下载不了，不知能否分享一下，万分感谢。
邮箱：[email protected]

数据格式

请教一个问题。
测试数据的三元组貌似都是一维的？但实际我的数据三元组是多维的，需要转换为一维？或求向量的模？

关于randn

可否给个FB15k的train.txt的描述？

谢谢@zibuyu @Mrlyk423

difference between results of transE(paper) and transE(our)

Hi, could anyone please explain what causes the difference between transE(paper) and transE(our) in the evaluation results? Why transE(our) outperforms transE(paper) on hit@10 of FB15K that much?

给出的FB40k数据集与TransR论文中描述的数据数量不一致？

WN11数据集，重新统计了一下实体数量，与论文中给出的数量有出入，请问有什么问题？

PTransE

请问有谁运行Test_TransE_path出现segmentation fault11的情况嘛？怎么解决的

TransE中rand_max函数的作用？

请问在https://github.com/thunlp/KB2E/blob/master/TransE/Train_TransE.cpp#L125的这个函数rand_max，它有什么作用呢？

Python版本

大家有Python版本的PTransE实现吗

PTransE中测试的时候n2n.txt是怎么生成的

transE参数设置问题

你好，TranE代码似乎并没有预设为最优的参数，直接跑的结果和transE论文有所差距，但是按照原论文Translating Embeddings for Modeling Multi-relational Data (2013) 中的参数来跑，结果竟然差得离谱，想请教一下参数如何设定才能跑出原论文以及TransR论文中的结果。以下是我再WN18上直接运行代码的结果;
18 40943
left:439.644 0.8028 425.411 0.9308
right:468.461 0.8108 455.947 0.9314
以及调整到原论文参数（method=unif，k=20,learning rate=0.01,margin=2）后的结果（这个显然不对）：
18 40943
left:1351.67 0.1482 1340.44 0.1648
right:1475.97 0.142 1466.25 0.1548

求教，关于testTransE的问题

请问，最后的输出结果
cout<<"left:"<<lsum/fb_l.size()<<'\t'<<lp_n/fb_l.size()<<"\t"<<lsum_filter/fb_l.size()<<'\t'<<lp_n_filter/fb_l.size()<<endl;
cout<<"right:"<<rsum/fb_r.size()<<'\t'<<rp_n/fb_r.size()<<'\t'<<rsum_filter/fb_r.size()<<'\t'<<rp_n_filter/fb_r.size()<<endl;
怎么理解？
如何得到论文中链接预测，关系分类，元组分类以及最后的文本关系抽取的结果。
请您给出详细步骤，非常感谢！

Train_TransE中bfgs()有点问题

bfgs()中保存实体和关系词向量的代码应该在epoch循环之外吧？

目前这样重复写，并没有保存中间结果，反而延长了训练时间

请问能否对代码提供注释

您好，因为每个人的代码能力不同，请问能否对代码添加注释，方便更多的人理解和改进算法，非常感谢。

MeanRank没有达到200+

您好，我跑了几次您transE实现的代码，试了许多参数，但是实验结果显示MeanRank都没有达到200多，最好的只有400+的情况。我们用的数据是 wordnet18的数据，下降1000轮，学习率试过[0.01, 0.001]，vector 维度试过[20, 100]，margin 试过[1, 2]。请问是参数或者程序还有要修改的地方吗？
谢谢，祝好~

call for e1_e2.txt

Hi, I am interested in your Ptranse model. In the Ptranse, the PCRA.py reads e1_e2.txt, but it is not in the data files. What should i do to get e1_e2.txt? Extract e1 and e2 from the train.txt? Could you release the confidence.txt directly to help me understand your algorithm? I am really appreciate for your reply.

entity2id & relation2id mappings

I couldn't find a description of the FB15k dataset so I'm asking this question here.

In the dataset, I saw triples such as:
"/m/08mbj32 /m/0d193h /common/annotation_category/annotations./common/webpage/topic"
(FB15k valid, line 830)

The relation seems to be 2 joined relations( /common/annotation_category/annotations & /common/webpage/topic) merged into one. Is this how 1-N-1 relations are handled? I found both parts of this specific relation in the relation2id description, but there are some other "concatenated" relations for which I can't find sub-parts.

If you could clear this up, I'd be really grateful.

Best,
Martin

thunlp / kb2e Goto Github PK

kb2e's Introduction

The project will no longer be maintained and users are recommended to access and use the new package https://github.com/thunlp/OpenKE.

kb2e's People

Contributors

Stargazers

Watchers

Forkers

kb2e's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs