GithubHelp home page GithubHelp logo

thunlp / kb2e Goto Github PK

View Code? Open in Web Editor NEW
1.4K 74.0 452.0 65.14 MB

Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE

License: MIT License

C++ 95.54% Makefile 1.09% Python 3.32% Shell 0.06%
knowledge-embedding

kb2e's Introduction

kb2e's People

Contributors

helloxcq avatar mrlyk423 avatar zibuyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kb2e's Issues

Questions about Vector Normalization in TransE

Hi,
I'm interested in your research, but I have some questions about vector normalization in Train_TransE.cpp.

Paper "Translating Embeddings for Modeling Multi-relational Data" says:

The optimization is carried out by stochastic gradient descent (in minibatch mode), over the possible
h, and t, with the additional constraints that the L2-norm of the embeddings of the entities is 1 (no regularization or norm constraints are given to the label embeddings ). This constraint is important for our model, as it is for previous embedding-based methods, because it prevents the training process to trivially minimize L by artificially increasing entity embeddings norms.

But I'm confused about two places in Train_TransE.cpp:

1:
Here is the normalization function in Train_TransE.cpp:
double norm(vector<double> &a) { double x = vec_len(a); if (x>1) for (int ii=0; ii<a.size(); ii++) a[ii]/=x; return 0; }
It seems that you will not normalize vectors whose length is smaller than 1.

2:
Relation vectors are normalized.

Can you give some hints to help me understand your algorithm?

数据格式

请教一个问题。
测试数据的三元组貌似都是一维的?但实际我的数据三元组是多维的,需要转换为一维?或求向量的模?

TransH Wr向量梯度公式的推导过程

能给下 TransH 中 Wr 向量的梯度推导过程吗,推不出代码里的结果:
A_tmp[rel][ii]+=belta*rate*x*tmp1;
A_tmp[rel][ii]-=belta*rate*x*tmp2;
A_tmp[rel][ii]+=belta*rate*sum_x*entity_vec[e1][ii];
A_tmp[rel][ii]-=belta*rate*sum_x*entity_vec[e2][ii];
谢谢!

请问能否对代码提供注释

您好,因为每个人的代码能力不同,请问能否对代码添加注释,方便更多的人理解和改进算法,非常感谢。

./TransE 段错误 (核心已转储) ,请教这个是什么原因?

[New LWP 24671]
Core was generated by `./Train_TransE'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
347 vfscanf.c: 没有那个文件或目录.
(gdb) where
#0 _IO_vfscanf_internal (s=0x0, format=0x409414 "%s%d", argptr=argptr@entry=0x7fff7663cbc8, errp=errp@entry=0x0) at vfscanf.c:347
#1 0x00007fa3fe62c457 in ___vfscanf (s=, format=, argptr=argptr@entry=0x7fff7663cbc8)
at vfscanf.c:3066
#2 0x00007fa3fe6337d7 in __fscanf (stream=, format=) at fscanf.c:31
#3 0x0000000000402249 in prepare() ()
#4 0x00000000004019b2 in main ()
(gdb)

数据集

能否提供一下WN11和FB13数据集?

Train_TransR.cpp BUG

#236-#237:
norm(entity_tmp[k]);
这里 k 值与实体数量有什么关系?当 train.txt 容量大于实体数量100倍时会引起段错误。

What does the file "n2n.txt" mean?

如题.
n2n.txt并没有找到相关的文档说明? 请问n2n.txt是什么文件? 如何计算得到呢?
期待你的回复, 感谢.

Segmentation fault error

Thanks for releasing the project code and data.
When running the code i got "Segmentation fault", why is that? any idea.

Magic Integer `1345` in PCRA.py

From PTransE/PCRA.py:

for line in f:
    seg = line.strip().split()
    relation2id[seg[0]] = int(seg[1])
    id2relation[int(seg[1])]=seg[0]
    id2relation[int(seg[1])+1345]="~"+seg[0]
    relation_num+=1
f.close()

What does 1345 in line 26 represent? Is this adding a negative label to the end of the list? Is 1345 some default number of ids?

关于Trans系列模型训练集的问题

你好,我抽取了FB15K/train.txt中的一个实体/m/04ztj及其所有的关系,发现关系为/people/marriage_union_type/unions_of_this_type./people/marriage/spouse(配偶)有几千条之多,关系为/people/marriage_union_type/unions_of_this_type./people/marriage/location_of_ceremony的也有几百条之多,这明显不可能,为什么训练集中的错误关系会如此之多呢?

Train_TransE中bfgs()有点问题

bfgs()中保存实体和关系词向量的代码应该在epoch循环之外吧?

目前这样重复写,并没有保存中间结果,反而延长了训练时间

TransR and TransH cannot work

Hi, TransE can work with respect to FB15K, but when TransR and TransH cannot. The error is showed as:
Segmentation fault: 11

数据含义

在实体到id的映射文件中,假设我有这样一行数据 /m/027rn 0 我想请问 “/m/027rn”是什么意思。

Normalize `relation_tmp[rel_neg]` in PTransE/Train_TransE_path.cpp

Is there a reason why the section

			norm(relation_tmp[rel]);
            		norm(entity_tmp[e1]);
            		norm(entity_tmp[e2]);
            		norm(entity_tmp[j]);

Skips normalization of rel_neg? This leads to huge scores in train_kb for the corrupt relation triple (comparing values like 0.1 + MARGIN > 250 and may lead to fewer gradient updates than expected for each label pair, possibly leading to underfitting.

PTransE

请问有谁运行Test_TransE_path出现segmentation fault11的情况嘛?怎么解决的

e1_e2.txt

请问e1_e2.txt文件是如何生成的,作用是什么。

关于transH中法向量的更新

尊敬的林先生:

关于法向量的更新有2个问题想要请教您:
1)train_transH的法向量Wr为什么要进行如下的更新?(line 292---296)

for (int ii=0; ii<n; ii++)
{
A_tmp[rel][ii]+=belta_rate_sum_x_entity_vec[e1][ii];
A_tmp[rel][ii]-=belta_rate_sum_x_entity_vec[e2][ii];
}
2)在您编写的代码中是如何保证Wr和r正交的?

谢谢~

validate.txt作用

请问题目中文件的作用是什么?
如果将validate.txt去掉对结果会有什么样的结果?将该文件中的数据都用来训练结果会更好吗?

求教,关于testTransE的问题

请问,最后的输出结果
cout<<"left:"<<lsum/fb_l.size()<<'\t'<<lp_n/fb_l.size()<<"\t"<<lsum_filter/fb_l.size()<<'\t'<<lp_n_filter/fb_l.size()<<endl;
cout<<"right:"<<rsum/fb_r.size()<<'\t'<<rp_n/fb_r.size()<<'\t'<<rsum_filter/fb_r.size()<<'\t'<<rp_n_filter/fb_r.size()<<endl;
怎么理解?
如何得到论文中链接预测,关系分类,元组分类以及最后的文本关系抽取的结果。
请您给出详细步骤,非常感谢!

transE参数设置问题

你好,TranE代码似乎并没有预设为最优的参数,直接跑的结果和transE论文有所差距,但是按照原论文Translating Embeddings for Modeling Multi-relational Data (2013) 中的参数来跑,结果竟然差得离谱,想请教一下参数如何设定才能跑出原论文以及TransR论文中的结果。以下是我再WN18上直接运行代码的结果;
18 40943
left:439.644 0.8028 425.411 0.9308
right:468.461 0.8108 455.947 0.9314
以及调整到原论文参数(method=unif,k=20,learning rate=0.01,margin=2)后的结果(这个显然不对):
18 40943
left:1351.67 0.1482 1340.44 0.1648
right:1475.97 0.142 1466.25 0.1548

MeanRank没有达到200+

您好,我跑了几次您transE实现的代码,试了许多参数,但是实验结果显示MeanRank都没有达到200多,最好的只有400+的情况。我们用的数据是 wordnet18的数据,下降1000轮,学习率试过[0.01, 0.001],vector 维度试过[20, 100],margin 试过[1, 2]。请问是参数或者程序还有要修改的地方吗?
谢谢,祝好~

call for e1_e2.txt

Hi, I am interested in your Ptranse model. In the Ptranse, the PCRA.py reads e1_e2.txt, but it is not in the data files. What should i do to get e1_e2.txt? Extract e1 and e2 from the train.txt? Could you release the confidence.txt directly to help me understand your algorithm? I am really appreciate for your reply.

entity2id & relation2id mappings

I couldn't find a description of the FB15k dataset so I'm asking this question here.

In the dataset, I saw triples such as:
"/m/08mbj32 /m/0d193h /common/annotation_category/annotations./common/webpage/topic"
(FB15k valid, line 830)

The relation seems to be 2 joined relations( /common/annotation_category/annotations & /common/webpage/topic) merged into one. Is this how 1-N-1 relations are handled? I found both parts of this specific relation in the relation2id description, but there are some other "concatenated" relations for which I can't find sub-parts.

If you could clear this up, I'd be really grateful.

Best,
Martin

How to generate e1_e2.txt

you mean e1_e2.txt as all top-500 entity pairs mentioned in the task entity prediction.
Can you explain how to generate e1_e2.txt explicitly? And what dou you mean all top-500 entity pairs

求教,TransH和TransR的梯度以及L1、L2距离的问题

1 关于L1和L2的选取问题
Trans系列算法中,对距离的选取对最后的结果有影响么?看了您github上分享的代码,感觉主要是侧重于用L1距离。

2 梯度下降算法中的偏导数取值的问题
参考了您github上的代码,transE的梯度求取没有问题,但是transH和transR算法的梯度求解值,我不是太明白。
具体的细节,我给您发邮件了。
非常感谢!

Test_TransE

double tmp = calc_sum(h,l,rel);
这一行计算出来的距离值后面没有使用,是不是有其他的用途?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.