bramblexu / knowledge-graph-learning Goto Github PK

View Code? Open in Web Editor NEW

727.0 41.0 122.0 75 KB

A curated list of awesome knowledge graph tutorials, projects and communities.

License: MIT License

knowledge-graph knowledge-graph-embeddings relation-extraction

knowledge-graph-learning's People

Contributors

Stargazers

Watchers

Forkers

moolighty yinyao41 ttklm20 zinc-30 fionattu spencerai suetming bertramlau tlwzzy xiaoqiangcs yushu-liu allensmile hololen mingkin friendshipity bourbon-whiskey nysdy csqjxiao lianxiaolei charlottesean cuitengfei2006x sjtuzhangxiao gokunwu stc-cqupt thormax oustandingman dextercoder nicemartin kurodream cjm1044642385 tigeryang93 ufukhurriyetoglu yimingli90 todun fengyinyang almoslmi mmicky jerryten sixzhang sugarshaw95 merajat pq7799 goingcoder hongshengxin ttong-ai liuxiaolong98 cdyangbo stjordanis imrankhan1984 tonydeep tenyks zjms mylv1222 flamato gnn2qsu moluchase zjgtan utpal0401 acproject jct94 sckangz thientu cheryl008 zihaozheng98 chim3y zhangaz1 ycrich zhaoyfhitnlp ruo2012 ollawone zeusdavide knowledgehacker bpostance hdchieh zrealshadow jorson-chen chenhan510 linshangyu zdqf abdulrahman86 newbtrainee zhengyili1 xushijie xiaomingaaa earlbabson zhzhongzhuan stmrdus dilshan23 nlp-kg memmeta sunatthegilddotcom jmmunozr bzqweiyi alittleckf choco9966 stat-eklee fennhelloworld icloudsong fusky aimicm

knowledge-graph-learning's Issues

From One Point to a Manifold: Knowledge Graph Embedding for Precise Link Prediction

precise link prediction的问题有两个：ill-posed algebraic system， the over-strict geometric form

本文的原创点在于把翻译模型的原则改为了manifold function，以此来解决上面两个问题。

ACL-2018-Discourse Coherence: Concurrent Explicit and Implicit Relations

一句话总结

通过显性的关键词和隐形的关系来推测话语的连贯性

关键词：

discourse coherence：话语连贯
discourse relations：A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another.
explicit discourse adverbials：显性状语： AND, BECAUSE, BUT, OR and SO

EMNLP-2018-Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction

一句话总结：

通过2-D matrix-based word-level attention mechanism来解决表达context word的问题，通过2-D sentence-level attention mechanism 解决 mul-tiple instance learning的问题，即选择正确的instance。

资源：

pdf
[code](

关键字：

dataset: NYT
Attention mechanisms
distantly supervised relation extraction (DS-RE)

笔记：

Here are two important representation learning problems in DNN-based distantly supervised RE: (1) Problem I : entity pair-targeted context representation learn-ing from an instance; and (2)Problem II : valid in-stance selection representation learning over mul-tiple instances.

基于DNN的DS RE方法有两个问题：

如何表示conenxt word
从noisy data中选出valid instance

The former can use a word-level attention mechanism to learn a weight distribu-tion on words and then a weighted sentence rep-resentation regarding two entities; the latter canemploy a sentence-level attention mechanism to 2217 learn a weight distribution on multiple instances so that valid sentences with higher weights can be fo-cused and selected, and noisy instances with lower weights are suppressed.

前者通过attention给entity的word赋予更大的权重，而后者可以通过一个sentence level的attention模型来给valid sentence赋予更大的权重。(这不还是需要标注吗？得知道哪些是wrong label才行啊)

之前的研究是通过RNN来学习 1-D vectors 。而本工作则是基于structured self-attentive sen-tence embedding in Lin et al. (2017b ), 提出了一个multi-level structured (2-D ) self-attention mechanism (MLSSA) in a bidirectional LSTM-based (BiLSTM) 。

针对问题1：2-D matrix-based word-level attention mechanism, which contains mul-tiple vectors, each focusing on different aspects of the sentence for better context representation learning.

针对问题2：propose a 2-D sentence-level attention mechanism for mul-tiple instance learning, where it contains multi-ple vectors, each focusing on different valid in-stances for a better sentence selection.

4.2 Word Embeddings and Relative Position Features

输入和之前的研究一样，#13 #14 #106

模型图：

结果：

CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction

总结

数据

监督式训练

ACE 2005
SemEval-2010 Task 8 dataset

Distant supervision（远程监控）

aligning Freebase relations with the New York Times corpus

Basic Concepts

Word Embeddings
Positional Embeddings: The idea is that words closer to the target entities usually contain more useful information regarding the relation class. 离实体越近的单词，包含关于relation的信息也多。所以通过embedding来获取这些信息。
Convolutional Neural Networks: capture ngram level features

Supervised learning with CNNs

早期DL在RE方面的研究就是把RE当做了一个多分类问题。

4.1 Simple CNN model (Liu et al., 2013)
- 没有使用word embedidng。在ACE上币当时的SOTA kernel-based模型效果好了9个百分点。
4.2 CNN model with max-pooling (Zeng et al., 2014)
- 使用了word embedding，以及positional mebdding，也用了lexical level features。最主要的贡献是在卷积层之后使用了max-pooling layer。
4.3 CNN with multi-sized window kernels (Nguyen and Grishman, 2015)
- 完全去除了lexical level features。只让CNN自己去学特征。基本和上面4.2的一样，只不过使用了不同size的window来捕捉不同的n-gram情报

Multi-instance learning models with distant supervision

把问题变为Multi-instance learning问题，这样我们可以通过远程监督来构建更大的训练集。Multi-instance learning是distant supervision的一种，含义是一个label有一群instance，而不是单单的一个instance。

在RE这个问题上，每个entity pair定义一个bag，这个bag包含涉及到entity pair的所有句子。然后我们把一个relation label标记给整个bag，而不是单单一个instance。

5.1 Piecewise Convolutional Neural Networks (Zeng et al., 2015)
- 构建一个神经网络模型，从distant supervision data中构建一个relation extractor。网络模型和上面4.3，4.2的差不多，但是一个最大的贡献是使用了piecewise max-pooling across the sentence. 之前4.3的模型是在整个句子上做了max-pooling，这样会浪费大量信息。因为有两个entity，所以一个句子可以分为3个segment，然后分别在这3个segment上做max-pooling，这样能保留一些有用的信息。
- 但是这个模型有缺点。根据损失函数的设计，训练和预测的时候只是从bag里调出一个最能代表整个bag的document，没有好好把整个distant supervision data利用起来。
- 效果方面PCNN确实比CNN好
5.2 Selective Attention over Instances (Lin et al., 2016)
- 为了解决5.1中只使用最相关document的问题，这篇论文通过attention机制对一个bag的所有document进行处理。Then the ﬁnal vector representation for the bag of sentences is found by taking an attention-weighted average of all the sentence vectors (r i j , j = 1, 2...q i ) in the bag.
效果币PCNN好
5.3 Multi-instance Multi-label CNNs (Jiang et al., 2016)
- 解决了5.1的信息丢失问题 by using a crossdocument max-pooling layer. 对bag中的每一个句子做一个向量表示。Then the ﬁnal vector representation for the bag of sentences is found by taking a dimension wise max of the sentence vectors (r i j , j = 1, 2...q i ). 最终的bag vector是针对每个维度，从所有的句子向量的相同维度中，找出数字最大的那个。
- 还解决了多标签的问题。一个entity pair有多个relaiton。具体做法是把最后一层的softmax换位sigmoid。

Results

深度模型普遍比不深的好。attention + PCNN是效果最好的。奇怪的是没有LSTM在RE方面的工作。

下一篇论文

Relation Extraction : A Survey

EMNLP-2018-N-ary Relation Extraction using Graph State LSTM

一句话总结

为了防止Graph在split的时候丢失信息，提出了Graph State LSTM模型。

code and data

关键词：

Cross-sentencen n-ary relation extraction de-tects relations amongn entities across multi-ple sentences.
Typical methods formulate an input as a document graph. 这个是typical吗。。第一次看到

对于corss-sentence的关系抽取人物，通常把输入变为graph，然后把这个graph split为两个DAG，然后通过DAG-structured LSTM来分别对两个DAG进行学习。

但是在split的时候，容易丢失信息。于是本工作提出了graph-state LSTM模型。

ACL-2018-Unsupervised Learning of Distributional Relation Vectors

一句话总结

基于GloVe学习样本中的relation，得到embedding。这种relation embedding对于RE或者relation similarity这样的任务很有帮助。

EMNLP-2018-Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding

一句话总结：

不使用hard label, 而是用KGE中的t-h来代替relation label。提高RE的效果

资源：

pdf
[code](

关键字：

dataset: NYT
noisy in DS
KG embedding for certain sentence patern

笔记：

假设：noisy in DS这个问题主要是没有充分使用KG信息导致的。

办法：通过relation embedding (t- h) 以及entity type来代表label，而不是hard relation labels.

针对wrong label的问题，解决办法大致分为下面几种。

一种是Multi-Instance Learning(MIL) divided the sentences into differ-ent bags by (h, t ), and tried to select well-labeled sentences from each bag (Zeng et al.,2015 ) or re-duced the weight of mislabeled data (Lin et al. , 2016 ).
Another way tended to capture the reg-ular pattern of the translation from true label to noise label, and learned the true distribution by modeling the noisy data (Riedel et al.,2010; Luo et al.,2017). Some novel methods like (Feng et al. , 2017 ) used reinforcement learning to train an instance-selector, which will choose true labeled sentences from the whole sentence set. These methods focus on adding an extra model to reduce the noisy label. However, stacking extra model does not fundamentally solve the problem of inad-equate supervision signals of distant supervision, and will introduce expensive training costs
Another solution is to exploit extra supervision signal contained in a KG. Weston (2013 ) added the confidence of (h, r, t ) in the KG as extra super-vision signal. Han (2018 ) used mutual attention of KG and text to calculate a weight distribution of train data. Both of them got a better perfor-mance by introducing more information from KG. However, they still used the hard relation label de-rived from distant supervision, which also brought

这篇文章主要是想避免 hard relation labels。因为只要是hard labels，就不可避免引入一些noisy。所以想通过t-h的embedding来表示label. (但是这还是有问题啊，同样的t-h可能表达不同的realtionship，embedding学出来的效果还是会有noisy存在)

Our assumption is that each relation r in a KG has one or more sentence patterns that can describe the meaning of r .

左边的句子里Ankara和Turkey是captial的关系，而右边Mexcio和Guadalajara则是contains的关系。这二者如果直接去学的话，关系不一样（这里captial的关系属于noisy，我们想要的是contains的关系，而captial是contains的一个子关系）。但是如果通过t-h的话，这二者的关系，能更接近contain的关系，而不是capital的关系。

有两种embedding：

3.1 KG Embedding

We use typical KG embedding models such as TransE to pre-train the embedding of entities and
relations. We intend to supervise the learning by t - h instead of hard relation label r

3.2 Sentence Embedding

Word Embeddings and Attentions

Instead of encoding sentences directly, we first replace the entity mentions e in the sentences with corresponding entity types type e in the KG, such as PERSON, PLACE, ORGANIZATION, etc. We then pre-train the word embedding by word2vec.

Position embedding

还是用的 #13 的方式。

模型图：

结果：

ACL-2018-Interpretable and Compositional Relation Learning by Joint Training with an Autoencode

一句话总结

通过autoencoder对KB进行embedding，可以做到降维的效果，得到的embedding对于KBC任务的MR指标有显著提升。

关键词：

compositional constraints：合成约束
Knowledge Base Completion (KBC) tasks: predict the missing part of an incomplete triple, such as (Finding Nemo, country_of_film, ?).
Mean Rank
joint training with an autoencoder
a dimension reduction tech-nique which trains a KB embedding model jointly with an autoencoder.
low dimensional manifold: 低维流形. what is low dimensional manifold ?.

代码(C++)

通过KB embedding来学习关系，但是这种方法有太多的参数学要学习，所以要reduce dimensionality of relations. 而且，两个关系的合成，可以得到第三个关系。$M_1 + M_2 = M_3$这里的M是代表relation的矩阵。这样的话说明满足 compositional constraints， $M_1 \cdot M_2 \approx M_3$.

以前的 reduce dimensionality of relations手法是imposing pre-designed hard con-straints on the parameter space. 但是这种方法对于compositinoal constraints并不好。很难通过两种关系合成第三种关系。

所以本文的方法是 training relation parameters jointly with an autoencoder。

结果方法，模型的提升主要在Mean Rank (MR)这个指标上

MR: 就是平均排名
Mean reciprocal rank: 平均倒数排名是统计学中，依据排序的正确性，对查询请求响应结果的评估。查询响应结果的倒数排名是第一个正确答案的倒数积。平均倒数排名是多个查询结果的平均值。

关于Out-of-vocabulary Entities in KBC

参考了这个Tim Dettmers, Minervini Pasquale, Stenetorp Pon-tus, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In Proceedings of the 32th AAAI Conference on Artificial Intelligence

For an incomplete triplehh;r;?iin the test, ifh is OOV, we replace it with the most frequent entity that has ever appeared as a head of relationr in the training data. If the gold tail entity is OOV, we use the zero vector for computing the score and the rank of the gold entity.

简单来说就是拿r里最常见的entity来取代head entity或tail entity。另外，在WUN18RR这个数据集上因为有6.7%的triples有OOV，作者把这部分entity全都删除了。

Construction and application research of knowledge graph in aviation risk field

总结

这是一篇关于如何构建垂直领域知识图谱的文章，这里的垂直领域是航空风险领域。主要分为以下几步：

Firstly,the data-driven incremental construction technology is used to build aviation risk event ontology model.
Secondly,
the pattern-based knowledge mapping mechanism, which transform structured data into RDF (Resource Description
Framework) data for storage, is proposed.
And then the application, update and maintenance of the knowledge graph
are described.
Finally, knowledge graph construction system in aviation risk field is developed; and the data from American Aviation Safety Reporting System (ASRS) is used as an example to verify the rationality and validity of the knowledge graph construction method.

笔记

关于数据
在1970年，Aviation Safety Reporting System (ASRS)成立，记录air ristk和accident cased. ASRS encodes case reports from the aspects of aircraft, environment and incident evaluation, and store the valuable data in relational database to realize the structured organization and persistent storage of the aviation risk and accident
case information. ASRS in America has formed more than 1.3 million reports after nearly 40 years of accumulation. Especially in recent years, up to an average of 1,774 per week rapid growth.

The knowledge graph can be divided into general field and vertical field according
to its knowledge range.

General field
- Probase [7], DBpedia [8], Freebase [9], Baidu Zhixin, Sogou knowCube, CN-DBPedia [10] and so on
Vertical field
- GeoNames in geographical field, DBLife [11] in academic field, UniProtKb [12] in biological field and so on. 但是垂直领域的知识图谱构建仅限于学术研究，比如Ruan Tong et al. [13] proposed a data-driven incremental vertical knowledge graph construction method; Li Wenpeng et al. [14] proposed a software knowledge graph construction method for open source software projects; Ge Bin et al. [15] proposed a method and a computational framework for military knowledge graph construction.

空难事件受很多因素影响，飞机状况，乘务员状态，天气，地理因素等等。这些因素和时间本身狗冲了一个非常复杂的网络，每个事件都是一个很大的知识网络系统，所以用KG比较好描述这些复杂的关系。

这篇文章结合了两点，一个是空难领域的特征，一个是这个领域的知识图谱构建方法。

构建的过程包括以下几点，domain ontology modeling, instance-toontology mapping, visualization analysis and application maintenance.

Firstly, establish the ontology model of aviation risk event by combing the top-down and bottomup
methods, which is used as a data model to define the knowledge graph and describe the concepts and
relationships between them.
Secondly, transform the relational data of ASRS into knowledge entities, attributes and relationships in the knowledge graph by using the pattern-based knowledge mapping mechanism.
Finally, carry out the analysis of data relations, and achieve the visualization and maintenance of a knowledge graph. The construction process is shown in Figure 1.

根据这个构建过程，这篇文章可以分为三个部分：知识表示，数据映射，知识图谱的管理和应用

3.1 Knowledge Representation

The knowledge representation method based on ontology： M. Zhao, Y. Du, H. She, J. Zhang, H. Wang, Y. Chen. Transactions of the Chinese Society for Agricultural Machinery, 47(9), 278(2016)

这篇文章构建了一个空难领域的knowledge ontology framework(知识本体框架)。

event ontology model: $O = <C, R, A, I, F>

Figure 2 shows a partial of the aviation risk event ontology model.

构建完ontology model之后，接下来就是把知识存储到数据库中。In this paper, we use Jena, a ontology parsing java toolkit, to transform the ontology metadata into the resource description framework RDF [20], then store and query knowledge in the form of .

3.2 Data Mapping

知识来源是结构数据，来自ASRS，数据质量比较高。本文使用pattern-based data mapping mechanism来完成结构数据的转换，即RDB2RDF data conversion process.

W3C介绍过两种营业语言的标准（2012年）：Direch 和 R2RML。

Direct Mapping uses the mapping mechanism that directly output the relational database table structure to RDF graph
R2RML achieve the transformation through a custom vocabulary table, the latter approach is
more customizable and more flexible.
因为后者更灵活，所以选了R2RML来构建pattern-based data mapping.

R2RML映射设计了一个逻辑表格，可以从关系数据库中提取数据。We will define a SQL query of table in relational database as a logical table. Each logical table is converted into RDF data by a triples map, that is, each row of instance data in the logical table is mapped to several RDF triples. R2RML mapping mechanism expression is:

logicalMap: mainly describe the name of database table.
subMap: the common subject of all the RDF triples corresponding to one logical row, is used to generate the subject of RDF triples. 生成三元组中的subject
preobjMap: each mapping consists of a predicateMap and an objectMap or a valueMap, which is used to make the predicate and object of RDF triples. 生成三元组中的object

R2RML mapping有一个mapping文档，包括一系列RDF三元组，可以表示为：

R2RML映射文档基于上面的表达式，可以用下图表示

3.3 Management and application of knowledge graph

The RDF storage pattern includes two types of tuple from the ontology mapping and data mapping. We extract the and as entities in the knowledge graph, and extract , , and as the attribute and the association in the knowledge graph, so we can construct the knowledge graph.

通常的知识图谱构建需要进行知识融合，但是在空难领域，是不需要的。因为数据来源已经是被处理过的结构化数据了。所以当数据增加的时候，本论文可以使用data-driven incremental ontology 建模技术来扩增concepts和instances。concept指的是a class with the same entities. 因为concept的改变并不多，所以主要是instance的自动更新。

知识图谱在空难领域的应用：1 智能语义检索。2 决策支持。

4 Implementation of aviation risk knowledge map

在构建图谱的过程中，一些实际的构建技巧。

the model of the aviation risk event ontology is established by using the ontology construction tool Protégé.
obtain structured data from the [ASRS Database Online] in ASRS. Use "icing" event as an example to illustrate the data acquisition method. 输入icing后，得到3028个case。然后把结果输出到excel，然后把这个结果输入到Oracle数据库作为KG的知识来源。然后使用已经定义好规则的R2RML映射文件，使用开源的r2rml-parser工具来实现从关系数据到RDF的变化。
开发工具是Eclipse，middleware Dorado7
系统有两部分组成：KG可视化模块和KG更新模块
可视化模块：The knowledge graph visualization module uses ontology-based RDF query and Gojs foreground visualization javascript plugin to model the knowledge graph, and it can achieve the classification modeling for key concepts. In the meantime, it use ontology-based word segmentation technology and SPARQL-based semantic retrieval technique to realize intelligent search function of knowledge graph for natural query statement.下图展示了这个模块

论文链接/代码

link

作者/机构

Qian Zhao1, Qing Li1 and Jingqian Wen2

1 School of Mechanical Engineering and Automation, Beihang University, Beijing 100191, China

2 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China

发表时间(yyyy/MM/dd)

2018/02/21

2017 Asia Conference on Mechanical and Aerospace Engineering (ACMAE 2017)

概要

创新点

手法

结果

非常ground-truth的一篇论文，并不是算法模型的创新，而是工程实践的一篇总结，主要是选择在了空难这个垂直领域，针对这个领域设计了一个新的ontology框架。剩下的就是一些工程实现了。

ACL-2018-Discovering Implicit Knowledge with Unary Relations

一句话总结

通过把二元关系变为一元关系，来捕捉文本里的implicit relation, 来提高KBP（KB构建）任务的效果。

IBM Research AI: 这篇论文的introduciton写得很好，推荐阅读。但是图画得真不敢恭维。最后的评价指标也优点微妙。但是点子不错，话说之前没有人做过这方面的尝试？

关键词：

unary relations：一元关系
Knowledge Base Population (KBP) ：任务的主要目标是知识库扩展和填充，研究的主要内容是传统的结构化知识库如 Freebase，目前它的构建绝大多数都要依靠人的编辑工作。知识库中描述的信息是物理世界的命名实体和实体之间关系的抽取，如「克林顿和希拉里之间是夫妻关系」、「克林顿毕业于耶鲁法学院」这样一个个实体的关系。但人工编辑有两个问题，一是工作量较大，再就是可能出现错误和时效性的问题。KBP 公开任务的研究目标，是让机器可以自动从自然书写的非结构化文本中抽取实体，以及实体之间的关系。参考阅读
distant supervision
pre-existing Entity Detection and Linking sys-tem
a new KBP benchmark：github
a new methodology to identify relations between entities in text
Our ap-proach, focusing on unary relations, can greatly improve the recall in automatic construction and updating of knowledge bases by making use of implicit and partial textual markers

本研究只考虑提供的triple是正确的情况。当前的问题是text中出现的关系很多都是implicit的，即没有明确的语法关系可以判断entity之间的relation，这导致了KBP关系抽取的大部分方法的recall比较低。因为这些方法大部分是依靠判断两个entity之间的词法-句法（lexical-syntactic）来确认relation的。

目标关系是presidentOf, 两个entity是TRUMP and UNITED_STATES.

“Trump issued a presidential mem-orandum for the United States”
“The Houston Astros will visit President Donald Trump and the White House on Monday”.

第一个句子里的两个entity有explicit relation

第二个句子里虽然也表达的同样的关系，但是是implicity，需要一些背景知识才能推测。而且 UNITED_STATES压根没有出现。

The state-of-the-art systems are affected by very low performance, close to 16.6% F1, as shown in the latest TAC-KBP evaluation cam-paigns and in the open KBP evaluation bench-mark1 .

为了从文本中识别implicit relation，本研究把 identifying binary re-lations problem变为a much larger set of simpler unary relations problm. 下面举个例子：

For example, to build a Knowl-edge Base (KB) about presidents in the G8 countries, thepresidentOf relation can be expanded to presidentOf:UNITEDSTATES, pres-identOf:GERMANY, presidentOf:JAPAN , and so on. For all these unary relations, we train a multi-class (and in other cases, multi-label) classifier from all the available training data. This classifier takes textual evidence where only one entity is identified (e.g. ANGELA_MERKEL ) and predicts a confidence score for each unary relation In this way, ANGELA_MERKELwill be assigned to the unary relation presidentOf:GERMANY , which in turn generates the triple (ANGELA_MERKEL, presidentOf:GERMANY) .

把不同的总统按国家来预先区分，即把presidentOf->which country 变为``president(country)`，也就是说少了一步推理的步骤，关系变为了一元关系。文本中得到的只能有一个entity，然后判断这个entity属于哪个关系。当做多关系任务来训练。

Baseline 是 Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances.

接下来的研究计划：

KBP's goal is to improve re-call by identifying implicit information from texts

First of all, we plan to explore the use of more ad-vanced forms of entity detection and linking, in-cluding propagating features from the EDL sys-tem forward for both unary and binary deep mod-els.
In addition we plan to exploit unary and bi-nary relations as source of evidence to bootstrap a probabilistic reasoning approach, with the goal of leveraging constraints from the KB schema such as domain, range and taxonomies.
We will also integrate the new triples gathered from textual evi-dence with new triples predicted from existing KB relationships by knowledge base completion

EMNLP-2018-FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation

提出了一个新的数据集，既可以用作关系分类，又可以用于做few-shot learning的验证。

DS带来的noisy，通过few-shot的方法来进行减少。然后用几个sota的few-shot learning方法，来做关系分类。结果显示与人类的判断还是有较大差距的。

关键词：

DS
few shot
meta-learning
RC

DS得来的数据数量比较少，一旦样本数下降，模型效果也会下降很快。比如NYT-10这个数据集，其中有58%的数据都是少于100 instances。DS得来的数据集中有很多noisy，模型苦于wrong labeling problem久已。因此，通过少量数据来训练RC模型是有必要的。

制作过程

2.1 DS

we remove relations with fewer than 1000 instances, and randomly keep 1000 instances for the rest of
the relations. As a result, we get a candidate set of 122 relations and 122, 000 instances.

2.2 Human Annotation

通过Amazon MTurk. Then the annotator is asked to judge whether the relation could be deduced only from the sentence semantics. We also ask the annotator to mark an instance as negative if the sentence is not complete, or the mention is falsely linked with the entity.

After the annotation, we remove relations with fewer than 700 positive instances. For the remaining 105 relations, we calculate the inter-annotator agreement for each relation using the free-marginal multirater kappa (Randolph, 2005), and keep the top 100 relations.

也就是说，每个relation起码都有700个instance啊。

2.3 Dataset Statistics

The final FewRel dataset consists of 100 relations, each has 700 instances. A full list of relations, including their names and descriptions, is provided in Appendix A.2. The average number of tokens in each sentence is 24.99, and there are 124, 577 unique tokens in total

Table 2 provides a comparison of our FewRel dataset to two other popular few-shot classification datasets, Omniglot and mini-ImageNet. Table 3 provides a comparison of FewRel to the previous RC datasets

就relation数量来说，fewrel是最多的。

后续研究

AAAI-2019-Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification: pdf, 这篇是清华团队的后续之作，话说一年过去了基于one/few shot learning的研究也没人搞啊，只有清华的人自己做
#51 NAACL-2019-Sentence Embedding Alignment for Lifelong Relation Extraction: code, 这篇论文只是用了数据集而已，提出了一个新的任务，Lifelong learning

EMNLP-2017-Position-aware Attention and Supervised Data Improve Slot Filling

一句话总结：

这个工作的主要创新点是结合了attention模型和position information。并提出了一个新的数据集。
#147 #148

资源：

pdf
code: PyTorch 1.0
TACRED dataset
TAC 2017 Cold Start KB Track
LDC: TAC Relation Extraction Dataset
TAC_KBP_2015_Slot_Descriptions: 数据集的标注手册

关键字：

dataset: tacred
KBP：pop-ulate a knowledge base with relational facts con-tained in a piece of text
extract triples, or equivalently, knowledge graph edges, such ashPenner , per:spouse,Lisa Dillmani .

笔记：

tacred里relation的统计数据

按照上面标注手册里的内容, 2015年slot filling task，一共有41个relation。然后tacred添加了一个新的no_relation标称了42个。从TACRED dataset官网可以看到，relation的分布是非常不均匀的，因为其中79.5%都是no_relation. 恐怕这也是为什么相比于其他NYT，SemEval等数据集，tacred SOTA f1才只有70%

有41个relation和1个no_relation，其中41个relaiton主要是分为per slot和org slot两种。而且tacred并不能用于DS noise方面的研究，应为数据集是通过经过人工编辑过后得到的，没有noise。而且主要是用于slot filling这方面研究的。基于noise的研究，主要是用NYT数据集来进行的。而long tail研究的话，主要是FewRel这个数据集，但是这个数据集没有annotation guide相关的文件，只知道有100个relation，每个relation有700个instance，而且每个relation只有编号，没有对应的解释，无法直观进行理解，而且数据只有head和tail的位置信息，其他什么都没有。而如果研究long tail+DS noise的话，用NYT数据集。

下面是论文里的内容

notes

This task involves en-tity recognition, mention coreference and/or entity linking, and relation extraction。而这个工作则是关注 most challenging “slot filling” task of filling in the relations between entities in the text. 说是slot filling，其实就是RC。

We believe machine learning approaches have suf-fered from two key problems: (1) the models used have been insufficiently tailored to relation extrac-tion, and (2) there has been insufficient annotated data available to satisfy the training of data-hungry models, such as deep learning models.

两个问题：

模型没有针对RE特化
数据没有被充分标注

This work addresses both of these problems. We propose a new, effective neural network se-quence model for relation classification. Its ar-chitecture is better customized for the slot fill-ing task: the word representations are augmented by extra distributed representations of word posi-tion relative to the subject and object of the puta-tive relation. This means that the neural attention model can effectively exploit the combination of semantic similarity-based attention and position-based attention.

为了让模型针对RE特化，提出的attention模型利用了position 信息。

Secondly, we markedly improve the availability of supervised training data by us-ing Mechanical Turk crowd annotation to pro-duce a large supervised training dataset (Table 1 ), suitable for the common relations between peo-ple, organizations and locations which are used in the TAC KBP evaluations. We name this dataset theTAC RelationExtractionD ataset (TACRED), and will make it available through the Linguistic Data Consortium (LDC) in order to respect copy-rights on the underlying text

通过众包创建了一个新的数据集。

2 A Position-aware Neural Sequence Model Suitable for Relation Extraction

这部分首先diss了一下 #13 以及其他两篇文章，这些工作都是在关于与CNN，RNN以及他们的一些结合上。这些工作在测试的数据及上效果可以，但是对于longer sentences的数据库来说，效果不好。

现在模型的结构有两个问题

尽管LSTM的gating mechanism能类一定程度上令每一个word都能对最后的sentence representatio做出贡献，但是这些工作并没有针对RE这个任务进行特化。
大部分工作并没有明显的model the position of entities，or model the positions only within a local region.

所以我们提出了position-aware attention mechanism over an LSTM network to tackle these challenges. 这个模型的优点：

(1) evaluate the relative contribution of each word after seeing the entire sequence
(2) base this evaluation not only on the semantic information of the sequence but also on the global positions of the entities within the sequence.

具体对position的建模：

每个entity有一个non-overlapping consecutive spans。受到了 #13 以及 Natural language processing (almost) from scratch这两篇文章对于position encoding的启发，define a position sequence relative to the subject entity

其中的P_{i}^{s}定义如下：

其中s1和s2是subject的starting and ending indices. 所以P_{i}^{s}可以看作是每个token x_i到subject entity之间的相对距离。同样的道理，可以得到P_{i}^{o}，即到object entities的相对距离。

这个attention把h_i, q, p_i^s,p_i^o都放在一起了。但是q本来就是h_i的集合体。如果不是因为效果不好的话，可能不会这样设计。

计算出a_i之后，每个句子的表示是a_i*h_i。

Here the summary vector (q) helps the model to base this selection on the semantic information of the entire sentence (rather than on each word only), while the position vectors (p_i^s and p_i^o) provides important spatial information between each word and the entities. 为什么上面的公式要这么设计？ q用来考虑整个sentence information, 而p_i^s and p_i^o则用来引入spatial information.

模型图：

结果：

后续基于tacred的结果：

#50, PA-LSTM, F1: 65.1
#49, C-GCN, F1: 66.4
#66, SGC, F1: 67.0
#65, TRE, F1: 67.4
#130, Simple BERT, F1: 67.8

注意，上面是单一模型的效果，如果是多模型的话，C-GCN已经能做到68.2了

EMNLP-2018-Large-scale Exploration of Neural Relation Classification Architectures

一句话总结

提出了一个模型，能在各种不同类型的RC数据集上有不错的表现。并比较了不同模型在6个数据集上的表现，查看模型的泛化效果。

code

关键词：

RC
applied to the Dependency Unit of Shortest Dependency Path
six dataset
15 models
class imbalance

这篇论文出彩的地方在于可视化以及实验部分。

当前针对RC的研究，针对的数据集范围比较小（比如只在一个或两个数据集上进行比较，然后就说自己是SOTA），所以怀疑在其他类型的数据集上的表现如何。本工作对6中不同类型的RC数据集进行了调查，并提出了一个multi-channel LSTM模型(+CNN)来利用liguistic and architectural feature. 这个模型在两个数据集上有SOTA的表现。

自动问答综述Overview of Question-Answering

一句话总结

简单介绍了自动问答系统的构造

一句话总结

Analogical Reasoning：类比推理
Chinese Morphological：中文形态学
Semantic Relations：语义关系
linguistic regularities：语言规律
lexical knowledge：词汇知识
implicit morphological relations：隐含的形态关系
explicit semantic relations：明确的语义关系
analytic language：分析型语言
inflection：词形变化。在语法学中，词形变化或屈折变化（Inflection or inflexion）指单词（或词根）的变化，以导致语法功能改变，进而使其代表的意义也有所改变。屈折变化又可以分为变位和变格，前者指动词的时、体、式、态等范畴的变化，或后者则指名词、代词、形容词的格、数和性等范畴的不同而导致的变化。其中表示该词汇基本概念的部份称为词干。

类比推理能捕捉语言规律。

Analogy questions can be automatically solved via vector computation, e.g.“apples - apple + car �= cars”for morphological regularities and “king -man + woman� = queen” for semantic regularities.

根据语言的不同，语言规律也会有很大的不同。而中文是一个典型的分析型语言。

贡献：

第一个中文类比推理的任务，用于检测中文的embeding质量
公开了36个开源的与训练词向量，Chinese-Word-Vectors

propose a Chinese ana-logical reasoning task based on 68 morphological relations and 28 semantic relations

Test for IssueHunt

This is a test issue for IssueHunt

EMNLP-2018-Graph Convolution over Pruned Dependency Trees Improves Relation Extraction

一句话总结：

通过GNN学习Dependency Trees的特征来提高RE效果。

资源：

pdf
code

关键字：

dataset: tacred，同一数据集 #50 ，semeval2010
TACRED：https://github.com/yuhaozhang/tacred-relation
TAC Knowledge Base Population (KBP) 2017：https://tac.nist.gov/2017/KBP/index.html
Dependency trees help relation extraction models capture long-range relations between words
se-quence and dependency-based neural models

笔记：

这个工作主要是和dependency-based neural models进行对比的。

传统的feature-based模型可以把dependency information表示为overlapping path along the trees. 但是这些模型面临两个问题：sparse feature spaces and are brittle to lexical variations.

More recent neural models address this problem with distributed representations built from their computation graphs formed along parse trees. 于是一些基于nerual的模型试图解决这个问题。在parse trees构成的graph上用graph representation。

这种基于graph的表示通常有两种做法：

leverage dependency information is to perform bottom-up or top-down computation along the parse tree or the subtree below the lowest common ancestor (LCA) of the entities ( Miwa and Bansal,2016 ).
Another popular approach, inspired by Bunescu and Mooney(2005 ), is to reduce the parse tree to the shortest dependency path between the entities (Xu et al.,2015a,b )

第一种做法的模型很难做到并行计算，because aligning trees for efficient batch training is usually non-trivial.

第二种做法的模型简化了假设，models based on the shortest dependency path between the subject and object are computationally more efficient, but this simplifying assumption has major limitations as well

本工作提出了一个针对RE特化的GNN模型，能encode dependency structure. 另外还使用了a novel path-centric pruning technique to remove irrelevant information from the tree while maximally keeping relevant content。

2.3 Contextualized GCN

Contextualized GCN (C-GCN) model, where the input word vectors are first fed into a bi-directional long short-term memory (LSTM) network to generate contextualized representations, which are then used as h (0) in the original model. This BiLSTM contextualization layer is trained jointly with the rest of the network

普通的GCN模型并不会考虑contextual information. 而且GCN对于parse tree的依赖性很强（如果parse效果不好的话，效果也会变差）。这里提出的C-GCN，先把word vector输入到bi-LSTM里，生成contextuallized representations, 即模型图中的 h(0). bi-LSTM是和网络一起进行联合学习的。

3 Incorporating Off-path Information with Pathcentric Pruning

下面这部分其实和我设计的sdp+rdp1的想法是一样的

5.1 Baseline Models

这部分把RE的模型分成了两类，第一类是Dependency-based models，第二类是Neural sequence model。后者就是(PA-LSTM) #50 .

模型图：

结果：

这篇论文的模型叫做Contextualized GCN (C-GCN)。可以看到

Investigations on Knowledge Base Embedding for Relation Prediction and Extraction

一句话介绍

一篇调查类论文。测试了不同knowledge base embedding方法对于relation prediction和relation extraction的效果。