基于远监督的中文关系抽取

License: MIT License

Python 82.72% Jupyter Notebook 17.28%

distant-supervised-chinese-relation-extraction's People

Contributors

Stargazers

Watchers

distant-supervised-chinese-relation-extraction's Issues

运行test_demo,pred_result的entpair 怎么转成中文

如题，运行test_demo, 获得pred_result的entpair 时：

结果如下：

b'10001#27142'
10001#27142

b'10001#27142'
10001#27142
想问下，这些1001 跟27142是什么东东，怎么转成汉字呢，这些看着像词的id吗？是不是10001对应一个实体，27142对应一个实体？id对应的词从哪里找呢
@Xiaolalala 谢谢

运行add_relation文件时报如下错误，请问是什么原因？

NameError Traceback (most recent call last)
in
3 for can in data:
4 # can [sentence, head, tail, segment]
----> 5 if tail not in entities[can[1]].values():
6 can.append('NA')
7 processed_data.append(can)

NameError: name 'tail' is not defined

构建数据集时关系的方向问题

对筛选后的实体集合两两组合, 数据处理为[[sentence, entity_head, entity_tail, [sentence_seg]]]的格式。

这个两两组合貌似是对应SentenceSegment.py下这一部分的代码

        if len(new_eset)>1:
            for i in new_eset:
                for j in new_eset:
                    if j!=i:
                        new_data.append([d[0], i, j, sen_seg])

请问这个(i, j)还是(j, i)，为什么不需要考虑关系方向的问题？

怎么运行一个实例呢~？比如我就想得到『温瑞安是曾经的武侠大家』这么一句话的关系，是那个instance模式吗，但好像没被实现

KeyError: 'BaiduTAG' 依旧报错（python3.7）

KeyError Traceback (most recent call last)
in
19 if rel in value:
20 tp = key.split('_')[0].split('2')
---> 21 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
22 if tp[1]=='oth':
23 can.append(name[key])

KeyError: 'BaiduTAG'

add_relation 中title 没有定义，或者说键不对

按照作者提供的readme教程，一步一步走到了执行 add_relation程序时的标注部分时，出现了和评论区一样的报错。提示tail没有定义。我将can答应打印出来，发现是一句话。也就是说data列表的元素就是一句话，这样就没看懂作者后面使用can[1]作为一个键来索引entities字典是什么意思了。或许说我的data文件本来就错了？还希望作者能提供一下，can所代表的意思。或者data数据是否是一句话一句话作为元素构成的列表呢？
此外，

这个地方作者写了一个注释，我尝试猜测了一下，可是没有才出作者的意图。因为后面

这个地方引用了tail，说明tail和entities字典中的某些值是一样的。这就有点懵了。希望作者能提示一下，谢谢。
@Xiaolalala

咨询

你好，运行报错，请问您的tensorflow是什么版本呢
Traceback (most recent call last):
File "train_demo.py", line 2, in
import nrekit.framework
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 67, in
class re_framework:
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 124, in re_framework
optimizer=tf.train.GradientDescentOptimizer,
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'GradientDescentOptimizer'

EntityMatcher会匹配出很多单字符等无效实体

使用EntiyMatcher匹配的结果，由于是直接的暴力匹配实体字典，会出现很多无效的字符，如po主的例子，匹配出来的结果是这样的:
句子:北京, 简称京, 是中华人民共和国省级行政区、首都、直辖市, 是全国的政治、文化中心。
匹配实体：['称', '省级行政区', '辖', '直辖', '的', '治', '中心', '中', '区', '民', '政治', '共', '政', '国', '人民', '直辖市', '华人', '化', '北', '中华人民共和国', '行', '是', '直', '中华人民', '级']
甚至比分词得到的词还多的多...

作者如何得到的模型结果

我运行之后只能进行绘图，不能得到作者一样的结果

数据处理,entities

2.保留元素全为中文的三元组

In [20]:

all_chinese = re.compile('^[\u4e00-\u9fa5]*$')
new_data = []
for triple in data:
if bool(re.search(all_chinese, triple[0])):
if bool(re.search(all_chinese, triple[1])):
if bool(re.search(all_chinese, triple[2])):
new_data.append(triple)
len(new_data)
这个就把所有带有英文的实体都去掉了吧，包括，baidutag，baiducard，后面都运行不对

add_relation.ipynb

15         if rel in value:
 16             tp = key.split('_')[0].split('2')

---> 17 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
18 if tp[1]=='oth':
19 can.append(name[key])
KeyError: 'BaiduTAG'

xiaofei05 / distant-supervised-chinese-relation-extraction Goto Github PK

distant-supervised-chinese-relation-extraction's People

Contributors

Stargazers

Watchers

Forkers

distant-supervised-chinese-relation-extraction's Issues

如题，运行test_demo, 获得pred_result的entpair 时： 结果如下：

b'10001#27142' 10001#27142

b'10001#27142' 10001#27142

Recommend Projects

Recommend Topics

Recommend Org

Jobs

如题，运行test_demo, 获得pred_result的entpair 时：

结果如下：

b'10001#27142'
10001#27142

b'10001#27142'
10001#27142