xiaofei05 / distant-supervised-chinese-relation-extraction Goto Github PK
View Code? Open in Web Editor NEW基于远监督的中文关系抽取
License: MIT License
基于远监督的中文关系抽取
License: MIT License
b'10001#27142'
10001#27142
想问下,这些1001 跟27142是什么东东,怎么转成汉字呢,这些看着像词的id吗?是不是10001对应一个实体,27142对应一个实体?id对应的词从哪里找呢
@Xiaolalala 谢谢
NameError Traceback (most recent call last)
in
3 for can in data:
4 # can [sentence, head, tail, segment]
----> 5 if tail not in entities[can[1]].values():
6 can.append('NA')
7 processed_data.append(can)
NameError: name 'tail' is not defined
对筛选后的实体集合两两组合, 数据处理为[[sentence, entity_head, entity_tail, [sentence_seg]]]的格式。
这个两两组合貌似是对应SentenceSegment.py下这一部分的代码
if len(new_eset)>1:
for i in new_eset:
for j in new_eset:
if j!=i:
new_data.append([d[0], i, j, sen_seg])
请问这个(i, j)还是(j, i),为什么不需要考虑关系方向的问题?
KeyError Traceback (most recent call last)
in
19 if rel in value:
20 tp = key.split('_')[0].split('2')
---> 21 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
22 if tp[1]=='oth':
23 can.append(name[key])
KeyError: 'BaiduTAG'
按照作者提供的readme教程,一步一步走到了执行 add_relation程序时的标注部分时,出现了和评论区一样的报错。提示tail没有定义。我将can
答应打印出来,发现是一句话。也就是说data列表的元素就是一句话,这样就没看懂作者后面使用can[1]
作为一个键来索引entities字典是什么意思了。或许说我的data文件本来就错了?还希望作者能提供一下,can所代表的意思。或者data数据是否是一句话一句话作为元素构成的列表呢?
此外,
这个地方作者写了一个注释,我尝试猜测了一下,可是没有才出作者的意图。因为后面
这个地方引用了tail,说明tail和entities字典中的某些值是一样的。这就有点懵了。希望作者能提示一下,谢谢。
@Xiaolalala
你好,运行报错,请问您的tensorflow是什么版本呢
Traceback (most recent call last):
File "train_demo.py", line 2, in
import nrekit.framework
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 67, in
class re_framework:
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 124, in re_framework
optimizer=tf.train.GradientDescentOptimizer,
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'GradientDescentOptimizer'
使用EntiyMatcher匹配的结果,由于是直接的暴力匹配实体字典,会出现很多无效的字符,如po主的例子,匹配出来的结果是这样的:
句子:北京, 简称京, 是中华人民共和国省级行政区、首都、直辖市, 是全国的政治、文化中心。
匹配实体:['称', '省级行政区', '辖', '直辖', '的', '治', '中心', '中', '区', '民', '政治', '共', '政', '国', '人民', '直辖市', '华人', '化', '北', '中华人民共和国', '行', '是', '直', '中华人民', '级']
甚至比分词得到的词还多的多...
我运行之后只能进行绘图,不能得到作者一样的结果
2.保留元素全为中文的三元组
In [20]:
all_chinese = re.compile('^[\u4e00-\u9fa5]*$')
new_data = []
for triple in data:
if bool(re.search(all_chinese, triple[0])):
if bool(re.search(all_chinese, triple[1])):
if bool(re.search(all_chinese, triple[2])):
new_data.append(triple)
len(new_data)
这个就把所有带有英文的实体都去掉了吧,包括,baidutag,baiducard,后面都运行不对
add_relation.ipynb
15 if rel in value:
16 tp = key.split('_')[0].split('2')
---> 17 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
18 if tp[1]=='oth':
19 can.append(name[key])
KeyError: 'BaiduTAG'
我用双1080ti显卡,超过20G的可运行内存没跑完
未定义tail
基本是y=1的一条水平直线
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.