GithubHelp home page GithubHelp logo

xiaofei05 / distant-supervised-chinese-relation-extraction Goto Github PK

View Code? Open in Web Editor NEW
383.0 383.0 62.0 47 KB

基于远监督的中文关系抽取

License: MIT License

Python 82.72% Jupyter Notebook 17.28%

distant-supervised-chinese-relation-extraction's People

Contributors

xiaofei05 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

distant-supervised-chinese-relation-extraction's Issues

运行test_demo,pred_result的entpair 怎么转成中文

如题,运行test_demo, 获得pred_result的entpair 时:
image
结果如下:

b'10001#27142'
10001#27142

b'10001#27142'
10001#27142

b'10001#27142'
10001#27142
想问下,这些1001 跟27142是什么东东,怎么转成汉字呢,这些看着像词的id吗?是不是10001对应一个实体,27142对应一个实体?id对应的词从哪里找呢
@Xiaolalala 谢谢

构建数据集时关系的方向问题

对筛选后的实体集合两两组合, 数据处理为[[sentence, entity_head, entity_tail, [sentence_seg]]]的格式。

这个两两组合貌似是对应SentenceSegment.py下这一部分的代码

        if len(new_eset)>1:
            for i in new_eset:
                for j in new_eset:
                    if j!=i:
                        new_data.append([d[0], i, j, sen_seg])

请问这个(i, j)还是(j, i),为什么不需要考虑关系方向的问题?

KeyError: 'BaiduTAG' 依旧报错(python3.7)

KeyError Traceback (most recent call last)
in
19 if rel in value:
20 tp = key.split('_')[0].split('2')
---> 21 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
22 if tp[1]=='oth':
23 can.append(name[key])

KeyError: 'BaiduTAG'

add_relation 中title 没有定义,或者说键不对

按照作者提供的readme教程,一步一步走到了执行 add_relation程序时的标注部分时,出现了和评论区一样的报错。提示tail没有定义。我将can答应打印出来,发现是一句话。也就是说data列表的元素就是一句话,这样就没看懂作者后面使用can[1]作为一个键来索引entities字典是什么意思了。或许说我的data文件本来就错了?还希望作者能提供一下,can所代表的意思。或者data数据是否是一句话一句话作为元素构成的列表呢?
此外,
image
这个地方作者写了一个注释,我尝试猜测了一下,可是没有才出作者的意图。因为后面
image
这个地方引用了tail,说明tail和entities字典中的某些值是一样的。这就有点懵了。希望作者能提示一下,谢谢。
@Xiaolalala

咨询

你好,运行报错,请问您的tensorflow是什么版本呢
Traceback (most recent call last):
File "train_demo.py", line 2, in
import nrekit.framework
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 67, in
class re_framework:
File "/home/mere/Distant-Supervised-Chinese-Relation-Extraction-master/nrekit/framework.py", line 124, in re_framework
optimizer=tf.train.GradientDescentOptimizer,
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'GradientDescentOptimizer'

EntityMatcher会匹配出很多单字符等无效实体

使用EntiyMatcher匹配的结果,由于是直接的暴力匹配实体字典,会出现很多无效的字符,如po主的例子,匹配出来的结果是这样的:
句子:北京, 简称京, 是中华人民共和国省级行政区、首都、直辖市, 是全国的政治、文化中心。
匹配实体:['称', '省级行政区', '辖', '直辖', '的', '治', '中心', '中', '区', '民', '政治', '共', '政', '国', '人民', '直辖市', '华人', '化', '北', '中华人民共和国', '行', '是', '直', '中华人民', '级']
甚至比分词得到的词还多的多...

数据处理,entities

2.保留元素全为中文的三元组

In [20]:

all_chinese = re.compile('^[\u4e00-\u9fa5]*$')
new_data = []
for triple in data:
if bool(re.search(all_chinese, triple[0])):
if bool(re.search(all_chinese, triple[1])):
if bool(re.search(all_chinese, triple[2])):
new_data.append(triple)
len(new_data)
这个就把所有带有英文的实体都去掉了吧,包括,baidutag,baiducard,后面都运行不对

add_relation.ipynb

15         if rel in value:
 16             tp = key.split('_')[0].split('2')

---> 17 if check(entities[can[1]]['BaiduTAG'], tgs[tp[0]]):
18 if tp[1]=='oth':
19 can.append(name[key])
KeyError: 'BaiduTAG'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.