GithubHelp home page GithubHelp logo

关于自己标注数据集 about deepke HOT 2 CLOSED

zjunlp avatar zjunlp commented on May 12, 2024
关于自己标注数据集

from deepke.

Comments (2)

yuwl798180 avatar yuwl798180 commented on May 12, 2024

你好,关系抽取打标是一件很费时的事情。
demo的数据不是我自己打标的,是采样了百度的数据集的一部分,具体使用他的数据集,可以见 网址

至于具体打标的过程:
现在基本采用远程监督的方式。先用已知的三元组与互联网中的语料对齐,得到粗糙的语料。
比如想得到 school-isLocatedIn-place 这样的三元组语料,要先有如relaton:isLocatedIn (head清华大学-tail北京) 这样的三元组。再用这个三元组到互联网中对齐相应的语句,比如一句话中出现了清华大学和北京,就默认这是一条符合要求的语料。
但是这样一定会存在误标的语句,原因是词语多义和句中head tail并不一定是描述相应关系。

远程监督得到的语料后,后面就只能用人工校验了。
打标的语句中不仅要标出 head,tail,最好还要标出head_offset, tail_offset,这两个值是为了找出正确的entity位置。比如:南京南站位于南京的东北处。此时想标记南京就要区分到底是哪个南京了。

另一个head_type ,这个标不标记到不重要,可以在relation中标记出。这个作用是为了训练使用的。
为了让训练的语句记住句子的范式,往往会使用type替换entity,效果提升会非常明显。
比如句子是:张三是《太极》的主演,替换后为person是movie的主演。肯定是第二句的句子范式可以更好些。

当然远程监督打标出来的数据,也是可以直接训练的,这样的论文也有很多。

from deepke.

tantingting1012 avatar tantingting1012 commented on May 12, 2024

好的,谢谢,方便问一下您这个项目,预测部分代码什么时候公开呢

from deepke.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.