GithubHelp home page GithubHelp logo

kyle-bong / level2_dataannotation_nlp-level2-nlp-04 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from boostcampaitech4lv23nlp1/level2_dataannotation_nlp-level2-nlp-04

0.0 0.0 0.0 1.53 MB

level2_dataannotation_nlp-level3-nlp-04 created by GitHub Classroom

Shell 0.01% Python 4.85% Jupyter Notebook 95.14%

level2_dataannotation_nlp-level2-nlp-04's Introduction

Data Annotation for Relation Extraction - Olympic ๐Ÿ†

ํŒ€ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

  • ๊ณตํ†ต : Entity ๋ฐ Relation ์ •์˜, ํŒŒ์ผ๋Ÿฟ ํƒœ๊น… ๋ฐ ๋ฉ”์ธ ์–ด๋…ธํ…Œ์ด์…˜
  • ๊น€ํ•ด์› : ๊ฐ€์ด๋“œ๋ผ์ธ ์ž‘์„ฑ
  • ๊น€ํ˜œ๋นˆ : ๋ชจ๋ธ ํŠœ๋‹
  • ๋ฐ•์ค€ํ˜• : ๊ฐ€์ด๋“œ๋ผ์ธ ์ž‘์„ฑ
  • ์–‘๋ด‰์„ : Relation map ์ž‘์„ฑ
  • ์ด์˜ˆ๋ น : IAA ๊ณ„์‚ฐ

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

1.1. ๊ด€๊ณ„ ์ถ”์ถœ(Relation Extraction)

  • ๊ด€๊ณ„ ์ถ”์ถœ : ํ•˜๋‚˜์˜ ๋ฌธ์žฅ์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฐœ์ฒด(Entity) ์Œ ์‚ฌ์ด์˜ ์˜๋ฏธ์  ๊ด€๊ณ„๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ํƒœ์Šคํฌ์ด๋‹ค. ๋ฌธ์žฅ์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฐœ์ฒด ์Œ์€ ์ฃผ์ฒด(Subject entity)์™€ ๋Œ€์ƒ(Object entity)๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ์ด ๊ฐœ์ฒด ์Œ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์ด ๊ด€๊ณ„ ์ถ”์ถœ์˜ ๋ชฉํ‘œ์ด๋‹ค.

1.2. ๋ฐ์ดํ„ฐ์…‹

  • ๋„๋ฉ”์ธ: ์˜ฌ๋ฆผํ”ฝ
  • ์†Œ์Šค ๋ฌธ์žฅ ์ถœ์ฒ˜: ํ•œ๊ตญ์–ด ์œ„ํ‚คํ”ผ๋””์•„ (https://ko.wikipedia.org/ CC BY-SA 3.0)
  • ๊ทœ๋ชจ: ๊ฐœ์ฒด ๋ถ„์„์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฌธ์žฅ์„ ์ œ์™ธํ•œ, ์ด 1,091 ๊ฐœ์˜ ๋ฌธ์žฅ
  • Train set, Dev set, Test set: ํด๋ž˜์Šค ๋ถ„ํฌ๊ฐ€ ์œ ์‚ฌํ•˜๋„๋ก ์ธตํ™”์ถ”์ถœ์„ ํ™œ์šฉํ•ด ๋ถ„๋ฆฌํ•˜์˜€๋‹ค.
  • ๋ฐ์ดํ„ฐ์…‹์˜ ํ‰๊ฐ€:
    • ์ž‘์—…์ž ๊ฐ„ ์ผ์น˜๋„(Inter-annotator agreement, IAA): Fleissโ€™ Kappa = 0.911
    • ๋ชจ๋ธ ํŠœ๋‹ ๊ฒฐ๊ณผ: klue/roberta-large ๋กœ ํ•™์Šต ์‹œ F1 score = 95.035, AUPRC = 92.903


Train set, Dev set, Test set์˜ ๊ด€๊ณ„ ํด๋ž˜์Šค๋ณ„ ๋ถ„ํฌ


2. ํƒœ๊น… ์ž‘์—…

2.1. ์ž‘์—… ๊ณผ์ •

  1. ์›์‹œ ์ฝ”ํผ์Šค ๋ฌธ์žฅ ๋ถ„๋ฆฌ
    • KSS ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•ด ์›์‹œ ์ฝ”ํผ์Šค๋ฅผ ๋ฌธ์žฅ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌํ•œ ํ›„, ๋นˆ ๊ด„ํ˜ธ ๋ฐ ์†Œ์ œ๋ชฉ ๋“ฑ์„ ์ œ๊ฑฐํ•˜์˜€๋‹ค.
    • ๊ฐœ์ฒด๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ์—๋Š” ์ž‘์—…์ž๊ฐ€ ์ง์ ‘ ์•ž๋ฌธ์žฅ์ด๋‚˜ ๋’ท๋ฌธ์žฅ๊ณผ ๋ณ‘ํ•ฉํ•˜์˜€๋‹ค.
  2. ๊ฐœ์ฒด ์œ ํ˜• ๋ฐ ๊ด€๊ณ„ ์„ค์ •
    • ์˜ฌ๋ฆผํ”ฝ ๋„๋ฉ”์ธ์—์„œ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ๊ฐœ์ฒด ์œ ํ˜•๊ณผ ๊ด€๊ณ„๋ฅผ ์„ ์ •ํ•˜์˜€๋‹ค.
  3. ๊ฐœ์ฒด ํƒœ๊น… ๋ฐ ๊ด€๊ณ„ ํƒœ๊น… (ํŒŒ์ผ๋Ÿฟ)
    • 2์—์„œ ์„ ์ •ํ•œ ๊ฐœ์ฒด์™€ ๊ด€๊ณ„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํŒŒ์ผ๋Ÿฟ ํƒœ๊น…์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
  4. IAA ๋ฐ ๊ด€๊ณ„ ๋ถ„ํฌ ํ™•์ธ ๋ฐ ๊ฐœ์ฒด ์œ ํ˜• ๋ฐ ๊ด€๊ณ„ ์ˆ˜์ •
    • ํŒŒ์ผ๋Ÿฟ ๊ฒฐ๊ณผ IAA๋Š” 0.838๋กœ ๋†’๊ฒŒ ๋‚˜์™”์œผ๋‚˜, ๊ด€๊ณ„ ํด๋ž˜์Šค ๊ฐ„ ๋ช…ํ™•ํ•œ ๊ตฌ๋ถ„์„ ์œ„ํ•ด ๊ฐœ์ฒด ์œ ํ˜• ๋ฐ ๊ด€๊ณ„๋ฅผ ๋‹ค์‹œ ์ˆ˜์ •ํ•˜์˜€๋‹ค.
  5. ๋ณธ ํƒœ๊น…
    • 4์—์„œ ์„ ์ •ํ•œ ๊ฐœ์ฒด์™€ ๊ด€๊ณ„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ 1๋ช…๋‹น 200๊ฐœ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ๊ฐœ์ฒด ๋ฐ ๊ด€๊ณ„๋ฅผ ํƒœ๊น…ํ•œ ๋’ค, ๋‹ค๋ฅธ ์ž‘์—…์ž๊ฐ€ ํƒœ๊น…ํ•œ ์ž‘์—…๋ฌผ์— ๋Œ€ํ•ด์„œ๋„ ๊ด€๊ณ„๋ฅผ ํƒœ๊น…ํ•˜์˜€๋‹ค.
  6. ์ตœ์ข… IAA ์ธก์ •
    • 5๋ช…์˜ ์ž‘์—…์ž ๊ฐ„ IAA: 0.911
  7. ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ fine-tuning
    • ๋ชจ๋ธ ํ•™์Šต ์„ฑ๋Šฅ (RbertWithLSTM)

      test micro-f1 test auprc epoch batch
      1. klue/bert-base 90.226 95.930 10 16
      2. klue/roberta-large 95.035 92.903 10 16
      3. klue/bert-base 93.525 96.756 10 32
      4. klue/roberta-large 92.308 93.804 10 32


๊ฐ€์žฅ ์ข‹๊ฒŒ ๋‚˜์˜จ 2. klue/roberta-large ๋ชจ๋ธ์˜ Confusion matrix

2.2. ์ž‘์—… ๋„๊ตฌ

  • Tagtog

    • ๊ฐœ์ฒด ํƒœ๊น…์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜์˜€๋‹ค.
    • Subject, Object์˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•œ ํ›„, ํƒ€์ž…์„ ํƒœ๊น…ํ•˜์˜€๋‹ค.
      • ์ด๋ฅผ csvํŒŒ์ผ๋กœ ๋ณ€ํ™˜ ํ›„ ๊ด€๊ณ„ ํƒœ๊น…์„ ์œ„ํ•ด ์Šคํ”„๋ ˆ๋“œ ์‹œํŠธ๋กœ ์˜ฎ๊ฒจ์„œ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
  • ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ

    • ๊ด€๊ณ„ ํƒœ๊น…์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜์˜€๋‹ค.
    • Tagtog์˜ ์ž‘์—… ๊ฒฐ๊ณผ๋ฌผ์— ๋Œ€ํ•ด Subject Entity์™€ Object Entity ์‚ฌ์ด์˜ ๊ด€๊ณ„ ํƒœ๊น…์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

3. Relation map


4. Guideline


5. ์ž์ฒด ํ‰๊ฐ€ ์˜๊ฒฌ

  • ์ง์ ‘ ์ž‘์—…์ž๊ฐ€ ๋˜์–ด์„œ ํƒœ๊น…์„ ํ•  ๋•Œ ์• ๋งคํ–ˆ๋˜ ๋ถ€๋ถ„์ด๋‚˜ ์ด์Šˆ๊ฐ€ ๋˜๋Š” ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ํ† ๋ก ์„ ํ†ตํ•ด ๊ทœ์น™์„ ์ •ํ•˜๊ณ , ๊ทธ๊ฒƒ์„ ๊ฐ€์ด๋“œ๋ผ์ธ์— ๋ฐ˜์˜ํ•ด์„œ ๋ช…ํ™•ํ•œ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์„ธ์› ๋‹ค. ๋ช…ํ™•ํ•œ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ํ†ตํ•ด IAA์ง€์ˆ˜๊ฐ€ ๋†’์€ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์—ˆ๋‹ค.
  • ๊ต์ฐจ ๊ฒ€์ˆ˜๋ฅผ ํ†ตํ•ด ํŒ€์› ๋ชจ๋‘๊ฐ€ ํ•œ๋ฒˆ์”ฉ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ์–ด๋…ธํ…Œ์ด์…˜ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๊ณ  ๊ฒ€์ˆ˜ ์‹œ ๋ฐœ์ƒํ•œ ์ด์Šˆ๋“ค์„ ํ† ๋ก ์„ ๊ฑฐ์ณ์„œ ํ•ด๊ฒฐํ•˜์˜€๋‹ค.
  • ํด๋ž˜์Šค ๊ฐ„์˜ ๋ถˆ๊ท ํ˜•์„ ์ตœ๋Œ€ํ•œ ๊ณ ๋ คํ•˜์—ฌ ๊ด€๊ณ„๋ฅผ ์ง€์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํŒŒ์ผ๋Ÿฟ ํƒœ๊น… ํ›„ ํ† ๋ก ์„ ๊ฑฐ์ณค์ง€๋งŒ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•˜์˜€๋˜ ๊ฒƒ์ด ์•„์‰ฌ์› ๋‹ค.
  • ์„ค์ •ํ•œ ๊ฐœ์ฒด์™€ ๊ด€๊ณ„ ๊ฐ„์˜ ๊ตฌ๋ถ„์ด ๋ช…ํ™•ํ–ˆ๋˜ ๋•๋ถ„์— IAA๊ฐ€ ๋†’๊ฒŒ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

level2_dataannotation_nlp-level2-nlp-04's People

Contributors

54data avatar hyebinnn avatar junhyung1206 avatar kyle-bong avatar yeryeonglee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.