GithubHelp home page GithubHelp logo

kmounlp / ner Goto Github PK

View Code? Open in Web Editor NEW
88.0 5.0 18.0 20.69 MB

한국어 개체명 정의 및 표지 표준화 기술보고서와 이를 기반으로 제작된 개체명 형태소 말뭉치

License: Other

Python 98.47% Perl 1.53%

ner's Issues

데이터 오류

데이터를 보면 두번째, 세번째 필드에 오류가 있는 경우가 있습니다.
세번째 tags 필드의 시작이 '+'인 경우에 태그가 빠져있어서 살펴보니,
대략 아래와 같은 규칙으로 교정이 가능해보였습니다.

if tags[0] == '+':
                if morphi == '봤':
                    morphs = '보+았'
                    tags   = 'VV+EP'
                if morphi == '됐':
                    morphs = '되+었'
                    tags   = 'VV+EP'
                if morphi == '되':
                    morphs = '되'
                    tags   = 'VV'
                if morphi == '했':
                    morphs = '하+였'
                    tags   = 'VV+EP'
                if morphi == '했었':
                    morphs = '하+였었'
                    tags   = 'VV+EP'
                if morphi == '왔':
                    morphs = '오+았'
                    tags   = 'VV+EP'
                if morphi == '왔었':
                    morphs = '오+았었'
                    tags   = 'VV+EP'
                if morphi == '와':
                    morphs = '오+아'
                    tags   = 'VV+EC'
                if morphi == '와야':
                    morphs = '오+아야'
                    tags   = 'VV+EC'
                if morphi == '와서':
                    morphs = '오+아서'
                    tags   = 'VV+EC'
                if morphi == '컸':
                    morphs = '크+었'
                    tags   = 'VA+EP'
                if morphi == '커서':
                    morphs = '크+어서'
                    tags   = 'VA+EC'
                if morphi == '커':
                    morphs = '크+어'
                    tags   = 'VA+EC'
                if morphi == '줬':
                    morphs = '주+었'
                    tags   = 'VV+EP'
                if morphi == '졌':
                    morphs = '지+었'
                    tags   = 'VX+EP'
                if morphi == '써야':
                    morphs = '쓰+어야'
                    tags   = 'VV+EC'
                if morphi == '써서':
                    morphs = '쓰+어서'
                    tags   = 'VV+EC'
                if morphi == '써':
                    morphs = '쓰+어'
                    tags   = 'VV+EC'
                if morphi == '써도':
                    morphs = '쓰+어도'
                    tags   = 'VV+EC'
                if morphi == '썼':
                    morphs = '쓰+었'
                    tags   = 'VV+EP'
                if morphi == '쐈':
                    morphs = '쏘+았'
                    tags   = 'VV+EP'
                if morphi == '꿨':
                    morphs = '꾸+었'
                    tags   = 'VV+EP'
                if morphi == '쳤':
                    morphs = '치+었'
                    tags   = 'VV+EP'
                if morphi == '췄':
                    morphs = '추+었'
                    tags   = 'VV+EP'
                if morphi == '놨':
                    morphs = '노+았'
                    tags   = 'VV+EP'
                if morphi == '겠':
                    morphs = '겠'
                    tags   = 'EP'
                if morphi == '퍼':
                    morphs = '푸+어'
                    tags   = 'VV+EC'
                if morphi == '뒀':
                    morphs = '두+었'
                    tags   = 'VV+EP'
                if morphi == '꼈':
                    morphs = '끼+었'
                    tags   = 'VV+EP'
                if morphi == '떴':
                    morphs = '뜨+었'
                    tags   = 'VV+EP'
                if morphi == '떠':
                    morphs = '뜨+어'
                    tags   = 'VV+EC'
...

Add a license

This data is very useful, but currently cannot legally be used for anything. Would you be willing to license the data? Some standard open-source licenses for data are Creative Commons fair-use licenses. For more information, see here.

If you're willing, I can submit a pull request with a LICENSE.

안녕하세요!!

해당 말뭉치에 대해서 라이센스 정책이 없어서 문의드립니다.
무료로 사용가능한 건가요?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.