GithubHelp home page GithubHelp logo

cwnsensetagger's Introduction

Word Sense Disambiguaion by Chinese Word Net

Chinese word sense disambiguation has been known to a very difficult problem since Chinese is a complicated language. A word can have dozens or even hundreds of meanings on different occasions. Manually labels the senses of the words is labor-intensive and inefficient.

In this project, we aim to solve this problem by state-of-the-art Bert model. It gives us huge performance gains and can score roughly 82% accuracy on Chinese word sense disambiguation problem.

Prerequest

  • Input should be tokenized first. POS Tagging is preferred but not required.
  • Suppose we have m sentences and each sentence has $n_m$ words.
    • list_of_sentence[ [list_of_word[[target, pos, sense_id, sense] * $n_m$ ] *m ]

    • The following is an example that has 2 sentences, input data should be formed as following

        [[["他","Nh","",""],["由","P","",""],["昏沈","VH","",""],["的","DE","",""],["睡夢","Na","",""],["中","Ng","",""],["醒來","VH","",""],[",","COMMACATEGORY","",""]],
         [["臉","Na","",""],["上","Ncd","",""],["濕涼","VH","",""],["的","DE","",""],["騷動","Nv","",""],["是","SHI","",""],["淚","Na","",""],["。","PERIODCATEGORY","",""]]]
      

How to get sense

  • At Project root directory (same as setup.py)

      pip3 install .
      import CWN_WSD
      data = read_somewhere() #list of sentence, and sentence is composed as list of word
      sense = CWN_WSD.wsd(data)
    
  • example can be found under example folder

Acknowledgement

We thank Po-Wen Chen ([email protected]) and Yu-Yu Wu ([email protected]) for contributions in model development.

cwnsensetagger's People

Contributors

duck105 avatar seantyh avatar yuyuwu5 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.