GithubHelp home page GithubHelp logo

ksenticnet's Introduction

KSenticNet: 한국어 감성 사전 (Korean sentiment resource)

How to use

  • Just download 'ksenticnet_kaist.py' file :)

Overview

  • There are several Korean sentiment analysis resources such as KNU SentiLex, KOSAC.
  • However, sentiment lexicons like them require a lot of time and human resources.
  • So I decided to make it easier and automated by combining SenticNet and KAIST Korean wordnet(KWN).

Example Image

KSenticNet Example

  • You can get words' sentic values, sentiments, polarity value and semantics.
  • I recommend you to use it with POS tagger(such as Kkma).

Building Process

KSenticNet Structure

Features

  • It follows major process of CSenticNet.
  • But it resolved duplicated sentic value problem on Korean and English word.

Process

  1. Make {english word : synsets} dictionary through KWN.
  2. Direct mapping ( Compare each synset's hypernyms to semantics in SenticNet words and find pair )
  3. Apply Lesk algorithm to the non-matched words in SenticNet.
  4. During 2, 3 there are synsets which get several different sentic values. Apply weighted average on sentic values based on AffectNet frequencies.
  5. For Korean words, assign the sentic value which was assigned on the synset.
  6. During 5, there are synsets which have only one Korean word. For those, use weighted average sentic value same as process 4.
  7. During 5, several Korean words are assigned different sentic values but we cannot use weighted average because each synset contains multiple Korean words. So compute average cosine similarity * of synsets for that Korean word and use only the most adequate synset to give sentic value.

* Cosine similarity is computed from Korean tuned-embedding vectors. The vectors of Korean words are tuned by Context2Vec structure from facebook Fasttext. In this structure, I scraped example sentences for target words from several dictionaries. While applying Bi-LSTM, Self-Attention, Neural Tensor Network, pre-trained Fasttext vectors are modified and adjusted. By using these tuned vectors we can compute cosine similarities among other Korean words in a synset and use average similarity as an index of 'adequacy'.

Resources

Results and validation

  • We can assign sentic value to 5465 Korean words.
  • Validate it through 1000 positive reviews and 1000 negative reviews in NAVER movie review corpus ( simple count after tokenizing by Kkma )
  • Precision: 52.87% | Recall: 85.4% | F1: 65.31%

ksenticnet's People

Contributors

zzaebok avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.