GithubHelp home page GithubHelp logo

imclab / usent Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nik0spapp/usent

0.0 2.0 0.0 2.76 MB

Subjectivity and sentiment classification using polarity lexicons

License: Other

Python 100.00%

usent's Introduction

Dictionary-based sentiment detection

The attached code is an implementation of an unsupervised sentiment classification procedure that was used originally for an opinion mining and retrieval system (1st paper below) and for improving one-class collaborative filtering (2nd paper below). For the 2nd paper I have included a folder called "TED_comment_annotations" that contains the files of the human study we conducted on TED comment sentiment classification (with 6 human annotators). In case you use the code or the human annotations of TED comments for your research please cite the following papers:

The method combines two different bootstrapping procedures, namely for subjectivity and polarity detection (1st and 2nd paper accordingly). The rule-based polarity classifier is an extension of the one that was presented in the 3rd paper listed below.

  • E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP โ€™03, 2003.
    http://www.cs.utah.edu/~riloff/pdfs/emnlp03.pdf
  • D. K. M Wiegand. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In Proceedings of the ECAI-Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), 2009.
    http://www.lsv.uni-saarland.de/wassa.pdf
  • T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT โ€™05, 2005.
    http://people.cs.pitt.edu/~wiebe/pubs/papers/emnlp05polarity.pdf

Dependencies

The available code for unsupervised sentiment classification requires Python programming language and pip package manager to run. For detailed installing instructions please refer to the following links:
http://www.python.org/getit/
http://www.pip-installer.org/en/latest/

After installing them, you should be able to install the following packages:

$ pip install nltk  
$ pip install stemmer 
$ pip install numpy
$ pip install pickle 

After you install nltk you will need some corpora to train the sequential POS tagger (pos.py) and the nltk tokenizer.

$ python 
import nltk 
nltk.download() 

The issue of the above command will load a graphical interface that lets you manage several corpora related to nltk library. From the list select and download the following corpora: tokenizers/punkt/english, wordnet, brown, conll2000 and treebank.

Lastly, pyml library is needed for the SVM classifier that is used currently in our code.
Download http://pyml.sourceforge.net/ and then issue:

 $ tar zxvf PyML-0.7.11.tar.gz
 $ cd PyML-0.7.11
 $ python setup.py build
 $ python setup.py install 

Processing pipeline

The current pipline that is implemented in sentiment.py is depicted in the following diagram. Initially, the input text is split into sentences and each sentence is fed to a high precision subjectivity classifier. If the sentence is classified as subjective then syntactic patterns are learned from this instance. In case that the sentence is not detected as such then it is fed to the pattern-based classifier. The pattern-based classifier outputs the class of the sentence based on the learned patterns so far. If the instance is subjective then again more patterns are learned from it, otherwise it is fed to a high precision objectivity classifier. If the sentence is classified as objective, then it is ignored, otherwise it is fed to the polarity classifier.

Finally, the polarity classifier estimates the numerical sentiment and normalized sentiment values and outputs the result. The instances with high confidence from the polarity classifier can be further used to train an SVM classifier to improve further the classification performance (see paper for further details). At the current version this option is disabled, but you can easily enable it. Similarly, you can remove some of the components from the pipeline according to your needs (e.g. skip subjectivity classification).

ScreenShot

Examples

To estimate the total sentiment and total normalized sentiment (as described in the papers), you can simply execute the sentiment.py file and give the desired block of text as an argument. Make sure that you escape symbols such as '"' and '!'. Apart from the command line execution you can integrate the library to your code and use directly the returned results. Below you can find two simple examples for demonstrating purposes:

$ python sentiment.py "I have to give much love and respect to Rony. Your work is Amazing\!"

ScreenShot

$ python sentiment.py "I was blown away by some of the comments here posted by people who is either 
uneducated, ignorant, self-righteous or al-of-the-above. I'm irritated and saddened as I read these 
finger-pointing \"i'm right and you're wrong\" type of posts\!"

ScreenShot

usent's People

Contributors

nik0spapp avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.