GithubHelp home page GithubHelp logo

plumpmath / speech-acts-classifier Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gigasquid/speech-acts-classifier

0.0 0.0 0.0 128 KB

Speech act classifier for text based on Stanford CoreNLP and Weka

License: Eclipse Public License 1.0

Clojure 100.00%

speech-acts-classifier's Introduction

speech-acts-classifier

An experiment with parsing natural language and classifying the speech act of the sentence. This is especially important when a machine is trying to understand the meaning of a sentence in an environment, like a chat session, where missing punctuation is common.

This project classifies three speech acts: statements, questions, and expressives. Expressives are speech acts that express a mental state of the speaker. For example, "Thanks", "Ok", "lol".

The parsing and annotation is done with the wrapper around the Stanford CoreNLP library.

The classification uses the weka java library. A random forest model was trained on the following sentence features of the pos annotations:

  • Sentence length
  • Number of nouns in the sentence (NN, NNS, NNP, NNPS)
  • If the sentence ends in a noun or adjective (NN, NNS, NNP, NNPS, JJ, JJR, JJS)
  • If the sentence begins in a verb (VB, VBD, VBG, VBP, VPZ)
  • The count of the wh, (like who, what) markers (WDT, WRB, WP, WP$)

Training data for statements and questions were scraped from answers.com and then cleaned up by hand. The expressives were hand entered.

  • ~ 200 statements
  • ~ 200 questions
  • ~ 80 expressives

Summary of the Trained Model with cross validation:

Correctly Classified Instances         407               85.3249 %
Incorrectly Classified Instances        70               14.6751 %
Kappa statistic                          0.7658
Mean absolute error                      0.1185
Root mean squared error                  0.2665
Relative absolute error                 28.3497 %
Root relative squared error             58.3073 %
Total Number of Instances              477

The random forest model was chosen after interactively running the data through different models in weka explorer.

Usage

There are two main ways to use it.

The first is to use the classify-text function in the core. This will return back a keyword that is either :question, :statement, or :expressive.

(ns talk
  (:require [speech-acts-classifier.core :as c]))

(c/classify-text "I like cheese")
;; -> :statement

(c/classify-text "How do you make cheese")
;; -> :question

(c/classify-text "Right on")
;; -> :expressive

The second way is even more fun. It is a super simple chat bot based on your text. It will do a quick check to see if the text ends with a question mark. If not, it will run the classifier.

lein run
Hello.  Let's chat.
>> I like cheese
Nice to know.
>> Where do you go to buy your cheese
That is an interesting question.
>> wow
:)
>>

References

Classifying Sentences as Speech Acts in Message Board Posts Automated Speech Act Classification For Online Chat Student Speech Act Classification Using Machine Learning

Further Exploration

License

Copyright © 2015 Carin Meier

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

speech-acts-classifier's People

Contributors

gigasquid avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.