GithubHelp home page GithubHelp logo

spno77 / tweetopinionmining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nedicnemanja/tweetopinionmining

0.0 1.0 0.0 8.69 MB

Opinion mining/Sentiment analysis of a Tweet vectorized dataset on the topic of cruptocurrencies.

License: MIT License

Makefile 1.12% C++ 98.88%

tweetopinionmining's Introduction

TweetOpinionMining

Opinion mining/Sentiment analysis of a Tweet vectorized dataset on the topic of cruptocurrencies. The goal is to be able to predict User sentiment towards cruptocurrencies which they have never mentioned. That is achieved using 2 distinct methods discussed below.

Every Tweet is preprocessed as a set of tokens. Each token is checked against our Lexicon where, it carries a value in the range[-1,+1]. If it does not exist in the Lexicon it is simply regarded as a neutral word and its value is 0.

A totalScore is calculated, for each Tweet, as following:

Each totalScore is normalised using this formula:

where alpha=15 by default.

Another Lexicon exist containing all the cryptocurrencies an their relevant names, abbrievations etc.

Each user holds a sentiment value towards the cruptocurrencies he/she has mentioned in a Tweet. For example: Whenever a User mentions "bitcoin" in a Tweet, that Tweet's totalScore get added to the bitcoin sentiment value of that user. If that user also mentions ethernum, then the totalScore of that Tweet also gets added to ethernum sentiment value etc.

Rate unknown cryptos using Nearest Neighbor (CosineSimilarity+LocalitySensitiveHashing) based on other users

Find NUM_NN (you can edit this variable) Nearest Neighbors. Calculate the similarity betwen the target-user and each NN using Cosine Similarity. Use the following formula to calculate the sentiment value of cryptos that are unknown(has not mentioned) to the user:

The general gist of the formula is that we extract the average sentiment value for each unkown crypto of our target_user using the sum of sentiment values that his Nearest Neighbors had for that crypto, but adjusted/weighted by the similarity between our target_user and each of his NearestNeighbors.

Rate unknown cryptos using Cosine-LSH based on tweet Clusters

This time we create Tweet clusters (using ClusteringAlgorithms from my previous project). For each cluster we will create a Virutal User. Now we repeat the previously described NearestNeighbor sentiment analysis process for the Virtual Users instead of all the Users. The point here being that a Virtual User is much more representative of the average user a thus a more stable are reliable metric.

Results

The results are written in an output file. For each method the users are enumerated. One user per line. The user's name is the first string, the next 5 represent the top5 cryptos that are recommended for the user using only the NearestNeighbor method and the last 2 represent the top2 cryptos recommended for the user using the Clustering Method.

This project was done as a task for a University project. Although I learned a lot, I wish I had more time to dedicate to it. There is much more that could've been done such as implementing and testing more methods based on User Clustering instead of tweet clustering and better evaluation for each method using 10-fold cross-validation.

tweetopinionmining's People

Contributors

nedicnemanja avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.