clouded_minds's People
clouded_minds's Issues
Implement Tweet Vectorizer
Tweet vectorizer needs to be implemented. The challenge here is to aggregate similar words into one term while vectorizing to improve results.
Invalid JSON due to quotations
Using strip('"') doesn't seem to remove some of the " (quotations) from tweets, which results in invalid JSON. This needs to be fixed in order for sentiment analysis to work. If it's possible to use the 'r' before the string to denote raw, that'd fix the issue. But I don't think that's possible.
This can be seen at "id": 441 in the arsliv.json file.
If this is can be prevented in new data, it would save a lot of trouble.
Power point slides and update report for presentation on Nov 5
As title suggests.
Sentiment per file
Right now, it's hardcoded to do a single file.
Since JSON files are now in a separate directory, you could iterate through all files in the directory, then use the current filename as the working output name.
Study ARMA model for time-series analysis
I have currently implemented on my own to analyze the spikes across games. Anyone interested looking into ARMA model and what it offers for such datasets?
especially: @slgibson233 @Noflen
Split tweets into two halves per game
Each game needs to have tweets separated by team so that we can have tweets from fans from Team A and Team B for sentiment analysis by team. The keywords which are used to collect tweets can be used to separate tweets.
Example: Chelsea vs Southhampton (chesou.csv), Config file: config-chesou.txt
Chelsea keywords from config file: Chelsea,CFC
Southampton Keywords from config file: Southampton,SaintsFC
Game hastag(common for both teams) : #chesou,
Create File Directories
For our Sentiment Analysis (and I believe csv_to_json), the script will error out if the directories don't exist. Therefore, we've got to mainly create them for the games and all possible future games.
Should be able to use something akin to:
if not os.path.exists(path):
os.mkdirs(path)
Data without Language Tagging
A lot of our game data exists without the language tags, like arsliv and chears. When this is passed to Sentiment140, it'll classify but we just end up with a lot of "polarity: 2", which is a lot of noise for our data.
Do we want to throw out these games (meaning we lose most our games), or tag them as deprecated from the Sentiment system when we do our final presentation/poster?
Lang Availability Breaks on an IndexError
When doing:
lang = line_split[5]
An IndexError will occur. This is due to lines having '\n' whitespace that isn't removed by translate.
The error occurs on about line 5 in front of some ASCII ^- looking characters.
Clip tweets for given time frame
There are lots of tweets' files that contain unnecessary data of just before and after game. There needs to be a script that gets only those tweets between given interval from the files we are using and dump into new file sets.
-Write a python program, that takes input file, start time, and end time (GMT). Then it creates a new file based on the parameters.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.