GithubHelp home page GithubHelp logo

elegant-codes / detectdepressionintwitterposts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from peijoy/detectdepressionintwitterposts

0.0 1.0 0.0 514 KB

Detect depression in tweets using Keras

License: MIT License

Jupyter Notebook 100.00%

detectdepressionintwitterposts's Introduction

Detect Depression In Twitter Posts

Overview

Team member: Pei-Jo Yang

This is a final project for the course CMSC389A. Mental illness such as depression can be life threatening, with suicide as a possible outcome. In this project, a LSTM with Convolutional Neural Network is built using Keras to determine whether social platform users are depressive based on their Twitter posts. The accuracy of the model is evaluated and compared to a binary classification baseline model using logistic regression. It is discovered that the model has a 98.91% accuracy after 5 epochs, while the base line model has a much lower accuracy of 83.755%.

Retrieving Test Data

There are two kinds of tweets that are needed for this project: random tweets that do not indicate depression and tweets that show the user may have depression. The random tweets dataset can be found from the Kaggle dataset twitter_sentiment. It is harder to get tweets that indicate depression as there is no public dataset of depressive tweets, so in this project tweets indicating depression are retrieved using the Twitter scraping tool TWINT using the keyword depression by scraping all tweets in an one day span. The scrapped tweets may contain tweets that do not indicate the user having depression, such as tweets linking to articles about depression. As a result, the scrapped tweets need to be manually checked for better testing results. A csv file of scrapped tweets is provided, however the following code can be used to obtain depressive tweets for this project, keep in mind that the date in the code should be changed and the generated .csv file should be manually checked and moved to the project directory:

python3 twint.py -s depression --since 2018-05-15 -o depressive_tweets_processed.csv --csv

Test Data Split

Collected tweets are split into training, testing, and validation sets with a ratio of 60%:20%:20%.

Depressive Tweets Normal Tweets
Training 1384 7146
Validation 462 2382
Testing 462 2383
Total 2308 11911

Required Libraries

  • ftfy - fixes Unicode that's broken in various ways
  • gensim - enables storing and querying word vectors
  • keras - a high-level neural networks API running on top of TensorFlow
  • matplotlib - a Python 2D plotting library which produces publication quality figures
  • nltk - Natural Language Toolkit
  • numpy - the fundamental package for scientific computing with Python
  • pandas - provides easy-to-use data structures and data analysis tools for Python
  • sklearn - a software machine learning library
  • tensorflow - an open source machine learning framework for everyone

In addition, the pretrained vectors for the Word2Vec model is from here.

How to Run

To run the DepressionDetectionInTwitter.ipynb iPython notebook that contains all the code, please run the following line in the project directory:

$ jupyter notebook

Video Demo

Here is a video demo of this project, enjoy!

License

MIT

detectdepressionintwitterposts's People

Contributors

peijoy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.