GithubHelp home page GithubHelp logo

prishanmu / politician-bias-classifier Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 27 KB

Classifying Partisanship in U.S. Senator's Twitter and Facebook posts

Jupyter Notebook 100.00%
classification-algorithims classification-models social-media natural-language-processing tf-idf-vectorizer topic-modeling

politician-bias-classifier's Introduction

Classifying Partisanship in Politicians’ Social Media Posts

SI 670 (Applied Machine Learning) Final Project

Authors: Maryam Seifeldin (https://github.com/mseifeldin) and Priyanka Shanmugasundaram (https://github.com/prishanmu)

Filtering, encoding, and text-preprocessing code credit to: https://www.kaggle.com/laiquet/neutral-and-partisan-tweets-posts

Background

In the last few decades, much public political discourse has moved on to social media, and more intensely partisan messages from politicians on these platforms have been shown to heavily influence voters' decisions (Hemphill, Culotta & Heston, 2013). We apply machine learning techniques to find a relatively accurate model for identifying partisan messages in a sample dataset of Congressional politicians' tweets and Facebook posts from Crowdflower/Figure Eight. Aside from the text of each message, we use coded evaluations of each message’s target audience and intention as well as the identity of the posting politician, their state, and their Congressional house to predict partisanship.

Data Source

Link: https://www.kaggle.com/crowdflower/political-social-media-posts

Data come from Kaggle’s ‘Political Social Media Posts’ dataset, uploaded by Figure Eight. This data contains 5000 social media posts from Twitter and Facebook posted by US. House Representatives and Senators. Each post was then broken down by audience (national or local constituency), bias (non-partisan vs. partisan), and type of content in message (informational, attack on another candidate, appearance announcement etc.). In addition, the rater’s confidence in identifying each category was included.

Our Work

  1. Preprocessing: Audience, bias, source and type of politician were contrast coded. Message content, each politician’s state, and the poster’s id was one-hot encoded. Each post was stripped of punctuation, non-words, stopwords using the NLTK library, and words with low-information load. Then, these posts were put into a TF-IDF vectorizer to extract features from the posts. Uninformative columns and columns with repetitive information, were dropped. We ended up removing about 3% of the posts by filtering out any raters whose confidence was not 100%. Each of these variables except for bias were used as features for our classifier and clusters.

  2. Classification: For the classifier we decided that ‘bias’ was the target variable. We used GridSearchCV to test four models: Linear SVC, Kernelized SVM, Logistic Regression, and Random Forest Classifier. This provided us the best parameters for each model and the accuracy scores those parameters would give us.

  3. Clustering: In order to decide the number of clusters necessary, we created an Elbow Plot. This recommended around 4 clusters. Using the KMeans model, we clustered the data into 4 clusters and compared averages of features based across clusters. Then, we conducted an ANOVA to see if there was a significant difference in bias between clusters.

  4. Classification Using Deep Learning: This work is in progress and an extension of the final project. Priya is currently trying to build a classifier using deep learning methods.

Files in this Repo

  • Jupyter Notebook: Final_Project.ipynb
  • Jupyter Notebook: Classifying_Partisianship_Using_Deep_Learning.ipynb

Please contact us directly if you would like to see our final poster.

politician-bias-classifier's People

Contributors

prishanmu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.