GithubHelp home page GithubHelp logo

casmlab / purpletag Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 5.0 287 KB

Calculate political polarization scores for members of U.S. Congress based on their tweets

License: BSD 3-Clause "New" or "Revised" License

Makefile 4.82% Python 90.99% HTML 4.18%

purpletag's People

Contributors

aronwc avatar carolgrrr avatar libbyh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purpletag's Issues

improve error handling for collect script

File "/usr/local/lib/python2.7/dist-packages/purpletag/purpletag_collect.py", line 37, in track_users
for tweet in twutil.collect.track_user_ids(ids):

add try/except around this line

Score MOCs

  • Add a new output to the score command to list polarization scores for each member of congress.

when using track option, stream stalls and dies

Here's the error I got:

Traceback (most recent call last):
  File "/home/libbyh/anaconda3/envs/purpletag/bin/purpletag-collect", line 11, in <module>
    load_entry_point('purpletag==0.1.4', 'console_scripts', 'purpletag-collect')()
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 84, in main
    track_users(ids)
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 40, in track_users
    for tweet in twutil.collect.track_user_ids(ids):
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/TwitterAPI/TwitterAPI.py", line 305, in __iter__
    for item in self._iter_stream():
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/TwitterAPI/TwitterAPI.py", line 280, in _iter_stream
    raise TwitterConnectionError('Twitter stream stalled')
TwitterAPI.TwitterError.TwitterConnectionError: Twitter stream stalled

Not sure how long into tracking this happened. -collect option worked fine.

enable alternative time windows

I'm looking for finer control over historical analysis. We talked about two different ways to alter the way the code currently works.

Right now, when you enter a command like

purpletag parse -t 60 -d 100

purpletag starts with today and works its way backwards 100 days, one at a time. For roughly 4.5GB of JSON data, this takes over an hour per day.

Option 1 - Allow "start date" or similar arg in the command line call

User would issue a command like

purpletag parse -t 60 -s 2017-05-01

and purple tag would parse tweets from the 60 days before May 1, 2017. This option essentially overrides the "start with today" default and replaces it with "start with day user specifies".

Option 2 - Reverse the order of the days loop

Change line that starts the days loop to start at the oldest day instead of the most recent day. This option reverses the order of the existing process, and then users could quit the process if they wanted only the old days' data.

Visualizations

  • Add a command visualize or graph that generates some interesting graphs based on a set of .scores files.

ignore retweets/mentions?

Tracking users on Twitter returns their tweets along with retweets and mentions. If we ignore retweets and mentions, we'd have much less data to deal with; however, it might be nice to have these for some other purpose.

fix search_users

  • Crashes after a while (twitter returns empty response).
  • Instead, use my old implementation in twutil

Create historical scores

  • Add a flag to score that simulates X days of historical scores
    • e.g., create .scores files for the past X days

TwitterConnectionError

--track option
running less than a day

WARNING:root:<type 'exceptions.ValueError'> Expecting ':' delimiter: line 1 column 4964 (char 4963)
Traceback (most recent call last):
  File "/home/libbyh/anaconda3/envs/purpletag/bin/purpletag-collect", line 11, in <module>
    load_entry_point('purpletag==0.1.4', 'console_scripts', 'purpletag-collect')()
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 84, in main
    track_users(ids)
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 40, in track_users
    for tweet in twutil.collect.track_user_ids(ids):
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/TwitterAPI/TwitterAPI.py", line 311, in __iter__
    raise TwitterConnectionError(e)
TwitterAPI.TwitterError.TwitterConnectionError: Expecting ':' delimiter: line 1 column 4964 (char 4963)

IncompleteRead TwitterConnectionError

Using --track option
Running for a couple days

WARNING:root:<class 'requests.packages.urllib3.exceptions.ProtocolError'> ('Connection broken: IncompleteRead(0 bytes read, 1 more expected)', IncompleteRead(0 bytes read, 1 more expected))
Traceback (most recent call last):
  File "/home/libbyh/anaconda3/envs/purpletag/bin/purpletag-collect", line 11, in <module>
    load_entry_point('purpletag==0.1.4', 'console_scripts', 'purpletag-collect')()
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 84, in main
    track_users(ids)
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/purpletag-0.1.4-py2.7.egg/purpletag/purpletag_collect.py", line 40, in track_users
    for tweet in twutil.collect.track_user_ids(ids):
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/TwitterAPI/TwitterAPI.py", line 305, in __iter__
    for item in self._iter_stream():
  File "/home/libbyh/anaconda3/envs/purpletag/lib/python2.7/site-packages/TwitterAPI/TwitterAPI.py", line 294, in _iter_stream
    raise TwitterConnectionError(e)
TwitterAPI.TwitterError.TwitterConnectionError: ('Connection broken: IncompleteRead(0 bytes read, 1 more expected)', IncompleteRead(0 bytes read, 1 more expected))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.