GithubHelp home page GithubHelp logo

twarc's Introduction

twarc

twarc is command line tool for archiving the tweets in a Twitter search result. Twitter search results live for a week or so, and are highly volatile. Results are stored as line-oriented JSON (each line is a complete JSON document), and are exactly what is received from the Twitter API. twarc handles rate limiting and paging through large result sets. It also handles repeated runs of the same query, by using the most recent tweet in the last run to determine when to stop.

twarc was originally created to save tweets related to Aaron Swartz.

How To Use

  1. pip install -r requirements.txt
  2. cp config.py.example config.py
  3. add twitter api credentials to config.py
  4. ./twarc.py aaronsw
  5. cat aaronsw.json
  6. :-(

Scrape Mode

If you pass the --scrape option to twarc it will use search.twitter.com to discover tweet ids, and then use the Twitter REST API to fetch the JSON for each tweet.

Twitter Search now supports drilling backwards in time, past the week cutoff of the REST API. Since individual tweets are still retrieved with the REST API, rate limits apply--so this is quite a slow process. Still, if you are willing to let it run for a while it can be useful to query for older tweets, until the official search REST API supports a more historical perspective.

Utils

In the utils directory there are some simple command line utilities for working with the json dumps like printing out the archived tweets as text or html, extracting the usernames, referenced urls, and the like. If you create a script that is handy please send me a pull request :-)

For example lets say you want to create a wall of tweets that mention 'nasa':

% ./twarc.py nasa
% utils/wall.py nasa-20130306102105.json > nasa.html

If you want the tweets ordered from oldest to latest:

% tail -r nasa-20130306102105.json | utils/wall.py > nasa.html

Or you want to create a word cloud of tweets you collected about nasa:

% ./twarc.py nasa
% utils/wordcloud.py nasa-20130306102105.json > nasa-wordcloud.html

Or if you want to filter out all the tweets that look like they were from women, and create a word cloud from them:

% ./twarc.py nasa
% utils/gender.py --gender female nasa-20130306102105.json | utils/wordcloud.py > nasa-female.html

License

  • CC0

twarc's People

Contributors

edsu avatar lsblakk avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.