GithubHelp home page GithubHelp logo

hhromic / python-twitter-toolbox Goto Github PK

View Code? Open in Web Editor NEW
31.0 6.0 1.0 52 KB

Twitter Toolbox for Python.

License: Apache License 2.0

Python 100.00%
twitter-api toolbox streaming-api command-line-tool programmer-tool

python-twitter-toolbox's Introduction

Twitter Toolbox for Python

Often we need to interact with the Twitter APIs to grab some data for research purposes or simple curiosity.

The Twitter API is very rich and powerful, however for many non-experienced users it can be tedious, cumbersome and tricky to code. Specially if you just want quick and reliable access to the API's methods!

For all those users who just want zero programming, this Twitter Toolbox might be very handy. And for those users that want more programmatic access, this Toolbox is also suitable and helpful!

All you need to do to easily start working with the Twitter APIs is to:

  1. Sign-up for your own Twitter App.
  2. Configure the Toolbox with your generated personal access credentials.
  3. Use the provided command-line tools.
  4. (optional) use the provided higher-level Toolbox API for Python in your own code.

Want to grab the list of followers of user @insight_centre? No problem:

tt-users-get-followers --screen-name insight_centre --output-file followers.ids

Want to turn those user Ids into fully hydrated Twitter User objects? No problem:

tt-users-get-hydrated --user-ids followers.ids --output-file followers.json

Want to receive some real-time Tweets about obama or mentioning @realDonaldTrump? No problem:

tt-streaming-get-filter --track obama @realDonaldTrump --output-file tweets.json

Want to see current real-time sample of Tweets text and you have the jq tool installed? No problem:

tt-streaming-get-sample | jq .text

As seen, you can omit the --output-file argument to get data into your standard output pipe.

Finally, many tools have a bulk processing variant that allows you to download data in batches directly and easily. For example if you have a list of user ids stored in a file, you can download the follower ids for each of them in separate files stored under a directory using just one command:

tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt

In case of any errors, simply run the command again and it will resume the bulk processing from where it was left.

Installation

You can use pip (or any PyPI-compatible package manager) for installation:

pip install twitter-toolbox

or, if you prefer a local user installation:

pip install --user twitter-toolbox

For Microsoft Windows users, you might need to run pip through the Python interpreter:

python -m pip install twitter-toolbox

Configuration File

The Twitter Toolbox is globally configured using the simple configuration language from Python stored into a file named .twtoolbox.cfg under your home directory (please note the leading period .).

You can easily create a minimal basic configuration from your Twitter API access credentials using the tt-config command-line tool. Example usage:

$ tt-config
WARNING: this tool will create a **NEW** config file and
  overwrite any existing previous configuration.

Consumer Key ...... : <INPUT YOUR CONSUMER KEY HERE>
Consumer Secret ... : <INPUT YOUR CONSUMER SECRET HERE>
Access Token Key .. : <INPUT YOUR ACCESS TOKEN KEY HERE>
Access Token Secret : <INPUT YOUR ACCESS TOKEN SECRET HERE>

After you input your authentication data, a new minimal configuration file will be created in your home directory (replacing any previous existing file!).

You can further customize this file using the below configuration sections and options. The available configuration sections and options are:

  • [twitter]: (required) for configuring your own Twitter API's access credentials. Options: consumer_key, consumer_secret, access_token_key, access_token_secret.
  • [search]: for configuring access to the Tweets Search API. Options: limit.
  • [search_users]: for configuring access to the Users Search API. Options: limit.
  • [timeline]: for configuring access to the Users Timeline API. Options: limit.
  • [followers]: for configuring access to the User Followers API. Options: limit.
  • [friends]: for configuring access to the User Friends API. Options: limit.
  • [sample]: for configuring access to the Streaming API's Sample Endpoint. Options: limit.
  • [filter]: for configuring access to the Streaming API's Filter Endpoint. Options: limit.
  • [firehose]: for configuring access to the Streaming API's Firehose Endpoint. Options: limit.

All the limit options specify the maximum number of results (users, Tweets, Ids) you want to download from Twitter, with 0 meaning unlimited. Be very careful with this option, the higher the number the easier you will exhaust your API rate limits. It is strongly recommended that you use the defaults from the Toolbox.

The following is a full example of a suitable configuration file. You can omit those sections/options that you want the defaults to be used. The very minimum is the [twitter] section with your configured API credentials.

[twitter]
consumer_key=YOUR_CONSUMER_KEY_HERE
consumer_secret=YOUR_CONSUMER_SECRET_HERE
access_token_key=YOUR_ACCESS_TOKEN_KEY_HERE
access_token_secret=YOUR_ACCESS_TOKEN_SECRET_HERE

[search]
limit = 0

[search_users]
limit = 1000

[timeline]
limit = 0

[followers]
limit = 30000

[friends]
limit = 30000

[sample]
limit = 0

[filter]
limit = 0

[firehose]
limit = 0

The option values under the [twitter] section must be replaced by your own Twitter App credentials.

If the configuration file, any section or option are not specified, built-in defaults are used.

Tools for the Streaming API

  • tt-streaming-get-sample
  • tt-streaming-get-filter
  • tt-streaming-get-firehose

All tools have an --output-file argument. If omitted, the standard output pipe is used.

Additionally, all tools also have a --resume flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.

Example usage:

tt-streaming-get-sample --output-file tweets.json
tt-streaming-get-filter --track obama trump --follow 6456345 --resume
tt-streaming-get-filter --locations -122.75 36.8 -121.75 37.8 -74 40 -73 41
tt-streaming-get-firehose

Tools for Tweets

  • tt-tweets-get-hydrated
  • tt-tweets-get-retweets
  • tt-tweets-get-timeline
  • tt-tweets-search

All tools have an --output-file argument. If omitted, the standard output is used.

Additionally, all tools also have a --resume flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.

Example usage:

tt-tweets-get-hydrated --tweet-ids tweet_ids.txt --output-file tweets.json
tt-tweets-get-retweets --tweet-id 64563457564
tt-tweets-get-timeline --screen-name insight_centre
tt-tweets-search --query "twitter api" --resume

Tools for Twitter Users

  • tt-users-get-hydrated
  • tt-users-get-followers
  • tt-users-get-friends
  • tt-users-search

All tools have an --output-file argument. If omitted, the standard output is used.

Additionally, all tools also have a --resume flag to indicate that you want to append data to an existing output file instead of truncating it. Beware that this option does not de-duplicate existing data.

Example usage:

tt-users-get-hydrated --user-ids user_ids.txt --screen-names screen_names.txt
tt-users-get-followers --user-id 54252345
tt-users-get-friends --screen-name insight_centre --resume
tt-users-search --query "rte" --output-file users.json

Tools for Bulk Processing

  • tt-tweets-bulk-get-retweets
  • tt-tweets-bulk-get-timeline
  • tt-tweets-bulk-search
  • tt-users-bulk-get-followers
  • tt-users-bulk-get-friends
  • tt-users-bulk-search

All tools have an --output-dir argument. The directory is automatically created if not found. Some tools support resuming the bulk processing according to existing files in the output directory.

Example usage:

tt-tweets-bulk-get-retweets --output-dir retweets --tweet-ids tweet_ids.txt
tt-tweets-bulk-get-timeline --output-dir timelines --screen-names screen_names.txt
tt-tweets-bulk-search --output-dir searches --queries queries.txt
tt-users-bulk-get-followers --output-dir followers --user-ids user_ids.txt
tt-users-bulk-get-friends --output-dir friends --screen_names screen_names.txt
tt-users-bulk-search --output-dir searches --queries queries.txt

Toolbox API

The Twitter toolbox is contained in the twtoolbox module. The above command-line tools are actually wrappers around the functions listed below. The same semantics are used, including reading the configuration file.

Streaming API

The following functions are available in the streaming submodule:

  • get_sample(writer)
  • get_filter(writer, follow=None, track=None, locations=None)
  • get_firehose(writer)

Example usage:

from twtoolbox import streaming

with open("tweets.json", "w") as writer:
    streaming.filter(writer, track=["obama"])

Tweets

The following functions are available in the tweets submodule:

  • get_hydrated(writer, tweet_ids)
  • get_retweets(writer, tweet_id)
  • get_timeline(writer, user_id=None, screen_name=None, since_id=0)
  • search(writer, query, since_id=0)
  • bulk_get_retweets(output_dir, tweet_ids)
  • bulk_get_timeline(output_dir, user_ids=None, screen_names=None)
  • bulk_search(output_dir, queries)

Example usage:

from twtoolbox import tweets

with open("tweets.json", "w") as writer:
    tweets.search(writer, query="twitter api")

tweets.bulk_get_retweets("retweets", [768585599271993344, 768585794458120192])

Users

The following functions are available in the users submodule:

  • get_hydrated(writer, user_ids=None, screen_names=None)
  • get_followers(writer, user_id=None, screen_name=None)
  • get_friends(writer, user_id=None, screen_name=None)
  • search(writer, query)
  • bulk_get_followers(output_dir, user_ids=None, screen_names=None)
  • bulk_get_friends(output_dir, user_ids=None, screen_names=None)
  • bulk_search(output_dir, queries)

Example usage:

from twtoolbox import users

with open("followers.txt", "w") as writer:
    users.get_followers(writer, screen_name="twitter")

users.bulk_get_friends("friends", user_ids=[1635345, 645648754])

License

This software is under the Apache License 2.0.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

python-twitter-toolbox's People

Contributors

hhromic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

alanveloso

python-twitter-toolbox's Issues

ocassional crash while listening to the Streaming API

Hello all,
there is a bug in Tweepy that produces an ocassional crash while listening for long-enough time to the Streaming API. The problem has been reported since a while in upstream Tweepy:
tweepy/tweepy#576

I will try to think on a way to circumvent the problem, but the real fix has to come from the Tweepy developers.

Cheers!
Hugo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.