GithubHelp home page GithubHelp logo

pnytter's Introduction

pnytter

A Python library for scraping Twitter using one or more Nitter instances.

About Nitter & Pnytter

From Nitter's GitHub repository description: "A free and open source alternative Twitter front-end focused on privacy and performance".

Pnytter is a Python library that performs requests to Nitter instances for fetching different data from Twitter, requiring no official API credentials, and theorically no rate limits.

Features

This project currently features the following:

  • Supported methods:
    • Get Twitter profile data, by username
    • Get all the Tweets from a profile, by username in a date range
    • Get a single Tweet data by Tweet ID
  • Technical details:
    • Usage of multiple Nitter instances (chosen randomly for each request)
    • Return data using Pydantic objects

The features are bound to the development of my twitterscraper. Features may be requested through Issues or (preferably) Pull-Requests.

Requirements

  • Python >= 3.7
  • Requirements listed on requirements.txt
  • A hosted Nitter instance is recommeded for intensive use, to avoid overloading the public instances. It is recommended to use a Nitter version according the release date of the Pnytter version being used, to avoid incompatibilities.

Installing

Package available at PyPI.

# Virtual environment recommended
pip install pnytter

Usage

from pnytter import Pnytter
import pprint

# The Pnytter object needs at least 1 Nitter instance to work, but these can be added after initialization
pnytter = Pnytter(
  nitter_instances=["https://nitter.net"]
)

# Method to add instances to a Pnytter object. The `times` kwarg repeats the instance to increase its chances of being used
pnytter.add_instance("https://nitter.pussthecat.org", times=2)



# Find the data from a single user
user = pnytter.find_user("jack")
pprint.pp(user.dict())
# {'id': 12,
#  'username': 'jack',
#  'fullname': 'jack',
#  'biography': '#bitcoin',
#  'verified': True,
#  'joined_datetime': datetime.datetime(2006, 3, 21, 20, 50, tzinfo=datetime.timezone.utc),
#  'stats': {'tweets': 28602,
#            'following': 4573,
#            'followers': 6419102,
#            'likes': 35210},
#  'pictures': {'profile': {'twitter_url': HttpUrl('https://pbs.twimg.com/profile_images/1115644092329758721/AFjOr-K8.jpg', scheme='https', host='pbs.twimg.com', tld='com', host_type='domain', port='443', path='/profile_images/1115644092329758721/AFjOr-K8.jpg'),
#                           'nitter_path': '/pic/pbs.twimg.com%2Fprofile_images%2F1115644092329758721%2FAFjOr-K8.jpg'},
#               'banner': {'twitter_url': HttpUrl('https://pbs.twimg.com/profile_banners/12/1584998840/1500x500', scheme='https', host='pbs.twimg.com', tld='com', host_type='domain', port='443', path='/profile_banners/12/1584998840/1500x500'),
#                          'nitter_path': '/pic/https%3A%2F%2Fpbs.twimg.com%2Fprofile_banners%2F12%2F1584998840%2F1500x500'}}}


# Find user tweets during a date range
tweets = pnytter.get_user_tweets_list("year_progress", filter_from="2022-06-01", filter_to="2022-06-25")
pprint.pp(tweets)
# [TwitterTweet(tweet_id=1539246778041745409, author='year_progress', created_on=datetime.datetime(2022, 6, 21, 14, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓▓░░░░░░░░ 47%', stats=Stats(comments=29, retweets=1066, quotes=113, likes=5497)),
#  TwitterTweet(tweet_id=1537918020118491136, author='year_progress', created_on=datetime.datetime(2022, 6, 17, 22, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓▓░░░░░░░░ 46%', stats=Stats(comments=26, retweets=984, quotes=102, likes=5866)),
#  TwitterTweet(tweet_id=1536589258370297856, author='year_progress', created_on=datetime.datetime(2022, 6, 14, 6, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓▓░░░░░░░░ 45%', stats=Stats(comments=40, retweets=1490, quotes=144, likes=7543)),
#  TwitterTweet(tweet_id=1535275600482816000, author='year_progress', created_on=datetime.datetime(2022, 6, 10, 15, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓▓░░░░░░░░ 44%', stats=Stats(comments=21, retweets=937, quotes=95, likes=5879)),
#  TwitterTweet(tweet_id=1533946844497199104, author='year_progress', created_on=datetime.datetime(2022, 6, 6, 23, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓░░░░░░░░░ 43%', stats=Stats(comments=42, retweets=1090, quotes=121, likes=7327)),
# TwitterTweet(tweet_id=1532633192020205570, author='year_progress', created_on=datetime.datetime(2022, 6, 3, 8, 0, tzinfo=datetime.timezone.utc), text='▓▓▓▓▓▓░░░░░░░░░ 42%', stats=Stats(comments=31, retweets=1152, quotes=165, likes=7021))]



# Find single tweet
tweet = pnytter.get_tweet(1539246778041745409)
pprint.pp(tweet.dict())
# {'tweet_id': 1539246778041745409,
#  'author': 'year_progress',
#  'created_on': datetime.datetime(2022, 6, 21, 14, 0, tzinfo=datetime.timezone.utc),
#  'text': '▓▓▓▓▓▓▓░░░░░░░░ 47%',
#  'stats': {'comments': 29, 'retweets': 1066, 'quotes': 113, 'likes': 5497}}

Known issues

Unfixable

  • Certain tweets are not available on certain regions due to legal reasons. Pnytter method get_tweet allows forcing the query of all available Nitter instances until available in one of them.
  • Instances running certain versions of Nitter may not be compatible with the current Pnytter codebase. It is recommended to run/use a Nitter instance using an updated version, or a version according the release date of a targetted Pnytter version.

Changelog

Versions 0.y.z are expected to be unstable, and the API may change on Minor (y) releases.

  • 0.2.1
    • Fix error when a profile does not have a picture and/or banner. Breaking: TwitterProfile.id & TwitterProfile.Pictures.* fields are optional, when the profile lacks picture/banner.
    • Fix error when a profile does not have a biography.
  • 0.1.1
    • Get tweet stats (count of comments, retweets, quotes, likes)
    • Allow configuring Nitter instances after Pnytter initialization
  • 0.0.1
    • Initial release:
      • Get profile by username: id, username, fullname, biography, verified, when joined, stats (count of tweets, following, followers, likes), pictures (profile, banner)
      • Get profile tweets in date range (tweet id, author, when posted, text)
      • Get single tweet

pnytter's People

Contributors

david-lor avatar pluja avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

pluja hasselmm

pnytter's Issues

Crash when no picture is available

A profile without a Banner picture or a Profile picture will crash. It should return something like "False" or "None" if no image is found.

The issue is in TwitterURL.from_nitter_path. When this function receives a “None” value (NoneType), meaning that (for example) the self.soup.find("div", class_="profile-banner").find("a").get("href") in line 81 from nitter_parser/profile.py has found nothing, it crashes.

Some users do not return any tweets

I don't really know what can be causing it, I am trying to figure it out. But I have found that some users do not show any tweets when scraped by Pnytter, it just returns an empty array. But on Nitter you can see all of their tweets.

Here is an example account I found, and I am facing the problem: thomasfred2584

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.