GithubHelp home page GithubHelp logo

jameskang410 / scraping-netflix Goto Github PK

View Code? Open in Web Editor NEW
89.0 8.0 25.0 795 KB

A Python API that scrapes movie information from Netflix. A nice substitute for their now-privatized API. Used by http://www.tomatoflix.com

Python 100.00%

scraping-netflix's Introduction

Scraping Netflix - The Unofficial Way!

A Python class that scrapes information about Netflix movies that are available for streaming. Used by [www.tomatoflix.com][1]

Warning: Netflix's APIs have changed and this package has not yet been updated to follow those changes.

Why?

This project started because I wanted to create [Tomatoflix][1], an interactive website that helps lazy people like me find random Netflix movies to watch. I was surprised to find out that Netflix privatized their API. I took matters into my own hands and decided to forge a Netflix API of my own.

Requirements

  • Python 3
  • Modules:
    • BeautifulSoup
    • Requests
    • Fuzzywuzzy (Not very impressed with this one... Open to fuzzy matching alternatives. But it'll do for now.)

Installation

Git clone this to your local computer and it should be good to go. Currently working on making this installable via Pip.

Instructions

from netflix import *

# Insert netflix ID as a raw string
# To find Netflix ID:
# Sign into Netflix > Chrome Developer Tools > Resources > Cookies > www.netflix.com > NetflixId
netflix_id = r'INSERT NETFLIX ID HERE'

movie = Netflix(netflix_id)

# Initialization only has to be done once.
# This method creates jsons for all of the major genres that will be used to pull data from
movie.initialize()

"""
>Genres were successfully downloaded as JSON files
"""

# search() looks to see if the movie is available on Netflix streaming.
# other methods are chained to search() and returns specific information about the movie.
movie.search('Jerry Maguire').duration()
movie.search('Jerry Maguire').netflix_rating()

"""
>Movie was found
>2hr 18m
>3.6 stars
"""

Check out the example.py file. E-mail any specific questions to [email protected]

All Available Functions

[1]: http://www.tomatoflix.com
__Functions__ __Return Data Type__ __Description__
initialize(_netflix\_id\_as\_string_) None Creates a JSON file for each movie, organized by genre. This method has to be run __only once__ and __should not be run after the JSON files have been pulled successfully__. This will minimize your chances of getting "caught" by Netflix (as if they don't know what we're up to...).
all\_titles() List Returns a list of every title that's available for streaming on Netflix. Loop through this list to get information about every movie.
search(_movie\_string_) None Checks if the string is a movie that is currently available on Netflix. Will return one of the following messages: ```Movie was found``` or ```Movie could not be found. Did you mean any of the following movies?```. If movie is not found, a list of movies that paritially matched the search string will be printed to the console. __In order to find specific information about a movie, the algorithm must find a movie match.__
movie\_number() Int Returns the Netflix movie ID number
genres() List Returns a list of genres the movie belongs to on Netflix
title() String Returns the title of the movie
tv\_show() String Returns a _"Y"_ if the movie is considered a TV show. Returns a _"N"_ if it is only a movie.
synopsis() String Returns the synopsis for the movie.
year() Int Returns the year the movie was made. NOTE: This year does not always match the year listed on other movie websites like Rotten Tomatoes.
netflix\_rating() String Returns the average Netflix rating for the movie.
cert\_rating() String Returns the maturity rating for the movie.
actors\_list() List Returns a list of the prominent actors in the movie.
actors\_string() String Returns a string of the prominent actors in the movie.
url() String Returns the non Netflix member friendly URL for the movie.
duration() String Returns the duration (hours and minutes or number of seasons) of the movie or TV show.
box\_art() String Returns the URL for the small box art of the movie.
large\_box\_art() String Returns the URL for the large box art of the movie. NOTE: Because of the different layout of Netflix movie pages, this method does not always work.

scraping-netflix's People

Contributors

ericandrewlewis avatar jameskang410 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scraping-netflix's Issues

No JSON returned

It seems that the 8 alphanumeric character hex string "c88e2062" changes over time (possibly as a commit or release code?), so the JSON GETs no longer work in the initialize() method. 404 errors are returned by the HTTP requests. How did you determine the URL to hit for getting the JSON?

I did only test in-browser and not via the Python program, though.

Tomatoflix no longer up

Tomatoflix, as mentioned in the README, no longer seems to be up (it's now a pregnancy food search tool - and given it says copyright 2015, congratulations on your son/daughter who would have been born by now!). Are you still using this Netflix scraper anywhere online? Does it still work?

AJAX URL Changed

This looks awesome by the way. Thanks for making it.
I tried running it and am getting this error:
Error: Was not able to download JSON data. AJAX URLs may have changed or Netflix ID may be incorrect.
I tried both the NetflixId cookie and the SecureNetflixId cookie, and neither seemed to work. I opened up netflix and clicked on browse and then selected anime, and the Chrome Network tab had this as the URL:
https://www.netflix.com/api/shakti/113b89c9/pathEvaluator?withSize=true&materialize=true&model=harris which is similar but not that similar to the URL in the code:
http://www.netflix.com/api/shakti/c88e2062/wigenre?genreId=%s&full=false&from=0&to=10000

Is this something you are willing to fix?

Issues with search(), and netflix_json

Hi,

Thanks a lot for this project, I could work around and got what I needed.
However a few features doesn't work out. (or I didn't understand how it works out)

  1. I don't seem to figure out why I cannot get Netflix.search() to work. I get an error message and a list of suggestions. If I input as str one of the search suggestions... still doesn't work and prompt another suggestion list.
  2. I got to work only Netflix.initialize() and Netflix.all_titles() (and that was sufficient for what I had to do).
  3. Other methods like Netflix.titles() or Netflix.tv_show() or Netflix.duration() doesn't work out. Error message says self.netflix_json is not defined. And from what I read in your code, the rest seems to be pretty much the same error message.

Hope you could help.

In the other hand I see the latest commit is about 5 years old. If we'd like to take over the repares and solve it out, or complete the project... What is your Licence policy? Do you authorize to just fork and finish it?

Question

I saw the domain is blank. Did this shutdown? Did Netflix issue a C&D?

Fan

You have no email so this is my only way of contacting you. I wrote a scraper similar to this......I din't make an api out of it but it did the trick. It worked right up until netflix started using react text on there site. I'm really curious to figure out how you got around this. Because I've been trying to figure it out for awhile. I've thought about using browser automation. but that feels like over kill.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.