GithubHelp home page GithubHelp logo

djlight / pdfgrabber Goto Github PK

View Code? Open in Web Editor NEW

This project forked from felixfrog/pdfgrabber

0.0 0.0 0.0 166 KB

Download PDF books from bSmart, Pearson, Booktab/Kitaboo, Scuolabook, Oxford, and many more!

Python 100.00%

pdfgrabber's Introduction

What is this?

This vendor-agnostic script is used to download pdfs (and covers) from different services.

State of features

Every downloader has different features that might also be implemented in the future. As of now, there is a way to download best-quality pdf from every one of them, with varying degree of quality, speed and discretion. Here's a table riassuming all the features.

service pdf download table of contents pdf size max logins page labels login expire cover rendered
Scuolabook perfect yes (very small) excellent very restrictive yes never yes no
MyLim perfect excellent excellent no not yet/depends on vendor ? yes no
Pearson eText perfect (depends on vendor)/excellent big no no/depends on vendor very fast (30 min) yes sometimes
Pearson Reader+ yes good/very good big no yes very fast (30 min) yes no
bSmart yes yes (very small) very big (100+ mb) no yes ? yes no
Mondadori HUB Scuola yes yes very big (100+ mb) no yes (disable because glitches) ? yes no
MEE2 yes yes (very small) good/excellent no/1token4ever no never yes (?) no
easyeschool yes yes (very small) excellent no/1token4ever no never yes no
Zanichelli Booktab yes yes good/average yes yes ? yes no
Zanichelli Kitaboo yes yes average/big yes yes ? yes yes
Oxford Learner’s Bookshelf yes yes (small) very big ? not yet/no ? yes no
Laterza diBooK yes yes/(depends on vendor) excellent ? not yet/no ? yes no
Raffaello Player yes yes (incomplete bc no samples :-( very big ? yes ? yes no

Apps that I am aware of but I can't work with beacuse I don't have books:

  • Raffaello player Done
  • Appbook (might be a shitty html webview like booktab)

TODO

  • Add anonymous user
  • Add ability to pass options to scripts
  • Add page labels and "perfect" token checks
  • General code quality improvement (better management of exceptions)
  • Use pathlib for better windows compatibility

Installation

You need python 3.10+. To install all the requirements run python3 -m pip install -r requirements.txt. Also the script has been tested only on mac os, open an issue/pull request if you have tried it on linux/windows or you have problems.

Quirks

Kitaboo/Reader+ books

For kitaboo and RPLUS_EPUB books the script uses playwright (chromium automation) to render the html pages. You might want to issue playwright install chromium to download chroium. No forther setup is needed.

Scuolabook

Scuolabook has a very strict login system, where you can have only 2 devices logged in and you only have 2 deletions per year. This means that you can only log in 4 times every year, with no way of downloading books (at least, the pdf version) if you have hit this limit. pdfgrabber should save the token for you, but you should also keep it somewhere safer such as a text document.

Pearson

Pearon has a terrible double-service combo, eText and Reader+. Their login system is a gigantic mess, so the tokens expire very quickly (the ability to refresh them still needs to be implemented). This isn't too big of a problem in practice, since the you have unlimited logins. If your login fails even though you are sure your username and password are right, then it means that you need to accept the terms & privacy policy for one of the two services. Just download the Reader+ app and log in once. It will ask to accept the terms & conditions. If the download of an eText book gets stuck in loop where it infinitely tries to download, then that is a problem with too strict caching policies from pearson's servers. It is problematic even for the official app, where it just spits out an error. Just try again later ;-) Both RPLUS_PDF and RPLUS_EPUB books are stored in a password protected zip file. Python's ZipFile module doesn't do the decryption natively, so it might take a (very) long time for some big books in the "Extracting zip" phase. If you have found a decent workaround let me know.

How to use it

Just run python3 main.py. You first need to create an account by selecting r in the first menu. After you created an account, select d and the menus will guide you. The output file will be files/<service>/<id>.pdf

Disclaimer

This script is provided "as is", without any type of warranty. I am not responsible of any harm or nuclear war that this may cause.

pdfgrabber's People

Contributors

felixfrog avatar bbalduzz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.