GithubHelp home page GithubHelp logo

jalexw / pdf-ripper Goto Github PK

View Code? Open in Web Editor NEW
24.0 1.0 1.0 28.03 MB

Use this code to get a PDF from a textbook viewer by automatically looping through your textbook and taking a screenshot of each page.

Home Page: https://jalexw.github.io/pdf-ripper/

License: GNU General Public License v3.0

Python 98.17% Shell 1.83%
pdf pyautogui python

pdf-ripper's Introduction

PDF Ripper

Screenshot of PDF Ripper GUI

Use this code to rip a PDF from a desktop textbook app by automatically looping through your textbook, taking a screenshot of each page, and combining the screenshots into a PDF. This project was created to allow annotating a textbook within an iPad note-taking app.

Installation (for non-techy people)

Standalone ready-to-run executable files. Just download and run. No knowledge of Python required.

Extra Steps on Mac

You need to grant the program permission to record your screen and control your mouse/keyboard. Find these settings at:

  • System Preferences -> Security & Privacy -> Accessibility
  • System Preferences -> Security & Privacy -> Screen Recording You will likely need to try running the application before PDF-Ripper shows up under these settings.

Installation (from source code -- for techy people)

1. Install Python3

Check if you already have it installed or if you installed it successfully with the python3 --version command in your Terminal / Command Prompt. Install at least version 3.10.

2. Download this repository

Download this repository (i.e. git clone https://github.com/jalexw/pdf-ripper.git). Or hit the big green button and download as a ZIP archive.

3. Install dependencies

Follow either approach 3a or 3b. I prefer the pipenv approach, but the pip approach is likely easier as it should come bundled with your Python and not require

3a. Install pipenv and run the virtual environment defined in the Pipfile

I installed pipenv using Homebrew on Mac with the command brew install pipenv

Set the repository to be your active directory (i.e. cd pdf-ripper), and run pipenv install to download all required packages defined in the Pipfile. If installing dev dependencies (such as pyinstaller for bundling the app as a standalone executable) use the --dev flag.

3b. Install dependencies with pip

Run pip install -r Requirements.txt

If the command pip isn't found, try replacing with something like: pip3, python3 -m pip

4. Running the program

After installation, use pipenv run start to start the application. Use pipenv run dev for additional logging if you're trying to add a feature or troubleshoot.

If you're using pip instead of pipenv, run:

python3 PDF-Ripper.py

Alternatively, with additional logging:

python3 PDF-Ripper.py --dev

Known Errors

  • If you get an error on Mac related to tkinter, see this StackOverflow post. You may need to use Homebrew to install python-tk to fix the problem on Mac, brew install python-tk.
  • On Windows you may need to set up an alias for python3 to use python. Alternatively, change the scripts in the Pipfile to use python instead of python3.

Using the GUI

1. Open your textbook desktop/web application

  • Open your textbook viewing app to the textbook you would like to extract/rip.

2. Resize your browser window to get the screenshots as big as you can

  • Note: Screenshots must be taken on your primary monitor. However, it is convenient to have a second monitor where the GUI doesn't block the textbook. Otherwise, minimize the window after starting the rip and selecting the screenshot area.
  • Resize the textbook to be as large as possible.
  • Play around with zooming in/out.
  • You can open your browser's developer tools menu (Usually 'Command + Option + I' on Mac) to resize the textbook page better. Some textbook providers don't allow you to right click such that you can open Developer Tools; however, you can do this from the menu in the top right corner of Google Chrome (or with 'Command + Option + I' on Mac).

3. Start PDF Ripper Application

4. Input settings for PDF ripping

  • Enter the start/finish page here.
  • Enter whether coordinates of screenshot area should be doubled (see this issue for more details). It seems like some computers need to have the coordinates doubled, while others don't. Try both and see what works for you.
  • Choose an output directory to save the PDF in.
  • Choose how long to wait after going to a new page to take a screenshot. If you have slow internet you might want to give the app time to load the page so your PDF doesn't end up with loading screens in it
  • Select the region on your screen that should be screenshot. You do this by hovering your mouse over the top left corner and the bottom right corner of the area that should be screenshot after pushing the respective button.
  • Finally, hover your mouse over the page selection menu at the bottom of the page to tell the script where it should type in new page numbers to go to the next page.

5. The script takes control of your device

  • Hit 'Start Ripping'
  • On Mac, a pop-up should ask you to allow the script to record your screen and use accessibility services ( to control keyboard/mouse). You may need to restart the program after giving it permission, so do a test run first before actually trying to rip a textbook.
  • The script will take control of your mouse and keyboard while it loops through the pages and takes screenshots for the PDF! Don't touch anything while it works its magic!
  • Screenshots are combined into a PDF file at your chosen output location

To-do

  • Create GUI from original PDF ripper script
  • Create standalone executables
  • Test on Linux
  • Make text in PDF screen reader friendly
  • Auto-crop page after screenshot to allow extracting textbooks with different sized pages

Contributing

Tested on Mac and Windows. Contributors welcome! Feel free to make a PR :)

pdf-ripper's People

Contributors

jalexw avatar romonwafa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

aacamilleri

pdf-ripper's Issues

Only SS the first page

Hi,
I want to first say thanks for making the program and you are doing us a huge favor by making it. I'm not really a techy person so I'm not sure why but the pdf ripper program is only ripping the first page and it looks like this.
image
I was wondering if you knew why this was occuring.

Different Page Sizes

Hi, firstly i wld like to thank you for coming up with this code, it is currently running on my mac.
However, i have an issue whereby my bibliu book has different sized page... i dont think it will turn out as expected.. do you have any solution for this? Appreciate it!

suggest ripper with a longer tipper timer

There have been several instances where the automation rips a page that has yet to be loaded, as the reader needs to load in the next batch of sections, even though a local copy exists in the system. The issue doesn't occur if the user is manually ripping by sections as loaded in the reader, but the issue occurs when the user is ripping from the first to the last page where pages would exceed a few hundred pages.

image

The GUI only saves the first page.

Hi, I am having some issue with the GUI, I don't know if I'm reading something wrong but it only copies the first page of the book throughout the entire pdf (the first page of my textbook 400 times).

error in setup.py

hope I am wrong here, but there's problem with setup.py at line 27
the n_pages has to be str(n_pages) in order to run it without bug
just a complete newbie here so correct me if I am wrong

Using dev tools to take screenshots

I'm not sure if this is possible, though if it is, I have a suggestion. Im using this to screenshot my pages on BibliU. They make it very hard to accurately resize the page so that the screenshot quality is good so when I use this app it always comes out fuzzy. Not the apps fault, but I have a suggestion. I managed to use the built in screenshot function on dev tools in order to screenshot the pages. This makes it very high quality and good enough for a PDF.

I don't know Python so I don't know if this is possible, though if an option could be added to utilize this built in screenshot function, then this app would be able to get around issues like BibliU preventing you from adequately resizing pages. It'd also ensure very high quality screenshots even on smaller screens.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.