GithubHelp home page GithubHelp logo

pdf-to-rar-compress's Introduction

Problem

Storing large PDFs (over 2GB) wastes server space and causes problems when downloading with Google Chrome.

Solution

The most efficient way to save server space is to compress PDF to RAR.

RAR compression provides nearly 20x PDF compression and is currently the best compression algorithm for pdf and can reduce file size tenfold (comparison chart - zip, 7z, rar)

On a Linux server, this can be done by creating a watchdog script in Python 3 and the patool package.

General idea of the script

  1. When the pdf file appears and ready to work, archiving in rar will start.

Pay attention to the phrase "the file is ready to work": a file that is still in the process of being written to disk cannot be called "ready to work". You need to wait, the file will be completely written to the disk, and only then you can work with it (otherwise the broken file will be archived, which then cannot be read).

  1. After the successful creation of the rar archive, a text file will be created, which will be a kind of marker
  2. signaling to any external system that the archive is successfully ready.

For example, if the external system is Oracle, and you want to write the RAR file into database field. Here it is important to track the moment when the file is completely ready and formed for further actions. For example, it may turn out that the file is not yet fully copied to the directory. To do this, Linux has several file-specific events.

We need the following Linux file system events:

  • IN_CREATE
  • CLOSE_WRITE
  • MOVED_TO
  • MOVED_FROM
  • IN_DELETE
  • IN_DELETE_SELF

Requirements

Pyinotify is a Python module for monitoring filesystems changes. Pyinotify relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify. inotify is an event-driven notifier, its notifications are exported from kernel space to user space through three system calls. pyinotify binds these system calls and provides an implementation on top of them offering a generic and abstract way to manipulate those functionalities.

Follow the official documentation to install pyinotify.

Patool is a library for creating, extracting, testing archives, including in the RAR format.

How to install patool is described here.

Intallation and Running

  1. Clone or download repository
  2. Put the pdf_watchdog.py in any directory you want
  3. Run the script from Terminal

More info

pdf-to-rar-compress's People

Contributors

alexanderkhudoev avatar

Stargazers

H Sami Adnan avatar André Kruger avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.