GithubHelp home page GithubHelp logo

captn3m0 / pystitcher Goto Github PK

View Code? Open in Web Editor NEW
389.0 6.0 10.0 164 KB

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Home Page: https://pypi.org/project/pystitcher/

License: MIT License

Python 100.00%
python-pdf pdfmerge pdf-generation pdf-bookmarker

pystitcher's Introduction

pystitcher

PyPI Version Repository License GitHub branch checks status Codecov

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a markdown file. It is written in pure python and uses PyPDF3 for reading and writing PDF files.

Installation

You can install it easily using pipx:

pipx install pystitcher

The Wiki has Alternative Installation Instructions.

Description

pystitcher is a command line tool, with very few cli options:

usage: pystitcher [-h] [--version] [-v] [--cleanup | --no-cleanup] spine.md output.pdf

Stitch PDF files together

positional arguments:
  spine.md              Input markdown file
  output.pdf            Output PDF file

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         log more things
  --cleanup, --no-cleanup
                        Delete temporary files (default: True)

Given this input:

existing_bookmarks: remove
title: Complete Guide to the Personal Data Protection Bill
author: Medianama
keywords: privacy, surveillance, personal data protection
subject: Personal Data Protection Bill
# A Complete Guide to the Personal Data Protection Bill

- [Cover](cover.pdf)

# The Bills

- [Personal Data Protection Bill, 2019](https://example.com/2019-bill.pdf)
- [Personal Data Protection Bill, 2018](https://example.com/2018-bill.pdf)

# Other key reading material

- [Srikrishna Committee Report](2.a.pdf)
- [Dvara Research's Personal Data Protection Bill](2.b.pdf)
- [MP Shashi Tharoor's Data Protection Bill](2.c.pdf)
- [MP Jay Panda's Data Protection Bill](2.d.pdf)
- [SaveOurPrivacy.in bill](2.e.pdf)
- [TRAI recommendations on privacy](2.f1.pdf)
- [Comments on TRAI recommendations on privacy](2.f2.pdf)

Will generate a PDF with proper bookmarks:

https://i.imgur.com/qPVpZGt.png

And the correct metadata:

Title:          Complete Guide to the Personal Data Protection Bill
Subject:        Personal Data Protection Bill
Keywords:       privacy, surveillance, personal data protection
Author:         Medianama
Creator:        pystitcher/1.0.0
Producer:       pystitcher/1.0.0

Configuration options can be specified with Meta data at the top of the file.

Option Notes
fit Default fit of the bookmark. Can be overwritten per bookmark See wiki for more details.
author PDF Author
keywords PDF Keywords
subject PDF Subject
title PDF Title. If left unspecified, first Heading (h1) in the document is used.
existing_bookmarks What to do with existing bookmarks in individual files. Options are keep, flatten, and remove. See docs for more details.

Additionally, PDF links specified in markdown can have attributes to alter the PDFs before merging. The below attribute will rotate the second PDF file by 90 degrees clockwise before merging:

[Part 1](1.pdf)
[Part 2](2.pdf){: rotate="90"}

And the below attribute will merge only pages 2 to 5, both inclusive, from the second PDF file:

[Part 1](1.pdf)
[Part 2](2.pdf){: start=2 end=5}

The list of available attributes are:

Attribute Notes
rotate Rotate the PDF. Valid values are 90, 180, 270
start Start page number for PDF page selection
end End page number for PDF page selection

Documentation

Additional documentation is maintained on the project wiki on GitHub.

pystitcher's People

Contributors

captn3m0 avatar vonter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pystitcher's Issues

Render Markdown inline

Within the markdown, provide a way to declare pages that get rendered as stand-alone pages as well.

<!-- This only goes in bookmark-->
# Cover

![Cover](cover.pdf)

# Colophon

```
# Hobbit
## There and back again
## By JRR Tolkein
```{: inline=1}

![Foreword](foreword.pdf)

Renders a cover, a single page with the 3 lines as above, and then the foreword. So the colophon ends up linking to the middle text section

Automated functional tests

A couple of automated functional tests can be added for the existing markdown inputs in the tests/ directory.

Fix current working directory hack

Currently, we switch our CWD to the markdown file directory, and don't reset it back. Playing around with chdir is bad and causes issues.

Fix this to instead use paths relative from the markdown file directory.

Add PDF page filter

[Text](file.pdf){: start=2 end=5} to only select specific pages from the PDF

Installation instructions: please update the readme front page

Thanks for your work providing this tool. I ended up here looking for an alternative to python stapler.

There's a lot of good and important stuff on the README.

Please add one or two lines, at the top of the README with the most important stuff.
How to install it.

It might be obvious to you, a python developer, but not for a potential end user.
Is it using "pip install xyz" ? Will it work with pipx ? Are there any "official" packages for Linux distro xyz ?

Thanks in advance.

Auto Page numbering support

Want to be able to add page numbers to the generated PDF with font configuration. Use case - Printouts.
Once this is done, easy to add a Table of Contents too. #6

I was exploring and found this as one option: https://github.com/vlad-anisov/numbering2pdf/blob/main/numbering2pdf/numbering2pdf.py It uses reportlab to generate empty numbered PDFs and merges those pages with the existing pages one by one.

Happy to work on this issue, if you suggest a preferred method (given your research) to implement this.

Python 3.9 required?

Thanks for the great code, it worked well for me putting books back together from chapters in Elsevier.
The only issue I had was that it required using Python 3.9 at a minimum.
I initially had an error under Python 3.7, complaining about line 56 of skeleton.py with reference to argparse.BooleanOptionalAction ; I lost the actual error message e.g see here for related.

Add PDF rotation filter

Change PDF links to [Bookmark Title](file.pdf){: rotate="90"} and rotate them accordingly.

Local HTML -> PDF rendering

# Title

- [chapter 1](chapter1.html)
- [chapter 2](chapter2.html)

Convert the HTML to PDFs and merge accordingly.

Fetch HTML online and render

# Title

- [chapter 1](https://example.com/chapter1.html)
- [chapter 2](https://example.com/chapter2.html)

Download the source HTML, run it through readability, then render as PDF and merge accordingly.

Support external URLs to fetch PDF

# Title

- [chapter 1](https://example.com/chapter1.pdf)
- [chapter 2](https://example.com/chapter2.pdf)

Download the PDFs, cache them and merge accordingly.

Add scaling support

It would be nice if there was a way to scale documents to the same page size.

Specify Zoom level for links in markdown

[Personal Data Protection Bill, 2019](1.a.pdf){: zoom=FitWidth}

Other options:

    Inherit - Inherit zoom
    FitPage - Fit page width+height
    FitWidth - Fit page width
    FitHeight - Fit page height
    ##% - Zoom to ##% eg 50% = 50% zoom

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.