GithubHelp home page GithubHelp logo

rbrito / pkg-pdfbeads Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 1.0 78 KB

Debian packaging of pdfbeads

License: GNU General Public License v2.0

Ruby 100.00%
pdf pdf-generation pdf-converter scanning scanned-documents scanned-image-pdfs

pkg-pdfbeads's Introduction

PDFBeads -- convert scanned images to a single PDF file
Version 1.0 (November 2010)

Copyright (C) 2010 Alexey Kryukov ([email protected]).
All rights reserved.

PDFBeads is a small utility written in Ruby which takes scanned
page images and converts them into a single PDF file. Unlike other
PDF creation tools, PDFBeads attempts to implement the approach
typically used for DjVu books. Its key feature is separating scanned
text (typically black, but indexed images with a small number of
colors are also accepted) from halftone pictures. Each type of
graphical data is encoded into its own layer with a specific
compression method and resolution.

The name `PDFBeads' has been selected for the package because
building PDF files from separate image is comparable to threading
beads on a string. It also seems to be a good choice for a Ruby
application.

Here's a few operations you can perform with PDFBeads:

* encode B&W images using either CCITT Group 4 Fax or JBIG2
  compression method (you'll need Adam Langley's jbig2 utility,
  available at https://github.com/agl/jbig2enc/ , for JBIG2
  compression);

* combine halftone or indexed pictures with previously binarized
  text pages, placing them into the background layer. Various
  compression methods of background images (JPEG2000, JPEG or
  PNG-styled deflate compression) are supported;

* split mixed images where binarized text is combined with color
  or grayscale pictures (such pages may be produced with ScanTailor --
  an interactive post-processing tool for scanned page, available
  at http://scantailor.sourceforge.net) and encode each layer
  separately;

* correctly process indexed images with a limited number of colors,
  encoding each color separately into the foreground layer;

* split color images into background and foreground layers (similar
  to BG44 and FG44 chunks in a DjVu file) according to a given mask;

* create PDF files with TOC and metadata;

* read text from hOCR files and create a hidden text layer in the PDF
  file.

Note that PDFBeads is intended for creating PDF files from previously
processed images, and so it can't done some operations (e. g. converting
color or grayscale scans to B&W) which should be typically performed with
a special scan processing application, such as ScanTailor.

PDFBeads requires RMagick (the Ruby bindings for the popular Magick++ image
processing library). The hpricot extension is not required, but highly
recommended, as without it PDFBeads would not be able to read data from hOCR
files.

pkg-pdfbeads's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

moseslockhart

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.