boarpig is a toolkit that helps you produce e-texts from pdf/djvu files containing images.
Some features include:
- image extraction and renumbering
- ocr
- browser-based proofreading system
boarpig requires some kind of Linux distribution. It is known to work under WSL on Windows 10.
Steps:
- Install
djvulibre-bin
,mupdf-tools
andtesseract-ocr
- Install Deno (
curl -fsSL https://deno.land/x/install/install.sh | sh
) - Clone this repository
- Install boarpig using
deno install -fA src/boarpig.ts
. Do this every time you update the repository so that the latest changes are compiled. - See
boarpig --help
for more information. - HOWTO contains a complete step-by-step example of converting a djvu/pdf file into a proofread etext.
boarpig is licensed under GNU AFFERO GENERAL PUBLIC LICENSE, Version 3.0 (AGPL).
boarpig is named after the chapter from Saki's Beasts and Super-beasts.