GithubHelp home page GithubHelp logo

kiwilan / php-ebook Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 2.0 54.96 MB

PHP package to read metadata and extract covers from eBooks, comics and audiobooks.

License: MIT License

PHP 100.00%
book ebook epub php cba cbz epub2 epub3 cb7 cbr

php-ebook's Introduction

PHP eBook

Banner with eReader picture in background and PHP eBook title

php version downloads license tests codecov

PHP package to read metadata and extract covers from eBooks, comics and audiobooks.

  • eBooks: .epub, .pdf, .azw, .azw3, .kf8, .kfx, .mobi, .prc, .fb2
  • Comics: .cbz, .cbr, .cb7, .cbt (metadata from github.com/anansi-project)
  • Audiobooks: .mp3, .m4a, .m4b, .flac, .ogg with external packagekiwilan/php-audio (MUST be installed separately)

To know more see Supported formats. Supports Linux, macOS and Windows.

Note

This package favors eBooks in open formats such as .epub (from IDPF) or .cbz (from CBA) and which be parsed with native PHP, so for the best possible experience we recommend converting the eBooks you use. If you want to know more about eBook ecosystem, you can read documentation.

Warning

For DRM (Digital Rights Management) eBooks, in some cases you could read metadata but not contents (like HTML files for EPUB). To use all features, you have to use a software to remove DRM before using this package. For EPUB, you can use calibre with DeDRM plugin, this guide can help you.

About

This package was built for bookshelves-project/bookshelves, a web app to handle eBooks.

Requirements

  • PHP version >=8.1
  • PHP extensions:
    • zip (native, optional) for .EPUB, .CBZ
    • phar (native, optional) for .CBT
    • rar (optional) for .CBR (p7zip binary can be used instead)
    • imagick (optional) for .PDF cover
    • intl (native, optional) for Transliterator for better slugify
    • fileinfo (native, optional) for better detection of file type
  • Binaries
    • p7zip (optional) binarys for .CB7 (can handle .CBR too)
  • Audiobooks
  • To know more about requirements, see Supported formats

Note

You have to install requirements only if you want to read metadata for these formats, e.g. if you want to read metadata from .cbr files, you have to install rar PHP extension or p7zip binary. So all requirements for PHP extensions and binaries are optional.

Warning

Archives are handle with kiwilan/php-archive, for some formats (.cbr and .cb7) rar PHP extension or p7zip binary could be necessary. Some guides to install these requirements are available on kiwilan/php-archive.

Features

  • Support multiple formats, see Supported formats
  • πŸ”Ž Read metadata from eBooks, comics, and audiobooks
  • πŸ–ΌοΈ Extract covers from eBooks, comics, and audiobooks
  • 🎡 Works with audiobooks if kiwilan/php-audio is installed
  • πŸ“š Support metadata
    • eBooks: EPUB v2 and v3 from IDPF with calibre:series from Calibre | MOBI from Mobipocket (and derivatives) | FB2 from FictionBook
    • Comics: CBAM (Comic Book Archive Metadata) : ComicInfo.xml format from ComicRack and maintained by anansi-project
    • PDF with smalot/pdfparser
    • Audiobooks: ID3, vorbis and flac tags with kiwilan/php-audio (not included)
  • πŸ”– Chapters extraction (EPUB only)
  • πŸ“¦ EPUB and CBZ creation supported
  • Works perfectly with kiwilan/php-opds: PHP package to generate OPDS feeds (not included)

Roadmap

  • Better .epub creation support
  • Add .epub metadata update support
  • Add better handling of MOBI files: libmobi and ebook-convert from Calibre (fallback is available)
  • Add support of ebook-convert from Calibre
  • Add suport for DJVU: djvulibre
  • Support FB2 archive

Installation

You can install the package via composer:

composer require kiwilan/php-ebook

Usage

With eBook files or audiobook* files (to know more about formats, see Supported formats).

*: should be installed separately, see Requirements.

use Kiwilan\Ebook\Ebook;

$ebook = Ebook::read('path/to/ebook.epub');

$ebook->getPath(); // string => path to ebook
$ebook->getFilename(); // string => filename of ebook
$ebook->getExtension(); // string => extension of ebook
$ebook->getTitle(); // string
$ebook->getAuthors(); // BookAuthor[] (`name`: string, `role`: string)
$ebook->getAuthorMain(); // ?BookAuthor => First BookAuthor (`name`: string, `role`: string)
$ebook->getDescription(); // ?string
$ebook->getDescriptionHtml(); // ?string
$ebook->getCopyright(); // ?string
$ebook->getPublisher(); // ?string
$ebook->getIdentifiers(); // BookIdentifier[] (`value`: string, `scheme`: string)
$ebook->getPublishDate(); // ?DateTime
$ebook->getLanguage(); // ?string
$ebook->getTags(); // string[] => `subject` in EPUB, `keywords` in PDF, `genres` in CBA
$ebook->getSeries(); // ?string => `calibre:series` in EPUB, `series` in CBA
$ebook->getVolume(); // ?int => `calibre:series_index` in EPUB, `number` in CBA
$ebook->getCreatedAt(); // ?DateTime => file modified date
$ebook->getSize(); // int => file size in bytes
$ebook->getSizeHumanReadable(); // string => file size in human readable format

For pages count, you can use these methods:

$ebook->getPagesCount(); // ?int => estimated pages count (250 words by page) in `EPUB`, `pageCount` in PDF, `pageCount` in CBA
$ebook->getWordsCount(); // ?int => words count in `EPUB`

Note

For performance reasons, with EPUB, pagesCount and wordsCount are only available on demand. If you use var_dump to check eBook, these properties will be null.

Some metadata can be stored into extras() method, without typing, directly from metadata.

$ebook->getExtras(); // array<string, mixed> => additional data for book
$ebook->getExtra(string $key); // mixed => safely extract data from `extras` array

Note

For audiobooks, all metadata are stored into extras array, you will find duplicate with Ebook::class properties. See Formats specifications for more informations.

To know if eBook is valid, you can use isValid() static method, before read().

use Kiwilan\Ebook\Ebook;

$isValid = Ebook::isValid('path/to/ebook.epub');

To get additional data, you can use these methods:

$ebook->getParser(); // ?EbookParser => Parser with modules
$ebook->getMetaTitle(); // ?MetaTitle, with slug for `title` and `series`
$ebook->getFormat(); // ?EbookFormatEnum => `epub`, `pdf`, `cba`
$ebook->getCover(); // ?EbookCover => cover of book

To access to archive of eBook, you can use getArchive() method. You can find more informations about archive in kiwilan/php-archive.

$ebook->getArchive(); // ?BaseArchive => archive of book from `kiwilan/php-archive`

And to test if some data exists:

$ebook->isArchive(); // bool => `true` if `EPUB`, `CBA`
$ebook->isMobi(); // bool => `true` if Mobipocket derivatives
$ebook->isAudio(); // bool => `true` if `mp3`, `m4a`, `m4b`, `flac`, `ogg`
$ebook->hasCover(); // bool => `true` if cover exists
$ebook->hasMetadata(); // bool => `true` if metadata exists
$ebook->hasSeries(); // bool => `true` if series exists
$ebook->isBadFile(); // bool => `true` if file is not readable

Metadata

Ebook::class contains many informations but if you want to access to raw metadata, metadata() method is available.

use Kiwilan\Ebook\Ebook;

$ebook = Ebook::read('path/to/ebook.epub');

$parser = $ebook->getParser();

$parser->getModule(); // Used into parsing can be any of `EbookModule::class`

$parser->getAudiobook(); // `AudiobookModule::class`
$parser->getCba(); // `CbaModule::class`
$parser->getEpub(); // `EpubModule::class`
$parser->getFb2(); // `Fb2Module::class`
$parser->getMobi(); // `MobiModule::class`
$parser->getPdf(); // `PdfModule::class`

$parser->isAudiobook(); // bool
$parser->isCba(); // bool
$parser->isEpub(); // bool
$parser->isFb2(); // bool
$parser->isMobi(); // bool
$parser->isPdf(); // bool

MetaTitle

Can be set if book's title is not null.

use Kiwilan\Ebook\Ebook;

$ebook = Ebook::read('path/to/ebook.epub');
$metaTitle = $ebook->getMetaTitle(); // ?MetaTitle

$metaTitle->getSlug(); // string => slug title, like `lord-of-the-rings-en-01-fellowship-of-the-ring-j-r-r-tolkien-1954-epub`
$metaTitle->getSlugSimple(); // string => slug title simple, like `the-fellowship-of-the-ring`
$metaTitle->getSeriesSlug(); // ?string => slug series title, like `lord-of-the-rings-en-j-r-r-tolkien-epub`
$metaTitle->getSeriesSlugSimple(); // ?string => slug series title simple, like `the-lord-of-the-rings`

Cover

Cover can be extracted from ebook.

use Kiwilan\Ebook\Ebook;

$ebook = Ebook::read('path/to/ebook.epub');
$cover = $ebook->getCover(); // ?EbookCover

$cover->getPath(); // ?string => path to cover
$cover->getContents(bool $toBase64 = false); // ?string => content of cover, if `$toBase64` is true, return base64 encoded content

Note

Formats specifications

Audiobooks

For audiobooks, you have to install seperately kiwilan/php-audio.

Properties of Audio::class are:

Ebook Audio
title title
author artist
description description
publisher albumArtist
series album
volume trackNumber
publishDate artist
copyright year or creationDate
copyright encodingBy
tags genre
language language

You can find all metadata into getExtras() array of Ebook::class.

EPUB

With EPUB, metadata are extracted from OPF file, META-INF/container.xml files, you could access to these metatada but you can also get chapters from NCX file. And with chapters() method you can merge NCX and HTML chapters to get full book chapters with label, source and content.

use Kiwilan\Ebook\Ebook;

$ebook = Ebook::read('path/to/ebook.epub');

$epub = $ebook->getParser()?->getEpub();

$epub->getContainer(); // ?EpubContainer => {`opfPath`: ?string, `version`: ?string, `xml`: array}
$epub->getOpf(); // ?OpfItem => {`metadata`: array, `manifest`: array, `spine`: array, `guide`: array, `epubVersion`: ?int, `filename`: ?string, `dcTitle`: ?string, `dcCreators`: BookAuthor[], `dcContributors`: BookContributor[], `dcDescription`: ?string, `dcPublisher`: ?string, `dcIdentifiers`: BookIdentifier[], `dcDate`: ?DateTime, `dcSubject`: string[], `dcLanguage`: ?string, `dcRights`: array, `meta`: BookMeta[], `coverPath`: ?string, `contentFile`: string[]}
$epub->getNcx(); // ?NcxItem => {`head`: NcxItemHead[]|null, `docTitle`: ?string, `navPoints`: NcxItemNavPoint[]|null, `version`: ?string, `lang`: ?string}
$epub->getChapters(); // EpubChapter[] => {`label`: string, `source`: string, `content`: string}[]
$epub->getHtml(); // EpubHtml[] => {`filename`: string, `head`: ?string, `body`: ?string}[]
$epub->getFiles(); // string[] => all files in EPUB

Note

For performance reasons, with ncx, html and chapters are only available on demand. If you use var_dump to check metadata, these properties will be null.

Creation

You can create an EPUB or CBZ file with create() static method.

Note

Only EPUB and CBZ are supported for creation.

use Kiwilan\Ebook\Ebook;

$creator = Ebook::create('path/to/ebook.epub');

// Build manually
$creator->addFromString('mimetype', 'application/epub+zip')
    ->addFromString('META-INF/container.xml', '<?xml version="1.0" encoding="UTF-8" standalone="no" ?><container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"><rootfiles><rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/></rootfiles></container>')
    ->save();

// Build from files
$creator->addFile('mimetype', 'path/to/mimetype')
    ->addFile('META-INF/container.xml', 'path/to/container.xml')
    ->save();

// Build from directory
$creator->addDirectory('./', 'path/to/directory')
    ->save();

Supported formats

There is a lot of different formats for eBooks and comics, if you want to know more about:

Name Extensions Supported Uses Support cover Support series
EPUB (IDPF) .epub βœ… Native zip βœ… βœ…
Kindle (Amazon) .azw, .azw3, .kf8, .kfx βœ… Native filesystem βœ… (See MOBI cover note) ❌
Mobipocket .mobi, .prc βœ… Native filesystem βœ… (See MOBI cover note) ❌
PDF .pdf βœ… smalot/pdfparser (included) Uses imagick ❌
iBook (Apple) .ibooks ❌ N/A N/A
DjVu .djvu, .djv ❌ N/A N/A
Rich Text Format .rtf ❌ N/A N/A
FictionBook .fb2 βœ… Native filesystem βœ… βœ…
Broadband eBooks .lrf, .lrx ❌ N/A N/A
Palm Media .pdb ❌ N/A N/A
Comics CBZ .cbz βœ… Native zip βœ… βœ…
Comics CBR .cbr βœ… rar PHP extension or p7zip binary βœ… βœ…
Comics CB7 .cb7 βœ… p7zip binary βœ… βœ…
Comics CBT .cbt βœ… Native phar βœ… βœ…
Audio .mp3, .m4a, .m4b, .flac, .ogg βœ… If kiwilan/php-audio is installed Depends of format ❌

MOBI cover note

Mobipocket files and derivatives (.mobi, .prc, .azw, .azw3, .kf8, .kfx) can have a cover image embedded in the file. With native solution of php-ebook cover could be extracted but resolution is not good. Best solution is to convert file with calibre and use EPUB format.

Testing

composer test

Changelog

Please see CHANGELOG for more information on what has changed recently.

Credits

License

The MIT License (MIT). Please see License File for more information.

php-ebook's People

Contributors

dependabot[bot] avatar ewilan-riviere avatar github-actions[bot] avatar sergiomendolia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

php-ebook's Issues

[Bug]: Can't install package with Laravel 8

What happened?

I can't install package
Screen Shot 2023-12-06 at 14 03 50

How to reproduce the bug

Run composer require kiwilan/php-ebook

Package Version

2.1.02

PHP Version

8.2

Which operating systems does with happen with?

No response

Notes

No response

[Bug]: Call to a member function filter() on null

What happened?

Hello!

First of all, thanks for this great library!

I'm having an issue when an archive is a "badFile". Currently, the error is caught in Ebook.php, but it is only logged and the script continues.
Later on, the *Metadata.php files, use the getArchive()->xx method, and since the archive is null, the error is thrown. Since it's an error and not an Exception, the execution stops.

I can provide a PR for this, I would just like your preferred method of fixing it.

  • I can throw an Error on the getArchive() and typhint it as not nullable but I don't have enough visibility on the rest of the code to see if it's a good solution.
  • I could add a new Epub::check($path) to check a file before trying to read it
  • Any other idea?

Thanks!

How to reproduce the bug

Try to read a bad archive :)

Package Version

2.0.12

PHP Version

8.2

Which operating systems does with happen with?

Linux

Notes

No response

[Bug]: getChapters() not getting chapters

`
public function upload(Request $request)
{
$request->validate([
'epub_file' => 'required|mimes:epub'
]);

    $filename = 'eb_' . uniqid() . '.epub';
    $path = $request->file('epub_file')->storeAs('epubs', $filename);

    $ebook = Ebook::read(storage_path('app/' . $path));

    // Access metadata and properties of the eBook
    $title = $ebook->getTitle();
    $author = $ebook->getAuthorMain();
    $description = $ebook->getDescription() ?? 'No Description added';
    $coverImage = $ebook->getCover();

    // Store cover image
    $coverPath = null;
    if ($coverImage) {
        $coverFilename = 'cover_' . uniqid() . '.png'; // Assuming cover image format is PNG
        $coverContents = $coverImage->getContents();
        $fullPath = 'covers/' . $coverFilename;
        Storage::put($fullPath, $coverContents);
        $coverPath = $fullPath; // Store the full path including 'storage/app'
    }

    // Create a new book
    $book = Book::create([
        'name' => $title,
        'author' => $author,
        'isbn' => null,
        'book_cover' => $coverPath,
        'description' => $description
    ]);

    // Get chapters from the EPUB
    $epub = $ebook->getParser()?->getEpub();
    $chapters = $epub->getChapters(); // Ensure this line is correctly retrieving chapters
    dd($chapters); // Debug to check if chapters are now correctly retrieved

    // Save chapters to database
    foreach ($chapters as $index => $chapter) {
        Chapter::create([
            'title' => $chapter->label(), // Use label() method to get the chapter label
            'content' => $chapter->content(), // Use content() method to get the chapter content
            'book_id' => $book->id,
            'src' => $chapter->source(), // Use source() method to get the chapter source
        ]);
    }

    return redirect()->route('books.index')->with('success', 'Book uploaded successfully. File name: ' . $filename);
}

`

What happened?

Upon attempting to upload an EPUB file through the upload function, the process completes successfully without any errors reported. However, upon inspection, it's found that no chapters are created in the database despite the EPUB containing multiple chapters.

How to reproduce the bug

Expected Behavior:
After the file is uploaded, the function should extract chapters from the EPUB file and save them to the database. These chapters should be retrievable and displayed for the user.

Actual Behavior:
After uploading an EPUB file, the function successfully creates a book entry in the database with the correct title, author, cover image, and description. However, no chapters are created or saved in the database, even though the EPUB contains multiple chapters.

Debugging Steps Undertaken:

Checked the $chapters variable after retrieving it from the EPUB file. Used dd($chapters) to ensure that chapters are correctly retrieved. However, the dump shows an empty array, indicating that no chapters are being extracted.
Verified that the EPUB file contains multiple chapters by manually inspecting the file using EPUB reader software.

Package Version

2.3.8

PHP Version

8.2

Which operating systems does with happen with?

Windows

Notes

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.