GithubHelp home page GithubHelp logo

aerkalov / ebooklib Goto Github PK

View Code? Open in Web Editor NEW
1.4K 41.0 218.0 823 KB

Python E-book library for handling books in EPUB2/EPUB3 format -

Home Page: https://ebooklib.readthedocs.io/

License: GNU Affero General Public License v3.0

Python 100.00%
epub python python-library

ebooklib's Introduction

About EbookLib

EbookLib is a Python library for managing EPUB2/EPUB3 and Kindle files. It's capable of reading and writing EPUB files programmatically (Kindle support is under development).

The API is designed to be as simple as possible, while at the same time making complex things possible too. It has support for covers, table of contents, spine, guide, metadata and etc.

EbookLib is used in Booktype from Sourcefabric, as well as sprits-it!, fanfiction2ebook, viserlalune and Telemeta.

Packages of EbookLib for GNU/Linux are available in Debian and Ubuntu.

Sphinx documentation is generated from the templates in the docs/ directory and made available at http://ebooklib.readthedocs.io

Usage

Reading

import ebooklib
from ebooklib import epub

book = epub.read_epub('test.epub')

for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
    print(image)

Writing

from ebooklib import epub

book = epub.EpubBook()

# set metadata
book.set_identifier("id123456")
book.set_title("Sample book")
book.set_language("en")

book.add_author("Author Authorowski")
book.add_author(
    "Danko Bananko",
    file_as="Gospodin Danko Bananko",
    role="ill",
    uid="coauthor",
)

# create chapter
c1 = epub.EpubHtml(title="Intro", file_name="chap_01.xhtml", lang="hr")
c1.content = (
    "<h1>Intro heading</h1>"
    "<p>Zaba je skocila u baru.</p>"
    '<p><img alt="[ebook logo]" src="static/ebooklib.gif"/><br/></p>'
)

# create image from the local image
image_content = open("ebooklib.gif", "rb").read()
img = epub.EpubImage(
    uid="image_1",
    file_name="static/ebooklib.gif",
    media_type="image/gif",
    content=image_content,
)

# add chapter
book.add_item(c1)
# add image
book.add_item(img)

# define Table Of Contents
book.toc = (
    epub.Link("chap_01.xhtml", "Introduction", "intro"),
    (epub.Section("Simple book"), (c1,)),
)

# add default NCX and Nav file
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())

# define CSS style
style = "BODY {color: white;}"
nav_css = epub.EpubItem(
    uid="style_nav",
    file_name="style/nav.css",
    media_type="text/css",
    content=style,
)

# add CSS file
book.add_item(nav_css)

# basic spine
book.spine = ["nav", c1]

# write to the file
epub.write_epub("test.epub", book, {})

License

EbookLib is licensed under the AGPL license.

Authors

Full list of authors is in AUTHORS.txt file.

ebooklib's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ebooklib's Issues

Support for guide

Implement API and create guide element in the manifest. Guide is deprecated feature, but we should be able to support it.

We should also be able to support landmark feature:
http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav-def-types-landmarks

Guide

  1. Table of Contents
  2. List of Illustrations
  3. Start of Content

Example of deprecated guide:






Guide support different types:

  • cover
  • title-page
  • toc
  • index
  • glossary
  • acknowledgements
  • bibliography
  • colophon
  • copyright-page
  • dedication
  • epigraph
  • foreword
  • loi
  • lot
  • notes
  • preface
  • text

Preserve XML declaration when creating XML files

We do not preserve XML declarations with creating XML files. What we should so is use option xml_declaration when using etree.tostring function.

Example:
tree_str = etree.tostring(tree, pretty_print=True, encoding='utf-8', xml_declaration=True)

Cover image item

When an EPUB3 manifest is loaded, the item with cover-image property assigned is not recognized as the cover image.

Basic plugin for filtering non HTML5 content

We need a basic plugin which will be able to filter out most of non HTML5 tags, attributes and things like that.

What we would need later is also replace non supported tags with new syntax. For instance, replace tag with element and css and etc...etc.....

Implement add_item method for EpubHtml

When manipulating with chapter we should be able to add other items (like scripts, stylesheets) to this item and EbookLib should be able to make automatic links for us.

For instance, you have one style file which you would like to add to other Html files.

style = '''BODY { text-align: justify;}'''

default_css = epub.EpubItem(uid="style_default", file_name="style/default.css", media_type="text/css", content=style)
book.add_item(default_css)

c2 = epub.EpubHtml(title='About this book', file_name='about.xhtml')
c2.content='<h1>About this book</h1><p>Helou, this is my book! There are many books, but this one is mine.</p>'
c2.add_item(default_css)

Delete temporary directory

Somehow temporary directory with unextracted epub ended up in the repository. I wonder who has put it inside.

Wrong copyright info

I copy pasted copyright from Booktype. Should removed references to Booktype from inside.

Implement new API for Plugins

API will change a lot, but for now we just need something to start working.

def before_write(self, book):
    "Processing before save"

def after_write(self, book):
    "Processing after save"

def before_read(self, book):
    "Processing before save"

def after_read(self, book):
    "Processing after save"

def item_after_read(self, book, item):
    "Process general item after read."

def item_before_write(self, book, item):
    "Process general item before write."

def html_after_read(self, book, chapter):
    "Processing HTML before read."

def html_before_write(self, book, chapter):
     "Processing HTML before write"

Parse EPUB2 guide

The EPUB2 guide element of the OPF file is not parsed when an EPUB file is loaded.

Faulty navigation points in the NCX

Navigation points in the NCX documents that correspond to book sections have an empty URL assigned for content. This in reported as error by the epubcheck program.

Check type of Item in epub

We should be able to check type of items in EPUB file. Return some kind of ID for different items (image, html, css, ...)

Have EpubItem for remote resources

EPUB3 supports remote-resources property for video and audio elements. Meaning, they can be stored somewhere remotely. But this is only for audio and video tags. These items also must be placed in the list of resouces. Our EpubItem should be aware of this and not create local file in EPUB3 in this case.

Implement document type

It would be handy to have Document type also. We should be able to know "this is html document" but we should also know if it is cover.xhtml, nav.xhtml or just another chapter.

Load navigation document

When reading an EPUB file, if the NCX file is not present the TOC structure should be obtained by parsing the NAV document instead.

Do not create new title tag in chapter if it already exists

When creating chapter content we do two things. We copy old tags from the original document and we also add new title tag. We should not add new tag if it already exists. But also, we should not set empty title (if it is not defined) if it already exists.

Add sample files

Add some basic sample files. Something to show how to use EbookLib library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.