aerkalov / ebooklib Goto Github PK

View Code? Open in Web Editor NEW

1.4K 41.0 218.0 823 KB

Python E-book library for handling books in EPUB2/EPUB3 format -

Home Page: https://ebooklib.readthedocs.io/

License: GNU Affero General Public License v3.0

Python 100.00%

epub python python-library

ebooklib's Introduction

About EbookLib

EbookLib is a Python library for managing EPUB2/EPUB3 and Kindle files. It's capable of reading and writing EPUB files programmatically (Kindle support is under development).

The API is designed to be as simple as possible, while at the same time making complex things possible too. It has support for covers, table of contents, spine, guide, metadata and etc.

EbookLib is used in Booktype from Sourcefabric, as well as sprits-it!, fanfiction2ebook, viserlalune and Telemeta.

Packages of EbookLib for GNU/Linux are available in Debian and Ubuntu.

Sphinx documentation is generated from the templates in the docs/ directory and made available at http://ebooklib.readthedocs.io

Usage

Reading

import ebooklib
from ebooklib import epub

book = epub.read_epub('test.epub')

for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
    print(image)

Writing

from ebooklib import epub

book = epub.EpubBook()

# set metadata
book.set_identifier("id123456")
book.set_title("Sample book")
book.set_language("en")

book.add_author("Author Authorowski")
book.add_author(
    "Danko Bananko",
    file_as="Gospodin Danko Bananko",
    role="ill",
    uid="coauthor",
)

# create chapter
c1 = epub.EpubHtml(title="Intro", file_name="chap_01.xhtml", lang="hr")
c1.content = (
    "<h1>Intro heading</h1>"
    "<p>Zaba je skocila u baru.</p>"
    '<p><img alt="[ebook logo]" src="static/ebooklib.gif"/><br/></p>'
)

# create image from the local image
image_content = open("ebooklib.gif", "rb").read()
img = epub.EpubImage(
    uid="image_1",
    file_name="static/ebooklib.gif",
    media_type="image/gif",
    content=image_content,
)

# add chapter
book.add_item(c1)
# add image
book.add_item(img)

# define Table Of Contents
book.toc = (
    epub.Link("chap_01.xhtml", "Introduction", "intro"),
    (epub.Section("Simple book"), (c1,)),
)

# add default NCX and Nav file
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())

# define CSS style
style = "BODY {color: white;}"
nav_css = epub.EpubItem(
    uid="style_nav",
    file_name="style/nav.css",
    media_type="text/css",
    content=style,
)

# add CSS file
book.add_item(nav_css)

# basic spine
book.spine = ["nav", c1]

# write to the file
epub.write_epub("test.epub", book, {})

License

EbookLib is licensed under the AGPL license.

Authors

Full list of authors is in AUTHORS.txt file.

ebooklib's People

Stargazers

Watchers

Forkers

171230839 computamike hongquan jeroanan the-happy-hippo booktype mcccclean eos87 rdhyee mjmeintjes fdeandao punchagan leleu madevelopers andyroberts biggani yomguy parisson tomirendo kkucherenkov shyn olexpono unixtech ecrowdmedia dnlzsy clach04 57uff3r kotnik psypherpunk alexrakowski durden digglife rocktee edwardbetts gislab ocrack pombredanne birkbeckctp momingxu alin23 apprendimento wannaphong shanhaiying mr-eddy-n ride90 benjhastings fbigun danielhjames magickcoding bookronin kennyl ifarhankhan olethanh kodeworker michaelstorm petrchpetr foodpoison deborahgu ywzhaiqi yuchou tantale sealemar francofaa rjshaver surya10197 tyronebj lnrsoft afshinamiree optimiz-net wenxq kyuhwas woyun-qing mataoct walelile linrongbin juggernautbooks oscargibson ashadulhoque mediakraken-dependancies einverne tylerwhipple liuyanzhi aserun feitianyiren soroushj maanijou nipundiwan1992 tom-gardner pylixm muhammadzeeshan34 ruthlessruler takishima 06linux rec nmrta easily44 jftavares geeksivan gesmvstasr yishuihanhan

ebooklib's Issues

Support for guide

Implement API and create guide element in the manifest. Guide is deprecated feature, but we should be able to support it.

We should also be able to support landmark feature:
http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav-def-types-landmarks

Guide

Example of deprecated guide:

Guide support different types:

cover
title-page
toc
index
glossary
acknowledgements
bibliography
colophon
copyright-page
dedication
epigraph
foreword
loi
lot
notes
preface
text

Preserve XML declaration when creating XML files

We do not preserve XML declarations with creating XML files. What we should so is use option xml_declaration when using etree.tostring function.

Example:
tree_str = etree.tostring(tree, pretty_print=True, encoding='utf-8', xml_declaration=True)

A typo in the nav item string representation.

Extend API with methods for filtering data

Implement methods for fetching and filtering data in book or chapter.

Epubcheck fails for some tag attributes

For instance, P dir="RTL" will complain because RTL is in uppercase. Epubcheck expects them to be in lowercase.

Fix error in README file

Sample code will not run. Fix error in README file.

Implement different methods for fetching different items from a book

Implement call like get_link_of_href to fetch item from a book. The question is, should it return just one item or it should return more then one item. I guess more useful would be just to return one item.

Cover image item

When an EPUB3 manifest is loaded, the item with cover-image property assigned is not recognized as the cover image.

Fix typo in setup.py

The EPUB folder name is not configurable.

The default folder name is hard-coded in the container's XML template and will not reflect the name assigned for the book.

Basic plugin for filtering non HTML5 content

We need a basic plugin which will be able to filter out most of non HTML5 tags, attributes and things like that.

What we would need later is also replace non supported tags with new syntax. For instance, replace tag with element and css and etc...etc.....

Make some function names more pythonic + update docs + update examples

writeEPUB and readEPUB should really be write_epub and read_epub.

Implement add_item method for EpubHtml

When manipulating with chapter we should be able to add other items (like scripts, stylesheets) to this item and EbookLib should be able to make automatic links for us.

For instance, you have one style file which you would like to add to other Html files.

style = '''BODY { text-align: justify;}'''

default_css = epub.EpubItem(uid="style_default", file_name="style/default.css", media_type="text/css", content=style)
book.add_item(default_css)

c2 = epub.EpubHtml(title='About this book', file_name='about.xhtml')
c2.content='<h1>About this book</h1><p>Helou, this is my book! There are many books, but this one is mine.</p>'
c2.add_item(default_css)

Put parsing function in the utils module

We are using HTML5 parser way too many times. Just put it in the utils module.

Do not use ZIP_STORED for every item in zip file

For unknown reasons we are using ZIP_STORED flag for every single item in zip file. We should use it only for mimetype file.

Item in spine could have flag linear

We need to support linear flag in spine. The best would be to have option in Item and to be able to mark it somehow when defining spine.

We must not use .wait() for waiting Popen to end

We must use communicate. Here is a little tip... Read documentation and look at the big red boxes in the documentation.

Remove dependency of itertools module

No need for this. Just use normal generator expression.

Delete temporary directory

Somehow temporary directory with unextracted epub ended up in the repository. I wonder who has put it inside.

EpubCoverHtml should extend EpubHtml and use it for processing HTML

We have separate cover template and we used duplicated methods for the same thing. Just extend EpubHtml and use its methods for processing HTML.

Remove print statements from source code

Spurious metadata entry while parsing the OPF file.

When metadata found inside the OPF file is parsed, a bogus entry is read and placed in the book's metadata container.

Cover file should also extend EpubHtml class

It would be best if Cover file also extends EpubHtml. It would be possible to add dynamically other CSS files or JavaScript files with API.

Handle properties in manifest file when writing to epub

Handle properties tag when creating epub file.

Wrong copyright info

I copy pasted copyright from Booktype. Should removed references to Booktype from inside.

Implement new API for Plugins

API will change a lot, but for now we just need something to start working.

def before_write(self, book):
    "Processing before save"

def after_write(self, book):
    "Processing after save"

def before_read(self, book):
    "Processing before save"

def after_read(self, book):
    "Processing after save"

def item_after_read(self, book, item):
    "Process general item after read."

def item_before_write(self, book, item):
    "Process general item before write."

def html_after_read(self, book, chapter):
    "Processing HTML before read."

def html_before_write(self, book, chapter):
     "Processing HTML before write"

Decode filenames when reading them from zip file

We should unquote filenames when reading them from zip file.

Book title and UID fields not assigned when reading from file

When reading an EPUB from a file, the book title and its UID fields are not updated. As this information is contained in the metadata, make sure that it is also assigned to corresponding fields.

Mime type is not correctly guessed when adding items

Like the title said, mime type is not correctly gussed. mimetype.guess_type can return string OR tuple. We are only handling if it returns tuple. End result is value None for our mime type.

Move common functions to ebooklib.utils

There are some common functionality which should really be in ebooklib.utils. Things like debug, parse, ....

Add copyright, author info and setup.py

Add license info, author info and setup.py file.

Update README file with code samples

Write basic code samples for read/write operations.

Make it work on Python 2.7 / Python 3.3

Will need to change couple of API calls. This will make library work on Python 2.7 minimal.

Head and body elements missing in some cases

If the original document has empty body with no children, body and head elements will be missing from the generated content.

Parse EPUB2 guide

The EPUB2 guide element of the OPF file is not parsed when an EPUB file is loaded.

Faulty navigation points in the NCX

Navigation points in the NCX documents that correspond to book sections have an empty URL assigned for content. This in reported as error by the epubcheck program.

Creating EPUB files does not work in Python 3.3

Issue with string with lxml parse function and dictionary iteritems method.

Navigation xhtml file should behave like other xhtml files

Navigation xhtml file should extend standard xhtml chapter file class. Also, we should be able to use nav.add_item() to add CSS style definitions. For now, it was just hard coded.

Add additional item types for audio and video files

Add different item types like ITEM_AUDIO and ITEM_VIDEO.

Check type of Item in epub

We should be able to check type of items in EPUB file. Return some kind of ID for different items (image, html, css, ...)

Have EpubItem for remote resources

EPUB3 supports remote-resources property for video and audio elements. Meaning, they can be stored somewhere remotely. But this is only for audio and video tags. These items also must be placed in the list of resouces. Our EpubItem should be aware of this and not create local file in EPUB3 in this case.

Implement document type

It would be handy to have Document type also. We should be able to know "this is html document" but we should also know if it is cover.xhtml, nav.xhtml or just another chapter.

Increase version to 0.15

Load navigation document

When reading an EPUB file, if the NCX file is not present the TOC structure should be obtained by parsing the NAV document instead.

Plugin which cleans content with Tidy HTML

We need standard plugin which will use tidy to clean chapter content before they are saved in EPUB.

standard tidy
https://github.com/w3c/tidy-html5

Implement API to add files which are not present in the manifest

There is a need to add files which are not present in the manifest (for instance - iTunesMetadata.plist, META-INF/com.apple.ibooks.display-options.xml).

Probably have it as it is right now but have argument .add_item(item, manifest=False).

Read linear flag from spine structure

Read and store linear value from spine structure.

Do not create new title tag in chapter if it already exists

When creating chapter content we do two things. We copy old tags from the original document and we also add new title tag. We should not add new tag if it already exists. But also, we should not set empty title (if it is not defined) if it already exists.

Add sample files

Add some basic sample files. Something to show how to use EbookLib library.

Use six package for Python2/Python3 compatibility

Just use six package to make it work better on Python2/Python3.