GithubHelp home page GithubHelp logo

epubreader's Introduction

EpubReader logo with title

.NET library for reading EPUB files.

Build Tests Test coverage NuGet

Supported EPUB standards:

  • EPUB 2 (2.0, 2.0.1)
  • EPUB 3 (3.0, 3.0.1, 3.1, 3.2, 3.3)

Supported runtimes:

  • .NET Standard >= 1.3 (includes .NET Core >= 1.0 and .NET >= 5)
  • .NET Framework >= 4.6

Download | Documentation | WPF & .NET 7 console demo apps

EpubReader in a nutshell

EpubReader in a nutshell

Demo apps

  • Download WPF demo app (WpfDemo.zip)

    This .NET Framework application demonstrates how to open EPUB books and extract their content using the library.

    HTML renderer used in this demo app may have difficulties while rendering content for some of the books if the HTML structure is too complicated.

  • Download .NET 7 console demo app (ConsoleDemo.zip)

    This .NET 7 console application demonstrates how to open EPUB books and retrieve their text content.

Examples

  1. How to extract the table of contents.
  2. How to extract the plain text of the whole book.
  3. How to iterate over all EPUB files in a directory and gather some stats.

Download the latest stable release

epubreader's People

Contributors

fl3pp avatar frankdrebin893 avatar janek91 avatar krauser123 avatar lellid avatar rudacs avatar vers-one avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epubreader's Issues

Incorrect EPUB manifest exception

Loading EPUB produced (converted from AZW3) using Calibre 3.3, I get following exception:

Unmanaged exception: System.AggregateException: There were one of more errors. ---> System.Exception: Incorrect EPUB manifest: item with href = "My%20converted%20epub%20from%20Calibre_split_001.html" is missing.
in VersFx.Formats.Text.Epub.Readers.ChapterReader.GetChapters(EpubBookRef bookRef, List1 navigationPoints) in VersFx.Formats.Text.Epub.Readers.ChapterReader.GetChapters(EpubBookRef bookRef) in VersFx.Formats.Text.Epub.EpubBookRef.<GetChaptersAsync>b__33_0() in System.Threading.Tasks.Task1.InnerInvoke()
in System.Threading.Tasks.Task.Execute()
in VersFx.Formats.Text.Epub.EpubReader.ReadBook(String filePath)

This is related to href as URL Encoded (look at those "%20"), while Content.Html has decoded fileNames.

I simply added a:
contentFileName = Uri.UnescapeDataString(contentFileName);
after "if", inside VersFx.Formats.Text.Epub\Readers\ChapterReader.cs.

This solved my issue!

Book fails to open via OpenBookAsync (Regression)

Description

I have a book that fails to open after updating this library from v3.1.2 to v3.2

My Code:

using var epubBook = EpubReader.OpenBook(filePath, BookReaderOptions);
BookReaderOptions = = new()
    {
        PackageReaderOptions = new PackageReaderOptions()
        {
            IgnoreMissingToc = true
        }
    };

Sample EPUB file

<?xml version="1.0" encoding="utf-8"?>
<package unique-identifier="fanficfare-uid" version="2.0" xmlns="http://www.idpf.org/2007/opf">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:identifier id="fanficfare-uid">fanficfare-uid:www.royalroad.com-u87585-s26294</dc:identifier>
        <dc:title id="id">He Who Fights With Monsters</dc:title>
        <dc:creator opf:role="aut">Shirtaloon (Travis Deverell)</dc:creator>
        <dc:contributor id="id-2">FanFicFare [https://github.com/JimmXinu/FanFicFare]</dc:contributor>
        <dc:language>en</dc:language>
        <dc:date opf:event="publication">2019-07-28</dc:date>
        <dc:date opf:event="creation">2022-09-17</dc:date>
        <dc:date opf:event="modification">2022-09-17</dc:date>
        <meta content="2022-09-17T00:00:30" name="calibre:timestamp" />
        <dc:description>Some description here</dc:description>
        <dc:subject>High Fantasy</dc:subject>
        <dc:subject>Last Update: 2022/09/17</dc:subject>
        <dc:subject>LitRPG</dc:subject>
        <dc:subject>Adventure</dc:subject>
        <dc:publisher>www.royalroad.com</dc:publisher>
        <dc:identifier opf:scheme="URL">https://www.royalroad.com/fiction/26294</dc:identifier>
        <dc:source>https://www.royalroad.com/fiction/26294</dc:source>
    </metadata>
    <manifest>
        <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
        <item href="OEBPS/stylesheet.css" id="style" media-type="text/css" />
        <item href="OEBPS/title_page.xhtml" id="title_page" media-type="application/xhtml+xml" />
        <item href="OEBPS/file0001.xhtml" id="file0001" media-type="application/xhtml+xml" />
        ...
        <item href="OEBPS/file0153.xhtml" id="file0153" media-type="application/xhtml+xml" />
    </manifest>
    <spine toc="ncx">
        <itemref idref="title_page" linear="yes" />
        <itemref idref="file0001" linear="yes" />
        ...
        <itemref idref="file0153" linear="yes" />
    </spine>
</package>

I have validated all items exist and table of contents looks fine.

Stack Trace:

[18:47:11 WRN] [BookService] There was an exception when opening epub book: E:\Books\Wont Open\He_Who_Fights_With_Monsters_-_Shirtaloon.epub
System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at VersOne.Epub.Internal.BookCoverReader.ReadEpub2CoverFromGuide(EpubSchema epubSchema, Dictionary`2 imageContentRefs)
   at VersOne.Epub.Internal.BookCoverReader.ReadEpub2Cover(EpubSchema epubSchema, Dictionary`2 imageContentRefs)
   at VersOne.Epub.Internal.BookCoverReader.ReadBookCover(EpubSchema epubSchema, Dictionary`2 imageContentRefs)
   at VersOne.Epub.Internal.ContentReader.ParseContentMap(EpubBookRef bookRef, ContentReaderOptions contentReaderOptions)
   at VersOne.Epub.EpubReader.<>c__DisplayClass10_0.<OpenBookAsync>b__1()
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at VersOne.Epub.EpubReader.OpenBookAsync(IZipFile zipFile, String filePath, EpubReaderOptions epubReaderOptions)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at System.Threading.Tasks.Task`1.get_Result()
   at VersOne.Epub.EpubReader.OpenBook(String filePath, EpubReaderOptions epubReaderOptions)

Decouple EpubContentFileRef from EpubBookRef

Description

Currently, EpubContentFileRef class (and the classes derived from it) require an instance of the EpubBookRef to be passed to its constructor as an argument. At the same time, EpubBookRef instance contains the Content property which in turn contains collections of EpubContentFileRef instances thus creating a circular dependency.

This approach was chosen a few years ago because a book (EpubBookRef) needs to contain references to its content files (EpubContentFileRef) while a content file needs to have an access to the physical EPUB file which is only available through the EpubBookRef.EpubFile property. It also needs the content directory path which is available through the EpubBookRef.Schema.ContentDirectoryPath property.

In order to create an instance of the EpubContentFileRef class, the caller has to pass a partially initialized EpubBookRef instance to the constructor of the EpubContentFileRef class and then use the collection of EpubContentFileRef items to complete the construction of the EpubBookRef class. However, with the addition of nullable reference type annotations (#65) partially initialized instances are no longer possible.

Proposed solution

  1. Replace EpubBookRef argument in the constructor of the EpubContentFileRef class with IZipFile epubFile and string contentDirectoryPath arguments.
  2. Add bool IsDisposed property to the IZipFile interface and implement it in the ZipFile class.
  3. Check the epubFile.IsDisposed property in the ReadXX / GetContentStream methods and throw ObjectDisposedException if the file was already disposed.
  4. Add a clear statement in the documentation that the ownership of the IZipFile epubFile still belongs to the EpubBookRef class and destroying an instance of the EpubContentFileRef class doesn't dispose the file.

Metadata Items not reflecting property types: title-type and display-seq

epubBook.Schema.Package.Metadata.MetaItems does not seem to be showing "title-type" or "display-seq", which limits the ability to use dc:title tags for grouping books into collections/reading lists.

See the following:

<dc:title id="t3">The New French Cuisine Masters</dc:title>
<meta refines="#t3" property="title-type">collection</meta>
<meta refines="#t3" property="display-seq">3</meta>

https://www.w3.org/publishing/epub3/epub-packages.html#sec-title-type

When I run:
epubBook.Schema.Package.Metadata.MetaItems.Select(item => item.Property)

image

Internal.ContentReader.ParseContentMap(EpubBookRef bookRef, ContentReaderOptions contentReaderOptions)

Description

I have used the Daisy Pipeline2 to convert daisy2.02 and daisy3 books to epub. When using the library to read the converted books the library crashes (Even with all ignore and suppress options activated) and reports the following exceptions -

System.AggregateException: 'One or more errors occurred. (Object reference not set to an instance of an object.)'

Inner Exception
NullReferenceException: Object reference not set to an instance of an object.

Sample EPUB file

Absolute_Risk_Epub.zip

EPUB specification link

Additional context

The exceptions that my software reports don't really provide much information. To get more information about what's happening and what's causing the library to crash, you would have to run the epub file in the source code. I couldn't run the source code project myself, because I can't find the framework 4.6 installer for my computer.

Book cover extracting enhancements

Description

EPUB 2 specification doesn't contain explicit requirements on how book cover should be represented in the OPF schema file. Instead it provides only a vague recommendation to use a <guide>/<reference type="cover"> element mentioning the Chicago Manual of Style as the source of the list of applicable <reference> element types.

Most EPUB 2 books use <meta name="cover" content="..." /> element to define the cover, where the value of the content attribute points to a <manifest>/<item> element of the actual cover image. However there are some books that don't follow this pattern, hence all the hacks and heuristics currently present in the BookCoverReader:

// For non-standard ebooks, we try several other ways...
if (null != coverManifestItem) // we have found the item but there was no corresponding image ...
{
// some ebooks seem to contain more than one item with Id="cover"
// thus we test if there is a second item, and whether that is an image....
coverManifestItem = epubSchema.Package.Manifest.Where(manifestItem => manifestItem.Id.CompareOrdinalIgnoreCase(coverMetaItem.Content)).Skip(1).FirstOrDefault();
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
}
// we have still not found the item
// 2019-08-20 Hotfix: if coverManifestItem is not found by its Id, then try it with its Href - some ebooks refer to the image directly!
coverManifestItem = epubSchema.Package.Manifest.FirstOrDefault(manifestItem => manifestItem.Href.CompareOrdinalIgnoreCase(coverMetaItem.Content));
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
// 2019-08-24 if it is still not found, then try to find an Id named cover
coverManifestItem = epubSchema.Package.Manifest.FirstOrDefault(manifestItem => manifestItem.Id.CompareOrdinalIgnoreCase(coverMetaItem.Name));
if (null != coverManifestItem?.Href && imageContentRefs.TryGetValue(coverManifestItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}
// 2019-08-24 if it is still not found, then try to find it in the guide
var guideItem = epubSchema.Package.Guide.FirstOrDefault(reference => reference.Title.CompareOrdinalIgnoreCase(coverMetaItem.Name));
if (null != guideItem?.Href && imageContentRefs.TryGetValue(guideItem.Href, out coverImageContentFileRef))
{
return coverImageContentFileRef;
}

EPUB 3 on the other hand does define an explicit requirement for cover images by requesting to specify them via <manifest>/<item properties="cover-image"> elements. EpubReader parses these <manifest>/<item> elements along with their properties attributes correctly but does not currently use this information to obtain a cover image of an EPUB 3 book.

Proposed solution

  1. Remove all hacks from BookCoverReader.
  2. Replace heuristics with more robust algorithms to search for cover images in EPUB 2 books.
  3. Add EPUB 3 cover image support with a fallback to the EPUB 2 cover image extraction implementation if an EPUB 3 cover is not available (to support EPUB 3 books that provide only EPUB 2 covers).

Incorrect EPUB spine: TOC is missing

I am getting this exception, which seems to be related to a bad spine. After looking at your source code, I see that this exception is thrown when STRICTEPUB is set.

Is there a way to set this or do I need to compile the library myself?

Spine for reference:

<manifest>
    <item href="page-template.xpgt" id="pt" media-type="application/vnd.adobe.page-template+xml"/>
    <item href="stei_9780140177381_oeb_css_r1.css" id="style" media-type="text/css"/>
    <item href="stei_9780140177381_msr_cvi_r1.jpg" id="coverimagestandard" media-type="image/jpeg"/>
    <item href="stei_9780140177381_msr_cvt_r1.jpg" id="thumbimagestandard" media-type="image/jpeg"/>
    <item href="stei_9780140177381_msr_ppl_r1.jpg" id="PPCthumbnailimage" media-type="image/jpeg"/>
    <item href="stei_9780140177381_oeb_cover_r1.html" id="cover" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_toc_r1.html" id="toc" media-type="application/xhtml+xml"/>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
    <item href="stei_9780140177381_oeb_fm1_r1.html" id="fm1" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_fm2_r1.html" id="fm2" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_tp_r1.html" id="tp" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_ded_r1.html" id="ded" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_fm3_r1.html" id="fm3" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c01_r1.html" id="c01" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c02_r1.html" id="c02" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c03_r1.html" id="c03" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c04_r1.html" id="c04" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c05_r1.html" id="c05" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c06_r1.html" id="c06" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c07_r1.html" id="c07" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c08_r1.html" id="c08" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c09_r1.html" id="c09" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c10_r1.html" id="c10" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c11_r1.html" id="c11" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c12_r1.html" id="c12" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c13_r1.html" id="c13" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c14_r1.html" id="c14" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c15_r1.html" id="c15" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c16_r1.html" id="c16" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c17_r1.html" id="c17" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c18_r1.html" id="c18" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c19_r1.html" id="c19" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c20_r1.html" id="c20" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c21_r1.html" id="c21" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c22_r1.html" id="c22" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c23_r1.html" id="c23" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c24_r1.html" id="c24" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c25_r1.html" id="c25" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c26_r1.html" id="c26" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c27_r1.html" id="c27" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c28_r1.html" id="c28" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c29_r1.html" id="c29" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c30_r1.html" id="c30" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c31_r1.html" id="c31" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_c32_r1.html" id="c32" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_bm1_r1.html" id="bm1" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_ftn_r1.html" id="ftn" media-type="application/xhtml+xml"/>
    <item href="stei_9780140177381_oeb_cop_r1.html" id="cop" media-type="application/xhtml+xml"/>
      <item href="stei_9780140177381_oeb_001_r1.jpg" id="stei_9780140177381_oeb_001_r1" media-type="image/jpeg"/>
    <item href="stei_9780140177381_oeb_002_r1.jpg" id="stei_9780140177381_oeb_002_r1" media-type="image/jpeg"/>
    <item href="stei_9780140177381_oeb_003_r1.jpg" id="stei_9780140177381_oeb_003_r1" media-type="image/jpeg"/>
    <item href="stei_9780140177381_oeb_004_r1.jpg" id="stei_9780140177381_oeb_004_r1" media-type="image/jpeg"/>
</manifest>
  <spine>
    <itemref idref="cover"/>
    <itemref idref="toc"/>
    <itemref idref="fm1"/>
    <itemref idref="fm2"/>
    <itemref idref="tp"/>
    <itemref idref="cop"/>
    <itemref idref="ded"/>
    <itemref idref="fm3"/>
    <itemref idref="c01"/>
    <itemref idref="c02"/>
    <itemref idref="c03"/>
    <itemref idref="c04"/>
    <itemref idref="c05"/>
    <itemref idref="c06"/>
    <itemref idref="c07"/>
    <itemref idref="c08"/>
    <itemref idref="c09"/>
    <itemref idref="c10"/>
    <itemref idref="c11"/>
    <itemref idref="c12"/>
    <itemref idref="c13"/>
    <itemref idref="c14"/>
    <itemref idref="c15"/>
    <itemref idref="c16"/>
    <itemref idref="c17"/>
    <itemref idref="c18"/>
    <itemref idref="c19"/>
    <itemref idref="c20"/>
    <itemref idref="c21"/>
    <itemref idref="c22"/>
    <itemref idref="c23"/>
    <itemref idref="c24"/>
    <itemref idref="c25"/>
    <itemref idref="c26"/>
    <itemref idref="c27"/>
    <itemref idref="c28"/>
    <itemref idref="c29"/>
    <itemref idref="c30"/>
    <itemref idref="c31"/>
    <itemref idref="c32"/>
    <itemref idref="bm1"/>
    <itemref idref="ftn"/>
  </spine>

Migrate integration tests from using Json.NET to System.Text.Json

Description

The test case data for all integration tests is stored in JSON files. VersOne.Epub.Test project uses Json.NET (Newtonsoft.Json Nuget package) for serializing and deserializing those JSON files. It turns out, there are a few limitations and inconveniences in Json.NET that affect its use in VersOne.Epub.Test:

  • PreserveReferencesHandling option which instructs the JSON serializer to store only a single copy of an object within the JSON file cannot be used with classes that don't have default constructors (i.e. parameterless constructors). Nullable reference type annotations in VersOne.Epub require almost every class to have a non-default constructor which in turn makes it impossible to use the built-in reference tracking in Json.NET.
  • VersOne.Epub.Test needs to have custom serialization rules for some properties (e.g. instead of writing the whole content of a content file into JSON, which would be very wasteful, it writes only the path to the content file within the EPUB archive). The way to handle such scenarios in Json.NET is very complicated and poorly documented.
  • other minor inconveniences such as automatically converting string values into DateTime during deserialization if the content of the string looks like a date/time.

The first issue essentially makes it impossible to use strongly-typed serialization and deserialization operations. The only workaround is to parse the JSON file into a generic JObject object and deserialize its content manually. However, such generic parsing and writing operations can be done with System.Text.Json which also performs them more efficiently.

Proposed solution

Replace Json.NET with System.Text.Json for integration tests.

Additional context

Json.NET documentation: https://www.newtonsoft.com/json/help/
Json.NET Nuget package: https://www.nuget.org/packages/Newtonsoft.Json/
System.Text.Json documentation: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/overview
Migration guidelines: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/migrate-from-newtonsoft
Performance comparison: https://devblogs.microsoft.com/dotnet/whats-next-for-system-text-json/#performance

Invalid URI: The URI scheme is not valid.

This happens when I run WpfDemo. It cannot register fonts. There error is in BookHtmlContent.cs, line 148

Uri packageUri = new Uri(fontFile.Key + ":");

fontFile.Key is fonts/00001.ttf

How to read epub in async mode?

I try to read epub in UWP with the following code.
But get the Exception:
System.InvalidOperationException: 'Synchronous operations should not be performed on the UI thread. Consider wrapping this method in Task.Run.'

public async void test() { EpubBook epubBook = await EpubReader.ReadBookAsync("C:\test.epub"); }

Could not install package

Hello

I'm trying to install this package and I get this error:

Severity Code Description Project File Line Suppression State
Error Could not install package 'VersOne.Epub 2.0.1'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.6.1', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author. 0

My project is targeting net 4.6.1

Thanks for the help

NavigationItem.Link.ContentFileName is returning incorrectly

Description

The library is returning the wrong text for ContentFileName. In the case for this epub, it should return "Text/chapter01.xhtml" while it is returning "Text/../Text/chapter01.xhml". I'm not sure where the extra relative path is coming from, given it's not in the XML.

Epub structure:
image

Code:

var navItems = await book.GetNavigationAsync();
foreach (var navigationItem in navItems)
            {
                if (navigationItem.NestedItems.Count > 0)
                {
                   var nestedChapters = new List<BookChapterItem>();

                    foreach (var nestedChapter in navigationItem.NestedItems)
                    {
                        if (nestedChapter.Link == null) continue;

                        // BUG: nestedChapter.Link.ContentFileName -> Is returning "/Text/../Text/chapter01.xhtml" when it should be "Text/chapter01.xhtml"
                        var key = BookService.CleanContentKeys(nestedChapter.Link.ContentFileName);
                        if (mappings.ContainsKey(key))
                        {
                            nestedChapters.Add(new BookChapterItem()
                            {
                                Title = nestedChapter.Title,
                                Page = mappings[key],
                                Part = nestedChapter.Link.Anchor ?? string.Empty,
                                Children = new List<BookChapterItem>()
                            });
                        }
                    }

                    CreateToCChapter(navigationItem, nestedChapters, chaptersList, mappings);
                }

Toc.ncx:

<navPoint id="navPoint5">
      <navLabel>
        <text>Day 0: Backstory and the Bridal Wars</text>
      </navLabel>
      <content src="Text/chapter1.xhtml"/>
    </navPoint>
    <navPoint id="navPoint6">
      <navLabel>
        <text>Day 1, Morning: The Start of a Slow Life</text>
      </navLabel>
      <content src="Text/chapter2.xhtml"/>
    </navPoint>

Manifest:

<manifest>
    <item id="cover" href="Text/cover.xhtml" media-type="application/xhtml+xml"/>
    <item id="frontmatter1.xhtml" href="Text/frontmatter1.xhtml" media-type="application/xhtml+xml"/>
    <item id="frontmatter2.xhtml" href="Text/frontmatter2.xhtml" media-type="application/xhtml+xml"/>
    <item id="toc.xhtml" href="Text/toc.xhtml" media-type="application/xhtml+xml" properties="nav"/>
    <item id="prologue.xhtml" href="Text/prologue.xhtml" media-type="application/xhtml+xml"/>
    <item id="prologue2.xhtml" href="Text/prologue2.xhtml" media-type="application/xhtml+xml"/>
    <item id="insert1.xhtml" href="Text/insert1.xhtml" media-type="application/xhtml+xml"/>
    <item id="prologue2_1.xhtml" href="Text/prologue2_1.xhtml" media-type="application/xhtml+xml"/>
    <item id="chapter1.xhtml" href="Text/chapter1.xhtml" media-type="application/xhtml+xml"/>
    ...

Sample EPUB file

The file is under copyright

Additional context

This is a EPUB 2 document and I have tested on v3.1.1, v3.1.0 and it is not working.

Cover extracting issue for EPUB 2 books without a cover and a guide

Description

EpubReader throws NullReferenceException while trying to extract a cover for a EPUB 2 book if the following conditions are met:

  1. EPUB book doesn't have a metadata/meta element defining a cover.
  2. guide section is not present in the OPF package.

Sample EPUB file

test.epub

EPUB specification link

  1. metadata section: https://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.2
  2. guide section is optional in the OPF package: https://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#AppendixA

Additional context

BookCoverReader class uses the following algorithm to extract covers from EPUB 2 books:

  1. Check if the metadata section has the following meta item: <meta name="cover" content="..." />. If there is an item with the name set to cover, then its content attribute will point to an item in the manifest section representing the cover image.
  2. If there is no such meta item, then look for <reference type="cover" href="..." /> element in the guide section. BookCoverReader throws a NullReferenceException here if the guide section is not present.

It is possible for a EPUB 2 book to not have a cover image as well the guide section. This issue was not caught before because most EPUB 2 books have a cover image, and those that don't have it, include a guide section even though it is not required by the EPUB 2 specification.

EPUB Media Overlays support

Description

EPUB 3.2 specification lets EPUB books to declare media overlays which are essentially just embedded audio narrations synchronized with the text content. A reading software can use this information to highlight the words in the text as a narrator speaks.

This was first implemented by Apple as a non-standard extension for EPUB 2 books (under the name of Read Aloud). Later this feature was added to the EPUB 3 standard.

There is only a small amount of EPUB 3 books that contain media overlays and very few reader apps that actually support them. But nevertheless, it might be useful to have such support in this library.

Proposed solution

  1. Create schema data types that reflect all XML elements in the Media Overlays specification.
  2. Parse all SMIL files found in the EPUB book into these data types.

Note that this enhancement only parses SMIL files but doesn't perform any post-processing for the parsed data. There is a subsequent enhancement #84 which will expose the parsed data as easy to consume narration objects on the EpubBook level.

Additional context

EPUB Media Overlays 3.2 specification: https://www.w3.org/publishing/epub32/epub-mediaoverlays.html

Epub3 Creator roles aren't populating

Description

Creators roles are missing. In the example we have to creators, but when it's inspected, EpubReader has null Role value, whereas it should be "aut" and "ill".

<dc:creator id="creator01">Ameko Kaeruda</dc:creator>
    <meta property="alternate-script" refines="#creator01" xml:lang="ja">蛙田 アメコ</meta>
    <meta property="file-as" refines="#creator01">Kaeruda, Ameko</meta>
    <meta property="role" refines="#creator01" scheme="marc:relators">aut</meta>
    <dc:creator id="creator02">Sencha</dc:creator>
    <meta property="alternate-script" refines="#creator02" xml:lang="ja">せんちゃ</meta>
    <meta property="file-as" refines="#creator02">Sencha</meta>
    <meta property="role" refines="#creator02" scheme="marc:relators">ill</meta>
    ```


## EPUB specification link
https://www.w3.org/publishing/epub3/epub-packages.html#sec-role

EPUBs with wrong cover tag are not read

Description

Seems that there is something wrong when reading the cover tag of this particular EPUB (can not share it as it is copyrighted, but the publisher is Packt Publishing). Seems to me that in this case, the library should just log a warning and simply not read/use the cover image eg. treating it as if it has no cover (similar to #75).

I still have not found the root cause of this particular EPUB (the cover file Images/default_cover.jpeg does exist), but it seems like a non-critical error that should not cause the whole book from loading.

There was an exception when opening epub book: /books/MyBook.epub
VersOne.Epub.EpubPackageException: Incorrect EPUB manifest: item with ID = "Images/default_cover.jpeg" is missing.
   at VersOne.Epub.Utils.TaskExtensionMethods.ExecuteAndUnwrapAggregateException[T](Task`1 task)

Remove List<T> inheritance for 6 schema classes

Description

There are 6 schema classes derived from List<T>:

  1. Epub2NcxHead : List<Epub2NcxHeadMeta>
  2. Epub2NcxNavigationMap : List<Epub2NcxNavigationPoint>
  3. Epub2NcxPageList : List<Epub2NcxPageTarget>
  4. EpubGuide : List<EpubGuideReference>
  5. EpubManifest : List<EpubManifestItem>
  6. EpubSpine : List<EpubSpineItemRef>

The inheritance (rather than composition) was chosen to match the XML schema. For example, the <spine> section of the OPF package may look like this:

<spine toc="ncx">
    <itemref id="itemref-1" idref="item-1" />
    <itemref id="itemref-2" idref="item-2" />
</spine>

EpubSpine class lets the consumer access the child nodes in an intuitive way: spine[0].Id. In case of composition, it would look like this: spine.Items[0].Id which would not match the XML schema (since there is no <items> element in <spine>).

However, this also prevents the consumer to use both object and collection initializers together. C# syntax allows to use either object initializer:

EpubSpine spine = new EpubSpine()
{
    Toc = "ncx"
};

or collection initializer:

EpubSpine spine = new EpubSpine()
{
    new EpubSpineItemRef()
    {
        Id = "itemref-1",
        IdRef = "item-1"
    },
    new EpubSpineItemRef()
    {
        Id = "itemref-2",
        IdRef = "item-2"
    }
};

but not both.

Proposed solution

I think switching from inheritance to composition and adding intermediate Items property which doesn't exist in the XML schema is a reasonable price to pay to get the in-place initialization support for those classes.

Additional context

This is going to be a breaking change but hopefully a minor one since only a small set of consumers of this library use the raw schema classes and the fix is relatively simple (replacing spine[0].Id with spine.Items[0].Id).

How to read EPUB File from MemoryStream

We have developed an application based on this library, now we want to read epub file from MemoryStream. Kindly provide a sample to read a file from memory.

Add nullable reference type annotations

Description

EpubReader currently lacks nullable reference type annotations, mostly because it needs to target .NET Framework which locks the C# compiler version to C# 7.3 while nullable reference types require at least C# 8.0. However, there is a way to specify an explicit C# compiler version in csproj file via <LangVersion>x.x</LangVersion> project property. This should work even for .NET Framework and .NET Standard 1.0, as long as the code doesn't use any runtime features of the newer C# compiler. The only downside of this approach is the lack of nullable annotation attributes which require the project using them to NOT have any targets other than .NET Core >= 3, .NET >= 5, or .NET Standard 2.1. This leads to two main consequences:

  1. EpubReader cannot have any utility methods for nullable reference type annotations. For example, instead of creating a custom assertion method:
    void Assert([DoesNotReturnIf(false)] bool condition, string? message = null)
    {
        if (!condition)
        {
            throw ...
        }
    }
    it will have to copy-paste this check everywhere such assertion is required.
  2. .NET libraries themselves don't have any annotations so even after making a String.IsNullOrEmpty(test) check C# compiler still treats test as potentially null. The only workaround is to add an explicit if (test != null) { ... } check.

However, these downsides seem like reasonable tradeoffs for having nullable reference type annotations in EpubReader and most importantly, they don't affect the consumers of the library in any negative way.

Proposed solution

Switch to C# 10.0 compiler and add nullable reference type annotations for VersOne.Epub assembly.

Additional context

Documentation: https://learn.microsoft.com/en-us/dotnet/csharp/nullable-references

Remote manifest items support

Description

EPUB 3 standard supports remote manifest items and metadata links (i.e. files referenced by absolute URLs like http://example.com/book/123/font.ttf as opposed to local files like Content/font.ttf which are packaged within the EPUB file). EpubReader doesn't support remote manifest items and treats all absolute URLs as file names within the EPUB file.

Most of EPUB books don't contain references to remote resources.

Proposed solution

  1. Replace the implementation of the EpubContentFile / EpubContentFileRef classes and the classes derived from them with the following class hierarchy:

    EpubContentFile
    |-EpubLocalContentFile
    | |-EpubLocalTextContentFile
    | |-EpubLocalByteContentFile
    |-EpubRemoteContentFile
    | |-EpubRemoteTextContentFile
    | |-EpubRemoteByteContentFile
    
    EpubContentFileRef
    |-EpubLocalContentFileRef
    | |-EpubLocalTextContentFileRef
    | |-EpubLocalByteContentFileRef
    |-EpubRemoteContentFileRef
    | |-EpubRemoteTextContentFileRef
    | |-EpubRemoteByteContentFileRef
    
  2. Add ContentLocation property to the base classes with the following type: enum EpubContentLocation { LOCAL, REMOTE }.

  3. Add ContentFileType property to the base classes with the following type: enum EpubContentFileType { TEXT, BYTE_ARRAY }.

  4. Implement EpubContentCollection and EpubContentCollectionRef classes with two properties: Local and Remote which contain local and remote files / file references respectively. Use these classes for Html, Css, Images, Fonts, and AllFiles properties in the EpubContent / EpubContentRef classes.

  5. Implement content downloader for remote content files.

  6. Add ContentDownloaderOptions class to enable / disable downloading remote content and to let the application to supply its own content downloader.

  7. Extract the code to load local content and download remote content out of the EpubContentFileRef class into two separate classes: EpubLocalContentLoader and EpubRemoteContentLoader. Pass the reference to the content loader through the constructor parameter in the EpubContentFileRef class.

  8. Even though EPUB specification restricts types of remote resources to just audio, video, and font files, it would be better to relax this restriction in the EpubReader to allow all types of files to be remote resources to make the overall design simpler for the consumer of the library. However, EpubReader should still check that all HTML files in the EPUB spine, as well as the cover image and the EPUB 2 NCX / EPUB 3 navigation documents are local resources since these files are essential for constructing the parsed EPUB schema of the book.

Breaking changes

This solution introduces a breaking change: application will have to replace book.Content.<property> (where <property> is one of the following properties: Html, Css, Images, Fonts, or AllFiles) with book.Content.<property>.Local (unless application needs to handle remote items too).

Additionally, if application stores references to content files, then the following type replacement will be required:

  • EpubContentFileEpubLocalContentFile
  • EpubTextContentFileEpubLocalTextContentFile
  • EpubByteContentFileEpubLocalByteContentFile
  • EpubContentFileRefEpubLocalContentFileRef
  • EpubTextContentFileRefEpubLocalTextContentFileRef
  • EpubByteContentFileRefEpubLocalByteContentFileRef

Additional context

Support for XML 1.1 schema files

I am converting my epub file URL to stream and saving to local DB as bytes like below:

Stream stream;
HttpClient client = new HttpClient();
var response = await client.GetAsync(fileUrl);
stream = await response.Content.ReadAsStreamAsync();
epubBook = EpubReader.ReadBook(stream);

//saving to folder
byte[] bytes = await response.Content.ReadAsByteArrayAsync();
string filename = Path.GetFileName(fileUrl);
var folderPath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
var filePath = Path.Combine(folderPath, filename);
File.WriteAllBytes(filePath, bytes);

This is working fine for most of the files. But some file URLs showing System.AggregateException.

Exception Details

System.AggregateException: One or more errors occurred. (Version number '1.1' is invalid. Line 1, position 16.) ---> System.Xml.XmlException: Version number '1.1' is invalid. Line 1, position 16.
at System.Xml.XmlTextReaderImpl.Throw (System.Exception e) [0x00027] in <0757e7484a1349cca3b4558c721885b2>:0
at System.Xml.XmlTextReaderImpl.Throw (System.String res, System.String arg) [0x00029] in <0757e7484a1349cca3b4558c721885b2>:0
at System.Xml.XmlTextReaderImpl.ParseXmlDeclaration (System.Boolean isTextDecl) [0x0061f] in <0757e7484a1349cca3b4558c721885b2>:0
at System.Xml.XmlTextReaderImpl.Read () [0x000c6] in <0757e7484a1349cca3b4558c721885b2>:0
at System.Xml.Linq.XDocument.Load (System.Xml.XmlReader reader, System.Xml.Linq.LoadOptions options) [0x00016] in <89374192b20a41739cf7c5bb822846fe>:0
at System.Xml.Linq.XDocument.Load (System.IO.Stream stream, System.Xml.Linq.LoadOptions options) [0x0000f] in <89374192b20a41739cf7c5bb822846fe>:0
at System.Xml.Linq.XDocument.Load (System.IO.Stream stream) [0x00000] in <89374192b20a41739cf7c5bb822846fe>:0
at VersOne.Epub.Internal.XmlUtils+<>c__DisplayClass0_0.b__0 () [0x00000] in <7c46dbfe3ebf403389304a938822832e>:0
at System.Threading.Tasks.Task`1[TResult].InnerInvoke () [0x0000f] in <46c2fa109b574c7ea6739f9fe2350976>:0
at System.Threading.Tasks.Task.Execute () [0x00000] in <46c2fa109b574c7ea6739f9fe2350976>:0
--- End of stack trace from previous location where exception was thrown ---

Sample file URLs having this issue:

https://s3.us-east-1.amazonaws.com/catholic-brain/prod/dc/cbrain-app/files/doc-lib/2020/08/10/12/56/42/068/head/9781612781495_EPUB.epub

https://s3.us-east-1.amazonaws.com/catholic-brain/prod/dc/cbrain-app/files/doc-lib/2020/08/10/07/36/06/376/head/9781612781358_EPUB.epub

I am using EpubReader.Cross Nuget for parsing the epub file.

I have uploaded a sample project for the easy reference.

EPUB 3 support improvements

This is an issue to keep track of the work on the improvements to support EPUB 3 features that are currently not supported.

  • Parsing EPUB 3 navigation document
  • Parsing EPUB 2 & 3 linear reading order (spine-based text content order)
  • General refactoring
  • Changes to the examples to work with the new version of the library
  • Documentation update

Issue parsing ePub files

I have attached 3 ePub files that fail to be parsed by ePubReader.

I found these files in the wild, by google searching by file type to build up a ePub test
dataset to test ePubReader against.

I have other files that fail too but for same reasons as the ones attached (TOC error, etc)

Good job so far.

Thanks.
childrens-literature.zip
GhV-oeb-page.zip

CF General.zip

Book cover image cannot be displayed.

@versfx Hello! I have a problem with displaying book's cover image. It looks like the EpubBookRef entity doesn't have the necessary MetaItem. You can see on the picture below. This is strange. The book cover image can be displaying if I use my Pocket Book reader. There are another ways to display it?

metaitem

Get Plain Text of Chapter?

Hi,
I couldn't figure out how to email so I hope you'll forgive the question here. I've been trying to get the plain text of each chapter from epubs using your library. I've been "foreaching" through the chapters and then using chapter id values (current and next) to find ranges of relevant elements for each chapter but keep getting stuck in the particulars of trimming html.

If you can think of a way to do this or there is already some function that might achieve that result would you be willing to message me or respond here?

Happy to make a small contribution to your favorite charity or paypal account for your time.

Thanks,
Dave Gerding

Error when Epub has not cover

Description

When i try to read epub that has not cover image and any tag for cover in content.opf, throws an exception and can't read epub. If there is image and added in content.opf, it is ok but if there is no cover tag in content.opf, throws an exception

Exception

This exception was originally thrown at this call stack: VersOne.Epub.Internal.BookCoverReader.ReadEpub2CoverFromGuide(VersOne.Epub.EpubSchema, System.Collections.Generic.Dictionary<string, VersOne.Epub.EpubByteContentFileRef>) VersOne.Epub.Internal.BookCoverReader.ReadEpub2Cover(VersOne.Epub.EpubSchema, System.Collections.Generic.Dictionary<string, VersOne.Epub.EpubByteContentFileRef>) VersOne.Epub.Internal.BookCoverReader.ReadBookCover(VersOne.Epub.EpubSchema, System.Collections.Generic.Dictionary<string, VersOne.Epub.EpubByteContentFileRef>) VersOne.Epub.Internal.ContentReader.ParseContentMap(VersOne.Epub.EpubBookRef, VersOne.Epub.Options.ContentReaderOptions) VersOne.Epub.EpubReader.OpenBookAsync.AnonymousMethod__1() System.Threading.Tasks.Task<TResult>.InnerInvoke() in Future.cs System.Threading.Tasks.Task..cctor.AnonymousMethod__272_0(object) in Task.cs System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, object) in ExecutionContext.cs System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() in ExceptionDispatchInfo.cs

EPUB schema metadata/date/event and metadata/identifier/scheme attributes are not being parsed

Description

opf:event and opf:scheme attributes in the following example are always skipped during the parsing:

<package xmlns="http://www.idpf.org/2007/opf"
         xmlns:opf="http://www.idpf.org/2007/opf"
         xmlns:dc="http://purl.org/dc/elements/1.1/" ...>
  <metadata>
    <dc:date opf:event="...">...</dc:date>
    <dc:identifier ... opf:scheme="...">...</dc:identifier>
    ...

This is due to the fact that those attributes are not part of DC (Dublin Core Metadata Element Set) XML namespace. Instead, they are extra attributes added by the EPUB 2 standard which is why they appear in the opf XML namespace in the example above. EpubReader didn't account for this fact, so the values of those attributes were always null after parsing.

Sample EPUB file

test.epub

EPUB specification links

Add explicit .NET Standard 2.0 support

Description

Having .NET Standard 1.3 in the list of target frameworks:

<TargetFrameworks>net46;netstandard1.3</TargetFrameworks>
lets the library to be used in projects targeting some older frameworks (e.g. .NET Core 1.0). However, this also leads to an excessive list of unnecessary package dependencies when the library is imported in a project targeting a newer framework (e.g. .NET 6):

Microsoft.NETCore.Platforms.1.1.0
Microsoft.NETCore.Targets.1.1.0
runtime.native.System.4.3.0
runtime.native.System.IO.Compression.4.3.0
System.Buffers.4.3.0
System.Collections.4.3.0
System.Diagnostics.Debug.4.3.0
System.Diagnostics.Tracing.4.3.0
System.Globalization.4.3.0
System.IO.4.3.0
System.IO.Compression.4.3.0
System.Reflection.4.3.0
System.Reflection.Primitives.4.3.0
System.Resources.ResourceManager.4.3.0
System.Runtime.4.3.0
System.Runtime.Extensions.4.3.0
System.Runtime.Handles.4.3.0
System.Runtime.InteropServices.4.3.0
System.Text.Encoding.4.3.0
System.Threading.4.3.0
System.Threading.Tasks.4.3.0

Proposed solution

Adding an explicit .NET Standard 2.0 support should prevent unnecessary package dependencies.

Additional context

From https://docs.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-1-3#which-net-standard-version-to-target:

If you need to support .NET Standard 1.x, we recommend that you also target .NET Standard 2.0. .NET Standard 1.x is distributed as a granular set of NuGet packages, which creates a large package dependency graph and results in developers downloading a lot of packages when building.

Xamarin forms: Issue with epubreader

I am using epubreader NuGet package for parsing .epub files.

My Code:

string fileName = "SampleEPUB.epub";
var assembly = typeof(MainPage).GetTypeInfo().Assembly;
Stream stream = assembly.GetManifestResourceStream($"{assembly.GetName().Name}.{fileName}");
EpubBook epubBook = EpubReader.ReadBook(stream);
foreach (EpubNavigationItem chapter in epubBook.Navigation)
{
	chapterDetails.Add(new ChapterDetails() { title = chapter.Title, htmlData = chapter.HtmlContentFile?.Content, subChapters = chapter.NestedItems });
}

When parsing the epub file like above, I am getting only one chapter. If I click the chapter there is no data.

When we open that epub files using Adobe Digital Editions 4.5.11, there are lot of chapters and contents. I need to parse all the chapters and TOC in the epub file. Please help me to find the issue behind this.

I have added a sample project here having .epub files for the reference.

HTML Parser

Thanks for this library! Very useful in my project.

My question is: Should I build an HTML parser to display the chapter contents once I have parsed the .epub and have the HTML? The platform I am building for is not one with a built-in HTML/web parser. Any suggestions? Or is there a generally used HTML parsing library? Should this be built into the package?

long epub file names >50

Thanks for the library, usually... it works pretty well. I have several efiles which use identifiers that are longer than 50 characters. This makes the library return a null.

Am I doing something funky, that would make this break? It usually works.
Thanks,
Adam

public string ReadEpubFile(string sPnF)
{
Epub book = null;

        try
        {
            if (File.Exists(sPnF))
            {
             
                    book = new Epub(sPnF);

            }
        }
        catch (Exception e) 
        {
            throw new Exception(" public string ReadEpubFile( "+sPnF+" ) :: " +e.Source+" :: "+ e.Message);
        }
        if (book != null)
        {
            return book.GetContentAsPlainText();
        }
        else
        {
            return "";
        }
    }

EPUB 2 NCX navigation list parsing issues

Description

There are two issues with parsing EPUB 2 NCX navigation lists:

  1. Any NCX file with a navigation list causes NullReferenceException to be thrown.
  2. navTarget elements inside navigation lists are always ingored.

Here's a minimal NCX file to reproduce both of those issues:

<?xml version='1.0' encoding='utf-8'?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/">
  <head />
  <docTitle />
  <navMap />
  <navList id="navlist-1">
    <navLabel>
      <text>Test label</text>
    </navLabel>
    <navTarget id="navtarget-1">
      <navLabel>
        <text>Test label</text>
      </navLabel>
    </navTarget>
  </navList>
</ncx>

Sample EPUB file

test.epub

EPUB specification links

  1. https://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.4.1
  2. https://daisy.org/activities/standards/daisy/daisy-3/z39-86-2005-r2012-specifications-for-the-digital-talking-book/#NCX
  3. http://www.daisy.org/z3986/2005/ncx-2005-1.dtd

Additional context

Those issues have not been caught earlier because navigation lists are very rare in EPUB books. (They are used for secondary tables of contents, e.g. a list of illustrations, a list of tables, etc.)

cover image filename

How do we get the filename of the coverimage? I noticed that in pages that reference it you can find it from the images by the filename. In the epubbook there is the coverimage byte, but there is no filename or mimetype of this.

Add missing EPUB 3 attributes for all schema types

Description

Most VersOne.Epub.Schema.* types in EpubReader were designed to conform to EPUB 2, before EPUB 3 specification was officially released. EPUB 3 extended some of the schema XML elements with new attributes in such a way that made it impossible to add them to EpubReader's schema types without breaking backwards compatibility.

For example, <dc:title> XML element didn't have any attributes in EPUB 2, so the EpubBook.Schema.Package.Metadata.Titles property was typed as List<string>. However, EPUB 3 added id, dir, and xml:lang attributes which requires to replace the List<string> with something like List<EpubMetadataTitle> where EpubMetadataTitle will be a new class with all properties parsed from the EPUB 3 attributes. This will obviously be a breaking change for the consumers using this property. Another example is the <dc:language> element which got an optional id attribute.

Proposed solution

  1. Go through all schema types and add all missing EPUB 3 attributes.
  2. Document all breaking changes and make a detailed instruction explaining what needs to be changed in the consumer code to migrate to the new schema types.

Xamarin Support

Projects like PCL in Xamarin do not install nuget. More works by adding reference to DLL.
Please add support.

Load images and CSS files from EPUB archive even if they are missing in EPUB manifest

Description

  1. The file "fb.opf" cannot be found in EpubBook.Content.AllFiles;
  2. When I edit the fb.opt file, I add the item manifest to it. EpubBook.Content.AllFiles will changed accordingly;
  3. Sorry I didn't read the epub spec in detail, the WPF demo can't display any images from the last chapter because it is not included in the opt file manifest .I don't think a strict reliance on manifest is appropriate.

Sample EPUB file

test-file.epub.zip

Some pictures in the last chapter of the file

.NET Core support

Hello,

Is it possible to make EpubReader to be used in .NetCore applications?
Thanks for your efforts.

Br,
Sergey Sypalo | Blog at http:\sypalo.com

PlayOrder not being read

Hello @vers-one,
I think the newest version of the NuGet package doesn't contain the latest version of the assemblies.

When analyzing the assembly in your latest NuGet package I can see that the assembly version is still 2.0.5, despite your adjustions in the project files:

[assembly: TargetFramework(".NETStandard,Version=v1.3", FrameworkDisplayName = "")]
[assembly: AssemblyCompany("vers")]
[assembly: AssemblyConfiguration("Release")]
[assembly: AssemblyCopyright("vers, 2015-2018")]
[assembly: AssemblyFileVersion("2.0.4.0")]
[assembly: AssemblyInformationalVersion("2.0.4")]
[assembly: AssemblyProduct("VersOne.Epub")]
[assembly: AssemblyTitle("VersOne.Epub")]
[assembly: AssemblyVersion("2.0.4.0")]

This leads to my pull request not being included:

    internal class Program
    {
        static void Main(string[] args)
        {
            var ePub = EpubReader.ReadBook(@"C:\Users\Jann Flepp\Downloads\Tom Christiansen - Perl Cookbook.epub");

            var points = GetNavigationPoints(ePub.Schema.Navigation.NavMap).ToArray();

            Console.WriteLine("Any playorder null: " + (points.Any(p => p.PlayOrder == null) ? "true" : "false"));
        }

        private static IEnumerable<EpubNavigationPoint> GetNavigationPoints(IEnumerable<EpubNavigationPoint> map)
        {
            foreach (var point in map)
            {
                yield return point;
                foreach (var subPoint in GetNavigationPoints(point.ChildNavigationPoints))
                {
                    yield return subPoint;
                }
            }
        }
    }

With NuGet Package

<PackageReference Include="VersOne.Epub" Version="2.0.5" />

Any playorder null: true

With Reference to master branch project

<ProjectReference Include="..\VersOne.Epub\VersOne.Epub.csproj" />

Any playorder null: false

Could you verify my assumptions?

Thanks for your help!

Font mime-type missing

Hi, Thanks for the great project. I have some epub books where the ttf fonts have a mime-type of "application/x-font-truetype". I notice this mime-type is not included in the /Readers/ContentReaders.cs file. As a result, these fonts are not included in the EpubContent.Fonts list - they are being classed as EpubContentType.OTHER.

The fonts also appear to have 'ccs/' prefixed to their path (as does the .css file) even though in the epub archive manifest there is no css folder (the .css file is in the 'Styles' folder). Could you explain why this is?
Many thanks,
Will

Blank media type throws error (valid epub)

I have a valid epub, but the content.OPF file contains an item with a blank media type:

Sadly I have to be able to process files that pass epub check, and this does. Below is the stack trace from the fail. Ideally this would gracefully just ignore the file. Sadly I cannot provide the file as it contains copyrighted material, but I believe just adding the the file and a blank media-type to any should produce the same issue.

Thank you!

at VersOne.Epub.Internal.PackageReader.ReadManifest(XElement manifestNode)
at VersOne.Epub.Internal.PackageReader.d__0.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1.ConfiguredTaskAwaiter.GetResult() at VersOne.Epub.Internal.SchemaReader.<ReadSchemaAsync>d__0.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1.ConfiguredTaskAwaiter.GetResult()
at VersOne.Epub.EpubReader.d__10.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at VersOne.Epub.EpubReader.d__9.MoveNext()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.