GithubHelp home page GithubHelp logo

p0n1 / epub_to_audiobook Goto Github PK

View Code? Open in Web Editor NEW
755.0 10.0 66.0 1.81 MB

EPUB to audiobook converter, optimized for Audiobookshelf

License: MIT License

Python 99.04% Dockerfile 0.96%
audiobooks audiobookshelf epub tts chatgpt openai

epub_to_audiobook's People

Contributors

bryksin avatar haydonryan avatar ignorantsapient avatar jczinger avatar p0n1 avatar phuchoang2603 avatar xtmu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epub_to_audiobook's Issues

[Bug] File overwrite every iteration

Hi, I suspect I spotted the bug, it is not affecting functionality, as eventually final file would be created successfully, however it overwrites it on every iteration

I believe this write operation must be tabbed outside of for loop and make single write when all audio segments are collected after the loop instead of overwriting over and over again file with +1 audio segment

[Feature] Force mode

Can you please add functionality which detect if there are some chapters already processed in output folder by filename ?
in my use case i am not much time on one place so i am converting some books in 4-9 runs. and would be nice to save so money by automatically detecting the already converted chapters, if -f parameter would be user everything in output folder could be ignored :)

Error when parsing book with nested chapters

I get error when trying to pase tis book: https://www.kosmas.cz/knihy/257693/ostre-stribro/

/usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
  warnings.warn(

Could it be connectet to nested chapters ?
image

No breaks added between paragraphs when they don't end with a punctuation

When using Edge TTS to read an epub formatted book, the following sorts of paragraph won't be read correctly:

Chapter One

The Chapter Title

This is the first sentence of the chapter.

because it will be read as: "Chapter one the chapter title this is the first sentence of the chapter", as though it's all once sentence with no breaks. This can be especially confusing if there's a heading in a paragraph in the middle of a chapter, something like:

... This is the final sentence of a paragraph.

The Next Section

Here is another sentence.

since it'll be read as "The next section here is another sentence", making it easy to miss that the first half of that sentence was supposed to be a header.

I looked in the source code and the trouble seems to come from epub_book_parser.py, where the second text cleaning step replaces all groups of white space (including newlines) with a single space. So this might affect Azure and OpenAI TTS as well, but I haven't tested it.

At least in the case of Edge TTS, though, it's not sufficient to simply keep a newline in there, because it appears that the edge_tts module automatically replaces newlines with spaces as well. So I think the solution for it needs to include inserting periods where needed.

An even better solution for Edge TTS would be to insert longer pauses between such paragraphs, though since Microsoft prevents using SSML, it would require using something like this.

Amazon Polly

Thanks for developing this!

Have you considered integrating Amazon Polly? The neural voices are exceptionally good and the possibilities with SSML are unique!

Web Interface

Would it be a big ask if we could get (at some point) a web interface, with Readarr/Calibre integration (as well as Audiobook platforms) so we could even configure automated conversions based on Readarr tags or libraries? Bonus points if we could then have it notify a Readarr instance (or even a different audiobook app) that there's a new audio book for it to scan.

Azure region info

Can you explain a little more how the Azure region code is obtained? This isn't clear at all.

Local TTS support

Amazing work, but would be even more amazing if we have alternatives to online-paid only TTS options such as PiperTTS, CoquiTTS, Bark...etc Thanks for the hard work and keep it up!

Audio Is Longer In Duration Than What Is In ABS player

Hello! Thank you for your hard work. I used your program to create an audiobook from an EPUB file that was from my Calibre library.
It worked, but on Audiobookshelf it says it has a certain duration which is then exceeded, and it is listed as finished. So when I try to come back to it I'm forced to go to the official end of the audio, and I can't go forward or back using the player or it will start at the official end of the audio.
Audio

[Feature Request] Support for Customizable Audio Formats

Great project, thank you.
Here’s a suggestion: Currently, the default output audio format is an mp3 with a bitrate of 48kbps, which has a relatively poor sound quality. It would be great if you could add support for other audio formats, allowing users to customize it using the parameter “X-Microsoft-OutputFormat”. Thank you once again.

Better chapter title handling

I'm a bit concerned about the possibility that items labeled as h1 h2 h3 could be non section title. However, it's not a big issue, and if there is indeed a problem, we can fix it later.

Originally posted by @p0n1 in #30 (comment)

I ran into a book with single numbers in h1 tag. So, the chapter titles would be just something like 01, 02... I prefer to keep more context/strings in title so I can know more about each chapter audio file.

Could we add a --no-prompt option to supress the prompt (Do you want to continue? (y/n))

Hi there,
Great work here!
I'm working on a project where I would be running this in a pipeline, so no text input is possible. Could we add a --no-prompt option?.

2024-01-13 13:34:10 [INFO] Chapters count: 33.
2024-01-13 13:34:10 [INFO] Converting chapters from 1 to 33.
2024-01-13 13:34:10 [INFO] ✨ Total characters in selected book: 554126 ✨
Estimate book voiceover would cost you roughly: $8.88
Do you want to continue? (y/n)

Another minor suggestion here- if you pass the --preview option idea here, it still asks you to confirm if you would like to continue. If you're just previewing ... it won't actually do the conversion - so can we skip the prompt, just process and exit?

Thanks! and have a great day.

[Feature Request] Small sample of voice

It would be awesome if you had a function to convert a very short epub file with an option to select the different tts voices.. that way we could know what the voice will sound like before converting a full epub.

or maybe just include a single short chapter epub in the github that can be used as your example..

[Feature Request] Title Options / Filename Options

This is possibly just my OCD but would it be possible to add in some options to set the title of each file to something other than what the script decides?

Example :

  • Chapter 1
  • Book Name - Chapter 1

Would it also be possible to customise the filenames in a similar fashion?

[Errno 2] No such file or directory when running python3 epub_to_audiobook.py -h`

When I run python3 epub_to_audiobook.py -h (either in or out of the venv), I get

/Library/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python: can't open file '/Users//epub_to_audiobook.py': [Errno 2] No such file or directory

I'm super confused by why this would be happening. I've deleted and rebuilt the directory a few times and it seems to still not work. I'm not sure what's going on. Any suggestions. I feel like I'm following the guidance correctly.

This happens with my conversion commands as well.

footnotes

So I just started playing around with TTS over the last week or so and have been using Piper to take individual OCR'ed png files downloaded from archive.org and convert them to speech. This pretty much sucks, but since the book I'm working on is not available as an ebook so far as I can tell (Service, John, Lost Chance in China) this is the only way to do it. On the other hand, there are a lot of other non-fiction books (and perhaps some journal articles) that are available as .epub (and certainly as .pdf for the journal articles), a better TTS solution that goes from epub to .opus (or .flac) directly would be preferable so I can simplify .epub >[web-based conversion] > .txt > [piper] > [ffmpeg] > .opus. However, one of the problems with this that requires a lot of manual processing so far is integrating foot/chapter/endnotes back into the text so that content is not lost.

I don't think this is exactly the right venue for this discussion, but I didn't see an e.g., Discord channel for this project (happy to discuss it on off-topic @ audiobookshelf discord), but I'd like to see what others are thinking about for integrating that content back into non-fiction work (as I have all the fiction I want and then some already on audio).

[Feature Request] Implement rough cost calculation beforehand, with prompt to confirm.

Hi

I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented
So I'm planning to use your solution!

Thank you for your work!!!

However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost
Would be nice if every tts_provider would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected book

With manual command line prompt to confirm before final translation, like:

The approximate cost of the book voiceover would be XYZ$ 
Would you agree to proceed? [Y/N]: _

For example, OpenAI set the price of 0.015$ for 1k chars for the simple tts model and doubled it to 0.03$ for the tts-hd model
It should be easy to calculate by the formula: (whole_book_chars / 1k) * selected_tts_model_price

Additional suggestions:
Considering project evolution and further progress, I would suggest:

  1. Reorganise the project from a single file into proper separate classes and packages and move TTS providers and the main interface TTSProvider into a separate Python package to simplify adding more providers
  2. Add the cost_estimation method to the TTSProvider interface
  3. Add more book type support ( *.fb2, *.mobi...) which would require also the creation of separate services implementing a global interface for each book type
  4. Add more providers:
    • AWS has TTS - called Polly. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes --language to be an obligatory arg for execution). Price
    • Google has TTS Price
    • I'm sure there are many more providers out there, especially considering the AI boom in the industry, however, if anyone decides to contribute, he would have to implement TTSProvider interface with basic the standard functionality and place it into an individual Python package.

P.S. Happy to help with the project, feel free to PM

(question) All chapters end up being 2kb large

Hello,

This is more of a question, then a bug report as I am not sure if this is due to me doing something wrong...

After chapter is converted to mp3 it end up with size of ~2kb, during conversion size is reported correctly (while refreshing directory). This was in wsl on both Ubuntu and Arch distros as well as on regular Linux Arch distro, using python virtual env with edge as tts provider with following command:
python main.py book_name.epub book_dir --tts edge --language hr-HR --voice_name hr-HR-SreckoNeural

Had the same issue with english language book, not providing language switch, which, if I am correct use english language by default...

Using conversion with docker image works as expected.

Thanks

I can not convert using OpenAI + Docker

Hello, and thank you for this great tool!! 🙌

I am trying to convert an EPUB to audiobook running the following command:

docker run --rm -v ./:/app -e OPENAI_API_KEY=my-openai-key ghcr.io/p0n1/epub_to_audiobook my_ebook.epub audiobook_output --tts openai

But I am getting this error:

/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True. warnings.warn('In the future version we will turn default option ignore_ncx to True.') /usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor. warnings.warn( 2024-01-14 19:37:24 [INFO] Chapters count: 14. 2024-01-14 19:37:24 [INFO] Converting chapters from 1 to 14. 2024-01-14 19:37:24 [INFO] ✨ Total characters in selected book: 189325 ✨ Estimate book voiceover would cost you roughly: $2.85 Do you want to continue? (y/n) Traceback (most recent call last): File "/app_src/main.py", line 134, in <module> main() File "/app_src/main.py", line 130, in main AudiobookGenerator(config).run() File "/app_src/audiobook_generator/core/audiobook_generator.py", line 70, in run confirm_conversion(rough_price) File "/app_src/audiobook_generator/core/audiobook_generator.py", line 15, in confirm_conversion answer = input() ^^^^^^^ EOFError: EOF when reading a line

I can't even select Y or N when prompting if I want to continue. Do you now what could I be doing wrong?

Ability to set options via Environment Variables in Docker Compose?

Hi, this is a great project, thank you for creating it!

I am wondering if it is possible to set any of the configuration flags via Environment variables when using the docker compose file. I'm hoping to set things such as OPENAI_VOICE, OPENAI_MODEL, etc. via variables in the compose file or a .env file.

Thank you again!

voice

Great piece of software. Thanks! Could you please let me know how to make the voice sound more human? Is there any other option besides azure text-to-speech or any tweak that can make the audio ... audible?

Read in the audiobook in a different language

Thanks for putting together this program, it's been working really nicely for me so far and I've got a feature suggestion for you!

My epub was in english, but I wanted the audiobook to be danish. Since I'm somewhat familar with python I added an extra translation API call to gpt4 (turb would likely do as well) and used it's output for the speech generation. It's more expensive, but worked really nicely in my testing with openai. I was thinking it'd be nice to have it as a built-in option for people who aren't python-savvy.

Program stuck on y/n Prompt

For some reason it gets stuck after asking for confirmation when running on my desktop. Did the exact same thing on laptop and it was able to run. Not sure what the difference is, no error message is given only below warnings

$ python3 main.py --tts edge --voice_name "en-US-RogerNeural" "C:\Users[Username]\Downloads\Min-Maxing My TRPG Build in Another World_ Volume 8.epub" output_folder C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ebooklib\epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
C:\Users[Username]\epub_to_audiobook\audiobook_generator\tts_providers\base_tts_provider.py:13: RuntimeWarning: coroutine 'EdgeTTSProvider.validate_config' was never awaited
self.validate_config()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\bs4\builder_init_.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor.
warnings.warn(
2024-03-06 19:44:28 [INFO] Chapters count: 23.
2024-03-06 19:44:28 [INFO] Converting chapters from 1 to 23.
2024-03-06 19:44:28 [INFO] \u2728 Total characters in selected book: 609648 \u2728
Estimate book voiceover would cost you roughly: $0.00

Do you want to continue? (y/n)

Error trying to convert epub to audio: "cannot execute binary file"

This is my first time really messing with something like this, so it's almost definitely my fault something's up. I followed the Windows step-by-step guide, and was able to figure everything out until I got to this point:

$ "C:\Users*NAME*\Downloads*BOOK*.epub" "C:\Users*NAME*\Downloads*EMPTY FOLDER*" --tts azure --voice_name en-US-JennyNeural --language en-US
bash: C:\Users*NAME*\Downloads*BOOK*.epub: cannot execute binary file: Exec format error
(venv)

(all text in bold is stuff I changed to share this)
I'm on Windows 10, using Python 3.12.1 and latest version of Git

LocalAI Support

Split from the issue #9 (comment).

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "model":"it-riccardo_fasol-x-low.onnx",
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}'

LocalAI TTS API https://localai.io/features/text-to-audio/ is defined even before the release of OpenAI. I think It's not full compatible with OpenAI TTS API https://platform.openai.com/docs/guides/text-to-speech because they are using different voices and models.

So changing the base url of OpenAI SDK to LocalAI instance will not work for TTS feature.

LocalAI supports bark , piper and vall-e-x

If we can support LocalAI, we can support many good local TTS engines at once.

[Feature Request] Skip footnotes

It would be great to have a tag to skip both the numerical notes at the end of a sentence as well as the footnotes on the bottom of a page.

Feature suggestion: overwrite protection

I like how ffmpeg protects previous output insofar as it prompts to overwrite, but gives you the option of passing -n to not overwrite (very helpful in scripts!) or -y to overwrite.

KeyError: "There is no item named 'page_styles.css' in the archive"

Hi, this tool is very useful, thanks for working on this!

I've encountered a bug with an epub that I'm putting in. Is it a case of a malformed epub?

Thanks

Stack trace:

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 102, in <module>
    main()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 98, in main
    AudiobookGenerator(config).run()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\core\audiobook_generator.py", line 37, in run
    book_parser = get_book_parser(self.config)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\base_book_parser.py", line 42, in get_book_parser
    return EpubBookParser(config)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\epub_book_parser.py", line 19, in __init__
    self.book = epub.read_epub(self.config.input_file)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1768, in read_epub
    book = reader.load()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1410, in load
    self._load()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1722, in _load
    self._load_opf_file()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1679, in _load_opf_file
    self._load_manifest()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1555, in _load_manifest
    ei.content = self.read_file(zip_path.join(self.opf_dir, ei.get_name()))
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1417, in read_file
    return self.zf.read(name)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1475, in read
    with self.open(name, "r", pwd) as fp:
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1514, in open
    zinfo = self.getinfo(name)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1441, in getinfo
    raise KeyError(
KeyError: "There is no item named 'page_styles.css' in the archive"

Program successfully completes but files do not show up?

Hey y'all,

I apologize if this is a stupid inquiry - I have absolutely no coding experience (although this project is motivating me to learn!). Using docker, I was able to successfully follow the instructions and run the program for 2 different audiobooks. In both cases, the program seemed to successfully complete, however the audio files do not appear in the specified outlook directory. I used separate directories for each book, making a new folder /Users/paulclancy/Desktop/Azure and then copying this pathname to specify where to upload.

After checking the folder post-completion of the program, no audio files are present. My code (with the middle portion appreviated) is shown below. Please let me know if there is any obvious solution to this. Thank you!

'paulclancy@Pauls-MacBook-Pro Azure % docker run -i -t --rm -v ./:/app -e MS_TTS_KEY=b5145331f062491e9e53b1d4e3da942d -e MS_TTS_REGION=eastus ghcr.io/p0n1/epub_to_audiobook lying.epub /Users/paulclancy/Desktop/Azure --tts azure
/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
/usr/local/lib/python3.11/site-packages/bs4/builder/init.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument ``features="xml"` into the BeautifulSoup constructor.
warnings.warn(
2024-02-26 05:28:06 [INFO] Chapters count: 12.
2024-02-26 05:28:06 [INFO] Converting chapters from 1 to 12.
2024-02-26 05:28:06 [INFO] ✨ Total characters in selected book: 126647 ✨
Estimate book voiceover would cost you roughly: $2.03

Do you want to continue? (y/n)
y
2024-02-26 05:28:24 [INFO] Converting chapter 1/12:
2024-02-26 05:33:39 [INFO] Processing chapter-12 <A_NOTE_ON_THE_TYPE_This_book_was_set_in_Minion_a_typeface_d>, chunk 1 of 1
...

2024-02-26 05:33:39 [INFO] Sending request to Azure TTS, data length: 576
2024-02-26 05:33:40 [INFO] Got response from Azure TTS, response length: 172512
paulclancy@Pauls-MacBook-Pro Azure %``

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.