p0n1 / epub_to_audiobook Goto Github PK
View Code? Open in Web Editor NEWEPUB to audiobook converter, optimized for Audiobookshelf
License: MIT License
EPUB to audiobook converter, optimized for Audiobookshelf
License: MIT License
Hi, I suspect I spotted the bug, it is not affecting functionality, as eventually final file would be created successfully, however it overwrites it on every iteration
I believe this write operation must be tabbed outside of for loop and make single write when all audio segments are collected after the loop instead of overwriting over and over again file with +1 audio segment
Can you please add functionality which detect if there are some chapters already processed in output folder by filename ?
in my use case i am not much time on one place so i am converting some books in 4-9 runs. and would be nice to save so money by automatically detecting the already converted chapters, if -f parameter would be user everything in output folder could be ignored :)
I get error when trying to pase tis book: https://www.kosmas.cz/knihy/257693/ostre-stribro/
/usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
warnings.warn(
When using Edge TTS to read an epub formatted book, the following sorts of paragraph won't be read correctly:
Chapter One
The Chapter Title
This is the first sentence of the chapter.
because it will be read as: "Chapter one the chapter title this is the first sentence of the chapter", as though it's all once sentence with no breaks. This can be especially confusing if there's a heading in a paragraph in the middle of a chapter, something like:
... This is the final sentence of a paragraph.
The Next Section
Here is another sentence.
since it'll be read as "The next section here is another sentence", making it easy to miss that the first half of that sentence was supposed to be a header.
I looked in the source code and the trouble seems to come from epub_book_parser.py
, where the second text cleaning step replaces all groups of white space (including newlines) with a single space. So this might affect Azure and OpenAI TTS as well, but I haven't tested it.
At least in the case of Edge TTS, though, it's not sufficient to simply keep a newline in there, because it appears that the edge_tts
module automatically replaces newlines with spaces as well. So I think the solution for it needs to include inserting periods where needed.
An even better solution for Edge TTS would be to insert longer pauses between such paragraphs, though since Microsoft prevents using SSML, it would require using something like this.
Thanks for developing this!
Have you considered integrating Amazon Polly? The neural voices are exceptionally good and the possibilities with SSML are unique!
Would it be a big ask if we could get (at some point) a web interface, with Readarr/Calibre integration (as well as Audiobook platforms) so we could even configure automated conversions based on Readarr tags or libraries? Bonus points if we could then have it notify a Readarr instance (or even a different audiobook app) that there's a new audio book for it to scan.
Hi! Awesome project :)
Any plans to support TTS by OpenAI as well?
Can you explain a little more how the Azure region code is obtained? This isn't clear at all.
Amazing work, but would be even more amazing if we have alternatives to online-paid only TTS options such as PiperTTS, CoquiTTS, Bark...etc Thanks for the hard work and keep it up!
Hello! Thank you for your hard work. I used your program to create an audiobook from an EPUB file that was from my Calibre library.
It worked, but on Audiobookshelf it says it has a certain duration which is then exceeded, and it is listed as finished. So when I try to come back to it I'm forced to go to the official end of the audio, and I can't go forward or back using the player or it will start at the official end of the audio.
Great project, thank you.
Here’s a suggestion: Currently, the default output audio format is an mp3 with a bitrate of 48kbps, which has a relatively poor sound quality. It would be great if you could add support for other audio formats, allowing users to customize it using the parameter “X-Microsoft-OutputFormat”. Thank you once again.
I'm a bit concerned about the possibility that items labeled as h1 h2 h3 could be non section title. However, it's not a big issue, and if there is indeed a problem, we can fix it later.
Originally posted by @p0n1 in #30 (comment)
I ran into a book with single numbers in h1
tag. So, the chapter titles would be just something like 01
, 02
... I prefer to keep more context/strings in title so I can know more about each chapter audio file.
Hi there,
Great work here!
I'm working on a project where I would be running this in a pipeline, so no text input is possible. Could we add a --no-prompt option?.
2024-01-13 13:34:10 [INFO] Chapters count: 33.
2024-01-13 13:34:10 [INFO] Converting chapters from 1 to 33.
2024-01-13 13:34:10 [INFO] ✨ Total characters in selected book: 554126 ✨
Estimate book voiceover would cost you roughly: $8.88
Do you want to continue? (y/n)
Another minor suggestion here- if you pass the --preview option idea here, it still asks you to confirm if you would like to continue. If you're just previewing ... it won't actually do the conversion - so can we skip the prompt, just process and exit?
Thanks! and have a great day.
It would be awesome if you had a function to convert a very short epub file with an option to select the different tts voices.. that way we could know what the voice will sound like before converting a full epub.
or maybe just include a single short chapter epub in the github that can be used as your example..
Is possible to run this container without command, and automatically parse epubs to audio after uploading them to Audio-bookshelf directory ?
This is possibly just my OCD but would it be possible to add in some options to set the title of each file to something other than what the script decides?
Example :
Would it also be possible to customise the filenames in a similar fashion?
Hi, thank you for this great tool! It's been so useful for me. I've just converted my first book yesterday and realized that some chapters are missing texts. Turned out that when there's an illustration in the epub in the middle of the chapter, the tool only starts converting from after the illustration onwards.
Let me know if I can provide anymore details.
When I run python3 epub_to_audiobook.py -h
(either in or out of the venv), I get
/Library/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python: can't open file '/Users//epub_to_audiobook.py': [Errno 2] No such file or directory
I'm super confused by why this would be happening. I've deleted and rebuilt the directory a few times and it seems to still not work. I'm not sure what's going on. Any suggestions. I feel like I'm following the guidance correctly.
This happens with my conversion commands as well.
So I just started playing around with TTS over the last week or so and have been using Piper to take individual OCR'ed png files downloaded from archive.org and convert them to speech. This pretty much sucks, but since the book I'm working on is not available as an ebook so far as I can tell (Service, John, Lost Chance in China) this is the only way to do it. On the other hand, there are a lot of other non-fiction books (and perhaps some journal articles) that are available as .epub (and certainly as .pdf for the journal articles), a better TTS solution that goes from epub to .opus (or .flac) directly would be preferable so I can simplify .epub >[web-based conversion] > .txt > [piper] > [ffmpeg] > .opus. However, one of the problems with this that requires a lot of manual processing so far is integrating foot/chapter/endnotes back into the text so that content is not lost.
I don't think this is exactly the right venue for this discussion, but I didn't see an e.g., Discord channel for this project (happy to discuss it on off-topic @ audiobookshelf discord), but I'd like to see what others are thinking about for integrating that content back into non-fiction work (as I have all the fiction I want and then some already on audio).
Hi
I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented
So I'm planning to use your solution!
Thank you for your work!!!
However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost
Would be nice if every tts_provider
would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected book
With manual command line prompt to confirm before final translation, like:
The approximate cost of the book voiceover would be XYZ$
Would you agree to proceed? [Y/N]: _
For example, OpenAI set the price of 0.015$ for 1k chars for the simple tts
model and doubled it to 0.03$ for the tts-hd
model
It should be easy to calculate by the formula: (whole_book_chars / 1k) * selected_tts_model_price
Additional suggestions:
Considering project evolution and further progress, I would suggest:
TTSProvider
into a separate Python package to simplify adding more providerscost_estimation
method to the TTSProvider
interface*.fb2
, *.mobi
...) which would require also the creation of separate services implementing a global interface for each book typePolly
. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes --language
to be an obligatory arg for execution). PriceTTSProvider
interface with basic the standard functionality and place it into an individual Python package.P.S. Happy to help with the project, feel free to PM
Hello,
This is more of a question, then a bug report as I am not sure if this is due to me doing something wrong...
After chapter is converted to mp3 it end up with size of ~2kb, during conversion size is reported correctly (while refreshing directory). This was in wsl on both Ubuntu and Arch distros as well as on regular Linux Arch distro, using python virtual env with edge as tts provider with following command:
python main.py book_name.epub book_dir --tts edge --language hr-HR --voice_name hr-HR-SreckoNeural
Had the same issue with english language book, not providing language switch, which, if I am correct use english language by default...
Using conversion with docker image works as expected.
Thanks
Hello, and thank you for this great tool!! 🙌
I am trying to convert an EPUB to audiobook running the following command:
docker run --rm -v ./:/app -e OPENAI_API_KEY=my-openai-key ghcr.io/p0n1/epub_to_audiobook my_ebook.epub audiobook_output --tts openai
But I am getting this error:
/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True. warnings.warn('In the future version we will turn default option ignore_ncx to True.') /usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument
features="xml" into the BeautifulSoup constructor. warnings.warn( 2024-01-14 19:37:24 [INFO] Chapters count: 14. 2024-01-14 19:37:24 [INFO] Converting chapters from 1 to 14. 2024-01-14 19:37:24 [INFO] ✨ Total characters in selected book: 189325 ✨ Estimate book voiceover would cost you roughly: $2.85 Do you want to continue? (y/n) Traceback (most recent call last): File "/app_src/main.py", line 134, in <module> main() File "/app_src/main.py", line 130, in main AudiobookGenerator(config).run() File "/app_src/audiobook_generator/core/audiobook_generator.py", line 70, in run confirm_conversion(rough_price) File "/app_src/audiobook_generator/core/audiobook_generator.py", line 15, in confirm_conversion answer = input() ^^^^^^^ EOFError: EOF when reading a line
I can't even select Y or N when prompting if I want to continue. Do you now what could I be doing wrong?
Hi, this is a great project, thank you for creating it!
I am wondering if it is possible to set any of the configuration flags via Environment variables when using the docker compose file. I'm hoping to set things such as OPENAI_VOICE, OPENAI_MODEL, etc. via variables in the compose file or a .env file.
Thank you again!
Great piece of software. Thanks! Could you please let me know how to make the voice sound more human? Is there any other option besides azure text-to-speech or any tweak that can make the audio ... audible?
Thanks for putting together this program, it's been working really nicely for me so far and I've got a feature suggestion for you!
My epub was in english, but I wanted the audiobook to be danish. Since I'm somewhat familar with python I added an extra translation API call to gpt4
(turb would likely do as well) and used it's output for the speech generation. It's more expensive, but worked really nicely in my testing with openai
. I was thinking it'd be nice to have it as a built-in option for people who aren't python-savvy.
For some reason it gets stuck after asking for confirmation when running on my desktop. Did the exact same thing on laptop and it was able to run. Not sure what the difference is, no error message is given only below warnings
$ python3 main.py --tts edge --voice_name "en-US-RogerNeural" "C:\Users[Username]\Downloads\Min-Maxing My TRPG Build in Another World_ Volume 8.epub" output_folder C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ebooklib\epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
C:\Users[Username]\epub_to_audiobook\audiobook_generator\tts_providers\base_tts_provider.py:13: RuntimeWarning: coroutine 'EdgeTTSProvider.validate_config' was never awaited
self.validate_config()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\bs4\builder_init_.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml"
into the BeautifulSoup constructor.
warnings.warn(
2024-03-06 19:44:28 [INFO] Chapters count: 23.
2024-03-06 19:44:28 [INFO] Converting chapters from 1 to 23.
2024-03-06 19:44:28 [INFO] \u2728 Total characters in selected book: 609648 \u2728
Estimate book voiceover would cost you roughly: $0.00
Do you want to continue? (y/n)
This is my first time really messing with something like this, so it's almost definitely my fault something's up. I followed the Windows step-by-step guide, and was able to figure everything out until I got to this point:
$ "C:\Users*NAME*\Downloads*BOOK*.epub" "C:\Users*NAME*\Downloads*EMPTY FOLDER*" --tts azure --voice_name en-US-JennyNeural --language en-US
bash: C:\Users*NAME*\Downloads*BOOK*.epub: cannot execute binary file: Exec format error
(venv)
(all text in bold is stuff I changed to share this)
I'm on Windows 10, using Python 3.12.1 and latest version of Git
https://github.com/yl4579/StyleTTS2 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Still experimental but looks promising.
Split from the issue #9 (comment).
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"it-riccardo_fasol-x-low.onnx",
"backend": "piper",
"input": "Ciao, sono Ettore"
}'
LocalAI TTS API https://localai.io/features/text-to-audio/ is defined even before the release of OpenAI. I think It's not full compatible with OpenAI TTS API https://platform.openai.com/docs/guides/text-to-speech because they are using different voices and models.
So changing the base url of OpenAI SDK to LocalAI instance will not work for TTS feature.
LocalAI supports bark , piper and vall-e-x
If we can support LocalAI, we can support many good local TTS engines at once.
It would be great to have a tag to skip both the numerical notes at the end of a sentence as well as the footnotes on the bottom of a page.
I like how ffmpeg protects previous output insofar as it prompts to overwrite, but gives you the option of passing -n to not overwrite (very helpful in scripts!) or -y to overwrite.
Hi, this tool is very useful, thanks for working on this!
I've encountered a bug with an epub that I'm putting in. Is it a case of a malformed epub?
Thanks
Stack trace:
Traceback (most recent call last):
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 102, in <module>
main()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 98, in main
AudiobookGenerator(config).run()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\core\audiobook_generator.py", line 37, in run
book_parser = get_book_parser(self.config)
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\base_book_parser.py", line 42, in get_book_parser
return EpubBookParser(config)
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\epub_book_parser.py", line 19, in __init__
self.book = epub.read_epub(self.config.input_file)
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1768, in read_epub
book = reader.load()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1410, in load
self._load()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1722, in _load
self._load_opf_file()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1679, in _load_opf_file
self._load_manifest()
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1555, in _load_manifest
ei.content = self.read_file(zip_path.join(self.opf_dir, ei.get_name()))
File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1417, in read_file
return self.zf.read(name)
File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1475, in read
with self.open(name, "r", pwd) as fp:
File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1514, in open
zinfo = self.getinfo(name)
File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1441, in getinfo
raise KeyError(
KeyError: "There is no item named 'page_styles.css' in the archive"
Hey y'all,
I apologize if this is a stupid inquiry - I have absolutely no coding experience (although this project is motivating me to learn!). Using docker, I was able to successfully follow the instructions and run the program for 2 different audiobooks. In both cases, the program seemed to successfully complete, however the audio files do not appear in the specified outlook directory. I used separate directories for each book, making a new folder /Users/paulclancy/Desktop/Azure
and then copying this pathname to specify where to upload.
After checking the folder post-completion of the program, no audio files are present. My code (with the middle portion appreviated) is shown below. Please let me know if there is any obvious solution to this. Thank you!
'paulclancy@Pauls-MacBook-Pro Azure % docker run -i -t --rm -v ./:/app -e MS_TTS_KEY=b5145331f062491e9e53b1d4e3da942d -e MS_TTS_REGION=eastus ghcr.io/p0n1/epub_to_audiobook lying.epub /Users/paulclancy/Desktop/Azure --tts azure
/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
/usr/local/lib/python3.11/site-packages/bs4/builder/init.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument ``features="xml"` into the BeautifulSoup constructor.
warnings.warn(
2024-02-26 05:28:06 [INFO] Chapters count: 12.
2024-02-26 05:28:06 [INFO] Converting chapters from 1 to 12.
2024-02-26 05:28:06 [INFO] ✨ Total characters in selected book: 126647 ✨
Estimate book voiceover would cost you roughly: $2.03
Do you want to continue? (y/n)
y
2024-02-26 05:28:24 [INFO] Converting chapter 1/12:
2024-02-26 05:33:39 [INFO] Processing chapter-12 <A_NOTE_ON_THE_TYPE_This_book_was_set_in_Minion_a_typeface_d>, chunk 1 of 1
...
2024-02-26 05:33:39 [INFO] Sending request to Azure TTS, data length: 576
2024-02-26 05:33:40 [INFO] Got response from Azure TTS, response length: 172512
paulclancy@Pauls-MacBook-Pro Azure %``
Using on Mac (Sonoma - python 3.9.6) and able to run the enclosed example book without any issue. however, other epubs that I have gotten though the years do not work. All run but the message warning keep showing after several minutes as the attempt increase
I attached the full log.
error.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.