theslavicbear / ebook-publisher Goto Github PK

View Code? Open in Web Editor NEW

32.0 8.0 2.0 505 KB

A Python tool for converting online stories into portable formats

License: MIT License

Python 100.00%

ebook-downloader ebook epub fanfiction fictionpress python converting-online-stories story

ebook-publisher's People

Contributors

Stargazers

Watchers

Forkers

ismail-mirza

ebook-publisher's Issues

Ongoing chyoa linked chaper issues.

Duplicate chapters may still appear and recurse into their child chapters in Chyoa stories when the linked chapters are on the first chapter
First duplicate chapter will still appear with child links to chapters that won't work

Table of Contents for HTML output

got "IndexError: list index out of range" while downloading from chyoa.com

while downloading this story I got the following error :

it did not stop the program but it froze after a while and did not finish downloading

Add better error handling for Common.GetImage function

Should not stop process on error, should instead retry and skip image.

Add option for different line endings in txt format

fanfic programmers discord

Hello! We have a discord for people coding for fanfics here: https://discord.gg/fP37YMU
You're welcome to join.

Add support for custom CSS files for .epub output

Ebooklib has support for adding custom css to epub files and the default style isn't great. It shouldn't take too much fiddling to get this working, perhaps adding a -css file.css argument to pick the style sheet.

choya - Does not stop on 502 Error

Hey, I have just used:
Ebook-Publisher.py https://chyoa.com/story/Contagion-63X---Viral-Transformation.26118 -o epub -o html -o txt -i

the problem is that I got an:
==========Server returned status code 502 for page: https://chyoa.com/chapter/A-redhead-looking-for-something-to-lick.793955 Could not complete request for page: https://chyoa.com/chapter/A-redhead-looking-for-something-to-lick.793955 Server returned status code 502 for page: https://chyoa.com/chapter/A-young-man-at-the-beach-with-his-sister-and-her-boyfriend.793956 Could not complete request for page: https://chyoa.com/chapter/A-young-man-at-the-beach-with-his-sister-and-her-boyfriend.793956 Server returned status code 502 for page: https://chyoa.com/chapter/A-girl-with-a-love-for-popsicles.793959 Could not complete request for page: https://chyoa.com/chapter/A-girl-with-a-love-for-popsicles.793959 Server returned status code 502 for page: https://chyoa.com/chapter/A-couple-hungry-for-sex-barge-in-on-Pamela.793960 Could not complete request for page: https://chyoa.com/chapter/A-couple-hungry-for-sex-barge-in-on-Pamela.793960 ======================================================================================================

between the download and after that it did neither stop nor complete.
It just kept on producing ==== constantly, for over two hours.

Due to this it´s not possible to download this one.

Additionally, it takes a lot of time until this error is thrown, this story has way fewer chapters than some others, but it takes several times more time to get to this error.

I have tried to download it a second time, but it only has produced === for half an hour or so now.

get error related to internet connection even if my connection is fine

I get this error :

it also cause the program to run infinitely after the error with multi-threading (download without stopping even after having exceeded the number of chapters)

my internet connection is working, but maybe its not always stable ? maybe adding something to retry to download the page after a minute or something like that.

Add option to turn off verbosity

Use -q argument to suppress any output to standard out including story titles, authors, descriptions, and progress bars.

Make -n check creation date and last update

Hi,
I would like to recommend that -n or some other flag checks when the Story was last updated and then also checks when the corresponding file on the drive was created.

If there was no update since the creation date it should just do nothing.

This would be a better behavior to the current one where it only checks if the File exists or not.
This should be done quite simply, I would do it myself, but I have no idea of Python.

Allow multiple output formats to be selected with -o argument

This will require hefty editing of the crawlers for Chyoa and possibly Nhentai

Allow multiple URLs from command; Remove necessity for -f option to use a file

help me plesase

I'm a complete zero in python. I downloaded the archive with your script, unpacked it, and run Ebook-Publisher.py the command prompt opens for a second and closes immediately. What do I need to do so that I can use your script?

Classic Reader plays formatting issue.

There is no line break between the title of a character and their dialog in the output. This seems to only affect plays on Classic Reader

Chyoa stories missing images in Epub format

Chyoa stories can contain images. These should be added and displayed in the Epub output files.

CHYOA problem

Hello, I seem to have an error when trying to download a story from CHYOA. I get a ModuleNotFoundError: No module named 'requests'.

The story I am trying to download is:

https://chyoa.com/story/The-Gamer%2C-Chyoa-edition.12004

The command I am using is:

python3 Ebook-Publisher.py https://chyoa.com/story/The-Gamer%2C-Chyoa-edition.12004 -o epub -d D:\Documents\

Many thanks for your help.

Add multi-threading to speed up working through a list

The bottleneck in Ebook-Publisher is receiving HTML pages from external servers, so it is possible to get huge speed gains by either requesting the all of the pages from each story at once (and increasing strain on the servers), or by concurrently working on different stories given that they are in different domains.

(Low Priority) Progress bar does not work when length is longer than spaces in terminal

When number of pages/chapters is larger than the number of characters per line in the terminal/console, the progress bar displays poorly.

Low priority.

Chyoa collapsible TOC

I really don't want to do this but it should theoretically be possible to build a function to collapse parts of the TOC in html format

Server Returned Status Code 503 for Page

I'm attempting to download some CHYOA stories, but whenever I try to do so, I receive a Status Code 503 message.

Add update function

Hi, I would like to recommend making it possible to update my downloaded stories automatically.

I thought about it like that:
1.) create a default save directory archive/database
2.) Either add a database, a file with a simple key/value pair, being the Name of the story and the download URL or add it to the metadata of the story. Or whatever is easiest for you.
3.) add the command "update"
4.) by update it will update all the stories in the database. Don´t know how it´s easier for you - downloading completely or updating the existing file.

RecursionError: maximum recursion depth exceeded while calling a Python object

Hello, when using your program to download some stories from CHYOA.com it works fine but on a larger story I get this error:

RecursionError: maximum recursion depth exceeded while calling a Python object

The story I am downloading from has almost 1000 chapters here:

https://chyoa.com/chapter/Vacation-Week-2-%E2%80%93-A-Path-without-Destination.921081

Is it possible to alter the program do download larger stories or to download it in smaller chunks? Eg. For CHYOA it downloads from the last chapter you want and works backwards: could you tell it to stop at a certain point? Then you could download larger stories in chunks and it seems like something that could easily be done.

Many thanks for your help!

Possible hang waiting for threads to complete while multithreading

Low importance. Master thread may hang waiting for all threads to finish. This may be fixed by switching back to a queue system.

Comparison operator typo in Common.py

I'm a beginner and not proficient in pull requests. So, I'm submitting it manually here:

In Common.py line 101 please fix:
if attempts => 5: to if attempts >= 5:

Just a simple typo from the latest commit 469c534

Improve implementation of quiet option

The --quiet option currently works by rerouting standard out to /dev/null. This was a hasty solution to the mass cluttering caused by using the multi-threaded option. Ideally, it would check for the quiet option each time before printing anything so that messages can bypass the quiet option when needed. This is mostly only a problem with the Chyoa script, as it may need to print and accept input from the terminal (currently worked around, but this causes issues in multi-threaded mode).

Grab specific branches in CHYOA

Would it be possible to add the option to input a chapter and grab all following chapters? This would be really useful for large stories with big divergent branches where you might still want to grab all the sub-branches. Thanks!

Chyoa insert cover image

Filenames contain forbidden characters in Windows, and won't write

Thanks for developing this wonderful tool, but on Windows I'm having trouble downloading stories containing a " (double quote) or ? (question mark) in their title.

I'm assuming this would also be applicable to stories with an * (asterisk), < (less than) or > (greater than) character, as these are forbidden in Windows file names.

Error reads:
OSError: [Errno 22] Invalid argument: '[directory]\[storyname].html'

Additionally, when trying stories with a : (colon) in the title, rather than give an error on writing, the file seems to stop writing prematurely at the colon. This results in a file of 0kb with a file name of the title up to the colon, and no extension.

(eg. Master PC: Jenny's Tale becomes Master PC, rather than Master PC: Jenny's Tale.html)

Thanks.

Example stories:

"Yes" - https://chyoa.com/chapter/Introduction.37982
Master PC: Jenny's Tale - https://chyoa.com/chapter/Introduction.176462
Party Time? - https://chyoa.com/chapter/Introduction.259702

Wattpad stories should download and insert images.

Story download from chyoa which contains images is not working

Here is the error, I am getting while downloading story from choya which contains images:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/ebook-publisher/Ebook-Publisher.py", line 187, in MakeClass
formatsft
File "/ebook-publisher/Ebook-Publisher.py", line 32, in
'epub':lambda x:MakeEpub(x),
File "/ebook-publisher/Ebook-Publisher.py", line 158, in MakeEpub
zeros = '0' * (len(str(site.isize))-1)
AttributeError: 'Chyoa' object has no attribute 'isize'

Multiple Chyoa stories with customizable names fail when using multithreading

chyoa - Object has no attribute

If I use:
Ebook-Publisher.py https://chyoa.com/story/Healslut-Adventures.29070 -o epub -o html -i

I get the error:
Traceback (most recent call last): File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 295, in <module> formats[ft](clas) File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 32, in <lambda> 'epub':lambda x:MakeEpub(x), File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 158, in MakeEpub zeros = '0' * (len(str(site.isize))-1) AttributeError: 'Chyoa' object has no attribute 'isize'

It is thrown for html and txt.
It is thrown for epub too, however the epub will be created and filled with content unlike the other two.

Link to previous chapter in epubs for better navigation

Standard input

Program should be able to accept standard input in place of positional URL argument.

Large file sizes

Again, thanks for making this tool! I've been looking for something like this for years. One thing I noticed while playing around with the Chyoa scraper was that it doesn't seem to account for branches that reconverge (e.g. multiple chapters have urls pointing to the same chapter). Right now, the scraper seems to treat each URL as a new chapter, causing it to download the same branches over and over again. It's not a big deal with small stories but it seems to get exponentially larger each time one of these branch merges happens (one of the stories I tested ended up having >50 copies of the same chapter with a final filesize of ~120mb).

Without any knowledge of how your code works, I think you might be able to use a dictionary to record whether a URL has been visited already and pair it with the internal address within the file. If you used the original URL as the key, you could use 'in' to quickly check if a new link already has an internal address saved and point a link towards that instead of redownloading the chapter.

Better support for long CHYOA stories (The Gamer)

As far as I can tell it's currently impossible to download The Gamer with Ebook Publisher. If I start from the end, the story is currently 1036 chapters long and fails at roughly chapter 968 due to exceeding the recursion limits. This was marked as "wontfix" in issue #73. If you use the new "--chyoa-force-forwards" option it creates an ebook file with crazily long chapter files. Chapter 1036 would be over 1000 characters long with the 1.1.1.1.1.... etc naming method. This breaks the windows file path limits and also breaks Calibre and Sigil when I tried to open the file. The book stops once the file path's get too long.

My suggestion is to have a chapter depth variable that can be passed. This way I could download the book 100 chapters at a time and then merge them together (or just have Vol 1,2,3). This would also prevent file path too long errors.

Progress bar fails on very, very long stories (Chyoa)

Test URL: https://chyoa.com/chapter/Introduction.331257
Over 2500 pages to scrape, so it's heavily limited by IO. That said, there shouldn't be any reason the progress bar fails for this example.

Replace ebooklib with a free alternative

Ebooklib must be installed externally and cannot be included due to its license. By replacing it with a reverse engineered library, we can have all dependencies included and potentially have a more customizable experience.

SUpport for mcstories.com

It would be great if support for mcstories.com could be added.

Chyoa - UnicodeEncodeError

Hi,
I get here for some stories with txt,html and epub and error. epub and html will be produced normally, but Text files will throw this error and the txt file will only contain the Story- and author name.

Traceback (most recent call last): File "Ebook-Publisher.py", line 295, in <module> formats[ft](clas) File "Ebook-Publisher.py", line 33, in <lambda> 'html':lambda x:MakeHTML(x), File "Ebook-Publisher.py", line 81, in MakeHTML published.write('<h2 id = "'+site.depth[i-1]+'">'+site.chapters[i]+'\n</h2>\n'+str(site.rawstoryhtml[i])) File "C:\myProgramme\Python\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u202d' in position 512: character maps to <undefined>

Add the option to retrieve metadata from www.wlnupdates.com or www.novelupdates.com

Can you add the option to retrieve metadata from www.wlnupdates.com or www.novelupdates.com ? So that all epub metadata fields are compiled when it saves it?

Add GUI for fun

follow link that go back to beguining infinitely (until max recursion depth) when downloading from chyoa.com

in this story the program follow the link from this chapter infinitely

Chyoa - Add indents for epub

Hey,
I would like to recommend that you add indents for chyoa stories in epub too like you do in html.
It´s way easier to see what chapters belong together this way.

Please refactor code

Code is a mess, please clean up. You know what you need to do

help!

what i give:

C:\Users\bn>Ebook-Publisher.py https://chyoa.com/chapter/namechapter -o epub -o html -d D:\test

what i get:

D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\bs4\element.py:15: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used.
warnings.warn(
The Gamer, Chyoa edition.
[' Funatic']
When he turned 18, John Newman received a gift from Gaia the world spirit. Starting now his whole life would become a video game. Follow him as he discovers his new powers and use them for his own purposes. Unlike what happens in the original The Gamer has some other priorities and will develop his powers to have a lot of fun with the ladies around him.
968/1117 86%Traceback (most recent call last):
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 323, in
clas=MakeClass(i)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 195, in MakeClass
site=sitesdomain
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 19, in
'chyoa.com':lambda x:Chyoa.Chyoa(x),
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 169, in init
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
[Previous line repeated 964 more times]
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 326, in AddPrevPage
page = Common.RequestPage(url)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Common.py", line 139, in RequestPage
response = RequestSend(url, headers)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Common.py", line 133, in RequestSend
response = requests.get(url)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 677, in send
history = [resp for resp in gen]
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 677, in
history = [resp for resp in gen]
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 237, in resolve_redirects
resp = self.send(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1371, in getresponse
response.begin()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 338, in begin
self.headers = self.msg = parse_headers(self.fp)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 237, in parse_headers
return email.parser.Parser(_class=_class).parsestr(hstring)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\parser.py", line 67, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\parser.py", line 56, in parse
feedparser.feed(data)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 176, in feed
self._call_parse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 180, in _call_parse
self._parse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 295, in _parsegen
if self._cur.get_content_maintype() == 'message':
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 594, in get_content_maintype
ctype = self.get_content_type()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 578, in get_content_type
value = self.get('content-type', missing)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email_policybase.py", line 316, in header_fetch_parse
return self._sanitize_header(name, value)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email_policybase.py", line 287, in _sanitize_header
if _has_surrogates(value):
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\utils.py", line 57, in _has_surrogates
s.encode()
RecursionError: maximum recursion depth exceeded while calling a Python object

i don't speak english programam. HELP

Add HTML output format

HTML output should easily allow the outputted files to be more easily converted to other formats via pandoc or similar programs

--chyoa-update does not work

As the title says.
The update check does not seem to work.
When I download an Story completely and five minutes later again, it will be downloaded again. Also, it does not write the Message output.

Chyoa - never ending Cycle

Story: Viral Transformation.

When I try to download this story it "downloads" it for roughly 20 Minutes, although it´s a short story. I have the feeling that it visits the same chapters several times.
After that time, I will get errors in between the chapter number and then it will keep downloading longer. And some when it just stops and does nothing anymore.

I guess it goes out of some reason back to older chapters whyever.

ummm, help?

theslavicbear / ebook-publisher Goto Github PK

ebook-publisher's People

Contributors

Stargazers

Watchers

Forkers

ebook-publisher's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs