GithubHelp home page GithubHelp logo

theslavicbear / ebook-publisher Goto Github PK

View Code? Open in Web Editor NEW
32.0 8.0 2.0 505 KB

A Python tool for converting online stories into portable formats

License: MIT License

Python 100.00%
ebook-downloader ebook epub fanfiction fictionpress python converting-online-stories story

ebook-publisher's People

Contributors

theslavicbear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ismail-mirza

ebook-publisher's Issues

Ongoing chyoa linked chaper issues.

Duplicate chapters may still appear and recurse into their child chapters in Chyoa stories when the linked chapters are on the first chapter
First duplicate chapter will still appear with child links to chapters that won't work

Add support for custom CSS files for .epub output

Ebooklib has support for adding custom css to epub files and the default style isn't great. It shouldn't take too much fiddling to get this working, perhaps adding a -css file.css argument to pick the style sheet.

choya - Does not stop on 502 Error

Hey, I have just used:
Ebook-Publisher.py https://chyoa.com/story/Contagion-63X---Viral-Transformation.26118 -o epub -o html -o txt -i

the problem is that I got an:
==========Server returned status code 502 for page: https://chyoa.com/chapter/A-redhead-looking-for-something-to-lick.793955 Could not complete request for page: https://chyoa.com/chapter/A-redhead-looking-for-something-to-lick.793955 Server returned status code 502 for page: https://chyoa.com/chapter/A-young-man-at-the-beach-with-his-sister-and-her-boyfriend.793956 Could not complete request for page: https://chyoa.com/chapter/A-young-man-at-the-beach-with-his-sister-and-her-boyfriend.793956 Server returned status code 502 for page: https://chyoa.com/chapter/A-girl-with-a-love-for-popsicles.793959 Could not complete request for page: https://chyoa.com/chapter/A-girl-with-a-love-for-popsicles.793959 Server returned status code 502 for page: https://chyoa.com/chapter/A-couple-hungry-for-sex-barge-in-on-Pamela.793960 Could not complete request for page: https://chyoa.com/chapter/A-couple-hungry-for-sex-barge-in-on-Pamela.793960 ======================================================================================================

between the download and after that it did neither stop nor complete.
It just kept on producing ==== constantly, for over two hours.

Due to this it´s not possible to download this one.

Additionally, it takes a lot of time until this error is thrown, this story has way fewer chapters than some others, but it takes several times more time to get to this error.

I have tried to download it a second time, but it only has produced === for half an hour or so now.

get error related to internet connection even if my connection is fine

I get this error :

image

it also cause the program to run infinitely after the error with multi-threading (download without stopping even after having exceeded the number of chapters)

my internet connection is working, but maybe its not always stable ? maybe adding something to retry to download the page after a minute or something like that.

Make -n check creation date and last update

Hi,
I would like to recommend that -n or some other flag checks when the Story was last updated and then also checks when the corresponding file on the drive was created.

If there was no update since the creation date it should just do nothing.

This would be a better behavior to the current one where it only checks if the File exists or not.
This should be done quite simply, I would do it myself, but I have no idea of Python.

help me plesase

I'm a complete zero in python. I downloaded the archive with your script, unpacked it, and run Ebook-Publisher.py the command prompt opens for a second and closes immediately. What do I need to do so that I can use your script?

Add multi-threading to speed up working through a list

The bottleneck in Ebook-Publisher is receiving HTML pages from external servers, so it is possible to get huge speed gains by either requesting the all of the pages from each story at once (and increasing strain on the servers), or by concurrently working on different stories given that they are in different domains.

Chyoa collapsible TOC

I really don't want to do this but it should theoretically be possible to build a function to collapse parts of the TOC in html format

Add update function

Hi, I would like to recommend making it possible to update my downloaded stories automatically.

I thought about it like that:
1.) create a default save directory archive/database
2.) Either add a database, a file with a simple key/value pair, being the Name of the story and the download URL or add it to the metadata of the story. Or whatever is easiest for you.
3.) add the command "update"
4.) by update it will update all the stories in the database. Don´t know how it´s easier for you - downloading completely or updating the existing file.

RecursionError: maximum recursion depth exceeded while calling a Python object

Hello, when using your program to download some stories from CHYOA.com it works fine but on a larger story I get this error:

RecursionError: maximum recursion depth exceeded while calling a Python object

The story I am downloading from has almost 1000 chapters here:

https://chyoa.com/chapter/Vacation-Week-2-%E2%80%93-A-Path-without-Destination.921081

Is it possible to alter the program do download larger stories or to download it in smaller chunks? Eg. For CHYOA it downloads from the last chapter you want and works backwards: could you tell it to stop at a certain point? Then you could download larger stories in chunks and it seems like something that could easily be done.

Many thanks for your help!

Comparison operator typo in Common.py

I'm a beginner and not proficient in pull requests. So, I'm submitting it manually here:

In Common.py line 101 please fix:
if attempts => 5: to if attempts >= 5:

Just a simple typo from the latest commit 469c534

Improve implementation of quiet option

The --quiet option currently works by rerouting standard out to /dev/null. This was a hasty solution to the mass cluttering caused by using the multi-threaded option. Ideally, it would check for the quiet option each time before printing anything so that messages can bypass the quiet option when needed. This is mostly only a problem with the Chyoa script, as it may need to print and accept input from the terminal (currently worked around, but this causes issues in multi-threaded mode).

Grab specific branches in CHYOA

Would it be possible to add the option to input a chapter and grab all following chapters? This would be really useful for large stories with big divergent branches where you might still want to grab all the sub-branches. Thanks!

Filenames contain forbidden characters in Windows, and won't write

Thanks for developing this wonderful tool, but on Windows I'm having trouble downloading stories containing a " (double quote) or ? (question mark) in their title.

I'm assuming this would also be applicable to stories with an * (asterisk), < (less than) or > (greater than) character, as these are forbidden in Windows file names.

Error reads:
OSError: [Errno 22] Invalid argument: '[directory]\[storyname].html'

Additionally, when trying stories with a : (colon) in the title, rather than give an error on writing, the file seems to stop writing prematurely at the colon. This results in a file of 0kb with a file name of the title up to the colon, and no extension.

(eg. Master PC: Jenny's Tale becomes Master PC, rather than Master PC: Jenny's Tale.html)

Thanks.

Example stories:

"Yes" - https://chyoa.com/chapter/Introduction.37982
Master PC: Jenny's Tale - https://chyoa.com/chapter/Introduction.176462
Party Time? - https://chyoa.com/chapter/Introduction.259702

Story download from chyoa which contains images is not working

Here is the error, I am getting while downloading story from choya which contains images:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/ebook-publisher/Ebook-Publisher.py", line 187, in MakeClass
formatsft
File "/ebook-publisher/Ebook-Publisher.py", line 32, in
'epub':lambda x:MakeEpub(x),
File "/ebook-publisher/Ebook-Publisher.py", line 158, in MakeEpub
zeros = '0' * (len(str(site.isize))-1)
AttributeError: 'Chyoa' object has no attribute 'isize'

chyoa - Object has no attribute

If I use:
Ebook-Publisher.py https://chyoa.com/story/Healslut-Adventures.29070 -o epub -o html -i

I get the error:
Traceback (most recent call last): File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 295, in <module> formats[ft](clas) File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 32, in <lambda> 'epub':lambda x:MakeEpub(x), File "E:\Firefox Downloads\Ebook-Publisher-3.1.3\Ebook-Publisher.py", line 158, in MakeEpub zeros = '0' * (len(str(site.isize))-1) AttributeError: 'Chyoa' object has no attribute 'isize'

It is thrown for html and txt.
It is thrown for epub too, however the epub will be created and filled with content unlike the other two.

Standard input

Program should be able to accept standard input in place of positional URL argument.

Large file sizes

Again, thanks for making this tool! I've been looking for something like this for years. One thing I noticed while playing around with the Chyoa scraper was that it doesn't seem to account for branches that reconverge (e.g. multiple chapters have urls pointing to the same chapter). Right now, the scraper seems to treat each URL as a new chapter, causing it to download the same branches over and over again. It's not a big deal with small stories but it seems to get exponentially larger each time one of these branch merges happens (one of the stories I tested ended up having >50 copies of the same chapter with a final filesize of ~120mb).

Without any knowledge of how your code works, I think you might be able to use a dictionary to record whether a URL has been visited already and pair it with the internal address within the file. If you used the original URL as the key, you could use 'in' to quickly check if a new link already has an internal address saved and point a link towards that instead of redownloading the chapter.

Better support for long CHYOA stories (The Gamer)

As far as I can tell it's currently impossible to download The Gamer with Ebook Publisher. If I start from the end, the story is currently 1036 chapters long and fails at roughly chapter 968 due to exceeding the recursion limits. This was marked as "wontfix" in issue #73. If you use the new "--chyoa-force-forwards" option it creates an ebook file with crazily long chapter files. Chapter 1036 would be over 1000 characters long with the 1.1.1.1.1.... etc naming method. This breaks the windows file path limits and also breaks Calibre and Sigil when I tried to open the file. The book stops once the file path's get too long.

My suggestion is to have a chapter depth variable that can be passed. This way I could download the book 100 chapters at a time and then merge them together (or just have Vol 1,2,3). This would also prevent file path too long errors.

Replace ebooklib with a free alternative

Ebooklib must be installed externally and cannot be included due to its license. By replacing it with a reverse engineered library, we can have all dependencies included and potentially have a more customizable experience.

Chyoa - UnicodeEncodeError

Hi,
I get here for some stories with txt,html and epub and error. epub and html will be produced normally, but Text files will throw this error and the txt file will only contain the Story- and author name.

Traceback (most recent call last): File "Ebook-Publisher.py", line 295, in <module> formats[ft](clas) File "Ebook-Publisher.py", line 33, in <lambda> 'html':lambda x:MakeHTML(x), File "Ebook-Publisher.py", line 81, in MakeHTML published.write('<h2 id = "'+site.depth[i-1]+'">'+site.chapters[i]+'\n</h2>\n'+str(site.rawstoryhtml[i])) File "C:\myProgramme\Python\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u202d' in position 512: character maps to <undefined>

Chyoa - Add indents for epub

Hey,
I would like to recommend that you add indents for chyoa stories in epub too like you do in html.
It´s way easier to see what chapters belong together this way.

help!

what i give:

C:\Users\bn>Ebook-Publisher.py https://chyoa.com/chapter/namechapter -o epub -o html -d D:\test

what i get:

D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\bs4\element.py:15: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used.
warnings.warn(
The Gamer, Chyoa edition.
[' Funatic']
When he turned 18, John Newman received a gift from Gaia the world spirit. Starting now his whole life would become a video game. Follow him as he discovers his new powers and use them for his own purposes. Unlike what happens in the original The Gamer has some other priorities and will develop his powers to have a lot of fun with the ladies around him.
968/1117 86%Traceback (most recent call last):
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 323, in
clas=MakeClass(i)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 195, in MakeClass
site=sitesdomain
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Ebook-Publisher.py", line 19, in
'chyoa.com':lambda x:Chyoa.Chyoa(x),
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 169, in init
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 351, in AddPrevPage
self.AddPrevPage(i.get('href'))
[Previous line repeated 964 more times]
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Chyoa.py", line 326, in AddPrevPage
page = Common.RequestPage(url)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Common.py", line 139, in RequestPage
response = RequestSend(url, headers)
File "D:\téléchargements\Ebook-Publisher-master\Ebook-Publisher-master\Site\Common.py", line 133, in RequestSend
response = requests.get(url)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 677, in send
history = [resp for resp in gen]
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 677, in
history = [resp for resp in gen]
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 237, in resolve_redirects
resp = self.send(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1371, in getresponse
response.begin()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 338, in begin
self.headers = self.msg = parse_headers(self.fp)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 237, in parse_headers
return email.parser.Parser(_class=_class).parsestr(hstring)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\parser.py", line 67, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\parser.py", line 56, in parse
feedparser.feed(data)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 176, in feed
self._call_parse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 180, in _call_parse
self._parse()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\feedparser.py", line 295, in _parsegen
if self._cur.get_content_maintype() == 'message':
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 594, in get_content_maintype
ctype = self.get_content_type()
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 578, in get_content_type
value = self.get('content-type', missing)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email_policybase.py", line 316, in header_fetch_parse
return self._sanitize_header(name, value)
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email_policybase.py", line 287, in _sanitize_header
if _has_surrogates(value):
File "C:\Users\bn\AppData\Local\Programs\Python\Python39\lib\email\utils.py", line 57, in _has_surrogates
s.encode()
RecursionError: maximum recursion depth exceeded while calling a Python object

i don't speak english programam. HELP

Add HTML output format

HTML output should easily allow the outputted files to be more easily converted to other formats via pandoc or similar programs

--chyoa-update does not work

As the title says.
The update check does not seem to work.
When I download an Story completely and five minutes later again, it will be downloaded again. Also, it does not write the Message output.

Chyoa - never ending Cycle

Story: Viral Transformation.

When I try to download this story it "downloads" it for roughly 20 Minutes, although it´s a short story. I have the feeling that it visits the same chapters several times.
After that time, I will get errors in between the chapter number and then it will keep downloading longer. And some when it just stops and does nothing anymore.

I guess it goes out of some reason back to older chapters whyever.

ummm, help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.