domenic / worm-scraper Goto Github PK

View Code? Open in Web Editor NEW

194.0 4.0 47.0 2.08 MB

Scrapes the web serial Worm and its sequel Ward into an eBook format

License: Other

JavaScript 98.11% HTML 1.89%

worm epub ward ebook-downloader ebook

worm-scraper's Issues

line 1: use strict: command not found

FYI, on my Mac (10.12.5), after installing Node.js (6.11 LTS), worm-scraper gave this:

$ worm-scraper --help
/usr/local/bin/worm-scraper: line 1: use strict: command not found

Solution was to add a shebang line in worm-scraper per https://stackoverflow.com/a/34354713

Get rid of pseudo trigger warning

Capitalization of "wretch"

"Wretch" is inconsistently capitalized in Ward.

It's mostly un-capitalized through Shadow 5.8. (However, there are exceptions, such as Shadow 5.4.)

Starting in Shadow 5.9, it's mostly capitalized. However, there are many exceptions; eyeballing the search results, I would guess 30 or so.

There doesn't appear to be a pattern. For example, you could imagine that when used as a proper name, it's capitalized, and otherwise it's not. But both "the wretch" and "the Wretch" are often seen, even in later chapters. There is one instance of "Wretch" (no "the") in Pitch 6.6, but it appears to be the exception, and probably should be fixed.

I'm unsure whether this shift in capitalization represents a narratively-significant change in how Victoria thinks, or an author style update that wasn't back-applied to earlier chapters, or what.

A few options are:

Always un-capitalize. This generally fits better with English grammar, since it's not a proper name, and is often prefixed with "the".
Always capitalize. I.e., treat "the Wretch" is an entity whose proper name somehow includes a lowercase "the". This might best preserve authorial intent if we assume that "the Wretch" is the intended name throughout, and the author just never went back and corrected earlier instances.
Enforce capitalization after 5.8, fixing the ~30 un-capitalized instances. The idea here would be that the capitalization represents a narratively-significant shift in how Victoria thinks of the wretch/Wretch, and we assume that instances where it got left as lowercase later in the book were accidents. It seems weird to use capitalization this way (i.e., it seems like it's just going to cause the reader's eyes to stumble each time, instead of making them see the Wretch as more of a named entity), but it's possible this best preserves authorial intent.

Trailing spaces before </em>

Error: ENOENT: no such file or directory, rmdir

I tried running worm-scraper download convert scaffold zip but after downloading the last chapter, it throws the error Error: ENOENT: no such file or directory, rmdir '/Users/vigneshwar/Desktop/staging/worm/OEBPS/chapters'

Generated epub not compatible with Play Books

The generated epub is not compatible with Google Play Books.

An online epub validator (https://www.ebookit.com/tools/bp/Bo/eBookIt/epub-validator) points out possible errors. There are several on chapter 79, some on 211, 249 and 275.

Also it seems to have some problem on the cover and the img tag.

The only two fatal errors are on chapter 79, probably due some tag that was not closed properly:

FATAL(RSC-016): ./books/Bo/databases/eBookIt/temp_uploads/1609380837.epub/OEBPS/chapters/chapter079.xhtml(210,6): Fatal Error while parsing file: The element type "p" must be terminated by the matching end-tag "
".

ERROR(RSC-005): ./books/Bo/databases/eBookIt/temp_uploads/1609380837.epub/OEBPS/chapters/chapter079.xhtml(-1,-1): Error while parsing file: The element type "p" must be terminated by the matching end-tag "
".

Issue with running convert.js when no pre-existing "staging" path

When running worm-scraper convert... , if there is no pre-existing "staging" path (aka upon fresh run), it gives this error:
(node:6172) [DEP0147] DeprecationWarning: In future versions of Node.js, fs.rmdir(path, { recursive: true }) will be removed. Use fs.rm(path, { recursive: true }) instead
And that means fs.rmdir no longer allows for no existing path with the recursive: true arg.

The fix is to simply change line 90 in worm-scraper.js to the following:
return fs.rm(chaptersPath, { force: true, recursive: true, maxRetries: 3 })
And that made it work for me.

Adapt to other books

Not really an issue, since this does exactly what it says on the tin, but how would I go about making this work for the author's other books, like the Worm sequel?

`ENOENT: no such file or directory, rmdir '/Users/brachbach/worm/staging/worm/OEBPS/chapters'` for `worm-scraper convert`

I didn't have that directory at the point at which I ran worm-scraper convert (not sure if I was supposed to somehow have it, or what).

Hacked around it for now by just creating the director manually before running the command.

But I think the script should check whether the directory exists and only attempt to remove it if it does exist

Twig

In attempting to use worm-scraper with Twig, I encountered the following issue:

When improperly providing the --start-url parameter, it began to download Worm by default. After that, it would always download Worm. I thought that I was continuing to improperly pass the start url.

Clearing the cache folder resolved the problem.

This appears to be related to starting from the latest position in the existing manifest.

Typo: "woul" -> "would"

In Worm Extermination 8.4, "it woul be anyone's guess" should be "it would be anyone's guess" (emph mine)

Option to download Teaser (Glow-worm)

Hi, first let me thank you for this great project!

It would be nice to have the option to download the teaser for "Ward" aswell.
If you're interested, it could be implemented like this, using worm-scraper with other parameters:

Just download it as a standalone book, like the other two (Option example: --book=glow-worm-teaser)
Add it as chapter to "Ward" at the beginning (Option examples: --book=ward-include-teaser or --book=ward --include-teaser)

Let me know what you think about it :)

Invalid Syntax

As the title says. Installed with v7.4.0 of node.js (not the LTS version) on Windows 10 x64 and ran in an elevated command prompt.

Running worm-scraper --help prompts a syntax error (Code: 800A03EA).

The same occurs with worm-scraper download convert scaffold zip

Note that this issue persisted after a reinstallation of worm-scraper. I also tried cding to the directory where worm-scraper was installed, to no avail.

path error with -book=ward (but worm worked)

~/Downloads ⌚ 10:19:31
$ worm-scraper -book=ward
Downloading https://parahumans.wordpress.com/2013/11/19/interlude-end/... done
Converting raw downloaded HTML to EPUB chapters
All chapters converted in 23.4 seconds
EPUB contents assembled into /usr/lib/node_modules/worm-scraper/scaffolding
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received an instance of Array
    at validateString (internal/validators.js:124:11)
    at Object.resolve (path.js:980:7)
    at /usr/lib/node_modules/worm-scraper/lib/worm-scraper.js:105:58
    at /usr/lib/node_modules/worm-scraper/lib/worm-scraper.js:111:13

Cannot find module 'mz/fs'

i ran "worm-scraper --help" and got this error:

module.js:472
throw err;
^

Error: Cannot find module 'mz/fs'
at Function.Module._resolveFilename (module.js:470:15)
at Function.Module._load (module.js:418:25)
at Module.require (module.js:498:17)
at require (internal/module.js:20:19)
at Object. (C:\Program Files\Node\node_modules\worm-scraper\lib\download.js:3:12)
at Module._compile (module.js:571:32)
at Object.Module._extensions..js (module.js:580:10)
at Module.load (module.js:488:32)
at tryModuleLoad (module.js:447:12)
at Function.Module._load (module.js:439:3)

Ward build hanging at chapter 16-10

Hello—thank you for making this tool.

After finishing worm a while ago, I recently desired to pick up ward. Unfortunately my book build is failing at chapter 16-10 with the message I will post below.

I installed using the current node.js, 15.6.0 in case that is relevant.

Downloading https://www.parahumans.net/2019/09/15/from-within-16-10/... TypeError: Cannot read property 'textContent' of null
at getChapterTitle (C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\download.js:93:55)
at downloadAllChapters (C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\download.js:49:26)
at processTicksAndRejections (node:internal/process/task_queues:94:5)
at async C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\worm-scraper.js:111:7

Editing the script to scrap other epub

Hi, i encountered an issue when i try to edit the scripts.

TypeError: Failed to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'. I know this is from the converting portion but not sure why it failed as it worked for another epub.

Inconsistent italics in Sifara and Thanda

'Not enough non-option arguments: got 0. need at least 1'

Sorry for the newbie question, but whenever I try to use one of the single letter option commands (-s, -c, -b, -o) I get the message 'Not enough non-option arguments: got 0. need at least 1.'

Could you tell me exactly what I need to type to use these commands? thanks.

Staging folder not created

I ran -o Downloads, as I downloaded worm-scraper in users/myusername. The program then said, after downloading all files, Error: ENOENT: no such file or directory, rmdir 'C:\Users\myusername\staging\worm\OEBPS\chapters'. I ran the program again, the last interlude was downloaded, and the same error occurred. I then ran npm -g worm-scraper, and the issue persisted. I then created the directory, and everything worked fine. I would suggest using fs to see if the file exists, and if it does not, create it.

Consistent 'Gray Boy' spelling

In 'Venom 29.8', we get 'Grey Boy'; everywhere else, it's 'Gray Boy', with an 'a'.

`worm-scraper download` hangs after downloading a few chapters

Didn't look into why.

I hacked around it with this fish script:

function retryworm
    timeout 10 worm-scraper download
end

Suggest making this less visible to respect the author's wishes?

This repo is the number one result for "worm ebook". The author has explicitly said he doesn't mind people making their own ebooks but that publicising such methods damages his ability to pursue traditional ebook publishing routes.

Not saying this should be taken down (it's super useful!) but can we at least make it less conspicuous? This guy spends 50+ hours per week writing and has a following of millions, I just wish there was a way he could profit more from his awesome work.

domenic / worm-scraper Goto Github PK

worm-scraper's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs