domenic / worm-scraper Goto Github PK
View Code? Open in Web Editor NEWScrapes the web serial Worm and its sequel Ward into an eBook format
License: Other
Scrapes the web serial Worm and its sequel Ward into an eBook format
License: Other
FYI, on my Mac (10.12.5), after installing Node.js (6.11 LTS), worm-scraper gave this:
$ worm-scraper --help
/usr/local/bin/worm-scraper: line 1: use strict: command not found
Solution was to add a shebang line in worm-scraper per https://stackoverflow.com/a/34354713
"Wretch" is inconsistently capitalized in Ward.
It's mostly un-capitalized through Shadow 5.8. (However, there are exceptions, such as Shadow 5.4.)
Starting in Shadow 5.9, it's mostly capitalized. However, there are many exceptions; eyeballing the search results, I would guess 30 or so.
There doesn't appear to be a pattern. For example, you could imagine that when used as a proper name, it's capitalized, and otherwise it's not. But both "the wretch" and "the Wretch" are often seen, even in later chapters. There is one instance of "Wretch" (no "the") in Pitch 6.6, but it appears to be the exception, and probably should be fixed.
I'm unsure whether this shift in capitalization represents a narratively-significant change in how Victoria thinks, or an author style update that wasn't back-applied to earlier chapters, or what.
A few options are:
Always un-capitalize. This generally fits better with English grammar, since it's not a proper name, and is often prefixed with "the".
Always capitalize. I.e., treat "the Wretch" is an entity whose proper name somehow includes a lowercase "the". This might best preserve authorial intent if we assume that "the Wretch" is the intended name throughout, and the author just never went back and corrected earlier instances.
Enforce capitalization after 5.8, fixing the ~30 un-capitalized instances. The idea here would be that the capitalization represents a narratively-significant shift in how Victoria thinks of the wretch/Wretch, and we assume that instances where it got left as lowercase later in the book were accidents. It seems weird to use capitalization this way (i.e., it seems like it's just going to cause the reader's eyes to stumble each time, instead of making them see the Wretch as more of a named entity), but it's possible this best preserves authorial intent.
I tried running worm-scraper download convert scaffold zip
but after downloading the last chapter, it throws the error Error: ENOENT: no such file or directory, rmdir '/Users/vigneshwar/Desktop/staging/worm/OEBPS/chapters'
The generated epub is not compatible with Google Play Books.
An online epub validator (https://www.ebookit.com/tools/bp/Bo/eBookIt/epub-validator) points out possible errors. There are several on chapter 79, some on 211, 249 and 275.
Also it seems to have some problem on the cover and the img tag.
The only two fatal errors are on chapter 79, probably due some tag that was not closed properly:
FATAL(RSC-016): ./books/Bo/databases/eBookIt/temp_uploads/1609380837.epub/OEBPS/chapters/chapter079.xhtml(210,6): Fatal Error while parsing file: The element type "p" must be terminated by the matching end-tag "
".
ERROR(RSC-005): ./books/Bo/databases/eBookIt/temp_uploads/1609380837.epub/OEBPS/chapters/chapter079.xhtml(-1,-1): Error while parsing file: The element type "p" must be terminated by the matching end-tag "
".
When running worm-scraper convert...
, if there is no pre-existing "staging" path (aka upon fresh run), it gives this error:
(node:6172) [DEP0147] DeprecationWarning: In future versions of Node.js, fs.rmdir(path, { recursive: true }) will be removed. Use fs.rm(path, { recursive: true }) instead
And that means fs.rmdir
no longer allows for no existing path with the recursive: true
arg.
The fix is to simply change line 90 in worm-scraper.js
to the following:
return fs.rm(chaptersPath, { force: true, recursive: true, maxRetries: 3 })
And that made it work for me.
Not really an issue, since this does exactly what it says on the tin, but how would I go about making this work for the author's other books, like the Worm sequel?
I didn't have that directory at the point at which I ran worm-scraper convert
(not sure if I was supposed to somehow have it, or what).
Hacked around it for now by just creating the director manually before running the command.
But I think the script should check whether the directory exists and only attempt to remove it if it does exist
In attempting to use worm-scraper with Twig, I encountered the following issue:
When improperly providing the --start-url parameter, it began to download Worm by default. After that, it would always download Worm. I thought that I was continuing to improperly pass the start url.
Clearing the cache folder resolved the problem.
This appears to be related to starting from the latest position in the existing manifest.
In Worm Extermination 8.4, "it woul be anyone's guess" should be "it would be anyone's guess" (emph mine)
Hi, first let me thank you for this great project!
It would be nice to have the option to download the teaser for "Ward" aswell.
If you're interested, it could be implemented like this, using worm-scraper with other parameters:
Let me know what you think about it :)
As the title says. Installed with v7.4.0 of node.js (not the LTS version) on Windows 10 x64 and ran in an elevated command prompt.
Running worm-scraper --help prompts a syntax error (Code: 800A03EA).
The same occurs with worm-scraper download convert scaffold zip
Note that this issue persisted after a reinstallation of worm-scraper. I also tried cding to the directory where worm-scraper was installed, to no avail.
~/Downloads ⌚ 10:19:31
$ worm-scraper -book=ward
Downloading https://parahumans.wordpress.com/2013/11/19/interlude-end/... done
Converting raw downloaded HTML to EPUB chapters
All chapters converted in 23.4 seconds
EPUB contents assembled into /usr/lib/node_modules/worm-scraper/scaffolding
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received an instance of Array
at validateString (internal/validators.js:124:11)
at Object.resolve (path.js:980:7)
at /usr/lib/node_modules/worm-scraper/lib/worm-scraper.js:105:58
at /usr/lib/node_modules/worm-scraper/lib/worm-scraper.js:111:13
i ran "worm-scraper --help" and got this error:
module.js:472
throw err;
^Error: Cannot find module 'mz/fs'
at Function.Module._resolveFilename (module.js:470:15)
at Function.Module._load (module.js:418:25)
at Module.require (module.js:498:17)
at require (internal/module.js:20:19)
at Object. (C:\Program Files\Node\node_modules\worm-scraper\lib\download.js:3:12)
at Module._compile (module.js:571:32)
at Object.Module._extensions..js (module.js:580:10)
at Module.load (module.js:488:32)
at tryModuleLoad (module.js:447:12)
at Function.Module._load (module.js:439:3)
Hello—thank you for making this tool.
After finishing worm a while ago, I recently desired to pick up ward. Unfortunately my book build is failing at chapter 16-10 with the message I will post below.
I installed using the current node.js, 15.6.0 in case that is relevant.
Downloading https://www.parahumans.net/2019/09/15/from-within-16-10/... TypeError: Cannot read property 'textContent' of null
at getChapterTitle (C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\download.js:93:55)
at downloadAllChapters (C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\download.js:49:26)
at processTicksAndRejections (node:internal/process/task_queues:94:5)
at async C:\Users\aes\AppData\Roaming\npm\node_modules\worm-scraper\lib\worm-scraper.js:111:7
Hi, i encountered an issue when i try to edit the scripts.
TypeError: Failed to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'. I know this is from the converting portion but not sure why it failed as it worked for another epub.
Sorry for the newbie question, but whenever I try to use one of the single letter option commands (-s, -c, -b, -o) I get the message 'Not enough non-option arguments: got 0. need at least 1.'
Could you tell me exactly what I need to type to use these commands? thanks.
I ran -o Downloads, as I downloaded worm-scraper in users/myusername. The program then said, after downloading all files, Error: ENOENT: no such file or directory, rmdir 'C:\Users\myusername\staging\worm\OEBPS\chapters'. I ran the program again, the last interlude was downloaded, and the same error occurred. I then ran npm -g worm-scraper, and the issue persisted. I then created the directory, and everything worked fine. I would suggest using fs to see if the file exists, and if it does not, create it.
In 'Venom 29.8', we get 'Grey Boy'; everywhere else, it's 'Gray Boy', with an 'a'.
Didn't look into why.
I hacked around it with this fish script:
function retryworm
timeout 10 worm-scraper download
end
This repo is the number one result for "worm ebook". The author has explicitly said he doesn't mind people making their own ebooks but that publicising such methods damages his ability to pursue traditional ebook publishing routes.
Not saying this should be taken down (it's super useful!) but can we at least make it less conspicuous? This guy spends 50+ hours per week writing and has a following of millions, I just wish there was a way he could profit more from his awesome work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.