Comments (2)
Hi, thanks for logging the issue.
- I could use absolute paths so that my scripts are less fragile.
The -o
/ --output
path can be absolute, but admittedly the help text makes it sound that only relative paths are allowed. It was meant along the lines of when relative, it's relative to the current working directory, which incidentally is how paths normally work 😅 so maybe just drop the whole 'relative' part. Please note that the directory does need to exist, as percollate does not mkdir -p
itself to the destination.
- If I only specified a directory, it would use the title of the page as the title of the doc in the same manner that it does now if you do not use the -o option.
Using the --individual
flag effectively turns the value of -o
into a prefix, to which the web page titles are appended. So ending with a trailing slash (-o my/destination/
) will create files inside destination
. However, it can benefit from some cosmetic tweaks (it makes filenames start with a hyphen currently).
With a bash script, I could wget the web page and then use pup to put the title into a variable that could be used in percollate as the filename. (There's more to it than that as I would also want to clean the title to make sure there are no illegal characters and make sure it did not exceed the character count limit.)
The titles are currently transformed with slugify
, but they might benefit from stricter rules, e.g. filenamify
+ truncation. Do you have a hard limit on the filename length, or just a preference?
from percollate.
Appreciate the fast response!
Just a heads up, I am on Ubuntu 20.04 and am using version 4.0 of Percollate.
The -o / --output path can be absolute, but admittedly the help text makes it sound that only relative paths are allowed. It was meant along the lines of when relative, it's relative to the current working directory,
Huh. I tried it a few times, but it kept throwing an error. In fact, just tried it again and it is still throwing errors. Here's what I'm doing:
percollate pdf http://example.com/article.html -o /path/for/file/
I get this error:
[Error: EISDIR: illegal operation on a directory, open '/path/for/file/'] {
errno: -21,
code: 'EISDIR',
syscall: 'open',
path: '/path/for/file/'
}
At first, I thought permissions error — because that's the first place you check, but the permissions are correct on my directory. And, when I do this, it works:
percollate pdf http://example.com/article.html -o /path/for/file/file.pdf
Then I saw the note about relative paths and figured that was the cause.
Using the --individual flag effectively turns the value of -o into a prefix,
percollate pdf http://example.com/article.html -o /path/for/file/ —individual
Now, this command works as expected, though, as you say, it adds a hyphen in front of the filename
Something to note: this does not work:
percollate pdf http://example.com/article.html -o /path/for/file —individual
Notice the missing trailing slash at the end of "/path/for/file" It tries to create this and, at least in my attempts, fails:
/path/for/file-example.com/article.html
Given your description, I see why it works that way, but did want to point out something that people might miss
The titles are currently transformed with slugify
Sorry, my initial comment was just my thought experiment on how I would accomplish this without percollate. I've been bit before by the creation of a filename that was too long and it was a serious PITA to figure out how to delete that file. So now I am just extra cautious about length and illegal character output. Sounds like you've got that handled in percollate, though.
Lastly, I thought you might get a kick out of what I am trying to do... Basically, a few web pages are not allowing percollate to access the entire html page, but I've found that if I have singlefile grab the site first and send the result to stdout, then percollate can pull the html from stdout and create a new PDF or epub of the entire document. :D Something like this:
singlefile https://medium.com/article-file-name.html | percollate epub - --url=https://medium.com/ -o /path/to/save/ --individual
Anyway, thanks for the quick response. It seems like the best way to get what I want with what is already there would be to use the -- individual option and then, maybe at the end, rename the files to remove the initial dash. Since this is happening in a bash script, that should be pretty easy to do.
from percollate.
Related Issues (20)
- Work through a large number of files piece by piece HOT 2
- Better handling of file name duplicates HOT 3
- Add support for Firefox backend
- Notes on upgrading to node-fetch@3
- Better handling of `stdin`, `stdout`, `stderr` HOT 1
- Switch to new Puppeteer headless mode
- Cannot export files HOT 4
- Non-normal image link formats are not recognized HOT 2
- Multiple html files are synthesized into a single epub file with errors HOT 3
- Error when handling a Table node when converting to markdown HOT 2
- Make Wikipedia Markdown more Markdowny
- hyphenopoly pattern couldn't be loaded HOT 6
- Nested <pre> breaks page enhancement HOT 3
- Convert absolute links between bundled articles to relative links
- Can I convert an HTML into an epub and automatically generate a table of contents with the H2 headings? HOT 2
- EPUB files are completely uncompressed HOT 2
- support Pop-up Footnotes in EPUB books ? HOT 2
- H1 missing from table of contents HOT 1
- Images requiring a Referer header are not fetched HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from percollate.