alvarcarto / url-to-pdf-api Goto Github PK
View Code? Open in Web Editor NEWWeb page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
License: MIT License
Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
License: MIT License
Great work here -- its 2017 and generating PDFs is still unnecessarily complicated. I'm currently using wfhtmltopdf. I'd love to stop using it, and use this project as a micro service to handle all my pdf needs. However the one thing I can't figure out how to do is add a footer and/or header to every generated page. Header/footer would need to have stuff like logo, page number, warning, date, invoice number etc, so it needs to be more custom than the standard pdf.displayHeaderFooter
option allows.
Does anyone have any experience with this? Is there something I'm missing? Thanks again for this awesome project.
Hi, please upgrate puppeteer to latest (1.2.0).
Thank you.
How may I prevent text getting hidden and overflowed this way http://resume.josephrex.me/ as shown in the profile paragraph there?
Hi,
I get this error randomly when I try to generate a pdf from my local url-to-pdf.
The server crash with the following error : Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)
.
curl print curl: (52) Empty reply from server
curl -o test_.pdf -XPOST [email protected] -H"content-type: text/html" http://localhost:9000/api/render\?emulateScreenMedia=false\&goto.waitUntil\=load
This bug only happens AFTER the pdf generation, when browser.close()
is called, but I don't know if this is caused by puppeteer closing its connexion to chrome, or the connexion to one of the assets of the page. Because this error happens after the pdf generation, I'm inclined to ignore it, and it can be done by adding a callback on process.on('uncaughtException', (error) => {})
, but I'm not sure that's the correct thing to do, but for now it's the only solution I can provide.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test</title>
<!-- Normalize or reset CSS with your favorite library -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/3.0.3/normalize.css">
<!-- Load paper.css for happy printing -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/paper-css/0.2.3/paper.css">
<style>
@page {
size: A4;
}
img {
display: block;
position: absolute;
}
img:nth-of-type(1) {
left: 200px;
top: 200px;
transform: rotate(30deg);
}
img:nth-of-type(2) {
left: 10%;
top: 70%;
transform: rotate(200deg);
}
img:nth-of-type(2) {
float: right;
}
</style>
</head>
<body class="A4">
<section class="sheet">
<h1>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Cum, laboriosam!</h1>
<p>Lorem ipsum dolor sit amet, <u>consectetur</u> adipisicing elit. <em>Officia</em> <strong>aspernatur sed</strong> <i>quis</i> veniam! Itaque fugiat voluptas rerum necessitatibus iste, <b>dolores id eligendi minus! <i>Velit <u>alias</u></i> quos</b> , deleniti optio quod numquam perspiciatis sequi. Hic autem omnis non ipsam odio. Sit nostrum officia, ea officiis corporis tempore ut illum minus placeat repellat similique natus facere iusto aperiam rerum magni inventore in vero error, quisquam nihil dolore culpa optio necessitatibus, dicta? Sit quos enim, id quidem ea amet voluptas vitae odit sequi, ex aliquid commodi illum aperiam odio suscipit reiciendis</p>
<img src="https://placehold.it/400x400" alt="placeholder">
<img src="https://placehold.it/400x400" alt="placeholder">
<img src="https://placehold.it/400x400" alt="placeholder">
<img src="https://placehold.it/400x400" alt="placeholder">
<img src="https://placehold.it/400x400" alt="placeholder">
</section>
<section class="sheet">
<h1>Such wow</h1>
<h2>Such wow</h2>
<h3>Such wow</h3>
<h4>Such wow</h4>
<h5>Such wow</h5>
<h6>Such wow</h6>
<p style="text-align: left">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Molestiae ipsa inventore laborum rem deserunt placeat, praesentium soluta exercitationem corporis at, voluptatibus id atque amet voluptate mollitia nam sunt nisi, excepturi facilis nemo! Maiores deserunt qui, quia soluta culpa accusantium distinctio numquam eaque asperiores maxime suscipit, iusto inventore. Adipisci, quasi corporis!</p>
<p style="text-align: right">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Laborum, suscipit? Officia rem dolorum, quisquam autem expedita ea odio aliquam dicta amet corporis voluptatum ipsam sequi ipsa accusantium enim molestiae nemo, qui, et odit quod corrupti ab? Odio, quisquam voluptatem aperiam totam illum repellendus temporibus harum dolores, laboriosam alias, doloremque et?</p>
<p style="text-align: center">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Sapiente ipsam consectetur omnis ut repellendus, amet commodi minus fugit consequatur recusandae necessitatibus explicabo quasi nostrum eveniet dolores similique eligendi, expedita blanditiis doloremque nemo nobis. Sint aspernatur, mollitia expedita nulla est, rerum aliquam error. Provident saepe similique, dignissimos quia explicabo ab, nihil.</p>
<p style="text-align: justify;">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Sapiente ipsam consectetur omnis ut repellendus, amet commodi minus fugit consequatur recusandae necessitatibus explicabo quasi nostrum eveniet dolores similique eligendi, expedita blanditiis doloremque nemo nobis. Sint aspernatur, mollitia expedita nulla est, rerum aliquam error. Provident saepe similique, dignissimos quia explicabo ab, nihil.</p>
<h1 style="transform: rotate(180deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
<h1 style="transform: rotate(50deg);text-align: center;">AMAZING</h1>
<h1 style="transform: rotate(80deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
<h1 style="transform: rotate(300deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
<h1 style="transform: rotate(260deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
<h1 style="transform: rotate(120deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
<h1 style="transform: rotate(190deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
</section>
</body>
</html>
Hello does this software has been tested to handle 301 requests?
Cloudflare does that and other softwares don't seem to follow up.
Puppeteer await calls are not throwing all errors. Some errors can only be catched from page.on('error', cb)
callback. We should be able to provide these errors better in the responses. Currently almost all errors except validation errors are 500 Internal Server Error. Only place to see what happened is application logs.
Some images are not loading correctly. Test case: https://url-to-pdf-api.herokuapp.com/api/render?emulateScreenMedia=false&url=https://medium.com/@e_mad_ehsan/getting-started-with-puppeteer-and-chrome-headless-for-web-scrapping-6bf5979dee3e
When I alter the options in .env they are not used:
export NODE_ENV=development
export PORT=9990
export ALLOW_HTTP=true
When I use them as a prefix for the start command it works just fine:
ALLOW_HTTP=true PORT=9990 npm start
What am I doing wrong?
BTW, very nice piece of software!
As written inside README.md
, you can pass the value 0 to goto.timeout
when performing a request to /api/render
. However, this is supported from puppeteer 0.12.0
, while this project currently uses 0.11.0
.
I've seen that there is a branch for updating to the newest puppeteer, so this is maybe a non-issue (but the README could be updated before we actually merge the new branch)
The following url returns a 500 error, with no useful details.
{
"status": 500,
"statusText": "Internal Server Error",
"messages": [
"Internal Server Error"
]
}
If i try to pass more than one parameter, i get an error on the second parameter.
Example: https://urltopdf2.herokuapp.com/api/render?url=https://server1.outsystemscloud.com/automatedterritoryas/PDFEmail.aspx?Tenantid=109&Territoryid=564
If i browse to the url, works no problem. When i try to use url-to-pdf-api, i get the following error:
{"status":400,"statusText":"Bad Request","errors":[{"field":["Territoryid"],"location":"query","messages":[""Territoryid" is not allowed"],"types":["object.allowUnknown"]}]}
Again, if i leave off the last &Territoryid=564 it works, no error. Add it, error.
How to send authentication parameter along with the url ??
trying something like this but failing
http://abc.abc.com:9000/api/render?url=http://abc.abc.com&setExtraHTTPHeaders={'header': 'value'}
It's easy to make Chrome display any file://
link. A couple of ways:
Let's figure out if we could have a few ways in Puppeteer to block as much of these as possible. In any case, I'm quite confident that it's not possible to catch all of them. I would definitely recommend serving this API for "trusted" users, e.g. inside your organization.
I want the content to appear on a separate page, not with other content
I think the following shouldn't be allowed as it might put load on the server:
There are some websites such as example using a special lazy loading strategy.
When users scroll quickly(<300ms) they do not load image. Just when users stop to look at the content they load image.
So, I think that we need another option (scrollInterval) to let user to test and decide the interval.
releated discussions:
puppeteer/puppeteer#338 (comment)
Thanks!
I noticed the default file name is render.pdf, but how can I specific custom file name?
Deploy as AWS/Azure Function
E.g. Helvetica Neue is not rendered on Heroku. Url: https://url-to-pdf-api.herokuapp.com/api/render?url=https://github.com/kimmobrunfeldt/url-to-pdf-api/issues/5
Hi,
I'm searching for a few helping hands with the maintenance. This repo is definitely on my top open source maintenance priorities and I'll continue to be a maintainer also but I haven't had enough time to do good maintenance lately. I think it's healthy for any project to have at least 2 persons with collaboration rights. If you'd like to join the effort, please respond to this issue describing a bit your background in open source.
If the height and width parameters are passed while rendering an HTML page, it somehow reduces the font size but the size of the content boxes are not affected.
Is this the expected result, if not is there any solutions (any flag) to make sure the pdf rendering does not affect any applied styles(CSS).
https://url-to-pdf-api.herokuapp.com/api/render?url=file:///etc/passwd
Actually shows the content of the file... as mentioned by erdbeerkaese, here:
https://news.ycombinator.com/item?id=15408217
This issue gathers a lot of issues with PDF header and footer templates. They are not as flexible as I and apparently many others have thought.
pdf.displayHeaderFooter
to true
.Hi,
First of all, thanks for this awesome project. It seems to be really well thought-out, so thank you for your efforts. I also really like the ability to render logged in pages by setting a cookie in the POST request.
Since you are using puppeteer, which also supports rendering pages to images via "screenshot", it would be possible to render images as well. Is this something you're interested in? We have some users which would like this, for example for dashboards that are displayed on a monitors.
Just take a look at https://url-to-pdf-api.herokuapp.com/api/render?url=http://bennettfeely.com/filters&waitFor=1000
The first image (no css) has smooth lines but other images have glitches because of the filters applied to them. If you open the source in Chrome it has no such distortions.
Is it possible to remove links from HTML components rendered into the PDF?
Thanks!
Such as Baidu, a search engine like google in China. It cannot parse Chinese characters on this web page.
https://url-to-pdf-api.herokuapp.com/api/render?url=http://www.baidu.com
CORS_ORIGIN
is missing in config.js
, and it is used in app.js
:
const corsOpts = {
origin: config.CORS_ORIGIN, //undefined
methods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'HEAD', 'PATCH'],
};
I was having difficulties getting the cookies to be sent with my request, and I think I may have found the problem. This function here is missing cookies
assignment, and therefore the resulting cookies
array is always empty.
Am I missing something? Thanks for the library by the way - it's just awesome!
These options make the Chrome more vulnerable. See: puppeteer/puppeteer#290
Some requests to the demo Heroku app return:
{
status: 500,
statusText: "Internal Server Error",
messages: [
"Internal Server Error"
]
}
First - thank you so much for creating and working on this project.
I've deployed to Heroku. Most of the time pdf is generated, sometimes there is an error and entire node server crashes.
Here is the log: url_to_pdf_api_error-01-25-2018.log
Is there a good way to debug this problem? Currently it crashes around ~20% of the time. I was running on "hobby" initially, but had same results on 1x and 2x instance types.
Hi folks! Could someone please point me to some documentation on how to do API key authentication. There's mention of it in the README, but no instructions yet, and I didn't see anything relevant in the Puppeteer docs. Any help appreciated!
The Dom Distiller is a nice tool to remove clutter on web pages: https://github.com/chromium/dom-distiller
It would be great to be able to use it.
I use this Puppeteer microservice to generate receipts in PDF. For each receipt, width is always the same, but height changes, according to the article count in the order.
For now, I'm using the article count to approximate the required height for my receipt. It kind of works, but it's not perfect and is a dirty way to do.
Is there way to tell Puppeteer API : "Please automatically find the right PDF height, according to the HTML body height, in order to generate a perfectly sized PDF" ?
In my corporation we have self-signed certs, which causes to throw errors. How do I disable SSL?
2017-10-11T20:44:32.919Z - info: [pdf-core.js] Set browser viewport..
2017-10-11T20:44:32.920Z - info: [pdf-core.js] Emulate @media screen..
2017-10-11T20:44:32.921Z - info: [pdf-core.js] Goto url http://google.com ..
2017-10-11T20:44:33.689Z - error: [pdf-core.js] Error when rendering page: Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
2017-10-11T20:44:33.689Z - error: [pdf-core.js] Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
2017-10-11T20:44:33.690Z - info: [pdf-core.js] Closing browser..
2017-10-11T20:44:33.708Z - error: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-11T20:44:33.708Z - error: [error-logger.js] Request parameters:
2017-10-11T20:44:33.709Z - error: [error-logger.js] Request body:
2017-10-11T20:44:33.710Z - error: [error-logger.js] Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7) 'Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID\n at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)\n at <anonymous>\n at process._tickCallback (internal/process/next_tick.js:188:7)'
GET /api/render?url=http://google.com&pdf.margin.top=2cm&pdf.margin.right=2cm&pdf.margin.bottom=2cm&pdf.margin.left=2cm 500 1021.139 ms - -
http://localhost:9000/api/render?url=http://google.com
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request parameters:
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request body:
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Error: Only HTTPS allowed.
GET /api/render?url=https://google.com 403 0.824 ms - 74
I'm interested in running this project from an AWS EC2 instance.
I should have no problems doing that following these instructions correct?
https://github.com/alvarcarto/url-to-pdf-api#development
We have an internal site that I'm trying to grab PDFS from on the fly. The app works fine on any public url, but not on our internal.
2017-10-12T14:48:19.108Z - info: [pdf-core.js] Set browser viewport..
2017-10-12T14:48:19.109Z - info: [pdf-core.js] Emulate @media screen..
2017-10-12T14:48:19.109Z - info: [pdf-core.js] Goto url https://cef.erwf.nin.asn/ ..
2017-10-12T14:48:21.395Z - error: [pdf-core.js] Error when rendering page: Error: Failed to navigate: https://cef.erwf.nin.asn/
2017-10-12T14:48:21.396Z - error: [pdf-core.js] Error: Failed to navigate: https://cef.erwf.nin.asn/
at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)
at <anonymous>
2017-10-12T14:48:21.396Z - info: [pdf-core.js] Closing browser..
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request parameters:
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request body:
2017-10-12T14:48:21.408Z - error: [error-logger.js] Error: Failed to navigate: https://cef.erwf.nin.asn/
at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)
at <anonymous> 'Error: Failed to navigate: https://cef.erwf.nin.asn/\n at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)\n at <anonymous>'
GET /api/render?url=https://cef.erwf.nin.asn/ 500 2484.461 ms - -
We are using a workaround to render raw HTML. This workaround is needed to wait until all external resources are loaded. See the issue here: puppeteer/puppeteer#728
I suspect that this workaround is causing errors with large HTMLs. In my tests, I found that ~2MB HTML worked, but 4MB didn't. Tried with 512MB RAM Heroku server.
i am confused in assigning cooking in api. could anyone help me. I have 3 cookies
eg - Evnetid = 6235765; sessionid = jshdak; documentID= sjdh; how to enter this in api.
i read the document and try to put the values but getting error every time could anyone help please?
I think it might be a good idea to restrict URLs to http:// and https:// protocols. The current demo allows file:// type URLs and can therefore be used to read information from the file system.
Try for example https://url-to-pdf-api.herokuapp.com/api/render?url=file:///etc/passwd
There might be issues with other protocols as well. I only tested file:// URLs.
Hi there,
I'm having trouble getting a POST request in Mithril.js to a locally hosted version of this repo to generate a PDF from the URL I pass through. The URL field is undefined on the server side.
This is what my call looks like:
m.request({
method: "POST",
url: "http://localhost:9000/api/render",
headers: {
"content-type": "application/json",
},
data: {
"url": "http://www.google.com",
},
})
.then(function (result) {
try{
console.log('Worked');
} catch (error) {
console.log('Error:' + error);
}
})
.catch(function (result) {
console.log('Error: ' + result);
})
On the server side I output the opts. I get this:
{ cookies: [],
scrollPage: false,
emulateScreenMedia: true,
ignoreHttpsErrors: false,
html: {},
viewport:
{ width: 1600,
height: 1200,
deviceScaleFactor: undefined,
isMobile: undefined,
hasTouch: undefined,
isLandscape: undefined },
goto:
{ waitUntil: 'networkidle',
networkIdleTimeout: 2000,
timeout: undefined,
networkIdleInflight: undefined },
pdf:
{ format: 'A4',
printBackground: true,
scale: undefined,
displayHeaderFooter: undefined,
landscape: undefined,
pageRanges: undefined,
width: undefined,
height: undefined,
margin:
{ top: undefined,
right: undefined,
bottom: undefined,
left: undefined } },
url: undefined,
attachmentName: undefined,
waitFor: undefined }
When I do a curl command it works as expected, html is null and url contains the expected url.
What am I doing wrong? Thanks in advance!
What is the way if I want convert my html to grayscale pdf ?
Each requests starts a new instance with Puppeteer. We should use a pool of e.g. 4 tabs to make rendering as a service more reliable.
I would not want to pass the hosted version auth cookies but locally I would like to pass in a url and a cookie to be set. This would allow me to generate, locally, pdfs of my authenticated pages.
Thanks. It looks neat.
Becker
I have an issue, whatever font-weight property I set it's being ignored. When I open html in Chrome it looks fine, but when I generate pdf from it everything has 'regular' font weight. Has anyone experienced this issue? Is there a workaround?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.