sindresorhus / capture-website Goto Github PK

View Code? Open in Web Editor NEW

1.9K 11.0 132.0 99 KB

Capture screenshots of websites

License: MIT License

JavaScript 96.84% HTML 1.27% CSS 0.12% TypeScript 1.77%

capture-screenshots screenshots website-screenshot-capturer website-screenshot puppeteer nodejs npm-package

capture-website's Introduction

capture-website

Capture screenshots of websites

It uses Puppeteer (Chrome) under the hood.

See capture-website-cli for the command-line tool.

Install

npm install capture-website

Note to Linux users: If you get a sandbox-related error, you need to enable system sandboxing.

Usage

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png');

API

captureWebsite.file(input, outputFilePath, options?)

Capture a screenshot of the given input and save it to the given outputFilePath.

Intermediate directories are created for you if they do not exist.

Returns a Promise<void> that resolves when the screenshot is written.

captureWebsite.buffer(input, options?)

Capture a screenshot of the given input.

Returns a Promise<Buffer> with the screenshot as binary.

captureWebsite.base64(input, options?)

Capture a screenshot of the given input.

Returns a Promise<string> with the screenshot as Base64.

input

Type: string

The URL, file URL, data URL, local file path to the website, or HTML.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'local-file.png');

options

Type: object

inputType

Type: string
Default: 'url'
Values: 'url' | 'html'

Set it to html to treat input as HTML content.

import captureWebsite from 'capture-website';

await captureWebsite.file('<h1>Awesome!</h1>', 'screenshot.png', {
	inputType: 'html'
});

width

Type: number
Default: 1280

Page width.

height

Type: number
Default: 800

Page height.

type

Type: string
Values: 'png' | 'jpeg' | 'webp'
Default: 'png'

Image type.

quality

Type: number
Values: 0...1
Default: 1

Image quality. Only for {type: 'jpeg'} and {type: 'webp'}.

scaleFactor

Type: number
Default: 2

Scale the webpage n times.

The default is what you would get if you captured a normal screenshot on a computer with a retina (High DPI) screen.

emulateDevice

Type: string
Values: Devices (Use the name property)

Make it look like the screenshot was taken on the specified device.

This overrides the width, height, scaleFactor, and userAgent options.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	emulateDevice: 'iPhone X'
});

fullPage

Type: boolean
Default: false

Capture the full scrollable page, not just the viewport.

defaultBackground

Type: boolean
Default: true

Include the default white background.

Disabling this lets you capture screenshots with transparency.

timeout

Type: number (seconds)
Default: 60

The number of seconds before giving up trying to load the page.

Specify 0 to disable the timeout.

delay

Type: number (seconds)
Default: 0

The number of seconds to wait after the page finished loading before capturing the screenshot.

This can be useful if you know the page has animations that you like it to finish before capturing the screenshot.

waitForElement

Type: string

Wait for a DOM element matching the given CSS selector to appear in the page and to be visible before capturing the screenshot. It times out after options.timeout seconds.

element

Type: string

Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible. It times out after options.timeout seconds. Any actions performed as part of options.beforeScreenshot occur before this.

hideElements

Type: string[]

Hide DOM elements matching the given CSS selectors.

Can be useful for cleaning up the page.

This sets visibility: hidden on the matched elements.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	hideElements: [
		'#sidebar',
		'img.ad'
	]
});

removeElements

Type: string[]

Remove DOM elements matching the given CSS selectors.

This sets display: none on the matched elements, so it could potentially break the website layout.

clickElement

Type: string

Click the DOM element matching the given CSS selector.

scrollToElement

Type: string | object

Scroll to the DOM element matching the given CSS selector.

element

Type: string

A CSS selector.

offsetFrom

Type: string
Values: 'top' | 'right' | 'bottom' | 'left'

Offset origin.

offset

Type: number

Offset in pixels.

disableAnimations

Type: boolean
Default: false

Disable CSS animations and transitions.

blockAds

Type: boolean
Default: true

Ad blocking.

isJavaScriptEnabled

Type: boolean
Default: true

Whether JavaScript on the website should be executed.

This does not affect the scripts and modules options.

modules

Type: string[]

Inject JavaScript modules into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .js extension).

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	modules: [
		'https://sindresorhus.com/remote-file.js',
		'local-file.js',
		`
		document.body.style.backgroundColor = 'red';
		`
	]
});

scripts

Type: string[]

Same as the modules option, but instead injects the code as <script> instead of <script type="module">. Prefer the modules option whenever possible.

styles

Type: string[]

Inject CSS styles into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .css extension).

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	styles: [
		'https://sindresorhus.com/remote-file.css',
		'local-file.css',
		`
		body {
			background-color: red;
		}
		`
	]
});

headers

Type: object
Default: {}

Set custom HTTP headers.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	headers: {
		'x-powered-by': 'https://github.com/sindresorhus/capture-website'
	}
});

userAgent

Type: string

Set a custom user agent.

cookies

Type: Array<string | object>

Set cookies in browser string format or object format.

Tip: Go to the website you want a cookie for and copy-paste it from DevTools.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	cookies: [
		// This format is useful for when you copy it from the browser
		'id=unicorn; Expires=Wed, 21 Oct 2018 07:28:00 GMT;',

		// This format is useful for when you have to manually create a cookie
		{
			name: 'id',
			value: 'unicorn',
			expires: Math.round(new Date('2018-10-21').getTime() / 1000)
		}
	]
});

authentication

Type: object

Credentials for HTTP authentication.

username

Type: string

password

Type: string

beforeScreenshot

Type: Function

The specified function is called right before the screenshot is captured, as well as before any bounding rectangle is calculated as part of options.element. It receives the Puppeteer Page instance as the first argument and the browser instance as the second argument. This gives you a lot of power to do custom stuff. The function can be async.

Note: Make sure to not call page.close() or browser.close().

import captureWebsite from 'capture-website';
import checkSomething from './check-something.js';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	beforeScreenshot: async (page, browser) => {
		await checkSomething();
		await page.click('#activate-button');
		await page.waitForSelector('.finished');
	}
});

debug

Type: boolean
Default: false

Show the browser window so you can see what it's doing, redirect page console output to the terminal, and slow down each Puppeteer operation.

Note: This overrides launchOptions with {headless: false, slowMo: 100}.

darkMode

Type: boolean
Default: false

Emulate preference of dark color scheme (prefers-color-scheme).

inset

Type: object | number
Default: 0

Inset the bounding box of the screenshot.

Accepts an object {top?: number; right?: number; bottom?: number; left?: number} or a number as a shorthand for all directions.

Positive values, for example inset: 10, will decrease the size of the screenshot. Negative values, for example inset: {left: -10}, will increase the size of the screenshot.

Note: This option is ignored if the fullPage option is set to true. Can be combined with the element option. Note: When the width or height of the screenshot is equal to 0 an error is thrown.

Example: Include 10 pixels around the element.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'screenshot.png', {
	element: '.logo',
	inset: -10
});

Example: Ignore 15 pixels from the top of the viewport.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'screenshot.png', {
	inset: {
		top: 15
	}
});

launchOptions

Type: object
Default: {headless: 'new'}

Options passed to puppeteer.launch().

Note: Some of the launch options are overridden by the debug option.

overwrite

Type: boolean
Default: false

Overwrite the destination file if it exists instead of throwing an error.

This option applies only to captureWebsite.file().

preloadFunction

Type: string | Function
Default: undefined

Inject a function to be executed prior to navigation.

This can be useful for altering the JavaScript environment. For example, you could define a global method on the window, overwrite navigator.languages to change the language presented by the browser, or mock Math.random to return a fixed value.

clip

Type: object

Define the screenshot's position and size (clipping region).

The position can be specified through x and y coordinates which starts from the top-left.

This can be useful when you only need a part of the page.

You can also consider using element option when you have a CSS selector.

Note that clip is mutually exclusive with the element and fullPage options.

x - X-coordinate where the screenshot starts. Type: number
y - Y-coordinate where the screenshot starts. Type: number
width - The width of the screenshot. Type: number
height - The height of the screenshot. Type: number

For example, define the screenshot's width and height to 400 at position (0, 0):

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	clip: {
		x: 0,
		y: 0,
		width: 400,
		height: 400
	}
});

captureWebsite.devices

Type: string[]

Devices supported by the emulateDevice option.

Tips

Capturing multiple screenshots

import captureWebsite from 'capture-website';

const options = {
	width: 1920,
	height: 1000
};

const items = [
	['https://sindresorhus.com', 'sindresorhus'],
	['https://github.com', 'github'],
	// …
];

await Promise.all(items.map(([url, filename]) => {
	return captureWebsite.file(url, `${filename}.png`, options);
}));

Check out filenamify-url if you need to create a filename from the URL.

FAQ

I'm getting a sandbox-related error

If you get an error like No usable sandbox! or Running as root without --no-sandbox is not supported, you need to properly set up sandboxing on your Linux instance.

Alternatively, if you completely trust the content, you can disable sandboxing (strongly discouraged):

import captureWebsite from 'capture-website';

await captureWebsite.file('…', '…', {
	launchOptions: {
		args: [
			'--no-sandbox',
			'--disable-setuid-sandbox'
		]
	}
});

How is this different from your Pageres project?

The biggest difference is that Pageres supports capturing multiple screenshots in a single call and it automatically generates the filenames and writes the files. Also, when projects are popular and mature, like Pageres, it becomes harder to make drastic changes. There are many things I would change in Pageres today, but I don't want to risk making lots of breaking changes for such a large userbase before I know whether it will work out or not. So this package is a rethink of how I would have made Pageres had I started it today. I plan to bring some things back to Pageres over time.

capture-website-cli - CLI for this module
pageres - A different take on screenshotting websites

capture-website's People

Contributors

Stargazers

Watchers

Forkers

shivamgupta211 discrete-void smartmanru ideas-design hosam1696 send2cloud timoschwarzer digideskio eyaylagul mfkcelzy hicom150 iamsingularity hhy5277 priestd09 dotconnor attavut satyarohith ekas cybernetics luan0ap resfront nikkki tecte productinfo jaalvarezg92 jingwhale enio-infotera jaulz vxsx ketchalegend salvatore78 freshy969 nsisodiya silencerweb pushpen yurivictor datadesk micexx qinrr noampaz fisker dirathea johnpoth sgtrusty prakashpaarthipan saasify-sh guoxiao158 kikobeats hoangpq badibadiola doc22940 xemoe kernelnetworks danishack effortisdev kristofbolyai 1rosehip zpdldhkdl webmaster-zundux sts0mrg0 coding-to-music brandon93s mre nixon-nl jottenlips plastickid trixter5 mattryanmtl cold-a-muse detachhead tech96 krische richienb tactlabs vinayasathyanarayana bgindikagamage miramif standardgalactic fu4303 mozillazg api-pong pkaneshige platform-kit andylamping akumuliation brunoss8 rasata teacherstrange commonjs-bot ropel codassassin mesutyigit icodein bradserbu andregm lingz-ui shemgp xbsl asmolyaninov1 warifp

capture-website's Issues

Hey it seems i cant put variable as url

var url = google.com
(async () => {

    

    await captureWebsite.file(url, 'screenshot.png',{

		launchOptions: {

     

			args: [

     

				'--no-sandbox',

				'--disable-setuid-sandbox'

			],

    }

	});

crashing on PDF files?

I have no idea what to make out of it, but it seems to crash on PDFs

screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
                    ? new Error(`${response.errorText} at ${url}`)
                      ^

Error: net::ERR_ABORTED at https://************/wp-content/uploads/2020/10/ABC-Februar-2018.pdf
    at navigate (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:93:5)
    at async FrameManager.navigateFrame (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:417:16)
    at async Page.goto (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:784:16)
    at async captureWebsite (/Users/nabil/Downloads/screenshot/node_modules/capture-website/index.js:233:2)
    at async Object.module.exports.file (/Users/nabil/Downloads/screenshot/node_modules/capture-website/index.js:367:21)
    at async mapper (file:///Users/nabil/Downloads/screenshot/screenshot-complete-domain.mjs:148:5)
    at async /Users/nabil/Downloads/screenshot/node_modules/p-map/index.js:57:22

Add `scrollToElement` option

Useful if you want to capture a part of the screen that contain a certain element.

The value can either be a CSS selector string or an object like:

{
	element: '#foo',
	offsetFrom: 'top', // Accepts `top`, `right, `bottom`, `left
	offset: 10 // Pixels
}

The above object is not set in stone. Happy to receive feedback on it.

I'm also not sure how the object value should be implemented for the command-line interface.

IssueHunt Summary

hicom150 has been rewarded.

Backers (Total: $40.00)

issuehunt ($40.00)

Submitted pull Requests

#15 Add scrollToElement option

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

IssueHunt has been backed by the following sponsors. Become a sponsor

Wrong typing for `Header` option.

The Header type used by puppeteer is of object type while capture uses the Headers of Fetch API.
https://github.com/puppeteer/puppeteer/blob/v5.4.1/docs/api.md#pagesetextrahttpheadersheaders

Images are not coming after capture

const captureWebsite = require('capture-website');

(async () => {
	await captureWebsite.file('https://www.lumyna.com', 'screenshot2.png', {
        fullPage: true,
        styles: [
            `
            ._1LYUxrhd {
                display: none;
            }
            ._3q_PoeBO {
                opacity: 0
            }
            `
        ]
	}
    );
})();

Here is my code.

Images are not including from the website in capture.

Ad blocking

Would be awesome to have some kind of ad blocking to reduce noise in screenshots.

The best solution would be to use an existing ad blocker Chrome extension, like uBlock Origin, but it seems it's only possible to load extension when not in headless mode ({headless: false}), so the Chrome window would show for a few seconds.

https://gist.github.com/sindresorhus/bca2f7d0c8b31205fa3c9f328d548c70
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#working-with-chrome-extensions

We could potentially make it opt-in and let the user know the downside.

Referer should be configurable

Puppeteer's page.goto options are currently hardcoded:

	await page[isHTMLContent ? 'setContent' : 'goto'](input, {
		timeout: timeoutInSeconds,
		waitUntil: 'networkidle2'
	});

But certain pages, such as news articles from ft.com, display different content depending on the user request' referer header.

I'd suggest exposing page.goto options as a configurable object.

Failed to launch chrome!

Running as root without --no-sandbox is not supported

:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported.

Disable animations?

There's no point in running animations when you just want a screenshot.

We could inject some CSS overrides like this: https://dev.webonomic.nl/how-to-disable-css-transforms-transistions-and-animations

Thoughts?

Capture webpage with device frame

ref - sindresorhus/pageres#367
adding deviceFrame when capturing a screenshot.

Wait for lazy loaded elements when using the `fullPage` option

Is there a way to add an option that will wait until the website will be fully loaded (including lazy loading elements which loading will be triggered because of the scroll)?

Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a 🦄 in your PR description to indicate that you've read this. Thanks for helping out 🙌 - @sindresorhus

IssueHunt Summary

netrules has been rewarded.

Backers (Total: $60.00)

issuehunt ($60.00)

Submitted pull Requests

#40 Wait for lazy loaded elements when using the ullPage option

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

Add `{javascript: false}` option

Add an option to disable scripts on the webpage

Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a 🦄 in your PR description to indicate that you've read this. Thanks for helping out 🙌 - @sindresorhus

IssueHunt Summary

dirathea has been rewarded.

Backers (Total: $60.00)

issuehunt ($60.00)

Submitted pull Requests

#34 javascriptEnabled options

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

IssueHunt has been backed by the following sponsors. Become a sponsor

TimeoutError

Using latest version on Ubuntu 18.04

When trying to fetch a URL the script didn't finish running and I had to exit with Ctrl+C. When I did that, the following message was displayed:

TimeoutError: Navigation timeout of 30000 ms exceeded
    at Promise.then (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21)
    at <anonymous>
  -- ASYNC --
    at Frame.<anonymous> (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/helper.js:111:15)
    at Page.goto (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/Page.js:675:49)
    at Page.<anonymous> (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/helper.js:112:23)
    at captureWebsite (/usr/lib/node_modules/capture-website-cli/node_modules/capture-website/index.js:239:51)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:189:7)

The URL in question does exist and isn't particularly large: 4.52 MB with 30 HTTP requests.

Could you advise on how to fix? Happy to help if you point me in the right direction.

Image Size?

Is there an option to size the image after creation that I'm missing?

google webfont cannot load

I try many time to screenshot some web with google webfont,
but the screenshot .png didn't show google webfont.

And I occure with error something like 'timeout in 60ms'

I will change index.js line 351
await page.waitForFunction(imagesHaveLoaded, {timeout: 60});
await page.waitForFunction(imagesHaveLoaded, {timeout: timeoutInSeconds});

It's work for me

Some random feedback points

This module does exactly what I was looking for, I'm enjoying it
there is a hidden option of options._browser that allows you to pass the browser instance in. This is incredibly useful for speeding up rendering since the cost of opening a browser instance is pretty substantial (about ~3s in my testing). I've hijacked this to greatly improve performance
in relation to the above point, _keepAlive is also useful for speeding up future requests.
you're await ing the page.close method which is wasted time for anyone just wanting the screenshot. My assumption as the consumer is that the page will close. And even if it didn't for some reason, I still want the screenshot returned to me.

Thanks for the nice work. I ended up making myself a really simple docker image that just takes the in all available options as JSON and returns an image.

Is it possible to cancel the image saving?

Is it possible to cancel the image saving?

(async () => {
    await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
        beforeScreenshot: async (page, browser) => {
            await checkSomething(); <----- here ?
        }
    });
})();

Specifying chrome-aws-lambda build for deployment as Lambda function on Netlify

Hi,

is there any way to use this build of chromium + puppeteer-core?

https://github.com/alixaxel/chrome-aws-lambda

The version of puppeteer that is included is over 250mb (too large to deploy in a lambda function) - but the above build comes in at a small enough footprint.

Please advise.

MaxListenersExceededWarning when trying to resolve more than 10 URLs

When I invoke code:

const urlsArray = [] // more than 10 items
let imagePromises = []
urlsArray.forEach(item => {
    const imagePath = `./tmp/${item.id}.jpg`
    const captureWebsitePromise = captureWebsite.file(item.url, imagePath, {
      width: 1280,
      height: 800,
      type: "jpeg",
      overwrite: true
    })
    imagePromises.push(captureWebsitePromise)
  }
})
Promise.all(imagePromises).then(function(imageResult) {
})

I have these errors:

(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGINT listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added to [process]. Use emitter.setMaxListeners() to increase limit

Do you have an idea how to capture more than 10 pages using your module?

UnhandledPromiseRejectionWarning: Error: Page crashed!

Hi,

Thank you for the awesome package. This has been great so far.

I have encountered an issue on a production server: UnhandledPromiseRejectionWarning: Error: Page crashed! which is coming from Puppeteer.

From what I have read, the code below would help mitigate the issue, but I am not sure how to implement this with capture-website:

page.on('error', error => reject(error));

Could you point me in the right direction please?

Capturing website with webgl-canvas only captures initial angle

When using the "captureWebsite.file" function on a page with a 360-degree panorama image on a webgl-canvas is only screenshotting the initial 'angle'.

No matter what angle i position the panorama in, it will always generate the output from the same angle.

Auto-close cookie and GDPR banners

You don't want those in your screenshot.

Relevant: microlinkhq/browserless#20

waitFor is deprecated

Ran across this today when using the delay.

waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.

Full page screenshot

Can we improve the rendering of full-page screenshot i.e. tackling fixed elements as well as lazy-loading of elements at a particular scroll or maybe infinite scrolling too.

IssueHunt Summary

Backers (Total: $60.00)

issuehunt ($60.00)

Submitted pull Requests

#95 workaround fullPage clipping bug by using clip

Become a backer now!

Or submit a pull request to get the deposits!

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

Add a version of this library to be included in another app

The basic idea of this is to allow this to be used as a library to let an app take a screenshot of itself. For example it could be used as a part of a share sheet. An app might also want to allow users to have a full page screenshot option (which is not avalable to most users in chrome without opening devtools, unlike firefox.)

beforeScreenshot is run after element bounding rectangle is determined

I noticed that if element and beforeScreenshot are both specified, whatever occurs as part of beforeScreenshot will happen after the bounding rectangle of element has already been determined. So, if for example, the size of the element was modified by actions in the beforeScreenshot function, the new size will not be determined.

It would make sense to me that order of operations should be changed slightly so that beforeScreenshot is executed prior to determining the bounding rectangle of the targeted element.

hideElements and elements options not working

I'm creating a webscraper for a Discord bot and I scrape the webpage displaying players' info on RealmEye (for a game called Realm of the Mad God).

What I'm trying to do, essentially, is to get a character's image with a screenshot (I would just scrape the url of the image, but there's no href attribute, it's built with classes when the webpage loads).

I used hideElements: ["#mys-content"] (hides ads) and elements: ".character" (to frame the picture around the character's image). But when I run the code, it doesn't hide the ads and it just takes a pic of the full page (as seen below)

Here's my code:

const captureWebsite = require('capturewebsite');

[...]

captureWebsite.file("https://www.realmeye.com/player/Vyle", "charRaw.png", {
            elements: ".character",
            hideElements: [
                "#mys-content"
            ],
            fullPage: true
        }).then(() => {
            //Handling image
        });

Allow direct HTML input

It would be useful to be able to just pass a HTML string. The problem is that we can't really overload the url argument to accept a HTML string as it would make the parsing too ambiguous. Well, we could detect <, but then we would fail if the user thought they could just pass in a string like Hello without any tags. Better to make it explicit, I think.

Puppeteer docs: https://github.com/GoogleChrome/puppeteer/blob/v1.12.0/docs/api.md#pagesetcontenthtml-options

We should also add support for this in the CLI tool.

There are two alternatives I can think of:

1. Method

.html() will return an object with a html property and a Symbol we check for internally so we can handle it correctly.

await captureWebsite.file(captureWebsite.html('<h1>🦄</h1>'), 'screenshot.png');

2. Option

await captureWebsite.file('<h1>🦄</h1>', 'screenshot.png', {inputType: 'html'});

I'm happy to consider other solutions too.

Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a 🦄 in your PR description to indicate that you've read this. Thanks for helping out 🙌 - @sindresorhus

IssueHunt Summary

fisker has been rewarded.

Backers (Total: $40.00)

issuehunt ($40.00)

Submitted pull Requests

#33 Add direct HTML input support

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

IssueHunt has been backed by the following sponsors. Become a sponsor

Bundle jQuery?

When I create scripts to capture websites where I would like to do some quick changes, I would like it to be as easy as possible. While the native DOM methods have come a long way, they are still verbose and annoying.

Would be nice if the modules option already had jQuery available as an import, so I could just do:

import $ from './jquery.js';

Happy to consider other ways to handle this.

Add TypeScript definition

I'm happy to receive a PR to add a TypeScript definition.

It should follow https://github.com/sindresorhus/typescript-definition-style-guide (Make sure to read it twice)

Add `darkMode` option

To force websites that supports it to be dark.

We can use the new Puppeteer feature for this: https://github.com/GoogleChrome/puppeteer/blob/v2.0.0/docs/api.md#pageemulatemediafeaturesfeatures

We should also force the default to be await page.emulateMediaFeatures([{ name: 'prefers-color-scheme', value: 'light' }]); so people get predictable results.

Should be added to https://github.com/sindresorhus/capture-website-cli too.

Fix the `fullPage` option

It's currently broken. Anyone wanting to work on this can continue where #49 left off.

Problem if page is longer than 8192px

If the rendered website page is longer than 8192 px, the .png file is nor right. At 8192 px down, the top of the web page repeats

attachment is a very shrunken capture of https://api.jquery.com/
but you can see the repeated header

Optionally connect to running Chrome instance

Thanks for creating this very useful wrapper around puppeteer, Sindre!

Currently, it launches a Chrome instance for every capture via puppeteer.launch(). Would it make sense, if we added an option to let it connect to a running Chrome instance via puppeteer.connect()? I could take a look at implementing this, if there are no obvious (or not so obvious) arguments against it. 😄

`setGeolocation` needed

page.setGeolocation({ latitude: 90, longitude: 0 });

Add `clickElement` option

Could be useful with an option that accepts a CSS selector to click. In case you need to click away some kind of modal dialog.

I considered supporting multiple elements, but most real-world use-cases would require some delay between the clicks and that's too advanced for such a simple option. If some users need that, they can easily do with with the beforeScreenshot hook.

Should also be implemented for the command-line interface.

capture base64

Hello sir,
return of base64 still buffer.

Q: Is there a default value for emulateDevice?

There isn't a default value mentioned for emulateDevice in the options config object, I was wondering if there is one what is it?

capture-website does not work with website that need gpu rendering

I would like to make thumbnails from pages like this one:
https://potree.org/potree/examples/vr_heidentor.html

is there is a way to get it to work ?

Friendly regards

Screenshot doesnt capture all lazily loaded page elements

Example: https://medichecks.com/ the banner at the top doesn't get captured.

I've tried various combination of timeouts, scrolling trickery, all sorts.

The carousel is dynamically added by exponea.com, a marketing platform.

I need to screenshot various sites which use technology like this so a page-specific hack isn't appropriate.

Anyone know why a simple wait timeout doesn't help?

width and height does not work

When I set the width and height option but it doesn't work

Add `clip` option

Implementation of puppeteer clip functionality.
Example:

{
    clip: {
        x: 10,
        y: 30,
        height: 300,
        width: 200
    }
}

x - <number> - x-coordinate of top-left corner of clip area;
y - <number> - y-coordinate of top-left corner of clip area;
width - <number> - width of clipping area;
height - <number> - height of clipping area.

Preload script before `goto`

if(options.preload){
  await page.evaluateOnNewDocument(options.preload);
}

await page.goto(...);

Add `inset` option

To be able to reduce the scope/clip of the screenshot.

For example ignore 10px at the top of the page:

{
	inset: {
		top: 10
	}
}

Or select an element and also include 10px around it:

{
	element: '.foo',
	inset: {
		top: -10,
		right: -10,
		bottom: -10,
		left: -10
	}
}

Or the shorthand for all sides:

{
	element: '.foo',
	inset: -10
}

Happy to consider improvements to the option proposal.

IssueHunt Summary

krnik has been rewarded.

Backers (Total: $40.00)

issuehunt ($40.00)

Submitted pull Requests

#62 Add inset option

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

"Running as root without --no-sandbox" After enabling user namespace cloning.

My code:

captureWebsite.buffer('https://sindresorhus.com', 'screenshot.png').then(buffer => {
  const attachemnt = new Discord.MessageAttachment(buffer);
  message.channel.send(attachemnt)
});

fullPage is not working and causing zombie process

https://github.com/sindresorhus/capture-website/blob/master/index.js#L321
Based on this fullPage statment, is possible to find 3 problems

it will never get inside the while loop. bodyBoundingHeight is an object. the correct should be bodyBoundingHeight.height
waitForNavigation resolves when navigating to a new page. this is always crashing in timeout, never being resolved. (L331)
crashing on the waitForFunction can cause zombie process since it won't get at the page.close() (L351)

I will open a PR for these items

Can't seem to get scrollToElement to work

I'm trying to use scrollToElement but for some reason, it doesn't work, I'm I've tried using lots of different elements and ways to identify them but can't get it to work. Also, I'm sorry if this is the wrong place to put this but I couldn't get help with this issue anywhere else.

offset and offsetFrom not working?

Hi,

This is the options that i have :
const options = { overwrite: true, offset: 100, offsetFrom: 'left', };

It doesn't seem to make a difference, nothing is changed.
Is this a known thing?

Allow saving as pdf

Multiple screenshots example fails on .map function

The example provided for capturing multiple screenshots:

const captureWebsite = require('capture-website');

const options = {
	width: 1920,
	height: 1000
};

const items = new Map([
	['https://sindresorhus.com', 'sindresorhus'],
	['https://github.com', 'github'],
	// …
]);

(async () => {
	await Promise.all(items.map(({url, filename}) => {
		return captureWebsite.file(url, `${filename}.png`, options);
	}));
})();

Fails with the following error:

(node:14336) UnhandledPromiseRejectionWarning: TypeError: items.map is not a function
    at /Users/user1/workspace/screenshots/attempt5/app.js:67:27
    at Object.<anonymous> (/Users/user1/workspace/screenshots/attempt5/app.js:70:3)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:743:3)
(node:14336) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:14336) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don't have a solution yet, but I'll update when I do.

sindresorhus / capture-website Goto Github PK

capture-website's Introduction

capture-website

Install

Usage

API

captureWebsite.file(input, outputFilePath, options?)

captureWebsite.buffer(input, options?)

captureWebsite.base64(input, options?)

input

options

inputType

width

height

type

quality

scaleFactor

emulateDevice

fullPage

defaultBackground

timeout

delay

waitForElement

element

hideElements

removeElements

clickElement

scrollToElement

element

offsetFrom

offset

disableAnimations

blockAds

isJavaScriptEnabled

modules

scripts

styles

headers

userAgent

cookies

authentication

username

password

beforeScreenshot

debug

darkMode

inset

launchOptions

overwrite

preloadFunction

clip

captureWebsite.devices

Tips

Capturing multiple screenshots

FAQ

I'm getting a sandbox-related error

How is this different from your Pageres project?

Related

capture-website's People

Contributors

Stargazers

Watchers

Forkers

capture-website's Issues

hicom150 has been rewarded.

Backers (Total: $40.00)

Submitted pull Requests

Tips

netrules has been rewarded.

Backers (Total: $60.00)

Submitted pull Requests

Tips

dirathea has been rewarded.

Backers (Total: $60.00)

Submitted pull Requests

Tips

Backers (Total: $60.00)

Submitted pull Requests

Tips

1. Method