GithubHelp home page GithubHelp logo

sindresorhus / capture-website Goto Github PK

View Code? Open in Web Editor NEW
1.9K 11.0 132.0 99 KB

Capture screenshots of websites

License: MIT License

JavaScript 96.84% HTML 1.27% CSS 0.12% TypeScript 1.77%
capture-screenshots screenshots website-screenshot-capturer website-screenshot puppeteer nodejs npm-package

capture-website's Introduction

capture-website

Capture screenshots of websites

It uses Puppeteer (Chrome) under the hood.

See capture-website-cli for the command-line tool.

Install

npm install capture-website

Note to Linux users: If you get a sandbox-related error, you need to enable system sandboxing.

Usage

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png');

API

captureWebsite.file(input, outputFilePath, options?)

Capture a screenshot of the given input and save it to the given outputFilePath.

Intermediate directories are created for you if they do not exist.

Returns a Promise<void> that resolves when the screenshot is written.

captureWebsite.buffer(input, options?)

Capture a screenshot of the given input.

Returns a Promise<Buffer> with the screenshot as binary.

captureWebsite.base64(input, options?)

Capture a screenshot of the given input.

Returns a Promise<string> with the screenshot as Base64.

input

Type: string

The URL, file URL, data URL, local file path to the website, or HTML.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'local-file.png');

options

Type: object

inputType

Type: string
Default: 'url'
Values: 'url' | 'html'

Set it to html to treat input as HTML content.

import captureWebsite from 'capture-website';

await captureWebsite.file('<h1>Awesome!</h1>', 'screenshot.png', {
	inputType: 'html'
});
width

Type: number
Default: 1280

Page width.

height

Type: number
Default: 800

Page height.

type

Type: string
Values: 'png' | 'jpeg' | 'webp'
Default: 'png'

Image type.

quality

Type: number
Values: 0...1
Default: 1

Image quality. Only for {type: 'jpeg'} and {type: 'webp'}.

scaleFactor

Type: number
Default: 2

Scale the webpage n times.

The default is what you would get if you captured a normal screenshot on a computer with a retina (High DPI) screen.

emulateDevice

Type: string
Values: Devices (Use the name property)

Make it look like the screenshot was taken on the specified device.

This overrides the width, height, scaleFactor, and userAgent options.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	emulateDevice: 'iPhone X'
});
fullPage

Type: boolean
Default: false

Capture the full scrollable page, not just the viewport.

defaultBackground

Type: boolean
Default: true

Include the default white background.

Disabling this lets you capture screenshots with transparency.

timeout

Type: number (seconds)
Default: 60

The number of seconds before giving up trying to load the page.

Specify 0 to disable the timeout.

delay

Type: number (seconds)
Default: 0

The number of seconds to wait after the page finished loading before capturing the screenshot.

This can be useful if you know the page has animations that you like it to finish before capturing the screenshot.

waitForElement

Type: string

Wait for a DOM element matching the given CSS selector to appear in the page and to be visible before capturing the screenshot. It times out after options.timeout seconds.

element

Type: string

Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible. It times out after options.timeout seconds. Any actions performed as part of options.beforeScreenshot occur before this.

hideElements

Type: string[]

Hide DOM elements matching the given CSS selectors.

Can be useful for cleaning up the page.

This sets visibility: hidden on the matched elements.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	hideElements: [
		'#sidebar',
		'img.ad'
	]
});
removeElements

Type: string[]

Remove DOM elements matching the given CSS selectors.

This sets display: none on the matched elements, so it could potentially break the website layout.

clickElement

Type: string

Click the DOM element matching the given CSS selector.

scrollToElement

Type: string | object

Scroll to the DOM element matching the given CSS selector.

element

Type: string

A CSS selector.

offsetFrom

Type: string
Values: 'top' | 'right' | 'bottom' | 'left'

Offset origin.

offset

Type: number

Offset in pixels.

disableAnimations

Type: boolean
Default: false

Disable CSS animations and transitions.

blockAds

Type: boolean
Default: true

Ad blocking.

isJavaScriptEnabled

Type: boolean
Default: true

Whether JavaScript on the website should be executed.

This does not affect the scripts and modules options.

modules

Type: string[]

Inject JavaScript modules into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .js extension).

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	modules: [
		'https://sindresorhus.com/remote-file.js',
		'local-file.js',
		`
		document.body.style.backgroundColor = 'red';
		`
	]
});
scripts

Type: string[]

Same as the modules option, but instead injects the code as <script> instead of <script type="module">. Prefer the modules option whenever possible.

styles

Type: string[]

Inject CSS styles into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .css extension).

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	styles: [
		'https://sindresorhus.com/remote-file.css',
		'local-file.css',
		`
		body {
			background-color: red;
		}
		`
	]
});
headers

Type: object
Default: {}

Set custom HTTP headers.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	headers: {
		'x-powered-by': 'https://github.com/sindresorhus/capture-website'
	}
});
userAgent

Type: string

Set a custom user agent.

cookies

Type: Array<string | object>

Set cookies in browser string format or object format.

Tip: Go to the website you want a cookie for and copy-paste it from DevTools.

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	cookies: [
		// This format is useful for when you copy it from the browser
		'id=unicorn; Expires=Wed, 21 Oct 2018 07:28:00 GMT;',

		// This format is useful for when you have to manually create a cookie
		{
			name: 'id',
			value: 'unicorn',
			expires: Math.round(new Date('2018-10-21').getTime() / 1000)
		}
	]
});
authentication

Type: object

Credentials for HTTP authentication.

username

Type: string

password

Type: string

beforeScreenshot

Type: Function

The specified function is called right before the screenshot is captured, as well as before any bounding rectangle is calculated as part of options.element. It receives the Puppeteer Page instance as the first argument and the browser instance as the second argument. This gives you a lot of power to do custom stuff. The function can be async.

Note: Make sure to not call page.close() or browser.close().

import captureWebsite from 'capture-website';
import checkSomething from './check-something.js';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	beforeScreenshot: async (page, browser) => {
		await checkSomething();
		await page.click('#activate-button');
		await page.waitForSelector('.finished');
	}
});
debug

Type: boolean
Default: false

Show the browser window so you can see what it's doing, redirect page console output to the terminal, and slow down each Puppeteer operation.

Note: This overrides launchOptions with {headless: false, slowMo: 100}.

darkMode

Type: boolean
Default: false

Emulate preference of dark color scheme (prefers-color-scheme).

inset

Type: object | number
Default: 0

Inset the bounding box of the screenshot.

Accepts an object {top?: number; right?: number; bottom?: number; left?: number} or a number as a shorthand for all directions.

Positive values, for example inset: 10, will decrease the size of the screenshot. Negative values, for example inset: {left: -10}, will increase the size of the screenshot.

Note: This option is ignored if the fullPage option is set to true. Can be combined with the element option. Note: When the width or height of the screenshot is equal to 0 an error is thrown.

Example: Include 10 pixels around the element.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'screenshot.png', {
	element: '.logo',
	inset: -10
});

Example: Ignore 15 pixels from the top of the viewport.

import captureWebsite from 'capture-website';

await captureWebsite.file('index.html', 'screenshot.png', {
	inset: {
		top: 15
	}
});
launchOptions

Type: object
Default: {headless: 'new'}

Options passed to puppeteer.launch().

Note: Some of the launch options are overridden by the debug option.

overwrite

Type: boolean
Default: false

Overwrite the destination file if it exists instead of throwing an error.

This option applies only to captureWebsite.file().

preloadFunction

Type: string | Function
Default: undefined

Inject a function to be executed prior to navigation.

This can be useful for altering the JavaScript environment. For example, you could define a global method on the window, overwrite navigator.languages to change the language presented by the browser, or mock Math.random to return a fixed value.

clip

Type: object

Define the screenshot's position and size (clipping region).

The position can be specified through x and y coordinates which starts from the top-left.

This can be useful when you only need a part of the page.

You can also consider using element option when you have a CSS selector.

Note that clip is mutually exclusive with the element and fullPage options.

  • x - X-coordinate where the screenshot starts. Type: number
  • y - Y-coordinate where the screenshot starts. Type: number
  • width - The width of the screenshot. Type: number
  • height - The height of the screenshot. Type: number

For example, define the screenshot's width and height to 400 at position (0, 0):

import captureWebsite from 'capture-website';

await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
	clip: {
		x: 0,
		y: 0,
		width: 400,
		height: 400
	}
});

captureWebsite.devices

Type: string[]

Devices supported by the emulateDevice option.

Tips

Capturing multiple screenshots

import captureWebsite from 'capture-website';

const options = {
	width: 1920,
	height: 1000
};

const items = [
	['https://sindresorhus.com', 'sindresorhus'],
	['https://github.com', 'github'],
	// โ€ฆ
];

await Promise.all(items.map(([url, filename]) => {
	return captureWebsite.file(url, `${filename}.png`, options);
}));

Check out filenamify-url if you need to create a filename from the URL.

FAQ

I'm getting a sandbox-related error

If you get an error like No usable sandbox! or Running as root without --no-sandbox is not supported, you need to properly set up sandboxing on your Linux instance.

Alternatively, if you completely trust the content, you can disable sandboxing (strongly discouraged):

import captureWebsite from 'capture-website';

await captureWebsite.file('โ€ฆ', 'โ€ฆ', {
	launchOptions: {
		args: [
			'--no-sandbox',
			'--disable-setuid-sandbox'
		]
	}
});

How is this different from your Pageres project?

The biggest difference is that Pageres supports capturing multiple screenshots in a single call and it automatically generates the filenames and writes the files. Also, when projects are popular and mature, like Pageres, it becomes harder to make drastic changes. There are many things I would change in Pageres today, but I don't want to risk making lots of breaking changes for such a large userbase before I know whether it will work out or not. So this package is a rethink of how I would have made Pageres had I started it today. I plan to bring some things back to Pageres over time.

Related

capture-website's People

Contributors

bendingbender avatar brandon93s avatar cbbfcd avatar detachhead avatar dirathea avatar fisker avatar guoyunhe avatar hicom150 avatar iamkhalidbashir avatar jaulz avatar jopemachine avatar kikobeats avatar krische avatar mre avatar richienb avatar ropel avatar sgtrusty avatar sindresorhus avatar timoschwarzer avatar vlinder avatar zpdldhkdl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capture-website's Issues

Hey it seems i cant put variable as url

var url = google.com
(async () => {

    

    await captureWebsite.file(url, 'screenshot.png',{

		launchOptions: {

     

			args: [

     

				'--no-sandbox',

				'--disable-setuid-sandbox'

			],

    }

	});

crashing on PDF files?

I have no idea what to make out of it, but it seems to crash on PDFs

screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
                    ? new Error(`${response.errorText} at ${url}`)
                      ^

Error: net::ERR_ABORTED at https://************/wp-content/uploads/2020/10/ABC-Februar-2018.pdf
    at navigate (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:93:5)
    at async FrameManager.navigateFrame (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:417:16)
    at async Page.goto (/Users/nabil/Downloads/screenshot/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:784:16)
    at async captureWebsite (/Users/nabil/Downloads/screenshot/node_modules/capture-website/index.js:233:2)
    at async Object.module.exports.file (/Users/nabil/Downloads/screenshot/node_modules/capture-website/index.js:367:21)
    at async mapper (file:///Users/nabil/Downloads/screenshot/screenshot-complete-domain.mjs:148:5)
    at async /Users/nabil/Downloads/screenshot/node_modules/p-map/index.js:57:22

Add `scrollToElement` option

Issuehunt badges

Useful if you want to capture a part of the screen that contain a certain element.

The value can either be a CSS selector string or an object like:

{
	element: '#foo',
	offsetFrom: 'top', // Accepts `top`, `right, `bottom`, `left
	offset: 10 // Pixels
}

The above object is not set in stone. Happy to receive feedback on it.

I'm also not sure how the object value should be implemented for the command-line interface.


IssueHunt Summary

hicom150 hicom150 has been rewarded.

Backers (Total: $40.00)

Submitted pull Requests


Tips


IssueHunt has been backed by the following sponsors. Become a sponsor

Images are not coming after capture

const captureWebsite = require('capture-website');

(async () => {
	await captureWebsite.file('https://www.lumyna.com', 'screenshot2.png', {
        fullPage: true,
        styles: [
            `
            ._1LYUxrhd {
                display: none;
            }
            ._3q_PoeBO {
                opacity: 0
            }
            `
        ]
	}
    );
})();

Here is my code.

Images are not including from the website in capture.

Ad blocking

Would be awesome to have some kind of ad blocking to reduce noise in screenshots.

The best solution would be to use an existing ad blocker Chrome extension, like uBlock Origin, but it seems it's only possible to load extension when not in headless mode ({headless: false}), so the Chrome window would show for a few seconds.

https://gist.github.com/sindresorhus/bca2f7d0c8b31205fa3c9f328d548c70
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#working-with-chrome-extensions

We could potentially make it opt-in and let the user know the downside.

Referer should be configurable

Puppeteer's page.goto options are currently hardcoded:

	await page[isHTMLContent ? 'setContent' : 'goto'](input, {
		timeout: timeoutInSeconds,
		waitUntil: 'networkidle2'
	});

But certain pages, such as news articles from ft.com, display different content depending on the user request' referer header.

I'd suggest exposing page.goto options as a configurable object.

Failed to launch chrome!

Running as root without --no-sandbox is not supported

:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported.

Wait for lazy loaded elements when using the `fullPage` option

Issuehunt badges

Is there a way to add an option that will wait until the website will be fully loaded (including lazy loading elements which loading will be triggered because of the scroll)?


Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a ๐Ÿฆ„ in your PR description to indicate that you've read this. Thanks for helping out ๐Ÿ™Œ - @sindresorhus


IssueHunt Summary

netrules netrules has been rewarded.

Backers (Total: $60.00)

Submitted pull Requests


Tips

Add `{javascript: false}` option

Issuehunt badges

Add an option to disable scripts on the webpage


Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a ๐Ÿฆ„ in your PR description to indicate that you've read this. Thanks for helping out ๐Ÿ™Œ - @sindresorhus


IssueHunt Summary

dirathea dirathea has been rewarded.

Backers (Total: $60.00)

Submitted pull Requests


Tips


IssueHunt has been backed by the following sponsors. Become a sponsor

TimeoutError

Using latest version on Ubuntu 18.04

When trying to fetch a URL the script didn't finish running and I had to exit with Ctrl+C. When I did that, the following message was displayed:

TimeoutError: Navigation timeout of 30000 ms exceeded
    at Promise.then (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21)
    at <anonymous>
  -- ASYNC --
    at Frame.<anonymous> (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/helper.js:111:15)
    at Page.goto (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/Page.js:675:49)
    at Page.<anonymous> (/usr/lib/node_modules/capture-website-cli/node_modules/puppeteer/lib/helper.js:112:23)
    at captureWebsite (/usr/lib/node_modules/capture-website-cli/node_modules/capture-website/index.js:239:51)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:189:7)

The URL in question does exist and isn't particularly large: 4.52 MB with 30 HTTP requests.

Could you advise on how to fix? Happy to help if you point me in the right direction.

Image Size?

Is there an option to size the image after creation that I'm missing?

google webfont cannot load

I try many time to screenshot some web with google webfont,
but the screenshot .png didn't show google webfont.

And I occure with error something like 'timeout in 60ms'

I will change index.js line 351
await page.waitForFunction(imagesHaveLoaded, {timeout: 60});
await page.waitForFunction(imagesHaveLoaded, {timeout: timeoutInSeconds});

It's work for me

Some random feedback points

  • This module does exactly what I was looking for, I'm enjoying it
  • there is a hidden option of options._browser that allows you to pass the browser instance in. This is incredibly useful for speeding up rendering since the cost of opening a browser instance is pretty substantial (about ~3s in my testing). I've hijacked this to greatly improve performance
  • in relation to the above point, _keepAlive is also useful for speeding up future requests.
  • you're await ing the page.close method which is wasted time for anyone just wanting the screenshot. My assumption as the consumer is that the page will close. And even if it didn't for some reason, I still want the screenshot returned to me.

Thanks for the nice work. I ended up making myself a really simple docker image that just takes the in all available options as JSON and returns an image.

Is it possible to cancel the image saving?

Is it possible to cancel the image saving?

(async () => {
    await captureWebsite.file('https://sindresorhus.com', 'screenshot.png', {
        beforeScreenshot: async (page, browser) => {
            await checkSomething(); <----- here ?
        }
    });
})();

MaxListenersExceededWarning when trying to resolve more than 10 URLs

When I invoke code:

const urlsArray = [] // more than 10 items
let imagePromises = []
urlsArray.forEach(item => {
    const imagePath = `./tmp/${item.id}.jpg`
    const captureWebsitePromise = captureWebsite.file(item.url, imagePath, {
      width: 1280,
      height: 800,
      type: "jpeg",
      overwrite: true
    })
    imagePromises.push(captureWebsitePromise)
  }
})
Promise.all(imagePromises).then(function(imageResult) {
})

I have these errors:

(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGINT listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added to [process]. Use emitter.setMaxListeners() to increase limit
(node:9535) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added to [process]. Use emitter.setMaxListeners() to increase limit

Do you have an idea how to capture more than 10 pages using your module?

UnhandledPromiseRejectionWarning: Error: Page crashed!

Hi,

Thank you for the awesome package. This has been great so far.

I have encountered an issue on a production server: UnhandledPromiseRejectionWarning: Error: Page crashed! which is coming from Puppeteer.

From what I have read, the code below would help mitigate the issue, but I am not sure how to implement this with capture-website:

page.on('error', error => reject(error));

Could you point me in the right direction please?

Capturing website with webgl-canvas only captures initial angle

When using the "captureWebsite.file" function on a page with a 360-degree panorama image on a webgl-canvas is only screenshotting the initial 'angle'.

No matter what angle i position the panorama in, it will always generate the output from the same angle.

waitFor is deprecated

Ran across this today when using the delay.

waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.

Full page screenshot

Issuehunt badges

Can we improve the rendering of full-page screenshot i.e. tackling fixed elements as well as lazy-loading of elements at a particular scroll or maybe infinite scrolling too.


IssueHunt Summary

Backers (Total: $60.00)

Submitted pull Requests


Become a backer now!

Or submit a pull request to get the deposits!

Tips

Add a version of this library to be included in another app

The basic idea of this is to allow this to be used as a library to let an app take a screenshot of itself. For example it could be used as a part of a share sheet. An app might also want to allow users to have a full page screenshot option (which is not avalable to most users in chrome without opening devtools, unlike firefox.)

beforeScreenshot is run after element bounding rectangle is determined

I noticed that if element and beforeScreenshot are both specified, whatever occurs as part of beforeScreenshot will happen after the bounding rectangle of element has already been determined. So, if for example, the size of the element was modified by actions in the beforeScreenshot function, the new size will not be determined.

It would make sense to me that order of operations should be changed slightly so that beforeScreenshot is executed prior to determining the bounding rectangle of the targeted element.

hideElements and elements options not working

I'm creating a webscraper for a Discord bot and I scrape the webpage displaying players' info on RealmEye (for a game called Realm of the Mad God).

What I'm trying to do, essentially, is to get a character's image with a screenshot (I would just scrape the url of the image, but there's no href attribute, it's built with classes when the webpage loads).

I used hideElements: ["#mys-content"] (hides ads) and elements: ".character" (to frame the picture around the character's image). But when I run the code, it doesn't hide the ads and it just takes a pic of the full page (as seen below)

image

Here's my code:

const captureWebsite = require('capturewebsite');

[...]

captureWebsite.file("https://www.realmeye.com/player/Vyle", "charRaw.png", {
            elements: ".character",
            hideElements: [
                "#mys-content"
            ],
            fullPage: true
        }).then(() => {
            //Handling image
        });

Allow direct HTML input

Issuehunt badges

It would be useful to be able to just pass a HTML string. The problem is that we can't really overload the url argument to accept a HTML string as it would make the parsing too ambiguous. Well, we could detect <, but then we would fail if the user thought they could just pass in a string like Hello without any tags. Better to make it explicit, I think.

Puppeteer docs: https://github.com/GoogleChrome/puppeteer/blob/v1.12.0/docs/api.md#pagesetcontenthtml-options

We should also add support for this in the CLI tool.

There are two alternatives I can think of:

1. Method

.html() will return an object with a html property and a Symbol we check for internally so we can handle it correctly.

await captureWebsite.file(captureWebsite.html('<h1>๐Ÿฆ„</h1>'), 'screenshot.png');

2. Option

await captureWebsite.file('<h1>๐Ÿฆ„</h1>', 'screenshot.png', {inputType: 'html'});

I'm happy to consider other solutions too.


Note: This issue has a bounty, so it's expected that you are an experienced programmer and that you give it your best effort if you intend to tackle this. Don't forget, if applicable, to add tests, docs (double-check for typos), and update TypeScript definitions. And don't be sloppy. Review your own diff multiple times and try to find ways to improve and simplify your code. Instead of asking too many questions, present solutions. The point of an issue bounty is to reduce my workload, not give me more. Include a ๐Ÿฆ„ in your PR description to indicate that you've read this. Thanks for helping out ๐Ÿ™Œ - @sindresorhus


IssueHunt Summary

fisker fisker has been rewarded.

Backers (Total: $40.00)

Submitted pull Requests


Tips


IssueHunt has been backed by the following sponsors. Become a sponsor

Bundle jQuery?

When I create scripts to capture websites where I would like to do some quick changes, I would like it to be as easy as possible. While the native DOM methods have come a long way, they are still verbose and annoying.

Would be nice if the modules option already had jQuery available as an import, so I could just do:

import $ from './jquery.js';

Happy to consider other ways to handle this.

Optionally connect to running Chrome instance

Thanks for creating this very useful wrapper around puppeteer, Sindre!

Currently, it launches a Chrome instance for every capture via puppeteer.launch(). Would it make sense, if we added an option to let it connect to a running Chrome instance via puppeteer.connect()? I could take a look at implementing this, if there are no obvious (or not so obvious) arguments against it. ๐Ÿ˜„

Add `clickElement` option

Could be useful with an option that accepts a CSS selector to click. In case you need to click away some kind of modal dialog.

I considered supporting multiple elements, but most real-world use-cases would require some delay between the clicks and that's too advanced for such a simple option. If some users need that, they can easily do with with the beforeScreenshot hook.

Should also be implemented for the command-line interface.

Screenshot doesnt capture all lazily loaded page elements

Example: https://medichecks.com/ the banner at the top doesn't get captured.

I've tried various combination of timeouts, scrolling trickery, all sorts.

The carousel is dynamically added by exponea.com, a marketing platform.

I need to screenshot various sites which use technology like this so a page-specific hack isn't appropriate.

Anyone know why a simple wait timeout doesn't help?

Add `clip` option

Implementation of puppeteer clip functionality.
Example:

{
    clip: {
        x: 10,
        y: 30,
        height: 300,
        width: 200
    }
}
  • x - <number> - x-coordinate of top-left corner of clip area;
  • y - <number> - y-coordinate of top-left corner of clip area;
  • width - <number> - width of clipping area;
  • height - <number> - height of clipping area.

Add `inset` option

Issuehunt badges

To be able to reduce the scope/clip of the screenshot.

For example ignore 10px at the top of the page:

{
	inset: {
		top: 10
	}
}

Or select an element and also include 10px around it:

{
	element: '.foo',
	inset: {
		top: -10,
		right: -10,
		bottom: -10,
		left: -10
	}
}

Or the shorthand for all sides:

{
	element: '.foo',
	inset: -10
}

Happy to consider improvements to the option proposal.


IssueHunt Summary

krnik krnik has been rewarded.

Backers (Total: $40.00)

Submitted pull Requests


Tips

fullPage is not working and causing zombie process

https://github.com/sindresorhus/capture-website/blob/master/index.js#L321
Based on this fullPage statment, is possible to find 3 problems

  1. it will never get inside the while loop. bodyBoundingHeight is an object. the correct should be bodyBoundingHeight.height
  2. waitForNavigation resolves when navigating to a new page. this is always crashing in timeout, never being resolved. (L331)
  3. crashing on the waitForFunction can cause zombie process since it won't get at the page.close() (L351)

I will open a PR for these items

Can't seem to get scrollToElement to work

I'm trying to use scrollToElement but for some reason, it doesn't work, I'm I've tried using lots of different elements and ways to identify them but can't get it to work. Also, I'm sorry if this is the wrong place to put this but I couldn't get help with this issue anywhere else.

offset and offsetFrom not working?

Hi,

This is the options that i have :
const options = { overwrite: true, offset: 100, offsetFrom: 'left', };

It doesn't seem to make a difference, nothing is changed.
Is this a known thing?

Multiple screenshots example fails on .map function

The example provided for capturing multiple screenshots:

const captureWebsite = require('capture-website');

const options = {
	width: 1920,
	height: 1000
};

const items = new Map([
	['https://sindresorhus.com', 'sindresorhus'],
	['https://github.com', 'github'],
	// โ€ฆ
]);

(async () => {
	await Promise.all(items.map(({url, filename}) => {
		return captureWebsite.file(url, `${filename}.png`, options);
	}));
})();

Fails with the following error:

(node:14336) UnhandledPromiseRejectionWarning: TypeError: items.map is not a function
    at /Users/user1/workspace/screenshots/attempt5/app.js:67:27
    at Object.<anonymous> (/Users/user1/workspace/screenshots/attempt5/app.js:70:3)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:743:3)
(node:14336) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:14336) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don't have a solution yet, but I'll update when I do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.