microlinkhq / browserless Goto Github PK
View Code? Open in Web Editor NEWThe headless Chrome/Chromium driver on top of Puppeteer.
Home Page: https://browserless.js.org
License: MIT License
The headless Chrome/Chromium driver on top of Puppeteer.
Home Page: https://browserless.js.org
License: MIT License
It is a more lightweight dependency and only jpeg/png support is necessary.
Need to add the ability to load ABP rules and parse them.
Related:
I'm trying to launch 100 instance of browser at the same time to load test my website. Any example for that using browserless pool?
Is there any way we can destroy the pool after running tasks? Right now it keeps the process running forever
When rendering PDF's server side it's often better to disable JavaScript execution. It just uses unnecessary server resources while it's most of the time not necessary to render the page.
It would be awesome if we could pass a property to /pdf
where we can disable JavaScript in:
https://github.com/microlinkhq/browserless/blob/master/packages/pdf/src/index.js
Puppeteer supports this out of the box if you just use page.setJavaScriptEnabled(false)
before navigation to a page:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetjavascriptenabledenabled
Some suggestions:
Inspiration
package.json
.My environment:
0.38.0
v12.22.6
6.14.15
Just want to test browserless following the starting guide from documentation but I am getting the error:
const browserless = await browserlessFactory.createContext()
^^^^^
SyntaxError: await is only valid in async function
at wrapSafe (internal/modules/cjs/loader.js:915:16)
at Module._compile (internal/modules/cjs/loader.js:963:27)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
at Module.load (internal/modules/cjs/loader.js:863:32)
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
at internal/main/run_main_module.js:17:47
mkdir test
cd test
nvm use v12.22.6
npm init -y
npm install browserless puppeteer --save
touch index.js
index.js
contentconst createBrowserless = require('browserless')
const termImg = require('term-img')
// First, create a browserless factory
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()
// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache
// with other browser contexts.
const browserless = await browserlessFactory.createContext()
// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('https://basecamp.com', {
device: 'iPhone 6'
})
console.log(termImg(buffer))
// After your task is done, destroy your browser context
await browserless.destroyContext()
// At the end, gracefully shutdown the browser process
await browserless.close()
node index.js
Runningkit: https://runkit.com/embed/a7xjdfhnz7xi
We are waiting until sindresorhus/fkill#34 land.
People don't care if they need to wait for 'load'
or 'networkidle*'
event.
Implement an auto
mode that follows this behavior.
'networkidle*'
and 'load'
event on parallel (maybe two tabs?).'networkidle*'
event response. Otherwise, fallback into 'load'
.Testing URLs
Related
package.json
.When you import the browserless on a typescript based project, the error is given that it has no types, i also don't see them in the repo or in @types.
Just create a new typescript project and import the browserless.
import createBrowserless from 'browserless';
Tell us how to reproduce this issue.
Browserless should have types, so that it can be used easily in typescript and help the user with intellisense.
It has no inbuilt types or no info on them being installed separately.
package.json
.despite passing the proxy server details in the args, browserless doesnt launch puppeteer with my proxy server. I have tested the proxy server independently with with puppeteer and puppeteer-extra. so the issue isnt the proxy server
import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
import browserles from 'browserless'
puppeteer.use(StealthPlugin())
const browserless = browserles()
(async function () {
try {
let pageHtml = await browserless.html('https://httpbin.org/ip', {
puppeteer:puppeteer,
args:[`--proxy-server=${'my-proxy-deatils'}`,'--disable-gpu', '--single-process', '--no-zygote', '--no-sandbox', '--hide-scrollbars'],
incognito: true
})
console.log(pageHtml)
} catch(e){
throw new Error(e.message)
}
})()
ip address be the IP address of my proxy server
shows ip of my machine instead
Libraries
URLs to test
Related
according with puppeteer documentation:
path
: The file path to save the image to. The screenshot type will be inferred from file extension. If path is a relative path, then it is resolved relative to current working directory. If no path is provided, the image won't be saved to the disk.
so the temporal file could be moved out of the library
Detect username and password from proxy server URI
const args = baseArgs.concat([`--proxy-server=${proxyUrl}`])
Auto authenticate a page based on that credentials
// if you're using an authenticated proxy
await page.authenticate({ username, password })
related
Move:
From screenshot prepare https://github.com/microlinkhq/browserless/blob/master/packages/screenshot/src/prepare.js
to goto package, then the rest of methods (like pdf) can take these query parameters advantages
whishlist
/dev/shm
or similar) for userDataDir & temporal files. (puppeteer/puppeteer#7243).destroy()
Using puppeteer-extra gives me this error. This seems to be the case only when trying to use browserless with puppeteer-extra
UhandledPromiseRejectionWarning: Error: Could not find browser revision 818858. Run "PUPPETEER_PRODUCT=firefox npm install" or "PUPPETEER_PRODUCT=firefox yarn install" to download a supported Firefox browser binary.
Inspired by pretty-json.now.sh.
Saw this linked on echo.js, decided to give a couple of the examples a shot.
Copy/pasted the screenshot example from the docs into a js file, added a package.json, installed and saved browserless, then ran node on the js file. I'm assuming that would be a standard use-case for the lib.
Here's the error. I will spend some time chasing it down when I get home from work later.
throw err;
^
Error: Cannot find module 'puppeteer'
at Function.Module._resolveFilename (module.js:557:15)
at Function.Module._load (module.js:484:25)
at Module.require (module.js:606:17)
at require (internal/module.js:11:18)
at Object.<anonymous> (/home/mike/dev/test/node_modules/browserless/index.js:6:19)
at Module._compile (module.js:662:30)
at Object.Module._extensions..js (module.js:673:10)
at Module.load (module.js:575:32)
at tryModuleLoad (module.js:515:12)
at Function.Module._load (module.js:507:3) ```
package.json
.I couldn't try it.
const browserless = require("browserless");
const saveBufferToFile = (buffer, fileName) => {
const wstream = require("fs").createWriteStream(fileName);
wstream.write(buffer);
wstream.end();
};
browserless
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
.then((buffer) => {
const fileName = "screenshot.png";
saveBufferToFile(buffer, fileName);
console.log(your screenshot is here:
, fileName);
});
Save a screenshot.
$ node index.js
/home/juliolima/projects/poc_browserless/index.js:10
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
^
TypeError: browserless.screenshot is not a function
at Object. (/home/juliolima/projects/poc_browserless/index.js:10:4)
at Module._compile (internal/modules/cjs/loader.js:1085:14)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
at Module.load (internal/modules/cjs/loader.js:950:32)
at Function.Module._load (internal/modules/cjs/loader.js:790:12)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
at internal/main/run_main_module.js:17:47
error Command failed with exit code 1.
🚨 You need to enable Continuous Integration on all branches of this repository. 🚨
To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because we are using your CI build statuses to figure out when to notify you about breaking changes.
Since we did not receive a CI status on the greenkeeper/initial
branch, we assume that you still need to configure it.
If you have already set up a CI for this repository, you might need to check your configuration. Make sure it will run on all new branches. If you don’t want it to run on every branch, you can whitelist branches starting with greenkeeper/
.
We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.
Once you have installed CI on this repository, you’ll need to re-trigger Greenkeeper’s initial Pull Request. To do this, please delete the greenkeeper/initial
branch in this repository, and then remove and re-add this repository to the Greenkeeper integration’s white list on Github. You'll find this list on your repo or organiszation’s settings page, under Installed GitHub Apps.
package.json
.Dependency error
Attempts to run with multiple docker containers at once.
Normal operation
When running multiple times at once, a 503 error occurs in the request sent by top-user-agent.
It is presumed that 'https://techblog.willshouse.com/2012/01/03/most-common-user-agents/' used by top-user-agent has Cloudflare defense.
HTTPError: Response code 503 (Service Temporarily Unavailable)
crawler.4.@DESKTOP-2 | at Request.<anonymous> (/home/webdriver/.../crawlers/node_modules/got/dist/source/as-promise/index.js:117:42)
crawler.4.@DESKTOP-2 | at processTicksAndRejections (internal/process/task_queues.js:93:5) {
crawler.4.@DESKTOP-2 | code: undefined,
crawler.4.@DESKTOP-2 | timings: {
crawler.4.@DESKTOP-2 | start: 1630547054142,
crawler.4.@DESKTOP-2 | socket: 1630547054144,
crawler.4.@DESKTOP-2 | lookup: 1630547054203,
crawler.4.@DESKTOP-2 | connect: 1630547054241,
crawler.4.@DESKTOP-2 | secureConnect: 1630547054285,
crawler.4.@DESKTOP-2 | upload: 1630547054285,
crawler.4.@DESKTOP-2 | response: 1630547054330,
crawler.4.@DESKTOP-2 | end: 1630547054335,
crawler.4.@DESKTOP-2 | error: undefined,
crawler.4.@DESKTOP-2 | abort: undefined,
crawler.4.@DESKTOP-2 | phases: {
crawler.4.@DESKTOP-2 | wait: 2,
crawler.4.@DESKTOP-2 | dns: 59,
crawler.4.@DESKTOP-2 | tcp: 38,
crawler.4.@DESKTOP-2 | tls: 44,
crawler.4.@DESKTOP-2 | request: 0,
crawler.4.@DESKTOP-2 | firstByte: 45,
crawler.4.@DESKTOP-2 | download: 5,
crawler.4.@DESKTOP-2 | total: 193
crawler.4.@DESKTOP-2 | }
crawler.4.@DESKTOP-2 | }
crawler.4.@DESKTOP-2 | }
Puppeteer 2.x uses an array interface:
[
{
name: 'Blackberry PlayBook',
userAgent: 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+',
viewport: {
width: 600,
height: 1024,
deviceScaleFactor: 1,
isMobile: true,
hasTouch: true,
isLandscape: false
}
},
while puppeteer 3.x is using an object map:
'Nexus 6 landscape': {
name: 'Nexus 6 landscape',
userAgent: 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36',
viewport: {
width: 732,
height: 412,
deviceScaleFactor: 3.5,
isMobile: true,
hasTouch: true,
isLandscape: true
}
},
The current implementation is failing cause concat is a method for arrays.
That actually a good thing since right now we are converting array into objects so just need to remove that conversion since it is no longer necessary 🙂
Puppeteer is going to implement a global timeout in the next breaking version:
puppeteer/puppeteer#3158
In the middle time, we need to control timeout with enough granularity for ensuring we don't' waste resources.
The current implementation is leveraging the action out of the library, making impossible close the page under timeout:
https://github.com/Kikobeats/html-get/blob/master/src/index.js#L89
Current Behavior
When I use the following MQL API, the report returns the result.data.insights.lighthouse.audits['final-screenshot']
is returned as a base64 encoded image. However, this image is of the mobile view and not of the desktop view of the website.
const url = 'https://anywebsitehere.com';
const payload = {
meta: false,
insights: {
lighthouse: {
device: 'desktop',
onlyCategories: ['performance', 'best-practices', 'accessibility', 'seo'],
},
technologies: false,
},
};
const result = await mql(url, payload);
Expected behavior/code
I'd expect the above to return the desktop variation of the image and not a mobile version.
Additional context/Screenshots
Can be provided upon request.
The current tracking implementation is a poor port of disconnect rules
We need to implement a built-in solution.
Related:
URLs for testing
Inspiration
Related
Looks like the plugin can't be loaded directly at puppeteer layer: puppeteer/puppeteer#1286
But I suppose we can do something in execution time.
package.json
.Installing browserless via npm would fail and throw an error:
> node scripts/postinstall
/Project/node_modules/hooman/hooman.js:14
const instance = got.extend({
^
TypeError: got.extend is not a function
at Object.<anonymous> (/Project/node_modules/hooman/hooman.js:14:22)
at Module._compile (internal/modules/cjs/loader.js:1133:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
at Module.load (internal/modules/cjs/loader.js:977:32)
at Function.Module._load (internal/modules/cjs/loader.js:877:14)
at Module.require (internal/modules/cjs/loader.js:1019:19)
at require (internal/modules/cjs/helpers.js:77:18)
at Object.<anonymous> (/Project/node_modules/top-user-agents/scripts/postinstall.js:5:13)
at Module._compile (internal/modules/cjs/loader.js:1133:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
Note: You can reproduce the code using interactive Node.js shell by Runkit.
npm i -S browserless
It will be done when sindresorhus/p-retry#27 will be closed.
Would love to have Typescript type definitions.
package.json
. (v14.17.5)Trying to use proxy as described in documentation results in error for me. Could you advise what is the proper way to define it? In past issues I only found some other way, not described in the docs ( #259 ). Thanks
const browserless = require('browserless')
const { SocksProxyAgent } = require('socks-proxy-agent')
function testf(url) {
(async () => {
const browserless_factory = browserless()
const browser = await browserless_factory.createContext({
// agent: undefined
agent: new SocksProxyAgent({
host: 'localhost',
port: 9050
})
})
page_content = await browser.html(url)
await browser.destroyContext();
console.log(page_content)
})()
}
testf('http://ip-api.com/json')
Requests made through specified proxy.
Running the script above results in the following error:
(node:29627) UnhandledPromiseRejectionWarning: Error: Request is already handled!
at Object.assert (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
at HTTPRequest.continue (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:283:21)
at PuppeteerBlocker.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:225:33)
at BlockingContext.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:65:47)
at <MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:226:52
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async HTTPRequest.finalizeInterceptions (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:132:9)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:29627) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:29627) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Note that everything works ok if I change agent
to undefined.
package.json
.browserless is using prism-themes as dependency, but prism-themes doesn't have a valid main entry (should be a valid js file path). It caused deployment failures on vercel. I also tried with vercel/pkg, it failed as well.
I'm deploying next-imagegen-example to vercel with puppeteer provider (which is using browserless)
you can use vercel/pkg
to bundle imagegen-puppeteer-provider as well
Deployment should work well
Deployment failed with error that they couldn't resolve module prism-themes
change require.resolve
for prism-themes to sth else. maybe path.resolve
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.