GithubHelp home page GithubHelp logo

microlinkhq / browserless Goto Github PK

View Code? Open in Web Editor NEW
1.5K 10.0 76.0 23.32 MB

The headless Chrome/Chromium driver on top of Puppeteer.

Home Page: https://browserless.js.org

License: MIT License

JavaScript 96.67% HTML 3.33%
puppeteer puppeteer-core headless-chrome headless-chromium

browserless's People

Contributors

adityawankhede5 avatar catpea avatar ctalkington-brado avatar dependabot-preview[bot] avatar dependabot[bot] avatar eddymens avatar greenkeeper[bot] avatar imgbot[bot] avatar imgbotapp avatar kikobeats avatar lucleray avatar marcelscruz avatar pierresaid avatar remusao avatar staabm avatar timkor avatar tripss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

browserless's Issues

Example of pool?

I'm trying to launch 100 instance of browser at the same time to load test my website. Any example for that using browserless pool?

How to destroy pool?

Is there any way we can destroy the pool after running tasks? Right now it keeps the process running forever

[goto] Disable JavaScript

When rendering PDF's server side it's often better to disable JavaScript execution. It just uses unnecessary server resources while it's most of the time not necessary to render the page.

It would be awesome if we could pass a property to /pdf where we can disable JavaScript in:
https://github.com/microlinkhq/browserless/blob/master/packages/pdf/src/index.js

Puppeteer supports this out of the box if you just use page.setJavaScriptEnabled(false) before navigation to a page:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetjavascriptenabledenabled

Sample from documentation site not working out of the box

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Subject of the issue

My environment:

  • nvm: 0.38.0
  • node: v12.22.6
  • npm: 6.14.15

Just want to test browserless following the starting guide from documentation but I am getting the error:

const browserless = await browserlessFactory.createContext()
                    ^^^^^

SyntaxError: await is only valid in async function
    at wrapSafe (internal/modules/cjs/loader.js:915:16)
    at Module._compile (internal/modules/cjs/loader.js:963:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47

Steps to reproduce

Installing

mkdir test
cd test
nvm use v12.22.6
npm init -y
npm install browserless puppeteer --save

Sample project

touch index.js
index.js content
const createBrowserless = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory 
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()

// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache 
// with other browser contexts.
const browserless = await browserlessFactory.createContext()

// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('https://basecamp.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browserless.close()
Running
node index.js

Runningkit: https://runkit.com/embed/a7xjdfhnz7xi

[goto] implement `waitUntil: 'auto'`

People don't care if they need to wait for 'load' or 'networkidle*' event.

Implement an auto mode that follows this behavior.

  • Run 'networkidle*' and 'load' event on parallel (maybe two tabs?).
  • Add timeout over network event for preventing infinite waiting (10s?).
  • If the timeout is not exceeded, use 'networkidle*' event response. Otherwise, fallback into 'load'.

Testing URLs

Related

Add types

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Subject of the issue

When you import the browserless on a typescript based project, the error is given that it has no types, i also don't see them in the repo or in @types.

Steps to reproduce

Just create a new typescript project and import the browserless.

import createBrowserless from 'browserless';

Tell us how to reproduce this issue.

Expected behaviour

Browserless should have types, so that it can be used easily in typescript and help the user with intellisense.

Actual behaviour

It has no inbuilt types or no info on them being installed separately.

browserless not using proxy server

Prerequisites

  • ["browserless": "^8.7.11" ] I'm using the last version.
  • My node version is the same as declared as package.json.

browserless not using proxy server

despite passing the proxy server details in the args, browserless doesnt launch puppeteer with my proxy server. I have tested the proxy server independently with with puppeteer and puppeteer-extra. so the issue isnt the proxy server

Steps to reproduce

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
import browserles from 'browserless'

puppeteer.use(StealthPlugin())
const browserless = browserles()

(async function () {
    try {
        let pageHtml = await browserless.html('https://httpbin.org/ip', {
            puppeteer:puppeteer,
            args:[`--proxy-server=${'my-proxy-deatils'}`,'--disable-gpu', '--single-process', '--no-zygote', '--no-sandbox', '--hide-scrollbars'],
            incognito: true
        })
        console.log(pageHtml)
    } catch(e){ 
        throw new Error(e.message)
    }
})()

Expected behaviour

ip address be the IP address of my proxy server

Actual behaviour

shows ip of my machine instead

Evasion techniques

buffer screenshot support

according with puppeteer documentation:

path: The file path to save the image to. The screenshot type will be inferred from file extension. If path is a relative path, then it is resolved relative to current working directory. If no path is provided, the image won't be saved to the disk.

so the temporal file could be moved out of the library

v9 iteration

whishlist

  • Better process management.
    • keep running a single browser process.
    • ensure to setup default viewport.
    • find a way to setup proxy requests without flags.
    • use connet/disconnect model.
  • Reduce I/O operations.
    • use in memory (/dev/shm or similar) for userDataDir & temporal files. (puppeteer/puppeteer#7243)
    • ensure temporal files are removed on after .destroy()

Integration with puppeteer-extra

Using puppeteer-extra gives me this error. This seems to be the case only when trying to use browserless with puppeteer-extra

UhandledPromiseRejectionWarning: Error: Could not find browser revision 818858. Run "PUPPETEER_PRODUCT=firefox npm install" or "PUPPETEER_PRODUCT=firefox yarn install" to download a supported Firefox browser binary.

Cannot find module 'puppeteer'

Saw this linked on echo.js, decided to give a couple of the examples a shot.

Copy/pasted the screenshot example from the docs into a js file, added a package.json, installed and saved browserless, then ran node on the js file. I'm assuming that would be a standard use-case for the lib.

Here's the error. I will spend some time chasing it down when I get home from work later.

    throw err;                                                                                                                                                                                            
    ^                                                                                                                                                                                                     
                                                                                                                                                                                                          
Error: Cannot find module 'puppeteer'                                                                                                                                                                     
    at Function.Module._resolveFilename (module.js:557:15)                                                                                                                                                
    at Function.Module._load (module.js:484:25)                                                                                                                                                           
    at Module.require (module.js:606:17)                                                                                                                                                                  
    at require (internal/module.js:11:18)                                                                                                                                                                 
    at Object.<anonymous> (/home/mike/dev/test/node_modules/browserless/index.js:6:19)                                                                                                                    
    at Module._compile (module.js:662:30)                                                                                                                                                                 
    at Object.Module._extensions..js (module.js:673:10)                                                                                                                                                   
    at Module.load (module.js:575:32)                                                                                                                                                                     
    at tryModuleLoad (module.js:515:12)                                                                                                                                                                   
    at Function.Module._load (module.js:507:3) ```

Problem with screenshot.

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Subject of the issue

I couldn't try it.

Steps to reproduce

const browserless = require("browserless");

const saveBufferToFile = (buffer, fileName) => {
const wstream = require("fs").createWriteStream(fileName);
wstream.write(buffer);
wstream.end();
};

browserless
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
.then((buffer) => {
const fileName = "screenshot.png";
saveBufferToFile(buffer, fileName);
console.log(your screenshot is here: , fileName);
});

Expected behaviour

Save a screenshot.

Actual behaviour

$ node index.js
/home/juliolima/projects/poc_browserless/index.js:10
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
^

TypeError: browserless.screenshot is not a function
at Object. (/home/juliolima/projects/poc_browserless/index.js:10:4)
at Module._compile (internal/modules/cjs/loader.js:1085:14)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
at Module.load (internal/modules/cjs/loader.js:950:32)
at Function.Module._load (internal/modules/cjs/loader.js:790:12)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
at internal/main/run_main_module.js:17:47
error Command failed with exit code 1.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because we are using your CI build statuses to figure out when to notify you about breaking changes.

Since we did not receive a CI status on the greenkeeper/initial branch, we assume that you still need to configure it.

If you have already set up a CI for this repository, you might need to check your configuration. Make sure it will run on all new branches. If you don’t want it to run on every branch, you can whitelist branches starting with greenkeeper/.

We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

Once you have installed CI on this repository, you’ll need to re-trigger Greenkeeper’s initial Pull Request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper integration’s white list on Github. You'll find this list on your repo or organiszation’s settings page, under Installed GitHub Apps.

Error in top-user-agent module

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Subject of the issue

Dependency error

Steps to reproduce

Attempts to run with multiple docker containers at once.

Expected behaviour

Normal operation

Actual behaviour

When running multiple times at once, a 503 error occurs in the request sent by top-user-agent.

It is presumed that 'https://techblog.willshouse.com/2012/01/03/most-common-user-agents/' used by top-user-agent has Cloudflare defense.

HTTPError: Response code 503 (Service Temporarily Unavailable)
crawler.4.@DESKTOP-2     |     at Request.<anonymous> (/home/webdriver/.../crawlers/node_modules/got/dist/source/as-promise/index.js:117:42)
crawler.4.@DESKTOP-2     |     at processTicksAndRejections (internal/process/task_queues.js:93:5) {
crawler.4.@DESKTOP-2     |   code: undefined,
crawler.4.@DESKTOP-2     |   timings: {
crawler.4.@DESKTOP-2     |     start: 1630547054142,
crawler.4.@DESKTOP-2     |     socket: 1630547054144,
crawler.4.@DESKTOP-2     |     lookup: 1630547054203,
crawler.4.@DESKTOP-2     |     connect: 1630547054241,
crawler.4.@DESKTOP-2     |     secureConnect: 1630547054285,
crawler.4.@DESKTOP-2     |     upload: 1630547054285,
crawler.4.@DESKTOP-2     |     response: 1630547054330,
crawler.4.@DESKTOP-2     |     end: 1630547054335,
crawler.4.@DESKTOP-2     |     error: undefined,
crawler.4.@DESKTOP-2     |     abort: undefined,
crawler.4.@DESKTOP-2     |     phases: {
crawler.4.@DESKTOP-2     |       wait: 2,
crawler.4.@DESKTOP-2     |       dns: 59,
crawler.4.@DESKTOP-2     |       tcp: 38,
crawler.4.@DESKTOP-2     |       tls: 44,
crawler.4.@DESKTOP-2     |       request: 0,
crawler.4.@DESKTOP-2     |       firstByte: 45,
crawler.4.@DESKTOP-2     |       download: 5,
crawler.4.@DESKTOP-2     |       total: 193
crawler.4.@DESKTOP-2     |     }
crawler.4.@DESKTOP-2     |   }
crawler.4.@DESKTOP-2     | }

[devices] adapt puppeter 3.x interface

Puppeteer 2.x uses an array interface:

[
  {
    name: 'Blackberry PlayBook',
    userAgent: 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+',
    viewport: {
      width: 600,
      height: 1024,
      deviceScaleFactor: 1,
      isMobile: true,
      hasTouch: true,
      isLandscape: false
    }
  },

while puppeteer 3.x is using an object map:

'Nexus 6 landscape': {
    name: 'Nexus 6 landscape',
    userAgent: 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36',
    viewport: {
      width: 732,
      height: 412,
      deviceScaleFactor: 3.5,
      isMobile: true,
      hasTouch: true,
      isLandscape: true
    }
  },

The current implementation is failing cause concat is a method for arrays.

That actually a good thing since right now we are converting array into objects so just need to remove that conversion since it is no longer necessary 🙂

Lighthouse: images for desktop reports returning mobile interface

Bug Report

Current Behavior
When I use the following MQL API, the report returns the result.data.insights.lighthouse.audits['final-screenshot'] is returned as a base64 encoded image. However, this image is of the mobile view and not of the desktop view of the website.

const url = 'https://anywebsitehere.com';
const payload = {
  meta: false,
  insights: {
    lighthouse: {
      device: 'desktop',
      onlyCategories: ['performance', 'best-practices', 'accessibility', 'seo'],
    },
    technologies: false,
  },
};

const result = await mql(url, payload);

Expected behavior/code

I'd expect the above to return the desktop variation of the image and not a mobile version.

Additional context/Screenshots

Can be provided upon request.

Auto close cookies banners

Installing browserless via npm throws an error

Prerequisites

  • [ x] I'm using the last version.
  • [ x] My node version is the same as declared as package.json.

Subject of the issue

Installing browserless via npm would fail and throw an error:

> node scripts/postinstall

/Project/node_modules/hooman/hooman.js:14
const instance = got.extend({
                     ^

TypeError: got.extend is not a function
    at Object.<anonymous> (/Project/node_modules/hooman/hooman.js:14:22)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    at Module.load (internal/modules/cjs/loader.js:977:32)
    at Function.Module._load (internal/modules/cjs/loader.js:877:14)
    at Module.require (internal/modules/cjs/loader.js:1019:19)
    at require (internal/modules/cjs/helpers.js:77:18)
    at Object.<anonymous> (/Project/node_modules/top-user-agents/scripts/postinstall.js:5:13)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)

Steps to reproduce

Note: You can reproduce the code using interactive Node.js shell by Runkit.

npm i -S browserless

Using SocksProxyAgent fails

Prerequisites

  • I'm using the last version. (9.1.6)
  • My node version is the same as declared as package.json. (v14.17.5)

Subject of the issue

Trying to use proxy as described in documentation results in error for me. Could you advise what is the proper way to define it? In past issues I only found some other way, not described in the docs ( #259 ). Thanks

Steps to reproduce

const browserless = require('browserless')
const { SocksProxyAgent } = require('socks-proxy-agent')

function testf(url) {
    (async () => {
        const browserless_factory = browserless()
        const browser = await browserless_factory.createContext({
            // agent: undefined
            agent: new SocksProxyAgent({
                host: 'localhost',
                port: 9050
            })
        })
        page_content = await browser.html(url)
        await browser.destroyContext();
        console.log(page_content)
    })()
}

testf('http://ip-api.com/json')

Expected behaviour

Requests made through specified proxy.

Actual behaviour

Running the script above results in the following error:

(node:29627) UnhandledPromiseRejectionWarning: Error: Request is already handled!
    at Object.assert (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at HTTPRequest.continue (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:283:21)
    at PuppeteerBlocker.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:225:33)
    at BlockingContext.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:65:47)
    at <MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:226:52
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async HTTPRequest.finalizeInterceptions (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:132:9)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:29627) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:29627) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Note that everything works ok if I change agent to undefined.

Failed with vercel deployments or pkg bundling

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Failed to deploy apps relying on browserless to vercel

browserless is using prism-themes as dependency, but prism-themes doesn't have a valid main entry (should be a valid js file path). It caused deployment failures on vercel. I also tried with vercel/pkg, it failed as well.

Steps to reproduce

I'm deploying next-imagegen-example to vercel with puppeteer provider (which is using browserless)

you can use vercel/pkg to bundle imagegen-puppeteer-provider as well

Expected behaviour

Deployment should work well

Actual behaviour

Deployment failed with error that they couldn't resolve module prism-themes

Ideal Workaround

change require.resolve for prism-themes to sth else. maybe path.resolve

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.