The browserless from microlinkhq

Consider use `jimp` over `sharp`

It is a more lightweight dependency and only jpeg/png support is necessary.

Add ad-block

Need to add the ability to load ABP rules and parse them.

I'm trying to launch 100 instance of browser at the same time to load test my website. Any example for that using browserless pool?

[screenshot] image url overlay

goto: Rename `disableAnimations` into `animations`

[goto] Improve Shadow DOM support

https://github.com/GoogleChrome/rendertron/blob/master/src/renderer.ts#L76

How to destroy pool?

Is there any way we can destroy the pool after running tasks? Right now it keeps the process running forever

[screenshot] image url overlay

[goto] Disable JavaScript

When rendering PDF's server side it's often better to disable JavaScript execution. It just uses unnecessary server resources while it's most of the time not necessary to render the page.

It would be awesome if we could pass a property to /pdf where we can disable JavaScript in:
https://github.com/microlinkhq/browserless/blob/master/packages/pdf/src/index.js

Puppeteer supports this out of the box if you just use page.setJavaScriptEnabled(false) before navigation to a page:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetjavascriptenabledenabled

Improve screenshot

Some suggestions:

Ability to inject jQuery or other scripts.
Possibility of hidden/remove elements.
Be possible disable animations
Add scroll/click support.
Put the screenshot inside a overlay

Inspiration

Sample from documentation site not working out of the box

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

My environment:

nvm: 0.38.0
node: v12.22.6
npm: 6.14.15

Just want to test browserless following the starting guide from documentation but I am getting the error:

const browserless = await browserlessFactory.createContext()
                    ^^^^^

SyntaxError: await is only valid in async function
    at wrapSafe (internal/modules/cjs/loader.js:915:16)
    at Module._compile (internal/modules/cjs/loader.js:963:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47

Steps to reproduce

Installing

mkdir test
cd test
nvm use v12.22.6
npm init -y
npm install browserless puppeteer --save

Sample project

touch index.js

`index.js` content

const createBrowserless = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory 
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()

// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache 
// with other browser contexts.
const browserless = await browserlessFactory.createContext()

// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('https://basecamp.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browserless.close()

Running

node index.js

Runningkit: https://runkit.com/embed/a7xjdfhnz7xi

Use `fkill` for killing subprocess

We are waiting until sindresorhus/fkill#34 land.

[goto] implement `waitUntil: 'auto'`

People don't care if they need to wait for 'load' or 'networkidle*' event.

Implement an auto mode that follows this behavior.

Run 'networkidle*' and 'load' event on parallel (maybe two tabs?).
Add timeout over network event for preventing infinite waiting (10s?).
If the timeout is not exceeded, use 'networkidle*' event response. Otherwise, fallback into 'load'.

Testing URLs

Related

puppeteer/puppeteer#1353 (comment)

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

When you import the browserless on a typescript based project, the error is given that it has no types, i also don't see them in the repo or in @types.

Steps to reproduce

Just create a new typescript project and import the browserless.

import createBrowserless from 'browserless';

Tell us how to reproduce this issue.

Expected behaviour

Browserless should have types, so that it can be used easily in typescript and help the user with intellisense.

Actual behaviour

It has no inbuilt types or no info on them being installed separately.

Prerequisites

["browserless": "^8.7.11" ] I'm using the last version.
My node version is the same as declared as package.json.

browserless not using proxy server

despite passing the proxy server details in the args, browserless doesnt launch puppeteer with my proxy server. I have tested the proxy server independently with with puppeteer and puppeteer-extra. so the issue isnt the proxy server

Steps to reproduce

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
import browserles from 'browserless'

puppeteer.use(StealthPlugin())
const browserless = browserles()

(async function () {
    try {
        let pageHtml = await browserless.html('https://httpbin.org/ip', {
            puppeteer:puppeteer,
            args:[`--proxy-server=${'my-proxy-deatils'}`,'--disable-gpu', '--single-process', '--no-zygote', '--no-sandbox', '--hide-scrollbars'],
            incognito: true
        })
        console.log(pageHtml)
    } catch(e){ 
        throw new Error(e.message)
    }
})()

Expected behaviour

ip address be the IP address of my proxy server

Actual behaviour

shows ip of my machine instead

Improve PDF support

https://github.com/mikeal/snapkit/blob/master/index.js

Evasion techniques

Libraries

URLs to test

Related

https://timvanscherpenzeel.github.io/detect-gpu/

goto: Rename `disableJavaScript` into `javascript`

buffer screenshot support

according with puppeteer documentation:

path: The file path to save the image to. The screenshot type will be inferred from file extension. If path is a relative path, then it is resolved relative to current working directory. If no path is provided, the image won't be saved to the disk.

so the temporal file could be moved out of the library

Add visual tests

https://github.com/americanexpress/jest-image-snapshot

cursor emulation support

https://github.com/Xetera/ghost-cursor?auto_subscribed=false

[goto] Add proxy support

Detect username and password from proxy server URI

const args = baseArgs.concat([`--proxy-server=${proxyUrl}`])

Auto authenticate a page based on that credentials

// if you're using an authenticated proxy
await page.authenticate({ username, password })

related

screenshot: Move some API parameters out of screenshot

Move:

hide
click
modules
scripts
styles
scrollTo

From screenshot prepare https://github.com/microlinkhq/browserless/blob/master/packages/screenshot/src/prepare.js

to goto package, then the rest of methods (like pdf) can take these query parameters advantages

v9 iteration

whishlist

Better process management.
- keep running a single browser process.
- ensure to setup default viewport.
- find a way to setup proxy requests without flags.
- use connet/disconnect model.
Reduce I/O operations.
- use in memory (/dev/shm or similar) for userDataDir & temporal files. (puppeteer/puppeteer#7243)
- ensure temporal files are removed on after .destroy()

Integration with puppeteer-extra

Using puppeteer-extra gives me this error. This seems to be the case only when trying to use browserless with puppeteer-extra

UhandledPromiseRejectionWarning: Error: Could not find browser revision 818858. Run "PUPPETEER_PRODUCT=firefox npm install" or "PUPPETEER_PRODUCT=firefox yarn install" to download a supported Firefox browser binary.

[goto] pretty JSON response

Inspired by pretty-json.now.sh.

Cannot find module 'puppeteer'

Saw this linked on echo.js, decided to give a couple of the examples a shot.

Copy/pasted the screenshot example from the docs into a js file, added a package.json, installed and saved browserless, then ran node on the js file. I'm assuming that would be a standard use-case for the lib.

Here's the error. I will spend some time chasing it down when I get home from work later.

    throw err;                                                                                                                                                                                            
    ^                                                                                                                                                                                                     
                                                                                                                                                                                                          
Error: Cannot find module 'puppeteer'                                                                                                                                                                     
    at Function.Module._resolveFilename (module.js:557:15)                                                                                                                                                
    at Function.Module._load (module.js:484:25)                                                                                                                                                           
    at Module.require (module.js:606:17)                                                                                                                                                                  
    at require (internal/module.js:11:18)                                                                                                                                                                 
    at Object.<anonymous> (/home/mike/dev/test/node_modules/browserless/index.js:6:19)                                                                                                                    
    at Module._compile (module.js:662:30)                                                                                                                                                                 
    at Object.Module._extensions..js (module.js:673:10)                                                                                                                                                   
    at Module.load (module.js:575:32)                                                                                                                                                                     
    at tryModuleLoad (module.js:515:12)                                                                                                                                                                   
    at Function.Module._load (module.js:507:3) ```

Problem with screenshot.

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

I couldn't try it.

Steps to reproduce

const browserless = require("browserless");

const saveBufferToFile = (buffer, fileName) => {
const wstream = require("fs").createWriteStream(fileName);
wstream.write(buffer);
wstream.end();
};

browserless
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
.then((buffer) => {
const fileName = "screenshot.png";
saveBufferToFile(buffer, fileName);
console.log(your screenshot is here: , fileName);
});

Expected behaviour

Save a screenshot.

Actual behaviour

$ node index.js
/home/juliolima/projects/poc_browserless/index.js:10
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
^

TypeError: browserless.screenshot is not a function
at Object. (/home/juliolima/projects/poc_browserless/index.js:10:4)
at Module._compile (internal/modules/cjs/loader.js:1085:14)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
at Module.load (internal/modules/cjs/loader.js:950:32)
at Function.Module._load (internal/modules/cjs/loader.js:790:12)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
at internal/main/run_main_module.js:17:47
error Command failed with exit code 1.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because we are using your CI build statuses to figure out when to notify you about breaking changes.

Since we did not receive a CI status on the greenkeeper/initial branch, we assume that you still need to configure it.

If you have already set up a CI for this repository, you might need to check your configuration. Make sure it will run on all new branches. If you don’t want it to run on every branch, you can whitelist branches starting with greenkeeper/.

We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

Once you have installed CI on this repository, you’ll need to re-trigger Greenkeeper’s initial Pull Request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper integration’s white list on Github. You'll find this list on your repo or organiszation’s settings page, under Installed GitHub Apps.

Error in top-user-agent module

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

Dependency error

Steps to reproduce

Attempts to run with multiple docker containers at once.

Expected behaviour

Normal operation

Actual behaviour

When running multiple times at once, a 503 error occurs in the request sent by top-user-agent.

It is presumed that 'https://techblog.willshouse.com/2012/01/03/most-common-user-agents/' used by top-user-agent has Cloudflare defense.

HTTPError: Response code 503 (Service Temporarily Unavailable)
crawler.4.@DESKTOP-2     |     at Request.<anonymous> (/home/webdriver/.../crawlers/node_modules/got/dist/source/as-promise/index.js:117:42)
crawler.4.@DESKTOP-2     |     at processTicksAndRejections (internal/process/task_queues.js:93:5) {
crawler.4.@DESKTOP-2     |   code: undefined,
crawler.4.@DESKTOP-2     |   timings: {
crawler.4.@DESKTOP-2     |     start: 1630547054142,
crawler.4.@DESKTOP-2     |     socket: 1630547054144,
crawler.4.@DESKTOP-2     |     lookup: 1630547054203,
crawler.4.@DESKTOP-2     |     connect: 1630547054241,
crawler.4.@DESKTOP-2     |     secureConnect: 1630547054285,
crawler.4.@DESKTOP-2     |     upload: 1630547054285,
crawler.4.@DESKTOP-2     |     response: 1630547054330,
crawler.4.@DESKTOP-2     |     end: 1630547054335,
crawler.4.@DESKTOP-2     |     error: undefined,
crawler.4.@DESKTOP-2     |     abort: undefined,
crawler.4.@DESKTOP-2     |     phases: {
crawler.4.@DESKTOP-2     |       wait: 2,
crawler.4.@DESKTOP-2     |       dns: 59,
crawler.4.@DESKTOP-2     |       tcp: 38,
crawler.4.@DESKTOP-2     |       tls: 44,
crawler.4.@DESKTOP-2     |       request: 0,
crawler.4.@DESKTOP-2     |       firstByte: 45,
crawler.4.@DESKTOP-2     |       download: 5,
crawler.4.@DESKTOP-2     |       total: 193
crawler.4.@DESKTOP-2     |     }
crawler.4.@DESKTOP-2     |   }
crawler.4.@DESKTOP-2     | }

[devices] adapt puppeter 3.x interface

Puppeteer 2.x uses an array interface:

[
  {
    name: 'Blackberry PlayBook',
    userAgent: 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+',
    viewport: {
      width: 600,
      height: 1024,
      deviceScaleFactor: 1,
      isMobile: true,
      hasTouch: true,
      isLandscape: false
    }
  },

while puppeteer 3.x is using an object map:

'Nexus 6 landscape': {
    name: 'Nexus 6 landscape',
    userAgent: 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36',
    viewport: {
      width: 732,
      height: 412,
      deviceScaleFactor: 3.5,
      isMobile: true,
      hasTouch: true,
      isLandscape: true
    }
  },

The current implementation is failing cause concat is a method for arrays.

That actually a good thing since right now we are converting array into objects so just need to remove that conversion since it is no longer necessary 🙂

Implement timeout on goto method

Puppeteer is going to implement a global timeout in the next breaking version:
puppeteer/puppeteer#3158

In the middle time, we need to control timeout with enough granularity for ensuring we don't' waste resources.

The current implementation is leveraging the action out of the library, making impossible close the page under timeout:
https://github.com/Kikobeats/html-get/blob/master/src/index.js#L89

Lighthouse: images for desktop reports returning mobile interface

Bug Report

Current Behavior
When I use the following MQL API, the report returns the result.data.insights.lighthouse.audits['final-screenshot'] is returned as a base64 encoded image. However, this image is of the mobile view and not of the desktop view of the website.

const url = 'https://anywebsitehere.com';
const payload = {
  meta: false,
  insights: {
    lighthouse: {
      device: 'desktop',
      onlyCategories: ['performance', 'best-practices', 'accessibility', 'seo'],
    },
    technologies: false,
  },
};

const result = await mql(url, payload);

Expected behavior/code

I'd expect the above to return the desktop variation of the image and not a mobile version.

Additional context/Screenshots

Can be provided upon request.

Better tracking support

The current tracking implementation is a poor port of disconnect rules

We need to implement a built-in solution.

Related:

[screenshot] add url on top bar

[screenshot] mobile overlay

Similar to

https://github.com/sindresorhus/capture-website/pull/27/files?short_path=f1d7f01#diff-f1d7f01715e29ea2a7cbaf4f2f8117cc

URLs for testing

Inspiration

Related

Looks like the plugin can't be loaded directly at puppeteer layer: puppeteer/puppeteer#1286

But I suppose we can do something in execution time.

[screenshot] consider jimp alternatives

specially https://github.com/nuxt-community/jimp-compact

Installing browserless via npm throws an error

Prerequisites

[ x] I'm using the last version.
[ x] My node version is the same as declared as package.json.

Subject of the issue

Installing browserless via npm would fail and throw an error:

> node scripts/postinstall

/Project/node_modules/hooman/hooman.js:14
const instance = got.extend({
                     ^

TypeError: got.extend is not a function
    at Object.<anonymous> (/Project/node_modules/hooman/hooman.js:14:22)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    at Module.load (internal/modules/cjs/loader.js:977:32)
    at Function.Module._load (internal/modules/cjs/loader.js:877:14)
    at Module.require (internal/modules/cjs/loader.js:1019:19)
    at require (internal/modules/cjs/helpers.js:77:18)
    at Object.<anonymous> (/Project/node_modules/top-user-agents/scripts/postinstall.js:5:13)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)

Steps to reproduce

Note: You can reproduce the code using interactive Node.js shell by Runkit.

npm i -S browserless

Enable data saver

https://chrome.google.com/webstore/detail/data-saver/pfmgfdlgomnbgkofeojodiodmgpgmkac

ensure to wait respawn

It will be done when sindresorhus/p-retry#27 will be closed.

Typescript definitions

Would love to have Typescript type definitions.

Add pool support

Inspired in

Implement using

goto: Rename `media` into `mediaType`

Using SocksProxyAgent fails

Prerequisites

I'm using the last version. (9.1.6)
My node version is the same as declared as package.json. (v14.17.5)

Subject of the issue

Trying to use proxy as described in documentation results in error for me. Could you advise what is the proper way to define it? In past issues I only found some other way, not described in the docs ( #259 ). Thanks

Steps to reproduce

const browserless = require('browserless')
const { SocksProxyAgent } = require('socks-proxy-agent')

function testf(url) {
    (async () => {
        const browserless_factory = browserless()
        const browser = await browserless_factory.createContext({
            // agent: undefined
            agent: new SocksProxyAgent({
                host: 'localhost',
                port: 9050
            })
        })
        page_content = await browser.html(url)
        await browser.destroyContext();
        console.log(page_content)
    })()
}

testf('http://ip-api.com/json')

Expected behaviour

Requests made through specified proxy.

Actual behaviour

Running the script above results in the following error:

(node:29627) UnhandledPromiseRejectionWarning: Error: Request is already handled!
    at Object.assert (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at HTTPRequest.continue (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:283:21)
    at PuppeteerBlocker.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:225:33)
    at BlockingContext.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:65:47)
    at <MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:226:52
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async HTTPRequest.finalizeInterceptions (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:132:9)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:29627) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:29627) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Note that everything works ok if I change agent to undefined.

Failed with vercel deployments or pkg bundling

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Failed to deploy apps relying on browserless to vercel

browserless is using prism-themes as dependency, but prism-themes doesn't have a valid main entry (should be a valid js file path). It caused deployment failures on vercel. I also tried with vercel/pkg, it failed as well.

Steps to reproduce

I'm deploying next-imagegen-example to vercel with puppeteer provider (which is using browserless)

you can use vercel/pkg to bundle imagegen-puppeteer-provider as well

Expected behaviour

Deployment should work well

Actual behaviour

Deployment failed with error that they couldn't resolve module prism-themes

Ideal Workaround

change require.resolve for prism-themes to sth else. maybe path.resolve

microlinkhq / browserless Goto Github PK

browserless's People

Contributors

Stargazers

Watchers

Forkers

browserless's Issues

Prerequisites

Subject of the issue

Steps to reproduce

Installing

Sample project

index.js content

Running

Prerequisites

Subject of the issue

Steps to reproduce

Expected behaviour

Actual behaviour

Prerequisites

browserless not using proxy server

Steps to reproduce

Expected behaviour

Actual behaviour

Prerequisites

Subject of the issue

Steps to reproduce

Expected behaviour

Actual behaviour

Prerequisites

Subject of the issue

Steps to reproduce

Expected behaviour

Actual behaviour

Bug Report

Prerequisites

Subject of the issue

Steps to reproduce

Prerequisites

Subject of the issue

Steps to reproduce

Expected behaviour

Actual behaviour

Prerequisites

Failed to deploy apps relying on browserless to vercel

Steps to reproduce

Expected behaviour

Actual behaviour

Ideal Workaround

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`index.js` content