GithubHelp home page GithubHelp logo

zfcsoftware / puppeteer-real-browser Goto Github PK

View Code? Open in Web Editor NEW
310.0 10.0 37.0 157 KB

This package is designed to bypass puppeteer's bot-detecting captchas such as Cloudflare. It acts like a real browser and can be managed with puppeteer.

Home Page: https://www.npmjs.com/package/puppeteer-real-browser

License: MIT License

JavaScript 94.73% Dockerfile 5.27%
cloudflare-bypass puppeteer puppeteer-cloudflare-captcha puppeteer-extra-plugin puppeteer-fingerprint puppeteer-real-browser undetected-browser puppeteer-undetected-browser undetected-puppeteer undetected

puppeteer-real-browser's People

Contributors

rtritto avatar zfcsoftware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

puppeteer-real-browser's Issues

aws lambda

Hi =)
I'll admit right away that I'm a beginner in development
I can't quite figure out if it's possible to use puppeteer-real-browser in aws lambda, or if I'm doing something wrong and I'm getting a lot of errors.

Question related Puppeteer-real-browser

  1. FingerPrint
    which one is best ?
    "fingerprint-generator": "^2.1.30",
    "fingerprint-injector": "^2.1.30",
    or
    puppeteer-afp

  2. Extra-plugins
    dose Puppeteer-real-browser need puppeteer-extra & puppeteer-extra-plugin-stealth or its can mask all parameters its self without extra plugins ?

  3. Puppeteer Version and Cluster tool
    my bot using "puppeteer": "^21.5.0" with Puppeteer Cluster Puppeteer-real-browser work fine with any version or need any specific version ?

thank you in advance :)

Doesn't work with multiple tabs

I was wondering was is different about this project. I get that it helps to bootstrap the real browser and connect to it.
But what i dont get is why you launch 2 browser, first one that is hidden and the next one that is being used.
Doesn't launching a single browser directly and connect to it have the same effect with much less overhead?

Update chromium

Current chromium verison: 3.2171.3008 - published: 2016/04/27
Latest chromium version: 3.0.3 - published: 2021/10/18 (latest publish)

Change package.json:

  "dependencies": {
-    "chromium": "^3.0.3",
+    "chromium": "3.0.3",
  }

I got this error on start:

/node_modules/chromium/Chromium.js:2
  if(global.__TINT.Chromium) {
                   ^

TypeError: Cannot read properties of undefined (reading 'Chromium')

Cant Open Developer Tools

In the latest version, i cant open developer tools or cant right click -> inspect or open console in a page.

Using fingerprints is actually worse than using a normal browser?

image

Thanks for the work you are doing. Just trying to help with some testing. Anyway creeper.js doesnt pass.
The fingerprint should try to give realistic numbers. Just inspect the lies detected 😉

For example its bad to just use Math.random() to specify the amount of ram.

`Object.defineProperty(navigator, 'deviceMemory', {get: () => Math.floor(Math.random() * 8) + 4, });`

Better give some realistic values

function randomEl(array) { let ridx = parseInt(array.length * Math.random()); return array[ridx]; }
randomEl([4,8,16,32])

Timeout error when waiting for browser response

Basically, during the execution of the code, after I give a new page.goto (new url), the puppeteer loses the browser reference and does not return anything, just the TimeoutError error

image

export async function login(page, credentials){ try { await page.locator([name="Username"]).fill(credentials.user) await page.locator([name="Password"]).fill(credentials.password) await page.click([data-qa="submit"]) await page.waitForTimeout(10000) await page.goto(https://br.betano.com/live/`)
// await page.waitForNavigation({ waitUntil: 'domcontentloaded' });
// await page.evaluate(() =>{
// window.location.href = https://br.betano.com/live/
// })
await page.waitForTimeout(10000)
await page.reload()
await page.waitForTimeout(10000)
} catch (error) {
console.log(error)
}
}`

Does this work on headless mode

For me it's not, it got timeout all the time, i guess headless mode can't pass cloudflare, is it just me or it's universal, thanks

send({
    url: domain,            
})
.then(resp=>{
    return res.status(200).json(resp);
})
 
 
const send = ({ url = '', proxy = {} }) => {
    return new Promise(async (resolve, reject) => {
        try {
            var { puppeteerRealBrowser } = await import('puppeteer-real-browser')
            var data = {}
            if (proxy && proxy.host && proxy.host.length > 0) {
                data.proxy = proxy
            }
            data.headless = true;                       👈👈🏻👈🏼👈🏽👈🏾👈🏿 headless here
 
            puppeteerRealBrowser = await puppeteerRealBrowser(data)
            var browser = puppeteerRealBrowser.browser
            var page = puppeteerRealBrowser.page
 
            try {
                var st = setTimeout( async () => {
                    await browser.close();
                    resolve({
                        code: 504,
                        message: 'Time Out'
                    })
 
                }, 55000);
 
                await page.goto(url, { waitUntil: 'domcontentloaded' })
                ......
            }
        }
    }
})

Does not consistently attempt to click Cloudflare Checkbox

When I run the program and screenshot, it seems that sometimes it will click the CF checkbox, sometimes it does not. I am running headless on debian. If I restart the program, it will sometimes work, but if I try to keep reloading the page after it failed, it will never work.

And sadly, I can't create a new browser in the code, which would likely fix it, because that closes the program.

Creating a new page hangs.

I'm not sure if I'm doing something wrong, but creating a 2nd page other than the first one after calling puppeteer.connect() just hangs until the timeout triggers.

popup / new tab issue

whenever I click on a url that opens a pop up or I "control" + click to open in a new tab
it doesn't open the url but shows about:blank

any idea how to fix this?

request for new option

hi, your package is really amazing. it is truly acting live real browser but when i saw in whoer.net it shows defferent timezone and language. if this funtion added in next update i thing it will be best package. thanks for creating such a nice package...

Enable ad blocking feature

Can you add ad-blocking feature to puppeteer-real-browser? Thank you for creating such an awesome library.

The test does not start

SO: Ubuntu 22.04.3 LTS
installed xvfb

I clone the repository, run npm i and node ./src/test and it doesn't work, no console errors. Thank you

Handling Captcha Before doing other Actions

An issue I've encountered involves bypassing the captcha and ensuring it's fully processed before attempting any interactions with the page, such as inputting login credentials. Even with the turnstile: true and fingerprint: true configurations enabled, my script attempts to interact with the page immediately after navigation, before the captcha can be resolved. To work around this, I've had to introduce a fixed delay (waitForTimeout(20000); at the start of the initial goto , but im wondering if it can like wait when it sees the cloudfare captcha , currently it solves the captcha but it just tries to find elements on the screen which are not yet available since the captcha is not finished yet . the issue also is that the cloudfare could popup on different times in the process , so i can't wait 20seconds after each step , is there a way to detect the captcha appearing through puppeteer-real-browser ?

thank you

this is a snippet from my code
`

const {
  browser: browserInstance,
  page,
  setTarget,
} = await connect({
  headless: false,
  fingerprint: true, // Injects a unique fingerprint ID into the page
  turnstile: true, // Automatically clicks on Captchas
  tf: true, // Use targetfilter to avoid detection initially
});

browser = browserInstance; 

setTarget({ status: false });
const page2 = await browser.newPage();
setTarget({ status: true });

// Navigate to the appointment page
await page2.goto(
  "https://xxx",
  { waitUntil: "domcontentloaded" }
);
// Wait for 5 seconds
await page2.waitForTimeout(20000);


// Login
await page2.focus("#username");
await page2.type("#username", "[email protected]");

await page2.focus("#password");
await page2.type("#password", "xxxxx");

`

Running Puppeteer in Headless Mode for Captcha Solving

When I activate the headless mode, Puppeteer can't solve the capctra. Is there a way to run it in headless mode?

Additionally, the following solution is more accurate. Since we disable Puppeteer's access to Cloudflare, there won't be iframe access, so it would be more appropriate to manipulate the response and communicate with the iframe for a more accurate solution.

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
const script = `<script>const targetSelector = 'input[type="checkbox"]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        const node = addedNode.querySelector(targetSelector);
        if (node) {
          setTimeout(()=>{node.parentElement.click();},1000);
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true,
};
observer.observe(targetElement, observerOptions);</script>`;
function targetFilter(target) {
  if (target._getTargetInfo().type !== "iframe") {
    return true;
  }
  return false;
}
const main = async () => {
  const browser = await puppeteer.launch({
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",

    targetFilter,
    headless: false,
  });
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
               body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });
  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });
  try {
    await page.waitForSelector(".link_row", {
      timeout: 100000,
    });
  } catch (error) {
    console.error(error);
  }
  await page.screenshot({ path: "example.png" });
  await browser.close();
};
main();

Question for viewport

Any specific reason why you have width and height hard coded?

    await page.setViewport({
        width: 1920,
        height: 1080
    });

More suggestions

Hello, I have some suggestions for this revolutionary project:

  • if its possible to add captcha support
  • bypass datadome with custom userAgent (if you use --user-agent or page.setUserAgent you might get detected, even if you set the original user agent)
  • bypass youtube (youtube detects somehow this. you can test it by yourself the video doesn't autoplay like it should to do on normal browser)
  • implicit support for userDataDir without workarounds.

TF = true isnt detecting CF Turnstile

Greetings,

when I use tf = true its not detecting Turnstile at all, when I turn it off it does detect turnstile but it cannot pass the captcha

PS: I just used the example script from your starting page, nothing else added

Issue with consistency of Cloudfare turnstile

Hello, for me on this page https://www.planetminecraft.com/account/sign_in/#tab_log
It seems SOMETIMES it will have "Success" with the captcha, but other times (seemingly random) it won't. I have noticed if I try to reload the page, if it fails once, it will fail over and over. The whole program must be restarted.

It would be nice if there was a way to restart the browser without restarting the whole node.js program, as this would be a potential workaround in the event of a failure. I noticed from screenshots, when it doesn't solve it, it seems to not be attempting at all. It doesn't say fail or anything it just shows the empty checkbox.

Debian 12, headless set to

image
(an image of what I see when it doesn't seem to be clicking the captcha)

Thanks!

it is not working exactly like a real browser because the website is detecting this

is not working exactly like a real browser, the website I am using for testing is detecting that I am browsing through puppeteer, the website opens a video player if you enter it normally manually via Google Chome, but if you use puppeteer it shows a message and does not display the player

I'm using the latest version of node

const url = 'https://brbeast.com/video/f79921bbae40a577928b76d2fc3edc2a';
const sleep = ms => new Promise(res => setTimeout(res, ms));

const start = async () => {
    var { puppeteerRealBrowser } = await import('puppeteer-real-browser')
    const { page, browser } = await puppeteerRealBrowser({
        headless: false, // (optional) The default is false. If true is sent, the browser opens incognito. If false is sent, the browser opens visible.
        action:'default', // (optional) If default, it connects with puppeteer by opening the browser and returns you the page and browser. if socket is sent, it returns you the browser url to connect to. 
        executablePath:'default', // (optional) If you want to use a different browser instead of Chromium, you can pass the browser path with this variable.
        // (optional) If you are using a proxy, you can send it as follows.
        // proxy:{
        //     host:'<proxy-host>',
        //     port:'<proxy-port>',
        //     username:'<proxy-username>',
        //     password:'<proxy-password>'
        // }
    })
    console.log('Running tests..')
    // You should use it if you want the fingerprint values of the page to be changed.
    // puppeteerAfp(page);

    await page.goto(url)
    await sleep(5000)
    await page.screenshot({ path: 'testresult.png', fullPage: true })
   // await browser.close()
    console.log(`All done, check the screenshot. ✨`)
}




 start();

testresult

Multiple Browser issue

when i am running the script twice at same time from different terminals using different userDatadir, it is getting linked to the same chrome...
How to open two different browsers?

Adding TypeScript Support.

It will be very nice if we can get typescript support for a nice autocomplete and also because it cant be used in TS with modifying tsconfig file
image

Linux non headless not working

Why doesn't it open the Chrome window on Linux? I saw your preview video, and you did screenshots on interval. Isn't there a better way just for debugging purposes? Afterwards, using headless as normal would be better.

import { connect } from "puppeteer-real-browser";

connect({
  headless: false,
  turnstile: true,
})
  .then(async (response) => {
    const { browser, page } = response;

    setInterval(async () => {
      await page.screenshot({ path: "./page.jpg" });
    }, 250);
    await page.goto("https://disboard.org/");
  })
  .catch((error) => {
    console.log(error.message);
  });
[don@arch pup]$ node index.mjs 
[ERROR] [PUPPETEER-REAL-BROWSER] | This library is stable with headless: true in linuxt environment and headless: false in Windows environment. Please send headless: 'auto' for the library to work efficiently.

Failed to install deps for v1.2.17

When installing puppeteer-real-browser, I got a failed checksum when installing v1.2.17 (v1.2.16 works).

pnpm i [email protected] --ignore-pnpmfile --no-lockfile -w
 WARN  A pnpm-lock.yaml file exists. The current configuration prohibits to read or write a lockfile
 WARN  deprecated @nx/[email protected]: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
 WARN  deprecated @types/[email protected]: This is a stub types definition. helmet provides its own type definitions, so you do not need this installed.
 WARN  deprecated @aws-sdk/[email protected]: This package has moved to @smithy/signature-v4
 WARN  deprecated @aws-sdk/[email protected]: This package has moved to @smithy/util-buffer-from
 WARN  deprecated @prisma/[email protected]: Deprecated: @prisma/sdk was an internal package which doesn't follow semver and can include breaking changes without a warning. We renamed it to @prisma/internals to make it clearer.

If you're using this package it would be helpful if you could help us understand where, how, and why you are using it by ginving us feedback in https://github.com/prisma/prisma/discussions/13877). Your feedback will be valuable to us in defining a better API.
 WARN  deprecated [email protected]: The `subscriptions-transport-ws` package is no longer maintained. We recommend you use `graphql-ws` instead. For help migrating Apollo software to `graphql-ws`, see https://www.apollographql.com/docs/apollo-server/data/subscriptions/#switching-from-subscriptions-transport-ws    For general help using `graphql-ws`, see https://github.com/enisdenjo/graphql-ws/blob/master/README.md
 WARN  GET https://github.com/zfcsoftware/puppeteer-afp/tree/master error (ERR_PNPM_TARBALL_EXTRACT). Will retry in 10 seconds. 2 retries left.
 WARN  GET https://github.com/zfcsoftware/puppeteer-afp/tree/master error (ERR_PNPM_TARBALL_EXTRACT). Will retry in 1 minute. 1 retries left.
 ERR_PNPM_TARBALL_EXTRACT  Failed to add tarball from "https://github.com/zfcsoftware/puppeteer-afp/tree/master" to store: Invalid checksum for TAR header at offset 0. Expected NaN, got 42982

This error happened while installing the dependencies of [email protected]
Progress: resolved 4547, reused 4427, downloaded 0, added 0

Not entirely sure of the issue, but noticed v1.2.17 was released three days ago.

Custom User data directory

Hi everyone!
I try to use puppeteer with existing browser, but my profile does not load properly!
Here is my code

const { page, browser } = await connect({
executablePath: "D:\GPMLogin\gpm_browser\gpm_browser_chromium_core_119\chrome.exe",
headless: 'false',
args: [
'--user-data-dir=D:\GPM\9eea9e17-d2e9-400a-bd0f-249c49a0a7b4-3624',
'--window-size=640,640',
],
customConfig: {},
skipTarget: [],
fingerprint: true,
turnstile: true,
connectOption: {}
})

Thank you so much!

Making a Cloudflare Proxy

Im pretty sure most webscrapers use APIs to Scrape CF Challenge cookies to enable them to make requests to specific servers . My idea is since your project is very powerful and it evolves around the idea of bypassing cloudflare , you could try make webserver that accepts url and proxy as arguments and returns CF Page content , cookies and headers using your project . i think it would be very interesing since most libs now dont work .

Manageable Usage doesn't work

The first method with actual browser opening works fine, but the second method with headless: true doesn't work. I try the example from the documentation and it just opens two browsers, one with two blank pages, the other with one blank page

Bug with recpatchav2

Hey,

sorry me again, when it detects an recaptchav2 its starts clicking it and then pressing only the first picture.
Maybe you could add an support for capsolver/capmonster to solve this kind of captchas, that would be awesome

Puppeteer-real-browser as headless: "true"

I've been trying to run puppeteer-headless-browser as headless:true" but the browser still opens as headless:false. On headless:auto the browser launches as headless:false.
Below are the connect params i'm passing

connect({
    headless: 'true',
    args: [],
    customConfig: {},
    skipTarget: [],
    fingerprint: true,
    turnstile: true,
    connectOption: {},
    tf: true,
})

How to launch the browser as headless:true.

Error: require() of ES Module ./node_modules/puppeteer-real-browser/src/index.js from ./server.ts not supported. Instead change the require of index.js in ./server.ts to a dynamic import() which is available in all CommonJS modules.

When trying to run the application, the following error appears:

Error: require() of ES module /Users/jhonatabonadio/stfnoticias/node_modules/puppeteer-real-browser/src/index.js from /Users/jhonatabonadio/stfnoticias/server.ts not supported.
Instead, change the require from index.js in /Users/jhonatabonadio/stfnoticias/server.ts to a dynamic import() that is available in all CommonJS modules.

server.ts


import dotenv from 'dotenv'
import { connect } from 'puppeteer-real-browser'

dotenv.config()

puppeteer.use(StealthPlugin())

async function consultarProcesso(cpf: string) {
  connect({
    turnstile: true,
    fingerprint: true,
    headless: 'auto',
  }).then(async (response: any) => {
    const { page, browser } = response

    await page.goto('https://nopecha.com/demo/cloudflare')
  })
}

HELP/SUGGESTION: Multi - Threading

Hello how you would deal with multi-threading? There are some good libraries like crawlee and puppeteer-cluster but I can't see the way to integrate them with puppeteer-real-browser

CF looping

Greetings,

when using an extension to generate new UA+Headers sometimes cloudflare seems to flag it and then its looping(solving again and again)
Is there a way to make an check so that it only trys 1-3 times to solve cloudflare and afterwards its closes and gives an error?

Not bypass captcha ClouldFlare

Hello everyone!
I've been trying to use Puppeteer with the current browser but can't get bypass the captcha of this particular https://jobnib.com/book/i-am-the-luna-chapter-32
Here is my code.

`
(async () => {
var { connect } = await import('puppeteer-real-browser')
const { page, browser } = await connect({
headless: 'auto',
args: [],
customConfig: {},
skipTarget: [],
fingerprint: false,
turnstile: true,
connectOption: {},
fpconfig: {},
})

try {
    await page.goto('https://jobnib.com/book/i-am-the-luna-chapter-32');
    await sleep(30000)

} catch (error) {
    console.error('Lỗi:', error);
} finally {
    await browser.close();
    pool.end();
    process.exit(1); 
}

})();
`

Thank all

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.