GithubHelp home page GithubHelp logo

Comments (1)

barjin avatar barjin commented on June 7, 2024 1

Seems like a sign of a much larger underlying issue:

New sessions / fingerprints / proxyUrls are generated only on a browser launch.

The following snippet doesn't rotate the fingerprints correctly - all requests are done with one session only. This is because the useIncognitoPages was written with Playwright contexts in mind - we relied on the "newPage() creates a separate environment" invariant, so all the pages/contexts are launched in one browser.

sessionPoolOptions: {
    sessionOptions: {
        maxUsageCount: 1,
    },
},
launchContext: {
   useIncognitoPages: true,
},

The following snippet rotates the fingerprints correctly:

sessionPoolOptions: {
    sessionOptions: {
        maxUsageCount: 1,
    },
},
launchContext: {
   useIncognitoPages: false,
},

This works well because an "expired" session throws away the whole browser instance, causing the new pages to launch a whole new browser (see the parallel with the maxOpenPagesPerBrowser, which does the same thing). This is crazy expensive though, while launching and closing a context 100 times in one browser takes ~3.9 seconds, launching and closing a browser 100 times takes 40 seconds.

The entire browser-pool and session rotation logic is quite convoluted and worth a total rewrite.

from crawlee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.