GithubHelp home page GithubHelp logo

Comments (10)

ermolaev1337 avatar ermolaev1337 commented on May 18, 2024 4

Is there a way to connect the puppeteer-cluster to a remote instance of chromium? (“connect” instead of “launch”)

from puppeteer-cluster.

generic11 avatar generic11 commented on May 18, 2024 3

Hello - just wanted to get a feel for how active this project is. I see puppeteer cluster as being useful for several projects I'd like to work on. However, I'm hesitant to use it if development will be abandoned. Is development still happening? Thanks!

from puppeteer-cluster.

j-manu avatar j-manu commented on May 18, 2024 2
  1. Add a mixed concurrency model. i.e for PAGE or CONTEXT concurrency model, have the option to distribute the jobs to more than one browser instance. So a crash won't affect all jobs and this offers a good balance between reliability and resource usage.

  2. Add API to return the length of queue, time when the oldest item in queue was added and Number of jobs processed in the last minute. For a continuously operating cluster i.e jobs being added continuously, this information is valuable.

from puppeteer-cluster.

cyxou avatar cyxou commented on May 18, 2024 1

Cool, glad to hear that. Feel free to ping me if you need any help)

from puppeteer-cluster.

barpaw avatar barpaw commented on May 18, 2024

I have a question. How many browsers I can spawn in parallel for processor core? Lets Say my server has processor with 4 cores. How many browsers I can spawn in one time for my tests to pass?

from puppeteer-cluster.

thomasdondorf avatar thomasdondorf commented on May 18, 2024

Next time, please open a separate issue if it has nothing to do with this issue.

Regarding your question: It depends on your use case. For simple DOM handling I was able to run ~10 worker on my machine (i5 quad core). Just give it a try with the option (monitor: true) and see how your machine is handling the tasks.

from puppeteer-cluster.

cyxou avatar cyxou commented on May 18, 2024

Unfortunately, the current implementation of custom concurrency doesn't address the case when you need to provide custom puppeteer parameters to jobInstances. IMHO this would effectively solve the #36 with puppeteer args: [ '--incognito', '--proxy-server=${proxyServer}' ] and await page.authenticate(credentials).

@thomasdondorf , what do you think about this?

from puppeteer-cluster.

thomasdondorf avatar thomasdondorf commented on May 18, 2024

I'm currently thinking about completely reworking the concurrency implementations. Then there would be no more "WorkerInstance" and "JobInstance". Just one function that is called when a page is needed. Then the concurrency implementation would have 100% flexibility when a puppeteer instance is started and when one is reused.

Expect some code changes in the next two weeks ;)

from puppeteer-cluster.

strarsis avatar strarsis commented on May 18, 2024

+1 for Docker container support.
https://github.com/skalfyfan/dockerized-puppeteer

from puppeteer-cluster.

rennokki avatar rennokki commented on May 18, 2024

(Long-term runs of puppteer-cluster #25) Make sure it's reliable and crawl more than 10 million pages with it (so far the maximum I crawled was ~800k pages)

I use k6 benchmarks in my CI tests for soketi, making sure all releases are passing benchmarks in most of the cases.

Would it be a great idea to set it up for you for page rendering testing?

from puppeteer-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.