GithubHelp home page GithubHelp logo

Comments (6)

BubuAnabelas avatar BubuAnabelas commented on May 18, 2024

I'm not sure if this works like this because of the asynchronicity, but everytime onSuccess(response) is called it returns an array of links inside response. Those links are the ones the crawler will continue to crawl up to the configured depth. If the crawler does this sequentially you would have an ordered list of pages that the crawler will follow.

from headless-chrome-crawler.

yvmarques avatar yvmarques commented on May 18, 2024

I noticed it as well, and my best guess so fa with this is that we could store this lis on a global variable, because the order is correct and then on the preRequest match the future request with this global variable.

But I am also thinking that this option could also be useful for example configure the referrer for the next request. As far I understand, currently all the requests won't have any referrer and this can set off a few alarms and got blocked.

from headless-chrome-crawler.

yujiosaka avatar yujiosaka commented on May 18, 2024

@yvmarques
Is your use case satisfied if the previous page's information (like document.referrer ) is passed to onSuccess's result?

from headless-chrome-crawler.

yvmarques avatar yvmarques commented on May 18, 2024

@yujiosaka I am not sure in the onSuccess you can change the headers for the coming request ? Wouldn't that previous page's information be more useful on the preRequest method ?

The idea would to have something similar to what Scrapy has for the Request and Response.

https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta

from headless-chrome-crawler.

yujiosaka avatar yujiosaka commented on May 18, 2024

@yvmarques

Wouldn't that previous page's information be more useful on the preRequest method ?

Yes, it will be. I just thought you only wanted to know where the request is coming from.
If the referrer is passed to preRequest, you can even modify headers by extraHeaders options.

If it's what you wanted, I can probably add the feature quick.

from headless-chrome-crawler.

yvmarques avatar yvmarques commented on May 18, 2024

I don't know how hard would it be to, for example get the result of a previous request passed to preRequest and the executed request on onSuccess.

from headless-chrome-crawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.