I am not sure how I can archive this, but my requirement it's that I need to know the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Passes meta information from previous request about headless-chrome-crawler HOT 6 CLOSED

yujiosaka commented on May 18, 2024

Passes meta information from previous request

from headless-chrome-crawler.

Comments (6)

BubuAnabelas commented on May 18, 2024

I'm not sure if this works like this because of the asynchronicity, but everytime onSuccess(response) is called it returns an array of links inside response. Those links are the ones the crawler will continue to crawl up to the configured depth. If the crawler does this sequentially you would have an ordered list of pages that the crawler will follow.

from headless-chrome-crawler.

yvmarques commented on May 18, 2024

I noticed it as well, and my best guess so fa with this is that we could store this lis on a global variable, because the order is correct and then on the preRequest match the future request with this global variable.

But I am also thinking that this option could also be useful for example configure the referrer for the next request. As far I understand, currently all the requests won't have any referrer and this can set off a few alarms and got blocked.

from headless-chrome-crawler.

yujiosaka commented on May 18, 2024

@yvmarques
Is your use case satisfied if the previous page's information (like document.referrer ) is passed to onSuccess's result?

from headless-chrome-crawler.

yvmarques commented on May 18, 2024

@yujiosaka I am not sure in the onSuccess you can change the headers for the coming request ? Wouldn't that previous page's information be more useful on the preRequest method ?

The idea would to have something similar to what Scrapy has for the Request and Response.

https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta

from headless-chrome-crawler.

yujiosaka commented on May 18, 2024

@yvmarques

Wouldn't that previous page's information be more useful on the preRequest method ?

Yes, it will be. I just thought you only wanted to know where the request is coming from.
If the referrer is passed to preRequest, you can even modify headers by extraHeaders options.

If it's what you wanted, I can probably add the feature quick.

from headless-chrome-crawler.

yvmarques commented on May 18, 2024

I don't know how hard would it be to, for example get the result of a previous request passed to preRequest and the executed request on onSuccess.

from headless-chrome-crawler.

Recommend Projects

Passes meta information from previous request about headless-chrome-crawler HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs