Comments (6)
I'm not sure if this works like this because of the asynchronicity, but everytime onSuccess(response)
is called it returns an array of links inside response. Those links are the ones the crawler will continue to crawl up to the configured depth. If the crawler does this sequentially you would have an ordered list of pages that the crawler will follow.
from headless-chrome-crawler.
I noticed it as well, and my best guess so fa with this is that we could store this lis on a global variable, because the order is correct and then on the preRequest match the future request with this global variable.
But I am also thinking that this option could also be useful for example configure the referrer for the next request. As far I understand, currently all the requests won't have any referrer and this can set off a few alarms and got blocked.
from headless-chrome-crawler.
@yvmarques
Is your use case satisfied if the previous page's information (like document.referrer
) is passed to onSuccess
's result?
from headless-chrome-crawler.
@yujiosaka I am not sure in the onSuccess
you can change the headers for the coming request ? Wouldn't that previous page's information be more useful on the preRequest
method ?
The idea would to have something similar to what Scrapy has for the Request
and Response
.
https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta
from headless-chrome-crawler.
Wouldn't that previous page's information be more useful on the
preRequest
method ?
Yes, it will be. I just thought you only wanted to know where the request is coming from.
If the referrer is passed to preRequest
, you can even modify headers by extraHeaders
options.
If it's what you wanted, I can probably add the feature quick.
from headless-chrome-crawler.
I don't know how hard would it be to, for example get the result of a previous request passed to preRequest
and the executed request on onSuccess
.
from headless-chrome-crawler.
Related Issues (20)
- Is this project still active? HOT 2
- Too many branches HOT 1
- Save HTML and CSS instead of screenshot? HOT 3
- Get current URL in customCrawl() HOT 3
- Disable loading images HOT 2
- how disabled "UnhandledPromiseRejectionWarning: Error: Protocol error" HOT 1
- how overwrite cache set function?
- how set append?
- Can Update Puppeteer version to 5.2.1
- Are links with empty href ignored? (button links handled by page js) HOT 1
- Protocol error: Connection closed. Most likely the page has been closed. HOT 1
- Crawling site with maxDepth > 2 causes hang HOT 3
- Is there a way to scroll? HOT 4
- Provide auth to page when using proxy
- subdomain crawl with "allowedDomains" parameter crawls top domain, too
- i get a JSHandle@node string instead of a ElementHandle object
- [bug]hang ..........
- How can i make customCrawl click on specific elements? HOT 1
- Can some one guide me how to use proxy ips with this? HOT 1
- Proxy not working --proxy-server
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from headless-chrome-crawler.