Comments (4)
How large is the allowed_domains list? Could you add the allowed_domains and disable the offsite middleware? Could you run the spider with the allowed_domains list but disabling scrapy-redis?
from scrapy-redis.
Its about 250 urls, for some reason can not disable the offsite middleware been setting 'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware': None . I disabled scrapy-redis and looks like the same result so this must be a scrapy bug in the allowed_domains?
from scrapy-redis.
You could time how much take the offsite middleware operations:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spidermiddleware/offsite.py
If that's the problem, it might be better to perform the offset check
before pushing the urls to redis.
On Wed, Aug 6, 2014 at 7:27 PM, Doginal [email protected] wrote:
Its about 250 urls, for some reason can not disable the offsite middleware
been setting 'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware':
None . I disabled scrapy-redis and looks like the same result so this must
be a scrapy bug in the allowed_domains?—
Reply to this email directly or view it on GitHub
#19 (comment).
from scrapy-redis.
Hi Doginal,
The warning at the beginning of your crawl is due to scrapy-redis is based on the old version of Scrapy.
Scrapy rename scrapy.spider.BaseSpider
to scrapy.spider.Spider
at version 0.22.0. You could change it in your scrapy-redis code then the warning will disappear.
from scrapy-redis.
Related Issues (20)
- Redis data persistence HOT 4
- Scrapy 2.6.1 Unsupport function make_requests_from_url to use HOT 5
- error object has no attribute 'make_requests_from_url' HOT 2
- [spiders] remove duplicate check setting types
- why I can not see request record in redis HOT 2
- 你好,大佬,请问一个问题,我用scrapy-redis执行爬虫的时候,设置最大并发是20,但是怎么感觉像创建20个队列一样,上一个20执行完之后,下一个20再执行,大佬,您能为我解答一下吗,感激不尽 HOT 2
- How to use scrapy-redis if I'm using start_requests() instead of start_urls in my spider? HOT 3
- 日志报警:String request is deprecated
- 警告: Passing a 'spider' argument to ExecutionEngine.crawl is deprecated HOT 4
- make_request_from_data implementation in RedisMixin HOT 2
- Cleanup requirements HOT 1
- Playwright? HOT 2
- Scrapy 2.8.0 deprecated function scrapy.utils.request.request_fingerprint() warning HOT 1
- Is there a planned support for Python3.11? HOT 5
- [dev] Add Type annotations
- Add Type annotations pep-0483
- How does the CrawlSpider work?
- [Question] Fetch request url from redis fail HOT 4
- Add metadata to URLs to retrieve from Redis HOT 2
- Add Kafka Topic Integration to Scrapy Redis HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy-redis.