I have a crawler setup with Scrapy version 0.24.2 and the latest version of scrapy-red

Slow crawling speeds with allowed_domains about scrapy-redis HOT 4 CLOSED

rmax commented on May 13, 2024

Slow crawling speeds with allowed_domains

from scrapy-redis.

Comments (4)

rmax commented on May 13, 2024

How large is the allowed_domains list? Could you add the allowed_domains and disable the offsite middleware? Could you run the spider with the allowed_domains list but disabling scrapy-redis?

from scrapy-redis.

Doginal commented on May 13, 2024

Its about 250 urls, for some reason can not disable the offsite middleware been setting 'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware': None . I disabled scrapy-redis and looks like the same result so this must be a scrapy bug in the allowed_domains?

from scrapy-redis.

rmax commented on May 13, 2024

You could time how much take the offsite middleware operations:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spidermiddleware/offsite.py

If that's the problem, it might be better to perform the offset check
before pushing the urls to redis.

On Wed, Aug 6, 2014 at 7:27 PM, Doginal [email protected] wrote:

Its about 250 urls, for some reason can not disable the offsite middleware
been setting 'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware':
None . I disabled scrapy-redis and looks like the same result so this must
be a scrapy bug in the allowed_domains?

—
Reply to this email directly or view it on GitHub
#19 (comment).

from scrapy-redis.

younghz commented on May 13, 2024

Hi Doginal,
The warning at the beginning of your crawl is due to scrapy-redis is based on the old version of Scrapy.
Scrapy rename scrapy.spider.BaseSpider to scrapy.spider.Spider at version 0.22.0. You could change it in your scrapy-redis code then the warning will disappear.

from scrapy-redis.

Recommend Projects

Slow crawling speeds with allowed_domains about scrapy-redis HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs