Comments (5)
Crawling page 2023...
Crawling page 2024...
2020-03-11 21:32:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying
<GET https://tieba.baidu.com/p/3564343563?pn=5098> (failed 3 times): User timeo
ut caused connection failure: Getting https://tieba.baidu.com/p/3564343563?pn=50
98 took longer than 180.0 seconds..
2020-03-11 21:32:42 [scrapy.core.scraper] ERROR: Error downloading <GET https://
tieba.baidu.com/p/3564343563?pn=5098>
Traceback (most recent call last):
File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 1416, in
_inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "c:\python37\lib\site-packages\twisted\python\failure.py", line 512, in t
hrowExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "c:\python37\lib\site-packages\scrapy\core\downloader\middleware.py", lin
e 42, in process_request
defer.returnValue((yield download_func(request=request, spider=spider)))
File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 654, in _
runCallbacks
current.result = callback(current.result, *args, **kw)
File "c:\python37\lib\site-packages\scrapy\core\downloader\handlers\http11.py"
, line 377, in _cb_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout
))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Get
ting https://tieba.baidu.com/p/3564343563?pn=5098 took longer than 180.0 seconds
..
from tieba_spider.
会有这样的提示,然后脚本停止运行
from tieba_spider.
这是你网断了吧,错误提示写的是网络180秒无响应
from tieba_spider.
@Aqua-Dream
感谢大佬的回复。我又测试了几个贴吧,总页数有1万多页,基本都是下载到2000多时,就会出现上面的提示。
我猜测有个可能,现在贴吧限制了,17年前的帖子都不让看了。是不是这个引起的出错。。
还有一个,想要一个这样的功能。
如果有几十个贴吧想备份,
能不能同时写进config里,然后一次性下载完成。
或者写一个贴吧list.txt。config调用这个文件。
list里
按 吧名 数据库名 一行一个。
from tieba_spider.
你写个batch后台运行不就行了
from tieba_spider.
Related Issues (20)
- HI 请问被百度封ip怎么办 HOT 1
- 添加cookie HOT 4
- scrapy新版本make_requests_from_url method被弃用,代码运行不了
- 请问这个是被block了吗? HOT 11
- 我在运行项目时出现了问题 HOT 1
- 启动报错 HOT 1
- AttributeError: 'Values' object has no attribute 'overwrite_output' HOT 12
- 是否可以添加“从什么时候的帖子开始爬” HOT 1
- mysqldb运行出错,MySQLdb 是python2用的? HOT 3
- AttributeError: 'list' object has no attribute 'values'求解答大佬
- 中途断了爬取怎样才能继续运行呢 HOT 1
- 新手第一天用python,依赖的包都下载好了,运行scrapy run 沙发 aa HOT 22
- 被验证码卡住了 HOT 1
- 哥又遇到问题了,大的贴吧帖子多爬到pn值10000以上,帖子开始一种循环,那些老帖是没了吗 HOT 2
- 302验证问题 HOT 2
- 请问怎么加代理 HOT 1
- 看不出什么问题了,help HOT 1
- 抓取第一个吧(仙五前修改吧)的数据正常,抓取第二个吧数据的时候就失败了
- 抓取数据量比较大的贴吧时,抓取到36页的时候,报错了 HOT 1
- 改了add_argument函数,还是运行不了 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tieba_spider.