GithubHelp home page GithubHelp logo

Comments (5)

barnett2010 avatar barnett2010 commented on July 29, 2024

Crawling page 2023...
Crawling page 2024...
2020-03-11 21:32:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying
<GET https://tieba.baidu.com/p/3564343563?pn=5098> (failed 3 times): User timeo
ut caused connection failure: Getting https://tieba.baidu.com/p/3564343563?pn=50
98 took longer than 180.0 seconds..
2020-03-11 21:32:42 [scrapy.core.scraper] ERROR: Error downloading <GET https://
tieba.baidu.com/p/3564343563?pn=5098>
Traceback (most recent call last):
File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 1416, in
_inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "c:\python37\lib\site-packages\twisted\python\failure.py", line 512, in t
hrowExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "c:\python37\lib\site-packages\scrapy\core\downloader\middleware.py", lin
e 42, in process_request
defer.returnValue((yield download_func(request=request, spider=spider)))
File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 654, in _
runCallbacks
current.result = callback(current.result, *args, **kw)
File "c:\python37\lib\site-packages\scrapy\core\downloader\handlers\http11.py"
, line 377, in _cb_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout
))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Get
ting https://tieba.baidu.com/p/3564343563?pn=5098 took longer than 180.0 seconds
..

from tieba_spider.

barnett2010 avatar barnett2010 commented on July 29, 2024

会有这样的提示,然后脚本停止运行

from tieba_spider.

Aqua-Dream avatar Aqua-Dream commented on July 29, 2024

这是你网断了吧,错误提示写的是网络180秒无响应

from tieba_spider.

barnett2010 avatar barnett2010 commented on July 29, 2024

@Aqua-Dream
感谢大佬的回复。我又测试了几个贴吧,总页数有1万多页,基本都是下载到2000多时,就会出现上面的提示。

我猜测有个可能,现在贴吧限制了,17年前的帖子都不让看了。是不是这个引起的出错。。


还有一个,想要一个这样的功能。
如果有几十个贴吧想备份,
能不能同时写进config里,然后一次性下载完成。
或者写一个贴吧list.txt。config调用这个文件。
list里
按 吧名 数据库名 一行一个。

from tieba_spider.

Aqua-Dream avatar Aqua-Dream commented on July 29, 2024

你写个batch后台运行不就行了

from tieba_spider.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.