chanwoood / crawl-zsxq Goto Github PK
View Code? Open in Web Editor NEW爬取知识星球,并制作成 PDF 电子书。
License: MIT License
爬取知识星球,并制作成 PDF 电子书。
License: MIT License
https://api.zsxq.com/v1.10/groups/2421112121/topics?scope=digests&count=20&end_time=2018-04-12T15%3A49%3A13.443%2B0800
路径后面的 end_time 表示加载帖子的最后日期,以此达到翻页。
File "D:/SourceCodes/crawl-zsxq-master/crawl1.py", line 37, in get_data
for topic in json.loads(f.read()).get('resp_data').get('topics'):
TypeError: 'NoneType' object is not iterable
举例:
第一页的时间是2018-01-01T01:01:01.000+0800(主要是000这里)
翻页就变成了2018-01-01T01:01:01.0-1+0800
正确的应该是2018-01-01T01:01:00.999+0800
爬出来的内容回车都没有了,糊成一团,不知道是爬出来的数据本来就没有,还是转PDF的时候把回车去掉了,有空可以修复下。谢谢作者的贡献~
抓取图片是很简单的,只是要用合适的标签插入 HTML 文档,方可正确地转换为 PDF
如果分享技术就好了,发表星球里的内容,就不太妥
~
Traceback (most recent call last):
File "C:/Users/Administrator/Downloads/crawl-zsxq-master/crawl.py", line 119, in
make_pdf(get_data(start_url))
File "C:/Users/Administrator/Downloads/crawl-zsxq-master/crawl.py", line 37, in get_data
for topic in json.loads(f.read()).get('resp_data').get('topics'):
TypeError: 'NoneType' object is not iterable
提示这个错误怎么解决呢?
start_url那里我不知道该如何添加,所以直接复制URL后得到如下报错:
Traceback (most recent call last):
File "crawl.py", line 119, in
make_pdf(get_data(start_url))
File "crawl.py", line 34, in get_data
f.write(json.dumps(rsp.json(), indent=2, ensure_ascii=False))
File "C:\anaconda\lib\site-packages\requests\models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "C:\anaconda\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\anaconda\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\anaconda\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
请问,这只是因为start_url的问题吗?如果是的话,应该怎么找到精华区的接口呢?麻烦您了
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.