I'm hiring for meideng.net
为杭州美登科技招聘研发工程师,详见 meideng.net/join
A backup tool for renren.com
License: MIT License
I'm hiring for meideng.net
为杭州美登科技招聘研发工程师,详见 meideng.net/join
win10 x64 1909 +python3.8.1
正常执行程序
renrenBackup.exe fetch -e 用户名 -p 密码 -s -g -a -b
经过了漫长的等待之后,报了个错
“
fetched 42 albums
prepare to fetch blogs
start crawl blog list page 0
Traceback (most recent call last):
File "manage.py", line 116, in
File "site-packages\flask_script_init_.py", line 417, in run
File "site-packages\flask_script_init_.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in call
File "manage.py", line 41, in fetch
File "fetch.py", line 99, in fetch_user
File "fetch.py", line 76, in fetch_blog
File "crawl\blog.py", line 83, in get_blogs
File "crawl\blog.py", line 26, in load_blog_list
File "crawl\crawler.py", line 123, in get_json
File "json_init_.py", line 348, in loads
File "json\decoder.py", line 337, in decode
File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[16464] Failed to execute script manage
”
现在人人网的日志模块已经不能用了,是不是因为这事儿导致的报错啊???
然后,通过renrenBackup runserver发布的页面,也只能看见一个首页面,看不到其它模块。
求更新解决这个问题。
Describe the bug
感谢这个repo,非常实用!
在使用的过程中经常遇到这个error:illegal multibyte sequence,感觉是和相关的encoding有关。并不是经常出现,但是当内容有一些比较奇怪字符的时候就容易出现这个问题。
使用的系统是windows 10 x64中文版,但感觉这个问题是在insert into database时候出现的问题。
--- Logging error ---
Traceback (most recent call last):
File "logging\__init__.py", line 1036, in emit
UnicodeEncodeError: 'gbk' codec can't encode character '\uc190' in position 322: illegal multibyte sequence
Call stack:
File "manage.py", line 116, in <module>
File "site-packages\flask_script\__init__.py", line 417, in run
File "site-packages\flask_script\__init__.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in __call__
File "manage.py", line 41, in fetch
File "fetch.py", line 99, in fetch_user
File "fetch.py", line 76, in fetch_blog
File "crawl\blog.py", line 83, in get_blogs
File "crawl\blog.py", line 49, in load_blog_list
File "crawl\utils.py", line 124, in get_comments
File "site-packages\peewee.py", line 1574, in inner
File "site-packages\peewee.py", line 1645, in execute
File "site-packages\peewee.py", line 2288, in _execute
File "site-packages\peewee.py", line 2063, in _execute
File "site-packages\peewee.py", line 2653, in execute
File "site-packages\peewee.py", line 2628, in execute_sql
File "logging\__init__.py", line 1371, in debug
File "logging\__init__.py", line 1519, in _log
File "logging\__init__.py", line 1529, in handle
File "logging\__init__.py", line 1591, in callHandlers
File "logging\__init__.py", line 905, in handle
File "logging\handlers.py", line 479, in emit
File "logging\__init__.py", line 1132, in emit
File "logging\__init__.py", line 1040, in emit
Message: ('INSERT OR REPLACE INTO "comment" ("id", "t", "entry_id", "entry_type", "authorId", "authorName", "content") VALUES (?, ?, ?, ?, ?, ?, ?)', [36028797501124092, datetime.datetime(2008, 4, 8, 7, 22, 45), 282116120, 'blog', 172790766, '赵欢', '回复孙鹤中손학중:过奖啦!呵呵<img src="<a href=\'http://uu.ren/kRMsjR\' target=\'_blank\' title=\'http://static.xiaonei.com/img/editor/emot/emot-10.gif\'>http://uu.ren/kRMsjR </a> "/>'])
现在代码抓着抓着,有时候因为网络原因就断了,或者服务器的问题断了,能加入中断之后继续的功能么?还是说本来就有这个功能?
使用python3.7.0
运行python manage.py -e XXX -p OOO -b
显示登录成功但是并不能爬到日志内容 (用python manage.py -e XXX -p OOO -s -g -a都是成功的)
怀疑是因为最近开始电脑端人人网日志页面挂了 (网页上根本点进去是404)
但是手机网页版依旧可以用 不知道能不能改一下code 从手机网页版爬取日志内容
谢谢!
当抓取别人的内容时,个别时候,运行结束时,输出的uid正确,而用户名、用户头像都来自于抓取者本人。
我现在手里有一个例子,但限于隐私不便放出。
据我分析,这个问题分成两部分
首先,当fetch.py脚行时,无论是否给出 -u 选项,utils.get_user() 几乎总是给出抓取者本人的姓名和头像。无论提供哪个uid的homepage,结果总是一样的。
似乎这并没有很简短的修复办法。是可以抓取首页上的头像和姓名,但可能需要上BeautifulSoup来解析,而不是简单地字符搜索。
这样的话,初始的姓名和头像就是错的。
第二个问题,似乎绝大多数情况下,正确的用户名和头像会在脚本运行的过程中被更新。但是仍然存在少数时候没有更新。我还没有搞清楚为什么大部分时候会更新,少数时候没有被更新。
更新:
分析源码之后我似乎明白怎么回事了。save_user总共只有三个被调用点。既然 get_user() 返回的总是错的,那么只能是 get_comments() get_likes() 更新了用户的小头像。对于gossips来说,因为用户的头像是当年的头像,抓取留言板的时候并没有更新用户头像信息。
结论:当一个用户从未回复过评论,且从未对自己的内容点赞时,就无法抓取到正确的用户姓名和用户头像。
这样看来,还是修正get_user() 比较可行。我可能研究研究发个PR。
Traceback (most recent call last):
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "C:\Users\xxx.virtualenvs\renrenBackup-master-RLOICigy\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 1321, in getresponse
response.begin()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 296, in begin
version, status, reason = self._read_status()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\http\client.py", line 257, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37-32\Lib\socket.py", line 589, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] 远端主机已强制关闭一个现存的连线。
Traceback (most recent call last):
File "fetch.py", line 129, in
fetched = fetch_user(fetch_uid, cmd_args)
File "fetch.py", line 98, in fetch_user
fetch_album(uid)
File "fetch.py", line 71, in fetch_album
album_count = crawl_album.get_albums(uid)
File "I:\renrenBackup-master\crawl\album.py", line 118, in get_albums
total += get_album_list_page(cur_page, uid)
File "I:\renrenBackup-master\crawl\album.py", line 106, in get_album_list_page
get_album_summary(aid, uid)
File "I:\renrenBackup-master\crawl\album.py", line 66, in get_album_summary
'src': get_image(p['large']),
File "I:\renrenBackup-master\crawl\utils.py", line 31, in get_image
resp = crawler.get_url(img_url)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
File "I:\renrenBackup-master\crawl\crawler.py", line 97, in get_url
return self.get_url(url, params, method, retry)
[Previous line repeated 2 more times]
File "I:\renrenBackup-master\crawl\crawler.py", line 82, in get_url
raise Exception("network error, exceed max retry time")
Exception: network error, exceed max retry time
Traceback (most recent call last):
File "manage.py", line 116, in
File "site-packages\flask_script_init_.py", line 417, in run
File "site-packages\flask_script_init_.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in call
File "manage.py", line 53, in export
File "export.py", line 139, in export_all
KeyError: 'users'
[27288] Failed to execute script manage
求大大更新
(PS:其实看到了#51export失败提交的错误,也看到您修改了,只是我不知如何应用,看着编码一头雾水。。。)
评论没反应集中在倒数三四页的第二个评论,约莫有三四页中的第二个状态评论无法显示
系统是win10 1909 64位
再加一句:在备份某个好友时,会出现“renren return 500, wait a moment”字样,然后备份就卡在那里(备份一个没有email的,只有一串数字用户名的账号,也出现了同样问题)
Describe the bug
我用以下命令去备份我的个人内容
python manage.py fetch -p *** -e *** -s -g -a -b
前面的下载基本上都正常,但是进行到如下状态后,就报错了
fetch album 311698393 2008.18 (), 评0/分0/赞0
Traceback (most recent call last):
File "manage.py", line 158, in <module>
cli()
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "manage.py", line 53, in fetch
fetched = fetch_user(
File "/Users/Kinggerm/Downloads/renrenBackup/fetch.py", line 99, in fetch_user
fetch_album(uid)
File "/Users/Kinggerm/Downloads/renrenBackup/fetch.py", line 71, in fetch_album
album_count = crawl_album.get_albums(uid)
File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 163, in get_albums
count, after = get_album_list_page(uid, after)
File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 153, in get_album_list_page
get_album_summary(aid, uid)
File "/Users/Kinggerm/Downloads/renrenBackup/crawl/album.py", line 73, in get_album_summary
album_data = crawler.get_json(
File "/Users/Kinggerm/Downloads/renrenBackup/crawl/crawler.py", line 178, in get_json
r = json.loads(resp.text.replace(",}", "}"))
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
To Reproduce
重新运行原命令可在原位复现,但是账号和密码不太方便透露
试图在Windows系统直接以cmd运行renrenBackup.exe fetch -e EMAIL -p PASSWORD -a
,遇到以下错误:
check login, and get homepage for cookie
need login
prepare login encryt info
prepare post login request
Traceback (most recent call last):
File "manage.py", line 116, in <module>
File "site-packages\flask_script\__init__.py", line 417, in run
File "site-packages\flask_script\__init__.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in __call__
File "manage.py", line 38, in fetch
File "crawl\crawler.py", line 53, in __init__
File "crawl\crawler.py", line 137, in check_login
File "crawl\crawler.py", line 80, in get_url
File "crawl\crawler.py", line 178, in login
File "json\__init__.py", line 348, in loads
File "json\decoder.py", line 337, in decode
File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[14520] Failed to execute script manage
系统是Windows 10 64位。
在我这爬留言板的时候, 每当爬到 49 页就会报错:
start crawl gossip page 48
crawled 20 gossip on page 48
start crawl gossip page 49
Traceback (most recent call last):
File "fetch.py", line 46, in <module>
gossip_count = crawl_gossip.get_gossip()
File "/path/to/renrenBackup/crawl/gossip.py", line 62, in get_gossip
total = load_gossip_page(cur_page)
File "/path/to/renrenBackup/crawl/gossip.py", line 49, in load_gossip_page
gossip['content'] = normal_pattern.findall(body)[0]
IndexError: list index out of range
Describe the bug
按照readme 安装 运行 抓取自己的所有信息失败
To Reproduce
Steps to reproduce the behavior:
python manage.py fetch -e [email protected] -p xxxx -s -g -a -b
Expected behavior
抓取自己的所有信息
Error Output:
check login, and get homepage for cookie
need login
prepare login encryt info
prepare post login request
login success with [email protected] as 1234456644
check login, and get homepage for cookie
login valid
login valid
Traceback (most recent call last):
File "manage.py", line 170, in <module>
cli()
File "/Users/lfeng/Dev/projects/renrenBackup/venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/lfeng/Dev/projects/renrenBackup/venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/lfeng/Dev/projects/renrenBackup/venv/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/lfeng/Dev/projects/renrenBackup/venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/lfeng/Dev/projects/renrenBackup/venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "manage.py", line 57, in fetch
fetched = fetch_user(
File "/Users/lfeng/Dev/projects/renrenBackup/fetch.py", line 88, in fetch_user
get_user(uid)
File "/Users/lfeng/Dev/projects/renrenBackup/crawl/utils.py", line 108, in get_user
name = re.findall(
IndexError: list index out of range
Additional context
mac m1 os 14
python 3.8.5
有点可惜 好多事记不清还想回忆下的
相册图片详情页应该控制图片大小不超过当前页面容器尺寸,如果有更大的图,允许点击跳转查看大图即可
需求 by hackpro@v2ex https://www.v2ex.com/t/481371#r_7265457
报错ImportError: No module named 'playhouse'
之前感觉腾讯微博要over,
自己写了一个腾讯微博备份工具(见我的Git),
由于微博比较简单,所以干脆就用selenium备份到了word里面,
然后把腾讯微博注销了。
最近想把人人也注销掉,本来想自己写个备份工具,
结果搜了下git发现竟然有这么好用的备份工具,
刚才备份成功后显示效果也非常不错。
但是不知道目前这个工具的备份功能还有什么后续更新计划吗?
(看到TODO里面的LIST应该不是备份功能)
如果没有的话我就干脆注销掉人人账号了。
(如果有更新或者此版本有备份不全的话注销了就没办法再备一次了😂)
最后感谢这么好用的工具!👍👍👍
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the solution you'd like
建议为 flask app 增加搜索,用来搜索正文和评论中的关键字,比如使用 elasticsearch
应该是 PyInstaller 打包时的文件目录随新版本发生了变化,需要修复
按 https://pyinstaller.readthedocs.io/en/stable/usage.html#what-to-bundle-where-to-search 的说明,用 --add-data
把 templates
和 static
目录加进去(不过这样的话,抓到的图片在 static 目录下怎么处理?可能还要确认下)
用 -w
增加窗口模式运行,可以避免只用命令行模式运行
HI there,
Thank you so much for maintaining such repository, this is really awesome!
As you know renren is not active now, so even we send out friend request, user would not respond at all. And I occasionally know we can get access to friend's album via http://photo.renren.com/getalbumprofile.do?owner=targetID, thus I was wondering if you can add support for this feature as well.
Thank you all!
在翻看相册等场合,希望能支持按键盘左右键来快速翻页,其他页面如能支持更好
需求 by hackpro@v2ex https://www.v2ex.com/t/481371#r_7265457
第二个步骤:
2.在命令提示符进入该目录,执行 renrenBackup.exe fetch -e email -p password -s -g -a -b 来抓取账号为 email 密码是 password 的用户信息(详细参数可见下方 Python 环境运行方式)
我下载了压缩包,加压后运行那个exe文件,每次一打开,对话框出现一秒钟就自动关闭跳掉,无法继续输入指令,请问怎么处理?
谢谢!
註:在python web.py中显示正常
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Error Output:
Additional context
Add any other context about the problem here.
打开保存的index.html 会有以下报错信息:
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
查了一下生成的过程,会有 TemplateNotFound 的信息,可能与此有关:
Exception on /blog/436610492 [GET]
Traceback (most recent call last):
File "site-packages\flask\app.py", line 2292, in wsgi_app
File "site-packages\flask\app.py", line 1815, in full_dispatch_request
File "site-packages\flask\app.py", line 1718, in handle_user_exception
File "site-packages\flask_compat.py", line 35, in reraise
File "site-packages\flask\app.py", line 1813, in full_dispatch_request
File "site-packages\flask\app.py", line 1799, in dispatch_request
File "web.py", line 123, in blog_detail_page
File "web.py", line 23, in render_template
File "site-packages\flask\templating.py", line 134, in render_template
File "site-packages\jinja2\environment.py", line 869, in get_or_select_template
File "site-packages\jinja2\environment.py", line 830, in get_template
File "site-packages\jinja2\environment.py", line 804, in _load_template
File "site-packages\jinja2\loaders.py", line 113, in load
File "site-packages\flask\templating.py", line 58, in get_source
File "site-packages\flask\templating.py", line 86, in _get_source_fast
jinja2.exceptions.TemplateNotFound: blog.html
所用的机器为windows10 64位,exe执行。
release standalone execute files
请问现在这个还能用吗?我刚试了不可以了,但愿不是renren网把接口都封了。
Describe the bug
--- Logging error ---
Traceback (most recent call last):
File "logging_init_.py", line 1036, in emit
UnicodeEncodeError: 'gbk' codec can't encode character '\u2665' in position 89: illegal multibyte sequence
Call stack:
File "manage.py", line 116, in
File "site-packages\flask_script_init_.py", line 417, in run
File "site-packages\flask_script_init_.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in call
File "manage.py", line 41, in fetch
File "fetch.py", line 87, in fetch_user
File "fetch.py", line 52, in fetch_status
File "crawl\status.py", line 53, in get_status
File "crawl\status.py", line 41, in load_status_page
File "crawl\utils.py", line 149, in get_likes
File "crawl\utils.py", line 48, in save_user
File "logging_init_.py", line 1371, in debug
File "logging_init_.py", line 1519, in log
File "logging_init.py", line 1529, in handle
File "logging_init_.py", line 1591, in callHandlers
File "logging_init_.py", line 905, in handle
File "logging\handlers.py", line 479, in emit
File "logging_init_.py", line 1132, in emit
File "logging_init_.py", line 1040, in emit
Message: 'try to save masked with headPic masked'
Arguments: ()
get image masked to local
To Reproduce
Steps to reproduce the behavior:
save any info from people whose name contains speical character, such as ♥
人人的 SNS 资产在 2019 年卖给了 Donews,从 2019 年 8 月开始,Web 日志列表页和日志详情页就 404 了,而且一直没有恢复的迹象,后来 Donews 发布了新的人人手机端应用,可以正常看到日志,说明数据没有丢
在 2020.04 经人提醒,日志可以通过类似如下的 URL 来看到摘要并其实有全文透出
http://dnactivity.renren.com/index.html?p=601%2F30314%2F966126912
可以参考 https://github.com/whusnoopy/renrenBackup/blob/master/docs/a0_fetch_blog_after_201908.md 里的细节,求有空的人帮忙 PR
我测试了一下,照片接口,非本人信息也是可以抓取的;
我们可以互相交流一下;
下载的某些图片内容实质是
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>openresty</center>
</body>
</html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
根据图片链接打开,某些图片会404,但是过阵子打开又会正常,推测是cdn第一次访问无图片返回404,回源拿到后返回正常。
不会 Python
用 nodejs
写了个,原理是读取 img
目录中 404 的图片,然后按照地址反复多次下载。虽然 人人
已经凉了,如果有人能用得上的自取吧。
https://github.com/lqzhgood/renrenBackup-Timeline#downimg
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
暂无
Additional context
十分感谢这个项目的贡献者,虽然已经距离备份的黄金时间过去了很久,但是现在仍然备份出了非常多珍贵的数据。这次提出的建议我也乐意能做些什么,只是不太清楚是否可行,如果能提供相关接口我可以尝试实现一下。
File "***/renrenBackup/web.py", line 20, in render_template
if request.is_xhr:
Fetching is fine. The error occurs when runserver and or exporting. I am not familiar with flask. A quick google find say that the is_xhr method has been removed (simmer, https://stackoverflow.com/a/60995530). Any workaround?
When there is an exist cookie and it expired, check_login will get a HTML page instead an error return, should check the return text
目前(2020.11.29)人人网的状态没有办法读取
点击进入个人主页的“状态”内,状态都无法加载。
python manage.py fetch -e [email protected] -p passwordAtRenren -s 无效
不过通过在“我的主页”内向下滑动,还是能够看到状态以及评论点赞的
Traceback (most recent call last):
File "fetch.py", line 129, in
fetched = fetch_user(fetch_uid, cmd_args)
File "fetch.py", line 98, in fetch_user
fetch_album(uid)
File "fetch.py", line 71, in fetch_album
album_count = crawl_album.get_albums(uid)
File "I:\renrenBackup-master\crawl\album.py", line 118, in get_albums
total += get_album_list_page(cur_page, uid)
File "I:\renrenBackup-master\crawl\album.py", line 106, in get_album_list_page
get_album_summary(aid, uid)
File "I:\renrenBackup-master\crawl\album.py", line 18, in get_album_summary
first_photo_id = re.findall(r'"photoId":"(\d+)",', resp.text)[0]
IndexError: list index out of range
Describe the bug
Yields error when the download finishes.
To Reproduce
Run with python fetch.py [email] [password] -b -r
or without -r
option.
Expected behavior
Finishes without an error.
Error Output:
sqlite3.OperationalError: no such table: status
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "fetch.py", line 134, in
update_fetch_info(fetch_uid)
File "fetch.py", line 30, in update_fetch_info
status=Status.select().where(Status.uid == uid).count(),
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1604, in inner
return method(self, database, *args, **kwargs)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1859, in count
return Select([clone], [fn.COUNT(SQL('1'))]).scalar(database)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1604, in inner
return method(self, database, *args, **kwargs)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1845, in scalar
row = self.tuples().peek(database)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1604, in inner
return method(self, database, *args, **kwargs)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1832, in peek
rows = self.execute(database)[:n]
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1604, in inner
return method(self, database, *args, **kwargs)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1675, in execute
return self._execute(database)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 1826, in _execute
cursor = database.execute(self)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 2696, in execute
return self.execute_sql(sql, params, commit=commit)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 2690, in execute_sql
self.commit()
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 2481, in exit
reraise(new_type, new_type(*exc_args), traceback)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 178, in reraise
raise value.with_traceback(tb)
File "/Users/hou/.local/share/virtualenvs/renrenBackup-I9IlZmg3/lib/python3.7/site-packages/peewee.py", line 2683, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: no such table: status
Additional context
This error also makes python web.py
and python export.py backup.tar
un-usable.
Traceback (most recent call last):
File "fetch.py", line 129, in
fetched = fetch_user(fetch_uid, cmd_args)
File "fetch.py", line 102, in fetch_user
fetch_blog(uid)
File "fetch.py", line 79, in fetch_blog
blog_count = crawl_blog.get_blogs(uid)
File "I:\renrenBackup-master\crawl\blog.py", line 71, in get_blogs
total = load_blog_list(cur_page, uid)
File "I:\renrenBackup-master\crawl\blog.py", line 47, in load_blog_list
get_comments(bid, 'blog', owner=uid)
File "I:\renrenBackup-master\crawl\utils.py", line 88, in get_comments
save_user(c['authorId'], c['authorName'], c['authorHeadUrl'])
KeyError: 'authorName'
好友数目,姓名,简要信息,若是能备份网页版聊天记录(就是首页右侧好友列表的聊天记录)就更好了。
如题,感谢
登陆之后遇到UnicodeEncodeError错误,注释掉第62行之后就好了
load cookies from ./.cookies.json
check login, and get homepage for cookie
login valid
Traceback (most recent call last):
File "fetch.py", line 129, in
fetched = fetch_user(fetch_uid, cmd_args)
File "fetch.py", line 87, in fetch_user
get_user(uid)
File "/Users/loumingming/code/renrenBackup/crawl/utils.py", line 62, in get_user
print(' get user {uid} {name} with {pic}'.format(uid=uid, name=name, pic=pic))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
运行环境:win10 home
命令执行
./renrenBackup.exe fetch -e email -p pwd -b
该错误会在抓取第1~15个日志时出现,应该不是特定日志导致的错误
Arguments: ()
crawled 9 comments on blog 77xxxxx25
Traceback (most recent call last):
File "manage.py", line 116, in
File "site-packages\flask_script_init_.py", line 417, in run
File "site-packages\flask_script_init_.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in call
File "manage.py", line 41, in fetch
File "fetch.py", line 99, in fetch_user
File "fetch.py", line 76, in fetch_blog
File "crawl\blog.py", line 83, in get_blogs
File "crawl\blog.py", line 51, in load_blog_list
File "crawl\utils.py", line 103, in get_comments
File "crawl\crawler.py", line 119, in get_json
File "json_init_.py", line 348, in loads
File "json\decoder.py", line 337, in decode
File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[15240] Failed to execute script manage
check login, and get homepage for cookie
need login
prepare login encryt info
prepare post login request
can not get login info, needs icode
get icode image, output to ./static/img/icode.jpg
Traceback (most recent call last):
File "fetch.py", line 125, in
cralwer = prepare_crawler(cmd_args)
File "fetch.py", line 22, in prepare_crawler
config.crawler = Crawler(args.email, args.password, Crawler.load_cookie())
File "/Users/whglamrock/Downloads/renrenBackup-master/crawl/crawler.py", line 49, in init
self.check_login()
File "/Users/whglamrock/Downloads/renrenBackup-master/crawl/crawler.py", line 128, in check_login
self.get_url("http://www.renren.com/{uid}".format(uid=self.uid))
File "/Users/whglamrock/Downloads/renrenBackup-master/crawl/crawler.py", line 76, in get_url
self.login()
File "/Users/whglamrock/Downloads/renrenBackup-master/crawl/crawler.py", line 172, in login
with open(config.ICODE_FILEPATH, 'wb') as fp:
FileNotFoundError: [Errno 2] No such file or directory: './static/img/icode.jpg'
Support export statics in GUI mode
Traceback (most recent call last):
File "manage.py", line 116, in
File "site-packages\flask_script_init_.py", line 417, in run
File "site-packages\flask_script_init_.py", line 386, in handle
File "site-packages\flask_script\commands.py", line 216, in call
File "manage.py", line 53, in export
File "export.py", line 139, in export_all
KeyError: 'users'
[17940] Failed to execute script manage
简单看了下export.py中的get_json里面总是失败
人人网于 2021 年 5 月对 Web 端进行了大改版,本工具当前版本(2021.06 及以前)已不可用
不过在新版里,对日志列表和相册列表,提供了新的 API 如下
http://rrwapi.renren.com/feed/v1/blogs
http://rrwapi.renren.com/feed/v1/albums
但日志详情页还是 HTML 直接渲染,相册详情页还未细看其返回结构
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.