nondanee / weibopicdownloader Goto Github PK

View Code? Open in Web Editor NEW

266.0 14.0 63.0 3.75 MB

Download weibo images without logging-in

License: GNU General Public License v3.0

Python 100.00%

weibo

weibopicdownloader's Introduction

weiboPicDownloader

(not real) weibo user album batch download tool (CLI)

build user album by picking all photos from original weibos in user's post feed

for more weibo free login APIs, turn to wiki

中文 README

References

yAnXImIN/weiboPicDownloader

ningshu/weiboPicDownloader

Overview

Dependencies

$ pip install requests
$ pip install colorama # only windows version under 10.0.14393 required
$ pip install futures # only python2 environment required

Usage

$ python .\weiboPicDownloader.py -h
usage: weiboPicDownloader [-h] (-u user [user ...] | -f file [file ...])
                          [-d directory] [-s size] [-r retry] [-i interval]
                          [-c cookie] [-b boundary] [-n name] [-v] [-o]

optional arguments:
  -h, --help          show this help message and exit
  -u user [user ...]  specify nickname or id of weibo users
  -f file [file ...]  import list of users from files
  -d directory        set picture saving path
  -s size             set size of thread pool
  -r retry            set maximum number of retries
  -i interval         set interval for feed requests
  -c cookie           set cookie if needed
  -b boundary         focus on weibos in the id range
  -n name             customize naming format
  -v                  download videos together
  -o                  overwrite existing files

Required argument (choose one)

-u user ... users (nickname or id)
-f file ... user list files (nickname or id, separated by linefeed in the file)

Optional arguments

-d directory media saving path (default value: ./weiboPic)
-s size thread pool size (default value: 20)
-r retry max retries (default value: 2)
-i interval request interval (default value: 1, unit: second)
-c cookie login credential (only need the value of a certain key named SUB)
-b boundary mid/bid/date range of weibos (format: id:id between, :id before, id: after, id certain, : all)
-n name naming template (identifier: url, index, type, mid, bid, date, text, name, like "f-Strings" syntax)
-v download miaopai videos at the same time
-o overwrite existing files (skipping if exists for default)

✳How to get the value of SUB from browser (Chrome for example)

jump to https://m.weibo.cn and log in
inspect > Application > Cookies > https://m.weibo.cn
double click the SUB line and copy its value
paste it into terminal and run like -c <value>

weibopicdownloader's People

Contributors

Stargazers

Watchers

weibopicdownloader's Issues

Support more than 9 pictures

API返回的微博list是只包括9张图的，超过的不显示。
提供一个思路，从m.weibo.cn/status/***获取，就不开pull request了，毕竟我代码很丑陋…

fireattack@55ceea0

可以用这位测试 https://www.weibo.com/u/5629934501 最近发过几条9+图的微博。

爬取多用户图片

你好,现在我要爬取多个用户的话,思路应该怎么来走,或者说得到多用户的uid和id名后进行爬取图片,谢谢

可以将下载的图片分为每个微博一个目录存放吗

下载某博主的所有相册，好像是将所有图片放在一起，能不能弄一个选项使得可以将一条博客下的图片放在一个目录中，目录名字从博客中提取。

程序执行完后没有自动退出

选择的目录为 /root/weipic
该目录不存在, 是否创建?(Y/n): y
目录已创建
请输入要下载的账号类型:
[1]用户ID [2]用户昵称
(1/2): 1
请输入用户ID: 6033924165
分析微博中... 120
分析完毕, 微博总数 125, 实际获得 120
图片数量 319
设置下载线程数(1-20): 20
已处理 319/319, 下载失败 0/319全部完成
下载结束, 路径是 /root/weipic/6033924165

^CTraceback (most recent call last):
  File "weiboPicDownloader.py", line 217, in <module>
    main()
  File "weiboPicDownloader.py", line 213, in main
    sys.stdin.read()
KeyboardInterrupt

HD Video Quality ?

https://weibo.com/5765347158/HjFhXiv82
When i check, it only download low quality video, can it download highest video quality available ? thank

Ctrl+C的优雅终止?

如何在KeyboardInterrupt时终止所有下载线程
并将下载到一半的不完整文件删除(因下次下载检测到文件存在将直接跳过)

futue.cancel()无法结束running状态的线程

Python: concurrent.futures How to make it cancelable?
https://stackoverflow.com/questions/42782953/python-concurrent-futures-how-to-make-it-cancelable
Is there a way to stop a running process in concurrent.futures?
https://stackoverflow.com/questions/16050212/is-there-a-way-to-stop-a-running-process-in-concurrent-futures
How do you kill Futures once they have started?
https://stackoverflow.com/questions/29177490/how-do-you-kill-futures-once-they-have-started

解决方案1: 线程内检查全局变量
解决方案2: 关闭线程池并手动杀死子线程

有没有更加优雅的方式呢？

cannot download all images？

hi,I run your code, and I found it can only download part of albums.Is that my setting problem?

json.decoder.JSONDecodeError

使用 -f 下载多个用户的资源时，会出现这个错误

62/408 Thu May  3 01:20:38 2018
SNH48-陈韫凌 5681003434
analysing weibos... 319/577Traceback (most recent call last):
  File ".\weiboPicDownloader.py", line 263, in <module>
    urls = get_urls(uid,args.video)
  File ".\weiboPicDownloader.py", line 170, in get_urls
    json_data = json.loads(response.text)
  File "D:\myprograms\p3\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "D:\myprograms\p3\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "D:\myprograms\p3\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

使用的commit:

commit b1d94af18a90d18c454c0f28bbcccafb6eb08af6 (HEAD -> dev, origin/dev, origin/HEAD)
Author: nondanee <[email protected]>
Date:   Tue May 1 23:56:18 2018 +0800

    back to old loop query logic #15

env

windows 10
Python 3.6.4

error

$ python weiboPicDownloader.py -h
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 6, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

下载多长时间超时？

网速比较慢的地方，大尺寸图片或视频经常下载失败。
某些资源比较差，下载速度只有20KB/s，下载下来需要一分钟。

socket.timeout: The read operation timed out

------------------------------
201/408 Wed May  9 22:00:47 2018
GNZ48-肖文铃 5885954688
finish analysis 381/405
practically get 381 weibos, 1631 medias
downloading... 1628/1631(99%)Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 384, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.6/http/client.py", line 449, in read
    n = self.readinto(b)
  File "/usr/lib/python3.6/http/client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.6/ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.6/ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 436, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 401, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 307, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='us.sinaimg.cn', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 206, in download
    for chunk in response.iter_content(chunk_size=512):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 752, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='us.sinaimg.cn', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 288, in <module>
    if task.result() == False:
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "weiboPicDownloader.py", line 211, in download
    os.remove(file_path)
FileNotFoundError: [Errno 2] No such file or directory: 'weiboPic/GNZ48-肖文铃/0015cUNQjx073hcGSDFm010401000ZXJ0k01.mp4'

Can`t download the latest photos.

I run this program in a NAS, it's basically working properly, and I start it regularly to download the latest pictures.

But there is a problem, a Weibo account with a lot of pictures (more than 3165), I tried the-f or-u parameter, can not download the latest photos, only the-u-b parameter can be downloaded properly. This problem did not occur when I first started using it a few months ago, but I also had the same problem testing with older versions of the program.

Is there a certain number of restrictions in this program or Weibo site settings that cause this problem?

Other accounts downloaded in turn using the-f parameter are normal.

The error log is usually like this:

�[Kanalysing weibos... 25/756(#1)
�[Kanalysing weibos... 49/756(#2)
�[Kanalysing weibos... 72/756(#3)
�[Kanalysing weibos... 95/756(#4)
�[Kanalysing weibos... 120/756(#5)
�[Kanalysing weibos... 145/756(#6)
�[Kanalysing weibos... 170/756(#7)
�[Kanalysing weibos... 195/756(#8)
�[Kanalysing weibos... 219/756(#9)
�[Kanalysing weibos... 244/756(#10)
�[Kanalysing weibos... 269/756(#11)
�[Kanalysing weibos... 294/756(#12)
�[Kanalysing weibos... 319/756(#13)
�[Kanalysing weibos... 344/756(#14)
�[Kanalysing weibos... 369/756(#15)
�[Kanalysing weibos... 393/756(#16)
�[Kanalysing weibos... 418/756(#17)
�[Kanalysing weibos... 443/756(#18)
�[Kanalysing weibos... 468/756(#19)
�[Kanalysing weibos... 493/756(#20)
�[Kanalysing weibos... 518/756(#21)
�[Kanalysing weibos... 542/756(#22)
�[Kanalysing weibos... 567/756(#23)
�[Kanalysing weibos... 589/756(#24)
�[Kanalysing weibos... 614/756(#25)
�[Kanalysing weibos... 639/756(#26)
�[Kanalysing weibos... 664/756(#27)
�[Kanalysing weibos... 688/756(#28)
�[Kanalysing weibos... 712/756(#29)
�[Kanalysing weibos... 726/756(#30)
�[Kfinish analysis 726/756(#31)
practically scan 726 weibos, get 3221 resources

�[Kall tasks done 3221/3221(100%)
success 3220, failure 1, total 3221
automatic retry 1

�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kall tasks done 1/1(100%)
success 0, failure 1, total 1
automatic retry 2

�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kall tasks done 1/1(100%)
success 0, failure 1, total 1
https://wx4.sinaimg.cn/large/0065Jim5ly1g8cz10ucx0j30u01400zo.jpg failed

By the way, I can access the links that are usually displayed.

Thank you very much for your creation.

卡在analysing weibos

运行程序后卡在analysing weibos... 957/94738 一动不动
也试过把cookie加上去，效果差不多
貌似这种微博数量很多的用户都不能正常爬取相册，请问是什么原因呢？

微博个数异常

没有分析到微博时，缺少分割线“----------------------”

2.微博个数异常，总共82个微博，分析出83个微博

-CKG48-李恩锐 6195676604
finish analysis 83/82                   
practically get 83 weibos, 349 pictures
all tasks done 349/349(100%)            
successfull 349, failed 0, total 349

3.为什么大多数情况下分析出来的微博数比总微博数少？

OSError: [Errno 5] Input/output error

------------------------------
30/408 Tue May  8 09:51:27 2018
SNH48-刘菊子 6020769478
finish analysis 515/526
practically get 515 weibos, 1790 medias
downloading... 0/1790(0%)Traceback (most recent call last):
  File "weiboPicDownloader.py", line 288, in <module>
    if task.result() == False:
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "weiboPicDownloader.py", line 204, in download
    f = open(file_path,"wb")
OSError: [Errno 5] Input/output error: 'weiboPic/SNH48-刘菊子/006zsxb8gy1fqz7ab5617j33402c0b29.jpg'

ommit 16e997f765c3d72402a32980c12bdbf9638c98d5

Corrupted

https://imgur.com/a/WnhI5fg
16/801 pictures corrupted, can't download 100%

Log : successfull 801, failed 1, total 802

Thank you for tool :)

不能识别以“-”开头的微博昵称

微博昵称规则：设置或修改的昵称，请在4-30个字符，支持中英文、数字、“_”和减号。
微博昵称可以包含减号，以减号开头的微博昵称无法识别

root@ubuntu:~/workspace/rep/weiboPicDownloader# python weiboPicDownloader.py -u -CKG48-曾佳  -v -s 5
usage: weiboPicDownloader [-h] [-u user] [-us users [users ...]] [-f file]
                          [-d directory] [-s size] [-v] [-o]
weiboPicDownloader: error: argument -u: expected one argument

Download media by keywords

Using https://s.weibo.com/, I want to download all post with keyword 'apple' or tag apple. Or all post with keyword/tag apple from specific user. Is this possible?

增加打印信息

1.下载每个人微博之前打印当前时间。
2.下载每个人微博之前打印当前进度，比如 2/33,总共需要下载33个，当前正在下载第2个人的。
3.用户不存在，输出的UID异常

root@ubuntu:~/workspace/rep/weiboPicDownloader# python weiboPicDownloader.py -u CKG-曾佳  -v -s 5
CKG-曾佳 E%E4%BD%B3
finish analysis 0/0                     
practically get 0 weibos, 0 medias
bye bye

建议新增一个参数，设置请求间隔

为了应对反爬，建议新增一个参数，设置请求间隔。
我的网络环境1秒不行，至少3秒。

[bug] error when run in native english system

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 221, in <module>
    main()
  File "weiboPicDownloader.py", line 131, in main
    home_path = os.path.realpath(raw_input_fit("Φ»╖Φ╛ôσàÑσ¢╛τëçσ¡ÿσé¿Σ╜ìτ╜«: "))
  File "weiboPicDownloader.py", line 31, in raw_input_fit
    prompt = string.decode("utf-8").encode(sys.stdin.encoding or locale.getpreferredencoding(True))
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>

I translated the prompts langage in the code to English and it now run perfectly fine.

效果不错

使用"-f uid.txt" , "-s 3"下载了300多个人的图片，总共30多万张，没有假死或退出的情况。 👍

cant download as date

I'am using windows with non english language pack installed as secondary language. When I tried with option -n {date:%Y%m%d}-{name}, the '%' disappear and just appeard as -n {date:md}-{name} in cmd command box. How do I fix that? Thank you.

无法运行

最新版无法运行，之前都OK的。如下提示：
File "weiboPicDownloader.py", line 7

^
SyntaxError: invalid syntax

Feature request: download by latest successful try

Could you make the script only parse and download data by latest successful download (by date)? (ex: latest successful download: 20190606, downloaded data 100% successful). Since they are a lot of unused images and videos from target account, I don't want to keep repeating deleting them. It also will save parsing time from account with large amounts of images and videos. Thank you :)

Support date format for boundary option

Hi, thanks for bid support but I think it's still too much hassle because I must open the page and check the unique bid id. How about simpler solution, support for -b 'date format'. Example: ' -b 20190606: '. Download only posts after 2019 06 06 date. Thanks!

TypeError: expected string or bytes-like object

xxx@lxx:/media/yingpanhe/weiboPicDownloader$ python3 weiboPicDownloader.py -f macs.txt -v -r 5
1/439 Mon Oct 22 18:34:36 2018
SNH48-李宇琪 3050792913
finish analysis 1136/1263
practically get 1136 weibos, 2542 medias
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 281, in <module>
    file_name = re.sub(r"^\S+/","",url)
  File "/usr/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

每次下载都下不完全

大佬好，首先感谢你的代码。
我发现每次用这个代码下某一用户的微博图都下不完全，发现主要原因是微博浏览的不全。
有些明明发了600多条微博，我这最终只能抓500条，有的发了1500条，我这只能practically900多条，请问这个有具体解决办法吗。
非常感谢

Can we save images with Date?

Hello.
I know that this program only downloads weibo picture files with native raw file name, but I want to save the name with date(Ex. 20190606_6b084396gw1f8ta9obw9gj20rsbnve84.jpg)
Is this possible? Thanks.

有的微博无法抓取，返回空数据

比如滚叔街拍
https://m.weibo.cn/api/container/getIndex?count=25&page=1&containerid=1076035374563919
返回

{
  "ok": 0,
  "msg": "这里还没有内容",
  "data": {
    "cards": []
  }
}

Logic of platform.version() >= '10.0.14393' is wrong

platform.version()
'6.1.7601'
platform.version() >= '10.0.14393'
True

This results in colorama not properly loading in Win 7, thus sys.stdout.write('\r\033[K') doesn't work.

增不增加cookie对下载结果有影响吗？

增加cookie可以下载到更多的内容吗？是否每次下载都要重新获取cookie?

support for video ?

下载视频失败

下载这种链接的视频会失败，手动wget可以下载成功

GNZ48-郑丹妮- 5887697249
finish analysis 532/623                 
practically get 532 weibos, 2165 medias
all tasks done 2165/2165(100%)          
successfull 2160, failed 5, total 2165
automatic retry 1
all tasks done 5/5(100%)                
successfull 3, failed 2, total 5
automatic retry 2
all tasks done 2/2(100%)                
successfull 1, failed 1, total 2
https://us.sinaimg.cn/002z8kSbjx071Tm4lC8n010401001eTP0k01.mp4?KID=unistore,video&Expires=1521169058&ssig=GH9wSHNDc0&KID=unistore,video failed

Adding at most 18 images support for single blog

HI there,

Weibo has supported to allow users upload at most 18 images for a single blog, I checked with the latest code, seems still does not support this?

Could you help adding this feature support as well? Thanks!

如果目标文件图片已经存在，不重复下载

可否再添加一个参数，判断图片是否存在，如果已经存在就不用下载。
这样可以快速更新。

just getting half album while run program ?

it's because weibo using paging on their timeline ?

出现错误KeyError: 'cardlistInfo'

用UID文件下载，其中一个UID出现如下错误

7/377 Tue May  1 22:57:03 2018
BEJ48-李烨 6206250993
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 264, in <module>
    urls = get_urls(uid,args.video)
  File "weiboPicDownloader.py", line 171, in get_urls
    if total == -1: total = json_data["data"]["cardlistInfo"]["total"]
KeyError: 'cardlistInfo'

commit：

commit 113c8ead9157f5c0d388c3b07ba9f5f58c38e1fc
Author: nondanee <[email protected]>
Date:   Mon Apr 30 14:29:16 2018 +0800

    optimization: empty sequences(explicit is better than implicit)

Hi, i got some questions

Can you tell me in detail how to rename the file? I did not understand f-string. What should I enter after -n?

关于重命名，老兄可以开放个option来支持一下图片的重命名吗？

如题。
用微博本身的文件名看着很不舒服啊，如果可以时间和微博描述来做文件名的话就好了，这样方便管理。

某一个用户无法爬取

invalid account

python weibopicdownloader.py -u 3156359295
1/1 Tue Jan 22 15:47:57 2019
invalid account 3156359295

bye bye

old version is stucked on analysis so i just downloaded newer version(2months ago) and now it have this problem. So i just run it without python and then rerun it with python ,now it work but still stucked in analysing xxx/xxxx, i dont know why

Thank

能否支持批量下载？

比如添加参数或者添加配置文件。

一次最多能获取多少条微博？

现在一次获取24或25条。

可以设置下载某个时段的微博吗？

比如我想下载最近两个月的图片？这个怎么实现？

weiboPicDownloader-win32 不支持下载视频

是否考虑支持下载用户的秒拍视频

我觉得可以

Adding feature of pulling image/video from a specific date on

Hi there,

Thank you so much for maintaining this repository, which is really convenient.
However, it has to download all images for a userID and every time I have to check them one by one to see which one is newly added. Can you please help add one more option, which can download images/videos from a specific date on, and of course if not specified, it download all of them from the beginning.

Thanks again!