GithubHelp home page GithubHelp logo

weibopicdownloader's Introduction

weiboPicDownloader

(not real) weibo user album batch download tool (CLI)

build user album by picking all photos from original weibos in user's post feed

for more weibo free login APIs, turn to wiki

中文 README

References

yAnXImIN/weiboPicDownloader

ningshu/weiboPicDownloader

Overview

Dependencies

$ pip install requests
$ pip install colorama # only windows version under 10.0.14393 required
$ pip install futures # only python2 environment required

Usage

$ python .\weiboPicDownloader.py -h
usage: weiboPicDownloader [-h] (-u user [user ...] | -f file [file ...])
                          [-d directory] [-s size] [-r retry] [-i interval]
                          [-c cookie] [-b boundary] [-n name] [-v] [-o]

optional arguments:
  -h, --help          show this help message and exit
  -u user [user ...]  specify nickname or id of weibo users
  -f file [file ...]  import list of users from files
  -d directory        set picture saving path
  -s size             set size of thread pool
  -r retry            set maximum number of retries
  -i interval         set interval for feed requests
  -c cookie           set cookie if needed
  -b boundary         focus on weibos in the id range
  -n name             customize naming format
  -v                  download videos together
  -o                  overwrite existing files

Required argument (choose one)

  • -u user ... users (nickname or id)
  • -f file ... user list files (nickname or id, separated by linefeed in the file)

Optional arguments

  • -d directory media saving path (default value: ./weiboPic)
  • -s size thread pool size (default value: 20)
  • -r retry max retries (default value: 2)
  • -i interval request interval (default value: 1, unit: second)
  • -c cookie login credential (only need the value of a certain key named SUB)
  • -b boundary mid/bid/date range of weibos (format: id:id between, :id before, id: after, id certain, : all)
  • -n name naming template (identifier: url, index, type, mid, bid, date, text, name, like "f-Strings" syntax)
  • -v download miaopai videos at the same time
  • -o overwrite existing files (skipping if exists for default)

✳How to get the value of SUB from browser (Chrome for example)

  1. jump to https://m.weibo.cn and log in
  2. inspect > Application > Cookies > https://m.weibo.cn
  3. double click the SUB line and copy its value
  4. paste it into terminal and run like -c <value>

weibopicdownloader's People

Contributors

fireattack avatar nondanee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weibopicdownloader's Issues

爬取多用户图片

你好,现在我要爬取多个用户的话,思路应该怎么来走,或者说得到多用户的uid和id名后进行爬取图片,谢谢

程序执行完后没有自动退出

选择的目录为 /root/weipic
该目录不存在, 是否创建?(Y/n): y
目录已创建
请输入要下载的账号类型:
[1]用户ID [2]用户昵称
(1/2): 1
请输入用户ID: 6033924165
分析微博中... 120
分析完毕, 微博总数 125, 实际获得 120
图片数量 319
设置下载线程数(1-20): 20
已处理 319/319, 下载失败 0/319全部完成
下载结束, 路径是 /root/weipic/6033924165

^CTraceback (most recent call last):
  File "weiboPicDownloader.py", line 217, in <module>
    main()
  File "weiboPicDownloader.py", line 213, in main
    sys.stdin.read()
KeyboardInterrupt

Ctrl+C的优雅终止?

如何在KeyboardInterrupt时终止所有下载线程
并将下载到一半的不完整文件删除(因下次下载检测到文件存在将直接跳过)

futue.cancel()无法结束running状态的线程

Python: concurrent.futures How to make it cancelable?
https://stackoverflow.com/questions/42782953/python-concurrent-futures-how-to-make-it-cancelable
Is there a way to stop a running process in concurrent.futures?
https://stackoverflow.com/questions/16050212/is-there-a-way-to-stop-a-running-process-in-concurrent-futures
How do you kill Futures once they have started?
https://stackoverflow.com/questions/29177490/how-do-you-kill-futures-once-they-have-started

解决方案1: 线程内检查全局变量
解决方案2: 关闭线程池并手动杀死子线程

有没有更加优雅的方式呢?

json.decoder.JSONDecodeError

使用 -f 下载多个用户的资源时,会出现这个错误

62/408 Thu May  3 01:20:38 2018
SNH48-陈韫凌 5681003434
analysing weibos... 319/577Traceback (most recent call last):
  File ".\weiboPicDownloader.py", line 263, in <module>
    urls = get_urls(uid,args.video)
  File ".\weiboPicDownloader.py", line 170, in get_urls
    json_data = json.loads(response.text)
  File "D:\myprograms\p3\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "D:\myprograms\p3\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "D:\myprograms\p3\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

使用的commit:

commit b1d94af18a90d18c454c0f28bbcccafb6eb08af6 (HEAD -> dev, origin/dev, origin/HEAD)
Author: nondanee <[email protected]>
Date:   Tue May 1 23:56:18 2018 +0800

    back to old loop query logic #15

No module named 'requests'

env

windows 10
Python 3.6.4

error

$ python weiboPicDownloader.py -h
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 6, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

下载多长时间超时?

网速比较慢的地方,大尺寸图片或视频经常下载失败。
某些资源比较差,下载速度只有20KB/s,下载下来需要一分钟。

socket.timeout: The read operation timed out

------------------------------
201/408 Wed May  9 22:00:47 2018
GNZ48-肖文铃 5885954688
finish analysis 381/405
practically get 381 weibos, 1631 medias
downloading... 1628/1631(99%)Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 384, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.6/http/client.py", line 449, in read
    n = self.readinto(b)
  File "/usr/lib/python3.6/http/client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.6/ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.6/ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 745, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 436, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 401, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 307, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='us.sinaimg.cn', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 206, in download
    for chunk in response.iter_content(chunk_size=512):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 752, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='us.sinaimg.cn', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 288, in <module>
    if task.result() == False:
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "weiboPicDownloader.py", line 211, in download
    os.remove(file_path)
FileNotFoundError: [Errno 2] No such file or directory: 'weiboPic/GNZ48-肖文铃/0015cUNQjx073hcGSDFm010401000ZXJ0k01.mp4'

Can`t download the latest photos.

I run this program in a NAS, it's basically working properly, and I start it regularly to download the latest pictures.

But there is a problem, a Weibo account with a lot of pictures (more than 3165), I tried the-f or-u parameter, can not download the latest photos, only the-u-b parameter can be downloaded properly. This problem did not occur when I first started using it a few months ago, but I also had the same problem testing with older versions of the program.

Is there a certain number of restrictions in this program or Weibo site settings that cause this problem?

Other accounts downloaded in turn using the-f parameter are normal.

The error log is usually like this:

�[Kanalysing weibos... 25/756(#1)
�[Kanalysing weibos... 49/756(#2)
�[Kanalysing weibos... 72/756(#3)
�[Kanalysing weibos... 95/756(#4)
�[Kanalysing weibos... 120/756(#5)
�[Kanalysing weibos... 145/756(#6)
�[Kanalysing weibos... 170/756(#7)
�[Kanalysing weibos... 195/756(#8)
�[Kanalysing weibos... 219/756(#9)
�[Kanalysing weibos... 244/756(#10)
�[Kanalysing weibos... 269/756(#11)
�[Kanalysing weibos... 294/756(#12)
�[Kanalysing weibos... 319/756(#13)
�[Kanalysing weibos... 344/756(#14)
�[Kanalysing weibos... 369/756(#15)
�[Kanalysing weibos... 393/756(#16)
�[Kanalysing weibos... 418/756(#17)
�[Kanalysing weibos... 443/756(#18)
�[Kanalysing weibos... 468/756(#19)
�[Kanalysing weibos... 493/756(#20)
�[Kanalysing weibos... 518/756(#21)
�[Kanalysing weibos... 542/756(#22)
�[Kanalysing weibos... 567/756(#23)
�[Kanalysing weibos... 589/756(#24)
�[Kanalysing weibos... 614/756(#25)
�[Kanalysing weibos... 639/756(#26)
�[Kanalysing weibos... 664/756(#27)
�[Kanalysing weibos... 688/756(#28)
�[Kanalysing weibos... 712/756(#29)
�[Kanalysing weibos... 726/756(#30)
�[Kfinish analysis 726/756(#31)
practically scan 726 weibos, get 3221 resources

�[Kall tasks done 3221/3221(100%)
success 3220, failure 1, total 3221
automatic retry 1

�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kall tasks done 1/1(100%)
success 0, failure 1, total 1
automatic retry 2

�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kdownloading... 0/1(0%)
�[Kall tasks done 1/1(100%)
success 0, failure 1, total 1
https://wx4.sinaimg.cn/large/0065Jim5ly1g8cz10ucx0j30u01400zo.jpg failed

By the way, I can access the links that are usually displayed.

Thank you very much for your creation.

卡在analysing weibos

运行程序后卡在analysing weibos... 957/94738 一动不动
也试过把cookie加上去,效果差不多
貌似这种微博数量很多的用户都不能正常爬取相册,请问是什么原因呢?

微博个数异常

  1. 没有分析到微博时,缺少分割线“----------------------”

2.微博个数异常,总共82个微博,分析出83个微博

-CKG48-李恩锐 6195676604
finish analysis 83/82                   
practically get 83 weibos, 349 pictures
all tasks done 349/349(100%)            
successfull 349, failed 0, total 349

3.为什么大多数情况下分析出来的微博数比总微博数少?

OSError: [Errno 5] Input/output error

------------------------------
30/408 Tue May  8 09:51:27 2018
SNH48-刘菊子 6020769478
finish analysis 515/526
practically get 515 weibos, 1790 medias
downloading... 0/1790(0%)Traceback (most recent call last):
  File "weiboPicDownloader.py", line 288, in <module>
    if task.result() == False:
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "weiboPicDownloader.py", line 204, in download
    f = open(file_path,"wb")
OSError: [Errno 5] Input/output error: 'weiboPic/SNH48-刘菊子/006zsxb8gy1fqz7ab5617j33402c0b29.jpg'
ommit 16e997f765c3d72402a32980c12bdbf9638c98d5

不能识别以“-”开头的微博昵称

微博昵称规则:设置或修改的昵称,请在4-30个字符,支持中英文、数字、“_”和减号。
微博昵称可以包含减号,以减号开头的微博昵称无法识别

root@ubuntu:~/workspace/rep/weiboPicDownloader# python weiboPicDownloader.py -u -CKG48-曾佳  -v -s 5
usage: weiboPicDownloader [-h] [-u user] [-us users [users ...]] [-f file]
                          [-d directory] [-s size] [-v] [-o]
weiboPicDownloader: error: argument -u: expected one argument

增加打印信息

1.下载每个人微博之前打印当前时间。
2.下载每个人微博之前打印当前进度,比如 2/33,总共需要下载33个,当前正在下载第2个人的。
3.用户不存在,输出的UID异常

root@ubuntu:~/workspace/rep/weiboPicDownloader# python weiboPicDownloader.py -u CKG-曾佳  -v -s 5
CKG-曾佳 E%E4%BD%B3
finish analysis 0/0                     
practically get 0 weibos, 0 medias
bye bye

[bug] error when run in native english system

Traceback (most recent call last):
  File "weiboPicDownloader.py", line 221, in <module>
    main()
  File "weiboPicDownloader.py", line 131, in main
    home_path = os.path.realpath(raw_input_fit("请输入图片存储位置: "))
  File "weiboPicDownloader.py", line 31, in raw_input_fit
    prompt = string.decode("utf-8").encode(sys.stdin.encoding or locale.getpreferredencoding(True))
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>

I translated the prompts langage in the code to English and it now run perfectly fine.

效果不错

使用"-f uid.txt" , "-s 3"下载了300多个人的图片,总共30多万张,没有假死或退出的情况。 👍

cant download as date

I'am using windows with non english language pack installed as secondary language. When I tried with option -n {date:%Y%m%d}-{name}, the '%' disappear and just appeard as -n {date:md}-{name} in cmd command box. How do I fix that? Thank you.

无法运行

最新版无法运行,之前都OK的。如下提示:
File "weiboPicDownloader.py", line 7

^
SyntaxError: invalid syntax

Feature request: download by latest successful try

Could you make the script only parse and download data by latest successful download (by date)? (ex: latest successful download: 20190606, downloaded data 100% successful). Since they are a lot of unused images and videos from target account, I don't want to keep repeating deleting them. It also will save parsing time from account with large amounts of images and videos. Thank you :)

Support date format for boundary option

Hi, thanks for bid support but I think it's still too much hassle because I must open the page and check the unique bid id. How about simpler solution, support for -b 'date format'. Example: ' -b 20190606: '. Download only posts after 2019 06 06 date. Thanks!

TypeError: expected string or bytes-like object

xxx@lxx:/media/yingpanhe/weiboPicDownloader$ python3 weiboPicDownloader.py -f macs.txt -v -r 5
1/439 Mon Oct 22 18:34:36 2018
SNH48-李宇琪 3050792913
finish analysis 1136/1263
practically get 1136 weibos, 2542 medias
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 281, in <module>
    file_name = re.sub(r"^\S+/","",url)
  File "/usr/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

每次下载都下不完全

大佬好,首先感谢你的代码。
我发现每次用这个代码下某一用户的微博图都下不完全,发现主要原因是微博浏览的不全。
有些明明发了600多条微博,我这最终只能抓500条,有的发了1500条,我这只能practically900多条, 请问这个有具体解决办法吗。
非常感谢

Can we save images with Date?

Hello.
I know that this program only downloads weibo picture files with native raw file name, but I want to save the name with date(Ex. 20190606_6b084396gw1f8ta9obw9gj20rsbnve84.jpg)
Is this possible? Thanks.

下载视频失败

下载这种链接的视频会失败,手动wget可以下载成功

GNZ48-郑丹妮- 5887697249
finish analysis 532/623                 
practically get 532 weibos, 2165 medias
all tasks done 2165/2165(100%)          
successfull 2160, failed 5, total 2165
automatic retry 1
all tasks done 5/5(100%)                
successfull 3, failed 2, total 5
automatic retry 2
all tasks done 2/2(100%)                
successfull 1, failed 1, total 2
https://us.sinaimg.cn/002z8kSbjx071Tm4lC8n010401001eTP0k01.mp4?KID=unistore,video&Expires=1521169058&ssig=GH9wSHNDc0&KID=unistore,video failed

Adding at most 18 images support for single blog

HI there,

Weibo has supported to allow users upload at most 18 images for a single blog, I checked with the latest code, seems still does not support this?

Could you help adding this feature support as well? Thanks!

出现错误KeyError: 'cardlistInfo'

用UID文件下载,其中一个UID出现如下错误

7/377 Tue May  1 22:57:03 2018
BEJ48-李烨 6206250993
Traceback (most recent call last):
  File "weiboPicDownloader.py", line 264, in <module>
    urls = get_urls(uid,args.video)
  File "weiboPicDownloader.py", line 171, in get_urls
    if total == -1: total = json_data["data"]["cardlistInfo"]["total"]
KeyError: 'cardlistInfo'

commit:

commit 113c8ead9157f5c0d388c3b07ba9f5f58c38e1fc
Author: nondanee <[email protected]>
Date:   Mon Apr 30 14:29:16 2018 +0800

    optimization: empty sequences(explicit is better than implicit)

Hi, i got some questions

Can you tell me in detail how to rename the file? I did not understand f-string. What should I enter after -n?

invalid account

python weibopicdownloader.py -u 3156359295
1/1 Tue Jan 22 15:47:57 2019
invalid account 3156359295

bye bye

old version is stucked on analysis so i just downloaded newer version(2months ago) and now it have this problem. So i just run it without python and then rerun it with python ,now it work but still stucked in analysing xxx/xxxx, i dont know why

Thank

Adding feature of pulling image/video from a specific date on

Hi there,

Thank you so much for maintaining this repository, which is really convenient.
However, it has to download all images for a userID and every time I have to check them one by one to see which one is newly added. Can you please help add one more option, which can download images/videos from a specific date on, and of course if not specified, it download all of them from the beginning.

Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.