qianyantech / image-downloader Goto Github PK
View Code? Open in Web Editor NEWDownload images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
License: MIT License
Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
License: MIT License
Hello, I met this error when running the GUI program.
Fail: http://g.hiphotos.baidu.com/zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (MaxRetryError("HTTPConnectionPool(host='g.hiphotos.baidu.com', port=80): Max retries exceeded with url: /zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000361E208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.mkwd.gov.ph/new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (MaxRetryError("HTTPConnectionPool(host='www.mkwd.gov.ph', port=80): Max retries exceeded with url: /new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000059F2A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.micronbot.com/usr/uploads/2017/03/704946529.png (MaxRetryError("HTTPConnectionPool(host='www.micronbot.com', port=80): Max retries exceeded with url: /usr/uploads/2017/03/704946529.png (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000003610A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
When using Bing search in chrome, I set the window size to (4000, 3000), it show above 500 more images.
However, no matter what I do to change the window size or using script to scroll down the window, the code cound only crawl about 200 images.
Anybody know how to solve this?
Really thanks for your great work! However, recently I have met a strange problem like the following.
Any suggestions would help!
Keywords: 杨凡小时候
Number: 100
Face Only: False
Safe Mode: True
Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E6%9D%A8%E5%87%A1%E5%B0%8F%E6%97%B6%E5%80%99
Traceback (most recent call last):
File "Myimage_downloader.py", line 143, in
main(sys.argv[1:])
File "Myimage_downloader.py", line 76, in main
browser=args.driver)
File "Image-Downloader-master\crawler.py", line 326, in crawl_image_urls
proxy=proxy, proxy_type=proxy_type)
File "Image-Downloader-master\crawler.py", line 207, in baidu_get_image_url_using_api
total_num = init_json['listNum']
KeyError: 'listNum'
您好,谢谢。
GUI输出的结果如下:
[17:40:19] -e Google -n 20 -j 50 -o "./download_images/龙猫" -S "龙猫"
[17:40:19] Scraping From Google Image Search ...
[17:40:19] Keywords: 龙猫
[17:40:19] Number: 20
[17:40:19] Face Only: False
[17:40:19] Safe Mode: True
[17:40:19] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=%E9%BE%99%E7%8C%AB&safe=on
[17:40:19] Exception in thread Thread-7:
[17:40:19] Traceback (most recent call last):
[17:40:19] File "threading.py", line 914, in _bootstrap_inner
[17:40:19] File "threading.py", line 862, in run
[17:40:19] File "Image-Downloader-master\image_downloader.py", line 52, in main
[17:40:19] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[17:40:19] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 74, in start
[17:40:19] File "subprocess.py", line 640, in init
[17:40:19] File "subprocess.py", line 848, in _get_handles
[17:40:19] OSError: [WinError 6] 句柄无效。
[17:40:19] Exception ignored in:
[17:40:19] <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x000002335A2F50F0>>
[17:40:19] Traceback (most recent call last):
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 173, in del
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 145, in stop
[17:40:19] AttributeError
[17:40:19] :
[17:40:19] 'Service' object has no attribute 'process'
[17:40:20] stopped
搜了很多解决办法,没有用,所以来提问,也希望得到解决指导,可以帮助到其他也遇到我这样问题的朋友
最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)
Originally posted by @ald2004 in #29 (comment)
File "/Users/moo/project/01_yanglian/Image-Downloader/crawler.py", line 27, in
dcap = dict(DesiredCapabilities.PHANTOMJS)
AttributeError: type object 'DesiredCapabilities' has no attribute 'PHANTOMJS'
[18:33:35] -e Baidu -d chrome_headless -n 100 -j 50 -o "./download_images/dog" -S "dog"
[18:33:35] Scraping From Baidu Image Search ...
[18:33:35] Keywords: dog
[18:33:35] Number: 100
[18:33:35] Face Only: False
[18:33:35] Safe Mode: True
[18:33:35] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=dog
[18:33:38] Exception in thread Thread-1:
[18:33:38] Traceback (most recent call last):
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
[18:33:38] self.run()
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 865, in run
[18:33:38] self._target(*self._args, **self._kwargs)
[18:33:38] File "/home/yuran/project/Image-Downloader/image_downloader.py", line 54, in main
[18:33:38] browser=args.driver)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 315, in crawl_image_urls
[18:33:38] proxy=proxy, proxy_type=proxy_type)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 195, in baidu_get_image_url_using_api
[18:33:38] res = requests.get(init_url, proxies=proxies)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 75, in get
[18:33:38] return request('get', url, params=params, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 60, in request
[18:33:38] return session.request(method=method, url=url, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
[18:33:38] resp = self.send(prep, **send_kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in send
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 165, in resolve_redirects
[18:33:38] raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
[18:33:38] requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
[18:33:38] stopped
这是什么问题呢?
您好,博主,现在我已经把要下载的QT5和Python3都下载了,但是后面的执行和copy步骤看不懂,能否详细说明一下?
使用118版本的chrome,一直提示错误
[15:30:06] Checking Google Chrome and chromedriver ...
[15:30:07] WARNING:root:Can not find chromedriver for currently installed chrome version.
[15:30:07] Dependencies not resolved, exit.
[15:30:07] stopped
请问要运行此程序,是否需要指定版本的chrome
C:\Users\James\Desktop\articles\Image-Downloader>pip install PyQt5
Collecting PyQt5
Using cached PyQt5-5.15.9-cp37-abi3-win_amd64.whl (6.8 MB)
Collecting PyQt5-Qt5>=5.15.2
Using cached PyQt5_Qt5-5.15.2-py3-none-win_amd64.whl (50.1 MB)
Collecting PyQt5-sip<13,>=12.11
Using cached PyQt5_sip-12.11.1-cp39-cp39-win_amd64.whl (78 kB)
Installing collected packages: PyQt5-Qt5, PyQt5-sip, PyQt5
Successfully installed PyQt5-5.15.9 PyQt5-Qt5-5.15.2 PyQt5-sip-12.11.1
(yolo_classify) C:\Users\James\Desktop\articles\Image-Downloader>image_downloader_gui.py
Traceback (most recent call last):
File "C:\Users\James\Desktop\articles\Image-Downloader\image_downloader_gui.py", line 7, in
from mainwindow import MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\mainwindow.py", line 5, in
from ui_mainwindow import Ui_MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\ui_mainwindow.py", line 11, in
from PyQt5 import QtCore, QtGui, QtWidgets
ModuleNotFoundError: No module named 'PyQt5'
I just download and install the corresponding dependency.
Next, I revise the content in the main.py and run.
But I just stuck without responses as below:
sunner@sunner-All-Series:~/Save/Google-Image-Downloader/src$ python main.py
Scraping From Google Image Search ...
Keywords: orange
Number: 100
Face Only: Yes
Safe Mode: On
Query URL: https://www.google.com/search?tbm=isch&q=orange&tbs=itp:face&safe=on
The status keeps for several minutes until I do key interruption.
However, no error description appeared.
Is there any problem about my operation?
My environment is ubuntu 14.04, Python 2.7.6.
[10:07:57] -e Baidu -d chrome_headless -n 100 -j 50 -o "E:/zb/code/images/mouse" -S "mouse"
[10:07:57] Scraping From Baidu Image Search ...
[10:07:57] Keywords: mouse
[10:07:57] Number: 100
[10:07:57] Face Only: False
[10:07:57] Safe Mode: True
[10:07:57] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=mouse
[10:07:57] Exception in thread
[10:07:57] Thread-3
[10:07:57] :
[10:07:57] Traceback (most recent call last):
[10:07:57] File "D:\软件\python64\lib\threading.py", line 954, in _bootstrap_inner
[10:07:57] self.run()
[10:07:57] File "D:\软件\python64\lib\threading.py", line 892, in run
[10:07:57] self._target(*self._args, **self.kwargs)
[10:07:57] File "E:\zb\code\Image-Downloader\image_downloader.py", line 50, in main
[10:07:57] crawled_urls = crawler.crawl_image_urls(args.keywords,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 325, in crawl_image_urls
[10:07:57] image_urls = baidu_get_image_url_using_api(keywords, max_number=max_number, face_only=face_only,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 206, in baidu_get_image_url_using_api
[10:07:57] init_json = json.loads(res.text.replace(r"'", ""), encoding='utf-8', strict=False)
[10:07:57] File "D:\软件\python64\lib\json_init.py", line 359, in loads
[10:07:57] return cls(**kw).decode(s)
[10:07:57] TypeError
[10:07:57] :
[10:07:57] init() got an unexpected keyword argument 'encoding'
[10:07:57] stopped
when clicking "start" button, i encounter error as follows:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.0:1080 apple
Scraping From Google Image Search ...
Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.
== 0 out of 0 crawled images urls will be used.
Finished.`
I tried with GUI also. But it doesnt work. Please guid me.
[09:57:20] -e Baidu -n 100 -j 4 -o "./download_images/rain" -S "rain"
[09:57:20] Scraping From Baidu Image Search ...
[09:57:20] Keywords: rain
[09:57:20] Number: 100
[09:57:20] Face Only: False
[09:57:20] Safe Mode: True
[09:57:20] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=rain
[09:57:24] Exception in thread Thread-4:
[09:57:24] Traceback (most recent call last):
[09:57:24] File "threading.py", line 914, in _bootstrap_inner
[09:57:24] File "threading.py", line 862, in run
[09:57:24] File "Image-Downloader-master\image_downloader.py", line 52, in main
[09:57:24] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[09:57:24] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 96, in start
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 109, in assert_process_still_running
[09:57:24] selenium.common.exceptions.WebDriverException: Message: Service C:\Users\wyy\AppData\Local\Temp_MEI145722/bin/phantomjs.exe unexpectedly exited. Status code was: 4294967295
[09:57:25] stopped
你好,我装好后没有一个能够下载下来,请问是什么原因呢?
[14:54:36] Keywords: 货拉拉
[14:54:36] Number: 10
[14:54:36] Face Only: True
[14:54:36] Safe Mode: True
[14:54:36] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E8%B4%A7%E6%8B%89%E6%8B%89&face=1
[14:54:38] == 10 out of 10 crawled images urls will be used.
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/503d269759ee3d6d6a6a6c9a54166d224f4ade37.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/738b4710b912c8fc175f1d9ceb039245d6882137.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/b64543a98226cffc539816d9ae014a90f603ea2f.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/fd039245d688d43f5d067aa36a1ed21b0ef43b30.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/dbb44aed2e738bd4af8e5960b68b87d6277ff930.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/d4628535e5dde711047da8a7b0efce1b9d166130.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/d439b6003af33a8781c3eb4bd15c10385343b531.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/dbb44aed2e738bd4af8f5960b68b87d6277ff931.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/cf1b9d16fdfaaf51684d6aee9b5494eef01f7a31.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/0b46f21fbe096b63a7b6b5011b338744ebf8ac30.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] Finished.
[14:54:39] stopped
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
无法评价 ,打回去大改吧,半成品
现在是向下拉加载,可能数量远远大于现有所爬的??
No matter what keywords are used, it always says == 0 out of 0 crawled images urls will be used for Google and Bing engine. Only Baidu works. Any clue?
Hi,
Following error occcurs when I try to run the script.
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
Any help will be appreciate!
1.我制定了爬取数量后,为什么只能爬取一部分,比如我制定为10000张实际只能爬2000左右,并且我修改了你代码的制定上限,我手动搜关键词也并不只有2000左右?
目前,在linux系统个下,采用代理服务器的方式下载google图片,通过命令行不能够正确运行。
Hello.
Can you add support for https://yandex.ru/images/
In this search engine, censorship is not as strict as in Google.
And also less delete photos for copyright infringement.
Your prompt reply will be highly appreciated
[15:04:41] -e Google -d chrome_headless -n 100 -j 50 -o "./download_images/as" -S "as"
[15:04:41] Scraping From Google Image Search ...
[15:04:41] Keywords: as
[15:04:41] Number: 100
[15:04:41] Face Only: False
[15:04:41] Safe Mode: True
[15:04:41] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=as&safe=on
[15:04:41] Exception in thread Thread-1:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
[15:04:41] stdin=PIPE)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in init
[15:04:41] restore_signals, start_new_session)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
[15:04:41] startupinfo)
[15:04:41] FileNotFoundError: [WinError 2] A rendszer nem találja a megadott fájlt
[15:04:41] During handling of the above exception, another exception occurred:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner
[15:04:41] self.run()
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run
[15:04:41] self._target(*self._args, **self._kwargs)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\image_downloader.py", line 54, in main
[15:04:41] browser=args.driver)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\crawler.py", line 300, in crawl_image_urls
[15:04:41] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init
[15:04:41] self.service.start()
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
[15:04:41] os.path.basename(self.path), self.start_error_message)
[15:04:41] selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
[15:04:42] stopped
系统:win10,命令行模式,browser为chrome,无法使用代理服务器下载google图片。
Traceback:
Message: no such element: Unable to locate element: {"method":"id","selector":"smb"}
在此过程中网络配置正确,在响应的chrome窗口中可以看到打开google image链接并搜索到相应的图片。
能否帮忙看一下这个问题?
谢谢
[18:09:39] -e Google -d chrome_headless -n 10 -j 50 -o "./download_images/pear" "pear"
[18:09:39] Scraping From Google Image Search ...
[18:09:39] Keywords: pear
[18:09:39] Number: 10
[18:09:39] Face Only: False
[18:09:39] Safe Mode: False
[18:09:39] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=pear&safe=off
[18:09:42] Exception in thread Thread-1:
[18:09:42] Traceback (most recent call last):
[18:09:42] File "C:\Python37\lib\threading.py", line 917, in _bootstrap_inner
[18:09:42] self.run()
[18:09:42] File "C:\Python37\lib\threading.py", line 865, in run
[18:09:42] self._target(*self._args, **self._kwargs)
[18:09:42] File "C:\Python37\Image-Downloader\image_downloader.py", line 54, in main
[18:09:42] browser=args.driver)
[18:09:42] File "C:\Python37\Image-Downloader\crawler.py", line 282, in crawl_image_urls
[18:09:42] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[18:09:42] File "C:\Python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 81, in __init__
[18:09:42] desired_capabilities=desired_capabilities)
[18:09:42] File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
[18:09:42] self.start_session(capabilities, browser_profile)
[18:09:42] File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
[18:09:42] response = self.execute(Command.NEW_SESSION, parameters)
[18:09:42] File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
[18:09:42] self.error_handler.check_response(response)
[18:09:42] File "C:\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
[18:09:42] raise exception_class(message, screen, stacktrace)
[18:09:42] selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 83
[18:09:43] stopped
I already downloaded all the requirements. I'm using Python3.7.2 and ChromeDriver 81.
GUI模式下选择百度时,无法勾选爬取模式
Can not find chromedriver for currently installed chrome version
支持mac吗
$ python3 image_downloader.py "stroller on street" -e Google
Scraping From Google Image Search ...
Keywords: stroller on street
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=stroller%20on%20street&safe=off
== 0 out of 0 crawled images urls will be used.
Finished.
it works with Bing and Baidu, but the search results with this two engines are not satisfied, how can I fix this? Thanks!
在代码部署调试的时候,一直报这个错误,
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
查了一些资料,需要添加几个参数
“–no-sandbox”参数是让Chrome在root权限下跑
修改文件项目文件 crawler.py 330行左右,添加参数解决问题
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
When downloading from Google, i setup proxy as
proxy_type = "socks5"
proxy = "127.0.0.1:1080"
Download failed, error message is:
Do you know why? Thanks.
In google image search, you can click "tools" and then pick small, medium, or large resolution for the search results.
Is there a way to limit Image-Downloader to a particular resolution when doing a google image search?
Thanks!
刚开始用VPN, 无法爬取
然后我又开通了个硅谷的阿里服务器,2016server,还是无法爬取。
ping google和网页浏览google都没有问题
在浏览器里输入以下图片链接都是正常显示的,但是下载却不成功
环境:
Python 3.7.4
Mac
I would like to rename the download it files with the keywords.
Thanks! if you need any help let me know
Installation on Archlinux:
Linux archz800 4.9.56-1-lts #1 SMP Thu Oct 12 22:34:15 CEST 2017 x86_64 GNU/Linux
phantomjs Version : 2.1.1-8
Change line 9: image_downloader_gui.spec to:
datas=[("bin/phantomjs", "bin/")],
Then it works like this:
git clone https://github.com/sczhengyabin/Image-Downloader
mkvirtualenv -p python3 Image-Downloader
cd Image-Downloader
pip3 install PyQt5
pip3 install -r requirements.txt
pip3 install pyinstaller
pacman -S phantomjs
which phantomjs
cd /usr/bin/phantomjs
ln -s /usr/bin/phantomjs ~/Snakepit/Image-Downloader/bin/
pyinstaller image_downloader_gui.spec
Would it be possible to retrieve similar images by upload or specifying urls?
thanks!
NoneType' object has no attribute 'lower'
When I go to use the cmd version of Image Downloader I get
[0621/105054.701:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -200
i am using python image_downloader.py model_rocket --output=./images --max-number=1000 --num-threads=50 --engine=Google
to run.
About half of the images error out and don't download
When it finishes I get ## Fail: https://www.electroschematics.com/wp-content/uploads/2014/03/Rocket-Launch-Controller.png?fit=687%2C478 (ProtocolError('Connection aborted.', OSError("(10054, 'WSAECONNRESET')")),)
[16:58:11] -e Bing -d chrome -n 100 -j 50 -o "./download_images/man" -S "man"
[16:58:11] Scraping From Bing Image Search ...
[16:58:11] Keywords: man
[16:58:11] Number: 100
[16:58:11] Face Only: False
[16:58:11] Safe Mode: True
[16:58:11] Query URL: https://www.bing.com/images/search?&q=man&qft=
[16:58:12] Exception in thread
[16:58:12] Thread-1
[16:58:12] :
[16:58:12] Traceback (most recent call last):
[16:58:12] File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
[16:58:12] self.run()
[16:58:12] File "/usr/lib/python3.8/threading.py", line 870, in run
[16:58:12] self._target(*self._args, **self._kwargs)
[16:58:12] File "/mnt/e/ai/Image-Downloader/image_downloader.py", line 59, in main
[16:58:12] crawled_urls = crawler.crawl_image_urls(args.keywords,
[16:58:12] File "/mnt/e/ai/Image-Downloader/crawler.py", line 345, in crawl_image_urls
[16:58:12] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init
[16:58:12] self.service.start()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start
[16:58:12] self.assert_process_still_running()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
[16:58:12] raise WebDriverException(
[16:58:12] selenium.common.exceptions
[16:58:12] .
[16:58:12] WebDriverException
[16:58:12] :
[16:58:12] Message: Service ./bin/chromedriver unexpectedly exited. Status code was: 127
[16:58:13] stopped
[11:00:04] -e Baidu -n 100 -j 50 -o "./download_images/微笑" -S "微笑"
[11:00:04] Scraping From Baidu Image Search ...
[11:00:04] Keywords: 微笑
[11:00:04] Number: 100
[11:00:04] Face Only: False
[11:00:04] Safe Mode: True
[11:00:04] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E5%BE%AE%E7%AC%91
[11:01:05] Exception in thread
[11:01:05] Thread-1
[11:01:05] :
[11:01:05] Traceback (most recent call last):
[11:01:05] File "F:\python\python\lib\threading.py", line 973, in _bootstrap_inner
[11:01:05] self.run()
[11:01:05] File "F:\python\python\lib\threading.py", line 910, in run
[11:01:05] self._target(*self._args, **self._kwargs)
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\image_downloader.py", line 46, in main
[11:01:05] crawled_urls = crawler.crawl_image_urls(args.keywords,
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\crawler.py", line 160, in crawl_image_urls
[11:01:05] driver = webdriver.PhantomJS(executable_path=phantomjs_path,
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 51, in init
[11:01:05] self.service.start()
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 82, in start
[11:01:05] raise WebDriverException(
[11:01:05] selenium.common.exceptions
[11:01:05] .
[11:01:05] WebDriverException
[11:01:05] :
[11:01:05] Message: Can not connect to GhostDriver on port 14423
[11:01:05] stopped
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.