GithubHelp home page GithubHelp logo

qianyantech / image-downloader Goto Github PK

View Code? Open in Web Editor NEW
2.1K 2.1K 555.0 25.19 MB

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

License: MIT License

Python 100.00%
baidu bing google google-images image-downloader pyqt scrapy spider

image-downloader's People

Contributors

chicobentojr avatar cwchenwang avatar dependabot[bot] avatar jeffling avatar sczhengyabin avatar xingchen1224 avatar zchrissirhcz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

image-downloader's Issues

download image failed

Hello, I met this error when running the GUI program.

Fail: http://g.hiphotos.baidu.com/zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (MaxRetryError("HTTPConnectionPool(host='g.hiphotos.baidu.com', port=80): Max retries exceeded with url: /zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000361E208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.mkwd.gov.ph/new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (MaxRetryError("HTTPConnectionPool(host='www.mkwd.gov.ph', port=80): Max retries exceeded with url: /new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000059F2A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.micronbot.com/usr/uploads/2017/03/704946529.png (MaxRetryError("HTTPConnectionPool(host='www.micronbot.com', port=80): Max retries exceeded with url: /usr/uploads/2017/03/704946529.png (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000003610A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)

Bing Image Search problem.

When using Bing search in chrome, I set the window size to (4000, 3000), it show above 500 more images.
However, no matter what I do to change the window size or using script to scroll down the window, the code cound only crawl about 200 images.

Anybody know how to solve this?

Key error: 'listnum'

Really thanks for your great work! However, recently I have met a strange problem like the following.
Any suggestions would help!

Keywords: 杨凡小时候
Number: 100
Face Only: False
Safe Mode: True
Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E6%9D%A8%E5%87%A1%E5%B0%8F%E6%97%B6%E5%80%99
Traceback (most recent call last):
File "Myimage_downloader.py", line 143, in
main(sys.argv[1:])
File "Myimage_downloader.py", line 76, in main
browser=args.driver)
File "Image-Downloader-master\crawler.py", line 326, in crawl_image_urls
proxy=proxy, proxy_type=proxy_type)
File "Image-Downloader-master\crawler.py", line 207, in baidu_get_image_url_using_api
total_num = init_json['listNum']
KeyError: 'listNum'

按照指导进行安装后搜索龙猫,不论baidu还是bing都是如此'Service' object has no attribute 'process'

您好,谢谢。

GUI输出的结果如下:

[17:40:19] -e Google -n 20 -j 50 -o "./download_images/龙猫" -S "龙猫"
[17:40:19] Scraping From Google Image Search ...
[17:40:19] Keywords: 龙猫
[17:40:19] Number: 20
[17:40:19] Face Only: False
[17:40:19] Safe Mode: True
[17:40:19] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=%E9%BE%99%E7%8C%AB&safe=on
[17:40:19] Exception in thread Thread-7:
[17:40:19] Traceback (most recent call last):
[17:40:19] File "threading.py", line 914, in _bootstrap_inner
[17:40:19] File "threading.py", line 862, in run
[17:40:19] File "Image-Downloader-master\image_downloader.py", line 52, in main
[17:40:19] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[17:40:19] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 74, in start
[17:40:19] File "subprocess.py", line 640, in init
[17:40:19] File "subprocess.py", line 848, in _get_handles
[17:40:19] OSError: [WinError 6] 句柄无效。
[17:40:19] Exception ignored in:
[17:40:19] <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x000002335A2F50F0>>
[17:40:19] Traceback (most recent call last):
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 173, in del
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 145, in stop
[17:40:19] AttributeError
[17:40:19] :
[17:40:19] 'Service' object has no attribute 'process'
[17:40:20] stopped

搜了很多解决办法,没有用,所以来提问,也希望得到解决指导,可以帮助到其他也遇到我这样问题的朋友

不太懂该把大佬的改动加在哪个地方

最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)

Originally posted by @ald2004 in #29 (comment)

Exception in thread Thread-1

[18:33:35] -e Baidu -d chrome_headless -n 100 -j 50 -o "./download_images/dog" -S "dog"
[18:33:35] Scraping From Baidu Image Search ...
[18:33:35] Keywords: dog
[18:33:35] Number: 100
[18:33:35] Face Only: False
[18:33:35] Safe Mode: True
[18:33:35] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=dog
[18:33:38] Exception in thread Thread-1:
[18:33:38] Traceback (most recent call last):
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
[18:33:38] self.run()
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 865, in run
[18:33:38] self._target(*self._args, **self._kwargs)
[18:33:38] File "/home/yuran/project/Image-Downloader/image_downloader.py", line 54, in main
[18:33:38] browser=args.driver)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 315, in crawl_image_urls
[18:33:38] proxy=proxy, proxy_type=proxy_type)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 195, in baidu_get_image_url_using_api
[18:33:38] res = requests.get(init_url, proxies=proxies)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 75, in get
[18:33:38] return request('get', url, params=params, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 60, in request
[18:33:38] return session.request(method=method, url=url, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
[18:33:38] resp = self.send(prep, **send_kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in send
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 165, in resolve_redirects
[18:33:38] raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
[18:33:38] requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
[18:33:38] stopped
这是什么问题呢?

对chrome版本是否有限制

使用118版本的chrome,一直提示错误
[15:30:06] Checking Google Chrome and chromedriver ...
[15:30:07] WARNING:root:Can not find chromedriver for currently installed chrome version.
[15:30:07] Dependencies not resolved, exit.
[15:30:07] stopped

请问要运行此程序,是否需要指定版本的chrome

No module named 'PyQt5'

C:\Users\James\Desktop\articles\Image-Downloader>pip install PyQt5
Collecting PyQt5
Using cached PyQt5-5.15.9-cp37-abi3-win_amd64.whl (6.8 MB)
Collecting PyQt5-Qt5>=5.15.2
Using cached PyQt5_Qt5-5.15.2-py3-none-win_amd64.whl (50.1 MB)
Collecting PyQt5-sip<13,>=12.11
Using cached PyQt5_sip-12.11.1-cp39-cp39-win_amd64.whl (78 kB)
Installing collected packages: PyQt5-Qt5, PyQt5-sip, PyQt5
Successfully installed PyQt5-5.15.9 PyQt5-Qt5-5.15.2 PyQt5-sip-12.11.1

(yolo_classify) C:\Users\James\Desktop\articles\Image-Downloader>image_downloader_gui.py
Traceback (most recent call last):
File "C:\Users\James\Desktop\articles\Image-Downloader\image_downloader_gui.py", line 7, in
from mainwindow import MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\mainwindow.py", line 5, in
from ui_mainwindow import Ui_MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\ui_mainwindow.py", line 11, in
from PyQt5 import QtCore, QtGui, QtWidgets
ModuleNotFoundError: No module named 'PyQt5'

How is the usage of this project

I just download and install the corresponding dependency.
Next, I revise the content in the main.py and run.
But I just stuck without responses as below:

sunner@sunner-All-Series:~/Save/Google-Image-Downloader/src$ python main.py 

Scraping From Google Image Search ...

Keywords:   orange
Number:     100
Face Only:  Yes
Safe Mode:  On
Query URL:  https://www.google.com/search?tbm=isch&q=orange&tbs=itp:face&safe=on

The status keeps for several minutes until I do key interruption.
However, no error description appeared.
Is there any problem about my operation?
My environment is ubuntu 14.04, Python 2.7.6.

hi~我是一个正在学习ai的学生,使用您的爬虫爬取baidu图片,特此求助:使用gui方式打开,选取baidu,搜索关键字,点击start,然后就会报错如下

[10:07:57] -e Baidu -d chrome_headless -n 100 -j 50 -o "E:/zb/code/images/mouse" -S "mouse"
[10:07:57] Scraping From Baidu Image Search ...
[10:07:57] Keywords: mouse
[10:07:57] Number: 100
[10:07:57] Face Only: False
[10:07:57] Safe Mode: True
[10:07:57] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=mouse
[10:07:57] Exception in thread
[10:07:57] Thread-3
[10:07:57] :
[10:07:57] Traceback (most recent call last):
[10:07:57] File "D:\软件\python64\lib\threading.py", line 954, in _bootstrap_inner
[10:07:57] self.run()
[10:07:57] File "D:\软件\python64\lib\threading.py", line 892, in run
[10:07:57] self._target(*self._args, **self.kwargs)
[10:07:57] File "E:\zb\code\Image-Downloader\image_downloader.py", line 50, in main
[10:07:57] crawled_urls = crawler.crawl_image_urls(args.keywords,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 325, in crawl_image_urls
[10:07:57] image_urls = baidu_get_image_url_using_api(keywords, max_number=max_number, face_only=face_only,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 206, in baidu_get_image_url_using_api
[10:07:57] init_json = json.loads(res.text.replace(r"'", ""), encoding='utf-8', strict=False)
[10:07:57] File "D:\软件\python64\lib\json_init
.py", line 359, in loads
[10:07:57] return cls(**kw).decode(s)
[10:07:57] TypeError
[10:07:57] :
[10:07:57] init() got an unexpected keyword argument 'encoding'
[10:07:57] stopped

JSONDecodeError

when clicking "start" button, i encounter error as follows:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Not downloading any images

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.0:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried with GUI also. But it doesnt work. Please guid me.

Can you help me fix this problem?

[09:57:20] -e Baidu -n 100 -j 4 -o "./download_images/rain" -S "rain"
[09:57:20] Scraping From Baidu Image Search ...
[09:57:20] Keywords: rain
[09:57:20] Number: 100
[09:57:20] Face Only: False
[09:57:20] Safe Mode: True
[09:57:20] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=rain
[09:57:24] Exception in thread Thread-4:
[09:57:24] Traceback (most recent call last):
[09:57:24] File "threading.py", line 914, in _bootstrap_inner
[09:57:24] File "threading.py", line 862, in run
[09:57:24] File "Image-Downloader-master\image_downloader.py", line 52, in main
[09:57:24] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[09:57:24] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 96, in start
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 109, in assert_process_still_running
[09:57:24] selenium.common.exceptions.WebDriverException: Message: Service C:\Users\wyy\AppData\Local\Temp_MEI145722/bin/phantomjs.exe unexpectedly exited. Status code was: 4294967295
[09:57:25] stopped

下载失败 报错如下

你好,我装好后没有一个能够下载下来,请问是什么原因呢?
[14:54:36] Keywords: 货拉拉
[14:54:36] Number: 10
[14:54:36] Face Only: True
[14:54:36] Safe Mode: True
[14:54:36] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E8%B4%A7%E6%8B%89%E6%8B%89&face=1
[14:54:38] == 10 out of 10 crawled images urls will be used.
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/503d269759ee3d6d6a6a6c9a54166d224f4ade37.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/738b4710b912c8fc175f1d9ceb039245d6882137.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/b64543a98226cffc539816d9ae014a90f603ea2f.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/fd039245d688d43f5d067aa36a1ed21b0ef43b30.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/dbb44aed2e738bd4af8e5960b68b87d6277ff930.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/d4628535e5dde711047da8a7b0efce1b9d166130.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/d439b6003af33a8781c3eb4bd15c10385343b531.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/dbb44aed2e738bd4af8f5960b68b87d6277ff931.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/cf1b9d16fdfaaf51684d6aee9b5494eef01f7a31.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] ## Fail: https://gss0.baidu.com/7LsWdDW5_xN3otqbppnN2DJv/forum/pic/item/0b46f21fbe096b63a7b6b5011b338744ebf8ac30.jpg ("'NoneType' object has no attribute 'startswith'",)
[14:54:38] Finished.
[14:54:39] stopped

error

raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)

数量限制

现在是向下拉加载,可能数量远远大于现有所爬的??

No url found for google or bing

No matter what keywords are used, it always says == 0 out of 0 crawled images urls will be used for Google and Bing engine. Only Baidu works. Any clue?

Error when downloading pics using chrome

Hi,
Following error occcurs when I try to run the script.

selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

Any help will be appreciate!

爬百度图片的数量问题

1.我制定了爬取数量后,为什么只能爬取一部分,比如我制定为10000张实际只能爬2000左右,并且我修改了你代码的制定上限,我手动搜关键词也并不只有2000左右?

Support for Yandex

Hello.
Can you add support for https://yandex.ru/images/
In this search engine, censorship is not as strict as in Google.
And also less delete photos for copyright infringement.

Your prompt reply will be highly appreciated

Any ideas?

[15:04:41] -e Google -d chrome_headless -n 100 -j 50 -o "./download_images/as" -S "as"
[15:04:41] Scraping From Google Image Search ...
[15:04:41] Keywords: as
[15:04:41] Number: 100
[15:04:41] Face Only: False
[15:04:41] Safe Mode: True
[15:04:41] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=as&safe=on
[15:04:41] Exception in thread Thread-1:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
[15:04:41] stdin=PIPE)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in init
[15:04:41] restore_signals, start_new_session)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
[15:04:41] startupinfo)
[15:04:41] FileNotFoundError: [WinError 2] A rendszer nem találja a megadott fájlt
[15:04:41] During handling of the above exception, another exception occurred:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner
[15:04:41] self.run()
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run
[15:04:41] self._target(*self._args, **self._kwargs)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\image_downloader.py", line 54, in main
[15:04:41] browser=args.driver)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\crawler.py", line 300, in crawl_image_urls
[15:04:41] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init
[15:04:41] self.service.start()
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
[15:04:41] os.path.basename(self.path), self.start_error_message)
[15:04:41] selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
[15:04:42] stopped

win10命令行下代理服务器下载google图片有问题

系统:win10,命令行模式,browser为chrome,无法使用代理服务器下载google图片。
Traceback:
Message: no such element: Unable to locate element: {"method":"id","selector":"smb"}
在此过程中网络配置正确,在响应的chrome窗口中可以看到打开google image链接并搜索到相应的图片。

能否帮忙看一下这个问题?
谢谢

It fails to download anything

[18:09:39]   -e Google -d chrome_headless -n 10 -j 50 -o "./download_images/pear" "pear"
[18:09:39]  Scraping From Google Image Search ...
[18:09:39]  Keywords:  pear
[18:09:39]  Number:  10
[18:09:39]  Face Only:  False
[18:09:39]  Safe Mode:  False
[18:09:39]  Query URL:  https://www.google.com/search?tbm=isch&hl=en&q=pear&safe=off
[18:09:42]  Exception in thread Thread-1:
[18:09:42]  Traceback (most recent call last):
[18:09:42]    File "C:\Python37\lib\threading.py", line 917, in _bootstrap_inner
[18:09:42]      self.run()
[18:09:42]    File "C:\Python37\lib\threading.py", line 865, in run
[18:09:42]      self._target(*self._args, **self._kwargs)
[18:09:42]    File "C:\Python37\Image-Downloader\image_downloader.py", line 54, in main
[18:09:42]      browser=args.driver)
[18:09:42]    File "C:\Python37\Image-Downloader\crawler.py", line 282, in crawl_image_urls
[18:09:42]      driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 81, in __init__
[18:09:42]      desired_capabilities=desired_capabilities)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
[18:09:42]      self.start_session(capabilities, browser_profile)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
[18:09:42]      response = self.execute(Command.NEW_SESSION, parameters)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
[18:09:42]      self.error_handler.check_response(response)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
[18:09:42]      raise exception_class(message, screen, stacktrace)
[18:09:42]  selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 83
[18:09:43]  stopped

I already downloaded all the requirements. I'm using Python3.7.2 and ChromeDriver 81.

image

Error DevToolsActivePort file doesn't exist

在代码部署调试的时候,一直报这个错误,

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

查了一些资料,需要添加几个参数
“–no-sandbox”参数是让Chrome在root权限下跑

修改文件项目文件 crawler.py 330行左右,添加参数解决问题
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
image

可以下载图片,非常感谢项目作者
image

还修改过 crawler.py 文件这个地方,换行报语法错误,代码改成一行,没报错了
image

Is there are way to set image resolution?

In google image search, you can click "tools" and then pick small, medium, or large resolution for the search results.

Is there a way to limit Image-Downloader to a particular resolution when doing a google image search?

Thanks!

windows下google图片无法爬取

刚开始用VPN, 无法爬取
然后我又开通了个硅谷的阿里服务器,2016server,还是无法爬取。
ping google和网页浏览google都没有问题

Installation on Archlinx x86_64

Thanks! if you need any help let me know

Installation on Archlinux:
Linux archz800 4.9.56-1-lts #1 SMP Thu Oct 12 22:34:15 CEST 2017 x86_64 GNU/Linux
phantomjs Version : 2.1.1-8

Change line 9: image_downloader_gui.spec to:
datas=[("bin/phantomjs", "bin/")],

Then it works like this:

	git clone https://github.com/sczhengyabin/Image-Downloader
	mkvirtualenv -p python3 Image-Downloader
	cd Image-Downloader
	
	pip3 install PyQt5
	pip3 install -r requirements.txt 
	pip3 install pyinstaller

	pacman -S phantomjs
	which phantomjs
	cd /usr/bin/phantomjs
	ln -s /usr/bin/phantomjs  ~/Snakepit/Image-Downloader/bin/

	pyinstaller image_downloader_gui.spec 

Errors

When I go to use the cmd version of Image Downloader I get
[0621/105054.701:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -200
i am using python image_downloader.py model_rocket --output=./images --max-number=1000 --num-threads=50 --engine=Google to run.
About half of the images error out and don't download

When it finishes I get ## Fail: https://www.electroschematics.com/wp-content/uploads/2014/03/Rocket-Launch-Controller.png?fit=687%2C478 (ProtocolError('Connection aborted.', OSError("(10054, 'WSAECONNRESET')")),)

win10+wsl2 ubuntu20.04+chrome92.0.4515.107+ChromeDriver+92.0.4515.43 error

[16:58:11] -e Bing -d chrome -n 100 -j 50 -o "./download_images/man" -S "man"
[16:58:11] Scraping From Bing Image Search ...
[16:58:11] Keywords: man
[16:58:11] Number: 100
[16:58:11] Face Only: False
[16:58:11] Safe Mode: True
[16:58:11] Query URL: https://www.bing.com/images/search?&q=man&qft=
[16:58:12] Exception in thread
[16:58:12] Thread-1
[16:58:12] :
[16:58:12] Traceback (most recent call last):
[16:58:12] File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
[16:58:12] self.run()
[16:58:12] File "/usr/lib/python3.8/threading.py", line 870, in run
[16:58:12] self._target(*self._args, **self._kwargs)
[16:58:12] File "/mnt/e/ai/Image-Downloader/image_downloader.py", line 59, in main
[16:58:12] crawled_urls = crawler.crawl_image_urls(args.keywords,
[16:58:12] File "/mnt/e/ai/Image-Downloader/crawler.py", line 345, in crawl_image_urls
[16:58:12] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init
[16:58:12] self.service.start()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start
[16:58:12] self.assert_process_still_running()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
[16:58:12] raise WebDriverException(
[16:58:12] selenium.common.exceptions
[16:58:12] .
[16:58:12] WebDriverException
[16:58:12] :
[16:58:12] Message: Service ./bin/chromedriver unexpectedly exited. Status code was: 127
[16:58:13] stopped

driver = webdriver.PhantomJS(executable_path=phantomjs_path报错如下

[11:00:04] -e Baidu -n 100 -j 50 -o "./download_images/微笑" -S "微笑"
[11:00:04] Scraping From Baidu Image Search ...
[11:00:04] Keywords: 微笑
[11:00:04] Number: 100
[11:00:04] Face Only: False
[11:00:04] Safe Mode: True
[11:00:04] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E5%BE%AE%E7%AC%91
[11:01:05] Exception in thread
[11:01:05] Thread-1
[11:01:05] :
[11:01:05] Traceback (most recent call last):
[11:01:05] File "F:\python\python\lib\threading.py", line 973, in _bootstrap_inner
[11:01:05] self.run()
[11:01:05] File "F:\python\python\lib\threading.py", line 910, in run
[11:01:05] self._target(*self._args, **self._kwargs)
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\image_downloader.py", line 46, in main
[11:01:05] crawled_urls = crawler.crawl_image_urls(args.keywords,
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\crawler.py", line 160, in crawl_image_urls
[11:01:05] driver = webdriver.PhantomJS(executable_path=phantomjs_path,
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 51, in init
[11:01:05] self.service.start()
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 82, in start
[11:01:05] raise WebDriverException(
[11:01:05] selenium.common.exceptions
[11:01:05] .
[11:01:05] WebDriverException
[11:01:05] :
[11:01:05] Message: Can not connect to GhostDriver on port 14423
[11:01:05] stopped

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.