qianyantech / image-downloader Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 555.0 25.19 MB

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

License: MIT License

Python 100.00%

baidu bing google google-images image-downloader pyqt scrapy spider

image-downloader's People

Contributors

Stargazers

Watchers

Forkers

kermit5 robbietoo mm1994uestc a17813 mitnad zsz02 bugzhao mercymm nianfudong lvsolo momocomeon blackspadeace arsenluca chenzewei rambles taichu012 jacobishao webstardotme jacktank zumbalamambo slpal kingofbanana wzh880801 yfzmk2013 sadlifealone whaozl terminats17 baifengbai nidetaoge guillaumemilan mingfengwuye dterg czr9701 boban-dj xinerfeixiang skytreerom ghustwb dcc668 h005 awesome-archive bingohong lyb926 clarencedc pilotbear fang2x paseam zvz427 mxxu mawangyi xiliangsong bruceli001 wang21jun litingsjj fengyuxie prison1994 musicbeer yesyihua lilieming fendaq forfunfan sunlx688 ianmadlenya ieyer sheex2018 luozhifengrome xufabing mijaelx hhy5277 tyssbb wujunsoftsys 2223561412 yazhaqu onepiece808 jiangzongkang yianxss nxren2016 chpeng victor8733 laowang12345 lifeking yummycats zxzwxdl xiaopangzi313 li9769006 liuxinglu wishiwill jycjy2015 langyayue99 tuyuxiao renjianyouweishiqinhuan xamateur sbhavani vvaixy fredxu1990 nocoldbob aoxiangfly violet-ve leierqiang hongru303 hunterhawk

image-downloader's Issues

download image failed

Hello, I met this error when running the GUI program.

Fail: http://g.hiphotos.baidu.com/zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (MaxRetryError("HTTPConnectionPool(host='g.hiphotos.baidu.com', port=80): Max retries exceeded with url: /zhidao/wh%3D450%2C600/sign=e5c2877a6f600c33f02cd6cc2f7c7d39/cefc1e178a82b901c431875d778da9773812efc4.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x000000000361E208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.mkwd.gov.ph/new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (MaxRetryError("HTTPConnectionPool(host='www.mkwd.gov.ph', port=80): Max retries exceeded with url: /new-wordpress/wp-content/uploads/2015/02/asas-001.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000059F2A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)
[13:16:30] ## Fail: http://www.micronbot.com/usr/uploads/2017/03/704946529.png (MaxRetryError("HTTPConnectionPool(host='www.micronbot.com', port=80): Max retries exceeded with url: /usr/uploads/2017/03/704946529.png (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000003610A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))",),)

Bing Image Search problem.

When using Bing search in chrome, I set the window size to (4000, 3000), it show above 500 more images.
However, no matter what I do to change the window size or using script to scroll down the window, the code cound only crawl about 200 images.

Anybody know how to solve this?

Key error: 'listnum'

Really thanks for your great work! However, recently I have met a strange problem like the following.
Any suggestions would help!

Keywords: 杨凡小时候
Number: 100
Face Only: False
Safe Mode: True
Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E6%9D%A8%E5%87%A1%E5%B0%8F%E6%97%B6%E5%80%99
Traceback (most recent call last):
File "Myimage_downloader.py", line 143, in
main(sys.argv[1:])
File "Myimage_downloader.py", line 76, in main
browser=args.driver)
File "Image-Downloader-master\crawler.py", line 326, in crawl_image_urls
proxy=proxy, proxy_type=proxy_type)
File "Image-Downloader-master\crawler.py", line 207, in baidu_get_image_url_using_api
total_num = init_json['listNum']
KeyError: 'listNum'

按照指导进行安装后搜索龙猫，不论baidu还是bing都是如此'Service' object has no attribute 'process'

您好，谢谢。

GUI输出的结果如下：

[17:40:19] -e Google -n 20 -j 50 -o "./download_images/龙猫" -S "龙猫"
[17:40:19] Scraping From Google Image Search ...
[17:40:19] Keywords: 龙猫
[17:40:19] Number: 20
[17:40:19] Face Only: False
[17:40:19] Safe Mode: True
[17:40:19] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=%E9%BE%99%E7%8C%AB&safe=on
[17:40:19] Exception in thread Thread-7:
[17:40:19] Traceback (most recent call last):
[17:40:19] File "threading.py", line 914, in _bootstrap_inner
[17:40:19] File "threading.py", line 862, in run
[17:40:19] File "Image-Downloader-master\image_downloader.py", line 52, in main
[17:40:19] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[17:40:19] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 74, in start
[17:40:19] File "subprocess.py", line 640, in init
[17:40:19] File "subprocess.py", line 848, in _get_handles
[17:40:19] OSError: [WinError 6] 句柄无效。
[17:40:19] Exception ignored in:
[17:40:19] <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x000002335A2F50F0>>
[17:40:19] Traceback (most recent call last):
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 173, in del
[17:40:19] File "site-packages\selenium\webdriver\common\service.py", line 145, in stop
[17:40:19] AttributeError
[17:40:19] :
[17:40:19] 'Service' object has no attribute 'process'
[17:40:20] stopped

搜了很多解决办法，没有用，所以来提问，也希望得到解决指导，可以帮助到其他也遇到我这样问题的朋友

不太懂该把大佬的改动加在哪个地方

最近百度改了,up主要更新了,
crawler.py-baidu_get_image_url_using_api-res = requests.get(init_url, proxies=proxies)加个header:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
init_url="https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&lm=7&fp=result&ie=utf-8&oe=utf-8&st=-1&word=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&queryWord=%25E7%258E%25A9%25E6%2589%258B%25E6%259C%25BA&face=0&pn=0&rn=30"
-195 res = requests.get(init_url,proxies=proxies)
+196 res = requests.get(init_url,proxies=proxies,headers=headers)

Originally posted by @ald2004 in #29 (comment)

新版selenium不支持PhantomJS 要用老版本吗

File "/Users/moo/project/01_yanglian/Image-Downloader/crawler.py", line 27, in
dcap = dict(DesiredCapabilities.PHANTOMJS)
AttributeError: type object 'DesiredCapabilities' has no attribute 'PHANTOMJS'

Exception in thread Thread-1

[18:33:35] -e Baidu -d chrome_headless -n 100 -j 50 -o "./download_images/dog" -S "dog"
[18:33:35] Scraping From Baidu Image Search ...
[18:33:35] Keywords: dog
[18:33:35] Number: 100
[18:33:35] Face Only: False
[18:33:35] Safe Mode: True
[18:33:35] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=dog
[18:33:38] Exception in thread Thread-1:
[18:33:38] Traceback (most recent call last):
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
[18:33:38] self.run()
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/threading.py", line 865, in run
[18:33:38] self._target(*self._args, **self._kwargs)
[18:33:38] File "/home/yuran/project/Image-Downloader/image_downloader.py", line 54, in main
[18:33:38] browser=args.driver)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 315, in crawl_image_urls
[18:33:38] proxy=proxy, proxy_type=proxy_type)
[18:33:38] File "/home/yuran/project/Image-Downloader/crawler.py", line 195, in baidu_get_image_url_using_api
[18:33:38] res = requests.get(init_url, proxies=proxies)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 75, in get
[18:33:38] return request('get', url, params=params, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/api.py", line 60, in request
[18:33:38] return session.request(method=method, url=url, **kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
[18:33:38] resp = self.send(prep, **send_kwargs)
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in send
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 668, in
[18:33:38] history = [resp for resp in gen] if allow_redirects else []
[18:33:38] File "/home/yuran/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 165, in resolve_redirects
[18:33:38] raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
[18:33:38] requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
[18:33:38] stopped
这是什么问题呢?

关于下载您的软件windows配置的问题

您好，博主，现在我已经把要下载的QT5和Python3都下载了，但是后面的执行和copy步骤看不懂，能否详细说明一下？

对chrome版本是否有限制

使用118版本的chrome，一直提示错误
[15:30:06] Checking Google Chrome and chromedriver ...
[15:30:07] WARNING:root:Can not find chromedriver for currently installed chrome version.
[15:30:07] Dependencies not resolved, exit.
[15:30:07] stopped

请问要运行此程序，是否需要指定版本的chrome

No module named 'PyQt5'

C:\Users\James\Desktop\articles\Image-Downloader>pip install PyQt5
Collecting PyQt5
Using cached PyQt5-5.15.9-cp37-abi3-win_amd64.whl (6.8 MB)
Collecting PyQt5-Qt5>=5.15.2
Using cached PyQt5_Qt5-5.15.2-py3-none-win_amd64.whl (50.1 MB)
Collecting PyQt5-sip<13,>=12.11
Using cached PyQt5_sip-12.11.1-cp39-cp39-win_amd64.whl (78 kB)
Installing collected packages: PyQt5-Qt5, PyQt5-sip, PyQt5
Successfully installed PyQt5-5.15.9 PyQt5-Qt5-5.15.2 PyQt5-sip-12.11.1

(yolo_classify) C:\Users\James\Desktop\articles\Image-Downloader>image_downloader_gui.py
Traceback (most recent call last):
File "C:\Users\James\Desktop\articles\Image-Downloader\image_downloader_gui.py", line 7, in
from mainwindow import MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\mainwindow.py", line 5, in
from ui_mainwindow import Ui_MainWindow
File "C:\Users\James\Desktop\articles\Image-Downloader\ui_mainwindow.py", line 11, in
from PyQt5 import QtCore, QtGui, QtWidgets
ModuleNotFoundError: No module named 'PyQt5'

How is the usage of this project

I just download and install the corresponding dependency.
Next, I revise the content in the main.py and run.
But I just stuck without responses as below:

sunner@sunner-All-Series:~/Save/Google-Image-Downloader/src$ python main.py 

Scraping From Google Image Search ...

Keywords:   orange
Number:     100
Face Only:  Yes
Safe Mode:  On
Query URL:  https://www.google.com/search?tbm=isch&q=orange&tbs=itp:face&safe=on

The status keeps for several minutes until I do key interruption.
However, no error description appeared.
Is there any problem about my operation?
My environment is ubuntu 14.04, Python 2.7.6.

hi~我是一个正在学习ai的学生，使用您的爬虫爬取baidu图片，特此求助：使用gui方式打开，选取baidu，搜索关键字，点击start，然后就会报错如下

[10:07:57] -e Baidu -d chrome_headless -n 100 -j 50 -o "E:/zb/code/images/mouse" -S "mouse"
[10:07:57] Scraping From Baidu Image Search ...
[10:07:57] Keywords: mouse
[10:07:57] Number: 100
[10:07:57] Face Only: False
[10:07:57] Safe Mode: True
[10:07:57] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=mouse
[10:07:57] Exception in thread
[10:07:57] Thread-3
[10:07:57] :
[10:07:57] Traceback (most recent call last):
[10:07:57] File "D:\软件\python64\lib\threading.py", line 954, in _bootstrap_inner
[10:07:57] self.run()
[10:07:57] File "D:\软件\python64\lib\threading.py", line 892, in run
[10:07:57] self._target(*self._args, **self.kwargs)
[10:07:57] File "E:\zb\code\Image-Downloader\image_downloader.py", line 50, in main
[10:07:57] crawled_urls = crawler.crawl_image_urls(args.keywords,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 325, in crawl_image_urls
[10:07:57] image_urls = baidu_get_image_url_using_api(keywords, max_number=max_number, face_only=face_only,
[10:07:57] File "E:\zb\code\Image-Downloader\crawler.py", line 206, in baidu_get_image_url_using_api
[10:07:57] init_json = json.loads(res.text.replace(r"'", ""), encoding='utf-8', strict=False)
[10:07:57] File "D:\软件\python64\lib\json_init.py", line 359, in loads
[10:07:57] return cls(**kw).decode(s)
[10:07:57] TypeError
[10:07:57] :
[10:07:57] init() got an unexpected keyword argument 'encoding'
[10:07:57] stopped

JSONDecodeError

when clicking "start" button, i encounter error as follows:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Not downloading any images

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.0:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried with GUI also. But it doesnt work. Please guid me.

Can you help me fix this problem?

[09:57:20] -e Baidu -n 100 -j 4 -o "./download_images/rain" -S "rain"
[09:57:20] Scraping From Baidu Image Search ...
[09:57:20] Keywords: rain
[09:57:20] Number: 100
[09:57:20] Face Only: False
[09:57:20] Safe Mode: True
[09:57:20] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=rain
[09:57:24] Exception in thread Thread-4:
[09:57:24] Traceback (most recent call last):
[09:57:24] File "threading.py", line 914, in _bootstrap_inner
[09:57:24] File "threading.py", line 862, in run
[09:57:24] File "Image-Downloader-master\image_downloader.py", line 52, in main
[09:57:24] File "Image-Downloader-master\crawler.py", line 254, in crawl_image_urls
[09:57:24] File "site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 96, in start
[09:57:24] File "site-packages\selenium\webdriver\common\service.py", line 109, in assert_process_still_running
[09:57:24] selenium.common.exceptions.WebDriverException: Message: Service C:\Users\wyy\AppData\Local\Temp_MEI145722/bin/phantomjs.exe unexpectedly exited. Status code was: 4294967295
[09:57:25] stopped

Unsplash search engine, and firefox browser enhancement and image resolution preferences

Dear schzengyabin,

I would like to contribute to this project with the following enhancements

Unsplash search engine option
Firefox browser option
Image Resolution Preferences

The screenshots can be seen below. Hope to hear from you soon!
Thank you

下载失败报错如下

不支持中文关键字搜索

error

raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)

无法用，selenium 的version你都不说是多少，版本一更新一堆报错，全是历史版本不兼容

无法评价，打回去大改吧，半成品

数量限制

现在是向下拉加载，可能数量远远大于现有所爬的??

No url found for google or bing

No matter what keywords are used, it always says == 0 out of 0 crawled images urls will be used for Google and Bing engine. Only Baidu works. Any clue?

Error when downloading pics using chrome

Hi,
Following error occcurs when I try to run the script.

selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

Any help will be appreciate!

爬百度图片的数量问题

1.我制定了爬取数量后，为什么只能爬取一部分，比如我制定为10000张实际只能爬2000左右，并且我修改了你代码的制定上限，我手动搜关键词也并不只有2000左右？

linux下代理服务器下载google图片有问题

目前，在linux系统个下，采用代理服务器的方式下载google图片，通过命令行不能够正确运行。

Support for Yandex

Hello.
Can you add support for https://yandex.ru/images/
In this search engine, censorship is not as strict as in Google.
And also less delete photos for copyright infringement.

Your prompt reply will be highly appreciated

Unhandled Python exception when chosing .txt file

whenever I click the chose .txt button and chose a .txt file, exception happens as the picture

Any ideas?

[15:04:41] -e Google -d chrome_headless -n 100 -j 50 -o "./download_images/as" -S "as"
[15:04:41] Scraping From Google Image Search ...
[15:04:41] Keywords: as
[15:04:41] Number: 100
[15:04:41] Face Only: False
[15:04:41] Safe Mode: True
[15:04:41] Query URL: https://www.google.com/search?tbm=isch&hl=en&q=as&safe=on
[15:04:41] Exception in thread Thread-1:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
[15:04:41] stdin=PIPE)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in init
[15:04:41] restore_signals, start_new_session)
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
[15:04:41] startupinfo)
[15:04:41] FileNotFoundError: [WinError 2] A rendszer nem találja a megadott fájlt
[15:04:41] During handling of the above exception, another exception occurred:
[15:04:41] Traceback (most recent call last):
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner
[15:04:41] self.run()
[15:04:41] File "C:\Users\sya\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run
[15:04:41] self._target(*self._args, **self._kwargs)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\image_downloader.py", line 54, in main
[15:04:41] browser=args.driver)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\crawler.py", line 300, in crawl_image_urls
[15:04:41] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init
[15:04:41] self.service.start()
[15:04:41] File "C:\Users\sya\Desktop\Image-Downloader-master\venv\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
[15:04:41] os.path.basename(self.path), self.start_error_message)
[15:04:41] selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
[15:04:42] stopped

win10命令行下代理服务器下载google图片有问题

系统：win10，命令行模式，browser为chrome，无法使用代理服务器下载google图片。
Traceback:
Message: no such element: Unable to locate element: {"method":"id","selector":"smb"}
在此过程中网络配置正确，在响应的chrome窗口中可以看到打开google image链接并搜索到相应的图片。

能否帮忙看一下这个问题？
谢谢

It fails to download anything

[18:09:39]   -e Google -d chrome_headless -n 10 -j 50 -o "./download_images/pear" "pear"
[18:09:39]  Scraping From Google Image Search ...
[18:09:39]  Keywords:  pear
[18:09:39]  Number:  10
[18:09:39]  Face Only:  False
[18:09:39]  Safe Mode:  False
[18:09:39]  Query URL:  https://www.google.com/search?tbm=isch&hl=en&q=pear&safe=off
[18:09:42]  Exception in thread Thread-1:
[18:09:42]  Traceback (most recent call last):
[18:09:42]    File "C:\Python37\lib\threading.py", line 917, in _bootstrap_inner
[18:09:42]      self.run()
[18:09:42]    File "C:\Python37\lib\threading.py", line 865, in run
[18:09:42]      self._target(*self._args, **self._kwargs)
[18:09:42]    File "C:\Python37\Image-Downloader\image_downloader.py", line 54, in main
[18:09:42]      browser=args.driver)
[18:09:42]    File "C:\Python37\Image-Downloader\crawler.py", line 282, in crawl_image_urls
[18:09:42]      driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 81, in __init__
[18:09:42]      desired_capabilities=desired_capabilities)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
[18:09:42]      self.start_session(capabilities, browser_profile)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
[18:09:42]      response = self.execute(Command.NEW_SESSION, parameters)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
[18:09:42]      self.error_handler.check_response(response)
[18:09:42]    File "C:\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
[18:09:42]      raise exception_class(message, screen, stacktrace)
[18:09:42]  selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 83
[18:09:43]  stopped

I already downloaded all the requirements. I'm using Python3.7.2 and ChromeDriver 81.

无法用GUI下载百度图片

GUI模式下选择百度时，无法勾选爬取模式

使用谷歌搜索一直报错，是否需要升级什么版本？Can not find chromedriver for currently installed chrome version

Can not find chromedriver for currently installed chrome version

google image search problem

支持mac吗

Can not download images from Google

$ python3 image_downloader.py "stroller on street" -e Google

Scraping From Google Image Search ...

Keywords: stroller on street
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=stroller%20on%20street&safe=off

== 0 out of 0 crawled images urls will be used.

Finished.

it works with Bing and Baidu, but the search results with this two engines are not satisfied, how can I fix this? Thanks!

额，不太清楚是怎么回事，不过好像不能爬取谷歌浏览器的图片了

Error DevToolsActivePort file doesn't exist

在代码部署调试的时候，一直报这个错误，

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

查了一些资料，需要添加几个参数
“–no-sandbox”参数是让Chrome在root权限下跑

修改文件项目文件 crawler.py 330行左右，添加参数解决问题
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')

可以下载图片，非常感谢项目作者

还修改过 crawler.py 文件这个地方，换行报语法错误，代码改成一行，没报错了

most times I cannot download any image and it is really time consuming, more than one minute for one keyword.

Missing dependencies for SOCKS support

When downloading from Google, i setup proxy as
proxy_type = "socks5"
proxy = "127.0.0.1:1080"

Download failed, error message is:

Fail: http://bpic.ooopic.com/11/05/28/49b1OOOPIC4f.jpg!/fw/780/quality/90/unsharp/true/compress/true ('Missing dependencies for SOCKS support.',)

Do you know why? Thanks.

Is there are way to set image resolution?

In google image search, you can click "tools" and then pick small, medium, or large resolution for the search results.

Is there a way to limit Image-Downloader to a particular resolution when doing a google image search?

Thanks!

windows下google图片无法爬取

刚开始用VPN, 无法爬取
然后我又开通了个硅谷的阿里服务器，2016server，还是无法爬取。
ping google和网页浏览google都没有问题

无法下载百度图片

在浏览器里输入以下图片链接都是正常显示的，但是下载却不成功
环境：
Python 3.7.4
Mac

图片链接举例：https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fpic11.nipic.com%2F20101112%2F1295091_140128943000_2.jpg&refer=http%3A%2F%2Fpic11.nipic.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1670402385&t=a81c213b5f734ab2da917d2cd012ff23

How can I rename the download it files with the keywords.

I would like to rename the download it files with the keywords.

GUI版最大下载图片数量只能下2000？

Installation on Archlinx x86_64

Thanks! if you need any help let me know

Installation on Archlinux:
Linux archz800 4.9.56-1-lts #1 SMP Thu Oct 12 22:34:15 CEST 2017 x86_64 GNU/Linux
phantomjs Version : 2.1.1-8

Change line 9: image_downloader_gui.spec to:
datas=[("bin/phantomjs", "bin/")],

Then it works like this:

	git clone https://github.com/sczhengyabin/Image-Downloader
	mkvirtualenv -p python3 Image-Downloader
	cd Image-Downloader
	
	pip3 install PyQt5
	pip3 install -r requirements.txt 
	pip3 install pyinstaller

	pacman -S phantomjs
	which phantomjs
	cd /usr/bin/phantomjs
	ln -s /usr/bin/phantomjs  ~/Snakepit/Image-Downloader/bin/

	pyinstaller image_downloader_gui.spec

visual /similar image search feature

Would it be possible to retrieve similar images by upload or specifying urls?
thanks!

AttributeError

NoneType' object has no attribute 'lower'

Errors

When I go to use the cmd version of Image Downloader I get
[0621/105054.701:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -200
i am using python image_downloader.py model_rocket --output=./images --max-number=1000 --num-threads=50 --engine=Google to run.
About half of the images error out and don't download

When it finishes I get ## Fail: https://www.electroschematics.com/wp-content/uploads/2014/03/Rocket-Launch-Controller.png?fit=687%2C478 (ProtocolError('Connection aborted.', OSError("(10054, 'WSAECONNRESET')")),)

win10+wsl2 ubuntu20.04+chrome92.0.4515.107+ChromeDriver+92.0.4515.43 error

[16:58:11] -e Bing -d chrome -n 100 -j 50 -o "./download_images/man" -S "man"
[16:58:11] Scraping From Bing Image Search ...
[16:58:11] Keywords: man
[16:58:11] Number: 100
[16:58:11] Face Only: False
[16:58:11] Safe Mode: True
[16:58:11] Query URL: https://www.bing.com/images/search?&q=man&qft=
[16:58:12] Exception in thread
[16:58:12] Thread-1
[16:58:12] :
[16:58:12] Traceback (most recent call last):
[16:58:12] File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
[16:58:12] self.run()
[16:58:12] File "/usr/lib/python3.8/threading.py", line 870, in run
[16:58:12] self._target(*self._args, **self._kwargs)
[16:58:12] File "/mnt/e/ai/Image-Downloader/image_downloader.py", line 59, in main
[16:58:12] crawled_urls = crawler.crawl_image_urls(args.keywords,
[16:58:12] File "/mnt/e/ai/Image-Downloader/crawler.py", line 345, in crawl_image_urls
[16:58:12] driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init
[16:58:12] self.service.start()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start
[16:58:12] self.assert_process_still_running()
[16:58:12] File "/home/akira/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
[16:58:12] raise WebDriverException(
[16:58:12] selenium.common.exceptions
[16:58:12] .
[16:58:12] WebDriverException
[16:58:12] :
[16:58:12] Message: Service ./bin/chromedriver unexpectedly exited. Status code was: 127
[16:58:13] stopped

driver = webdriver.PhantomJS(executable_path=phantomjs_path报错如下

[11:00:04] -e Baidu -n 100 -j 50 -o "./download_images/微笑" -S "微笑"
[11:00:04] Scraping From Baidu Image Search ...
[11:00:04] Keywords: 微笑
[11:00:04] Number: 100
[11:00:04] Face Only: False
[11:00:04] Safe Mode: True
[11:00:04] Query URL: https://image.baidu.com/search/index?tn=baiduimage&word=%E5%BE%AE%E7%AC%91
[11:01:05] Exception in thread
[11:01:05] Thread-1
[11:01:05] :
[11:01:05] Traceback (most recent call last):
[11:01:05] File "F:\python\python\lib\threading.py", line 973, in _bootstrap_inner
[11:01:05] self.run()
[11:01:05] File "F:\python\python\lib\threading.py", line 910, in run
[11:01:05] self._target(*self._args, **self._kwargs)
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\image_downloader.py", line 46, in main
[11:01:05] crawled_urls = crawler.crawl_image_urls(args.keywords,
[11:01:05] File "D:\mine\毕设\软件\Image-Downloader-1.0.5\crawler.py", line 160, in crawl_image_urls
[11:01:05] driver = webdriver.PhantomJS(executable_path=phantomjs_path,
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 51, in init
[11:01:05] self.service.start()
[11:01:05] File "F:\python\python\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 82, in start
[11:01:05] raise WebDriverException(
[11:01:05] selenium.common.exceptions
[11:01:05] .
[11:01:05] WebDriverException
[11:01:05] :
[11:01:05] Message: Can not connect to GhostDriver on port 14423
[11:01:05] stopped

qianyantech / image-downloader Goto Github PK

image-downloader's People

Contributors

Stargazers

Watchers

Forkers

image-downloader's Issues

Fail: http://bpic.ooopic.com/11/05/28/49b1OOOPIC4f.jpg!/fw/780/quality/90/unsharp/true/compress/true ('Missing dependencies for SOCKS support.',)

Recommend Projects

Recommend Topics

Recommend Org

Jobs