hardikvasa / google-images-download Goto Github PK
View Code? Open in Web Editor NEWPython Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!
License: MIT License
Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!
License: MIT License
I've had several downloaded images where the file is not closed properly and I get a ResourceWarning. I checked and in the download_image function there is no exception handling for opening and writing to the file. I don't even see the output file being closed.
path = main_directory + "/" + dir_name + "/" + prefix + str(count) + ". " + image_name output_file = open(path, 'wb') data = response.read() output_file.write(data) response.close()
The same issue is is the single_image
function.
I'm less familiar with urlopen, but does it also need some exception handling so that the resource is closed if need be?
@hardikvasa
Hi Hardikvasa, thanks for your work. I have a question about downloading the images. What can I do if I want to download more images ? Specifically, What should I do if I want to download all the images under the keywork "Taj Mahal" ? Thanks for your help.
I am trying to play with your script and found it damn slow. Even the code doesn't work with Python 3.5 only.
Any idea how I can make the script work faster, please let me know
For windows the command
<cd google-images-download && sudo python setup.py install> gives the error <``bash: sudo: coomand not fine>
Re the line your disclaimer
"Please do not download any image without violating its copyright terms. "
you probably meant it the other way around. ;)
works great, thanks
If you were to change google_images_download.py to be a class rather than a set of functions and main caller, you could then allow it to be used as an API from other python scripts. That way it becomes a very powerful tool.
This is more of a feature request, not a bug.
Dear all, I would like to search for images using the search by image. I think that this accomplished by providing the (-u + url ) in the CLI but I have problems .
Originallly, the problem was with the name of the file and of the directory where the download has to occur. Specifically, Windows do not accept name of directories and file with the ":" character, so I changed both image_name with image_name.replace(':',"x") and dir_name with the same .replace function in the google_images_download.py file.
Unfortunately, the correction was not sufficient because if I run the following CLI :
py google_images_download.py -k "adidas" -u "https://www.adidas.it/dis/dw/image/v2/aagl_prd/on/demandware.static/-/Sites-adidas-products/default/dwc78be248/zoom/C77124_01_standard.jpg?sh=600&strip=false&sw=600" -l 20
I receive the following error msg: "***** This search result did not return any results...please try a different search filter *****"
Which is quite strange because the search works correctly if I simply eliminate the "- u image_url" and I simply run from the CLI:
py google_images_download.py -k "adidas" -l 20
Did someone encounter this problem before and solved it? Is there a way to upload the image from a jpg or png file on your local directory instead of providing the web address of the same image?
Thanks
PS I run Python 3.6 and Windows 10
Maybe I just haven't seen how to do this yet, but when using this library in a Python script, it's not obvious how to get the information about the downloaded file from the googleimagesdownload
object. It would be nice to have a dictionary of the relevant information about the downloaded image (i.e. basically the meta data that is printed out). At a minimum, it would be helpful to have the filenames of the downloaded images. The size would also be nice. Maybe this already exists, but I'm not sure how to get it.
Every time I run the script, I have the downloads/
folder not committed to git nor it is present in .gitignore
[~/mysites/2018/google-images-download]$ gst *[master]
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
downloads/
Installation
[~/mysites/2018/google-images-download]$ sudo python3 setup.py install [master]
Password:
running install
running bdist_egg
running egg_info
writing google_images_download.egg-info/PKG-INFO
writing dependency_links to google_images_download.egg-info/dependency_links.txt
writing entry points to google_images_download.egg-info/entry_points.txt
writing requirements to google_images_download.egg-info/requires.txt
writing top-level names to google_images_download.egg-info/top_level.txt
reading manifest file 'google_images_download.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.md'
writing manifest file 'google_images_download.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.9-x86_64/egg
creating build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/__init__.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/__main__.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/google_images_download.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/__main__.py to __main__.cpython-36.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/google_images_download.py to google_images_download.cpython-36.pyc
creating build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/PKG-INFO -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/entry_points.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/requires.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/top_level.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/google_images_download-2.0.4-py3.6.egg' and adding 'build/bdist.macosx-10.9-x86_64/egg' to it
removing 'build/bdist.macosx-10.9-x86_64/egg' (and everything under it)
Processing google_images_download-2.0.4-py3.6.egg
Removing /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google_images_download-2.0.4-py3.6.egg
Copying google_images_download-2.0.4-py3.6.egg to /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
google-images-download 2.0.4 is already the active version in easy-install.pth
Installing googleimagesdownload script to /Library/Frameworks/Python.framework/Versions/3.6/bin
Installed /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google_images_download-2.0.4-py3.6.egg
Processing dependencies for google-images-download==2.0.4
Searching for selenium==3.11.0
Best match: selenium 3.11.0
Processing selenium-3.11.0-py3.6.egg
selenium 3.11.0 is already the active version in easy-install.pth
Using /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium-3.11.0-py3.6.egg
Finished processing dependencies for google-images-download==2.0.4
Trying out
[~/mysites/2018/google-images-download]$ googleimagesdownload [master]
zsh: command not found: googleimagesdownload
[~/mysites/2018/google-images-download]$ googleimagesdownload --version [master]
zsh: command not found: googleimagesdownload
Error message is shown instead of the help text when no arguments are provided
Traceback (most recent call last):
File "/usr/local/bin/google_images_download.py", line 793, in <module>
main()
File "/usr/local/bin/google_images_download.py", line 785, in main
response.download(arguments)
File "/usr/local/bin/google_images_download.py", line 707, in download
raise ValueError('Keywords is a required argument!')
ValueError: Keywords is a required argument!
I am unable to get the keywords_from_file option to work. I get the message "OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\Users\Stell\Desktop\CurrentWorkfile\Data\Dorman\GID\Dorman 00588\r'".
If I specify keywords in the command instead of using the -kf option, it works fine. I've attached both my keyword file and the console output.
while trying to download 100 images found by searching "pokemon go para ios" I get the following error:
ssl.CertificateError: hostname 'assets.phonedog.com' doesn't match either of 'cloudfront.net', '*.cloudfront.net'
I'm using the python 3.6 fork.
Is ther anyway to bypass this??
First line of google_images_download.py is
# In[ ]:
instead of
#!/usr/bin/env python
Hello,
Thanks a lot for your work,
I modified it a bit and had to share this with you,
I simply changed the 'Edit From Here' section to this
########### Edit From Here ###########
search_keyword = [sys.argv[1]]
keywords = ['']
Which allows to run the programm from terminal only
Exemple.
python google-image-download.py kitty
For multiple words research, dont forget apostrophes.
python google-image-download.py 'flying laserbeam overkill kitty'
I think it's faster and easier for those non-coding.
Have a nice day.
If a keyword has more than one hyphen, it will sometimes throw "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)". It doesn't seem to do it with every example of this, but the ones that throw errors throw them every time. Log output attached.
I failed to run your code.
If I need to change the value of headers['User-Agent'] ?
name 'request' is not defined
[2018-01-17 06:11:18.374911] [ INFO ] ERROR : 'NoneType
' object has no attribute 'find'
elif cmd.startswith("image "):
sep = msg.text.split(" ")
search = msg.text.replace(sep[0] + " ","")
url = 'https://www.google.com/search?hl=en&biw=1366&bih=659&tbm=isch&sa=1&ei=vSD9WYimHMWHvQTg_53IDw&q=' + urllib.parse.quote(search)
raw_html = (download_page(url))
items = []
items = items + (_images_get_all_items(raw_html))
path = random.choice(items)
try:
start = timeit.timeit()
client.sendImageWithURL(to,path)
# wb1.sendMessage(msg.to, "「Google Image」\nType: Search Image\nTime taken: %s" % (start) +"\nTotal Image Links = "+str(len(items)))
except Exception as e:
client.sendMessage(to, str(e))
The first time I ran the parser I used the following.
memes = findMemes("meme",Meme_secondary_keywords,i)
Where "findMemes" is simply the parser wrapped in a method, Meme_secondary_keywords is a list of meme related terms like "greg,trump, and kobe" and i is superfluous
The second time I ran the parser I switched to this method
for word in variety_keywords:
Secondarykeyword = []
Secondarykeyword.append(word)
memes = findMemes(search_keyword,Secondarykeyword,i)
i = i + 1
In the above, i just acts as a tag that is added onto each unique results folder.
The results from the second method contained FAR more "meme-like" content. Reviewing the results from the first method, I noticed that after the first 50 or so results I was getting very few memes.
For example, the word "Trump" is the 11th element of the list, so it will run after 1,100 searches have been done. The first method of running the method did not return any trump memes, all of the results were just generic pictures of trump. The second method returned over 95 pictures which I would classify as memes
This is likely an idiosyncrasy of google's image search algorithm. I would hypothesize that as you get further from the original results page the scope of the results widens dramatically. I do not know if this problem can be fixed on your end, but I think it would be useful for end users to be aware of the second method. If I had ran my parse with the first method my results would be unusable, but the second method gave me very good data
See the master class in my memeClassifier for an example of this behavior https://github.com/dkennedy778/memeClassifier
Searching for memes with the for loop provides very good results, but searching with just the method and keyword is practically useless
I noticed that images weren't being directly downloaded to the output directory specified. A subdirectory is created in that output directory and the files are saved there. It's a hassle because I always have to move files up a directory. It would also be great to just name them 1.jpg, 2.jpg, etc... so that the file names are easy to sort by the order in which they are downloaded.
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/myQuery /1
Ex:
Starting Download...
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 220px-Polar_Bear_-Alaska%28cropped%29.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. edu-center-D00002331.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Medium_WW22786.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Original_WW215278.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. skynews-polar-bears-arctic_4130795.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Facts-about-polar-bears-5.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. ct_110915-23.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 162652-004-34A1BA10.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. polar-bear-help-604mk101612-604-337-7b6e7b90.rendition.598.336.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 6375873931_9046c0c779_b-1024x738.jpg'
Happening on Windows and Linux.
Hello! I noticed that there was a minor typo in the readme file: the "Compatability" section should say "Compatibility".
Hello!
Thank you for your great script. However sometimes requests are taking unlimited time. I wonder if it is possible to add a timeout to stop image requests.
req = Request( items[i], headers={"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"})
Is the download limit of 100 images set by google? because even if I change the limit inside the code I am unable to download >100 images. Is there a way to download >100 images?
is pagination possible?
In this script, the library 'urllib2' is imported directly in the code, which would cause error. I guess using an if structure and import urllib.request if the python version is python3 can solve this problem.
There are situations that we are interested in more than one specific file formats but not all formats. Currently the config file does not allow to have an array for "Format" attribute.
When running the example I get following error on ubuntu 17.10
google-images-download.py: error: unrecognized arguments: Polar bears, baloons, Beaches 20
Can we have the metadata of the images downloaded with the images itself?
I'm just trying to run your sample code, but on the import line I get an error saying command line arguments are required. I'm using Python 3.6.
Here's the code in the test file I created, testing_images_download.py:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload() #class instantiation
arguments = {"keywords":"Polar bears, balloons, Beaches", "limit":1, "print_urls":True} #creating list of arguments
response.download(arguments) #passing the arguments to the function
Here's the error:
/usr/local/bin/python3 testing_images_download.py
usage: testing_images_download.py [-h] [-k KEYWORDS] [-kf KEYWORDS_FROM_FILE]
[-sk SUFFIX_KEYWORDS] [-l LIMIT]
[-f {jpg,gif,png,bmp,svg,webp,ico}] [-u URL]
[-x SINGLE_IMAGE] [-o OUTPUT_DIRECTORY]
[-d DELAY]
[-c {red,orange,yellow,green,teal,blue,purple,pink,white,gray,black,brown}]
[-ct {full-color,black-and-white,transparent}]
[-r {labled-for-reuse-with-modifications,labled-for-reuse,labled-for-noncommercial-reuse-with-modification,labled-for-nocommercial-reuse}]
[-s {large,medium,icon,>400*300,>640*480,>800*600,>1024*768,>2MP,>4MP,>6MP,>8MP,>10MP,>12MP,>15MP,>20MP,>40MP,>70MP}]
[-t {face,photo,clip-art,line-drawing,animated}]
[-w {past-24-hours,past-7-days}]
[-wr TIME_RANGE]
[-a {tall,square,wide,panoramic}]
[-si SIMILAR_IMAGES] [-ss SPECIFIC_SITE]
[-p] [-ps] [-m] [-e] [-st SOCKET_TIMEOUT]
[-th]
[-la {Arabic,Chinese Simplified),Chinese (Traditional,Czech,Danish,Dutch,English,Estonian,Finnish,French,German,Greek,Hebrew,Hungarian,Icelandic,Italian,Japanese,Korean,Latvian,Lithuanian,Norwegian,Portuguese,Polish,Romanian,Russian,Spanish,Swedish,Turkish}]
[-pr PREFIX] [-px PROXY]
testing_images_download.py: error: Keywords is a required argument!
Process finished with exit code 2
There is an option for google image search size called "Exactly" where you can input a custom resolution like width = 3440 height = 1440 or '3440x1440'. This would be really handy to get exact resolution images / aspect ratio control. In the same vein a custom aspect ratio like "21:9" would be nice or a better explanation of panoramic vs wide.
If I want to filter only the images that are "Labeled for Reuse", then, how do I specify it in the code? (tools -> usage rights -> labeled for reuse).
I installed by pip3 and while running an "SI" search it ouputs this:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'TrainFiles\2018-03-03 22:21:01'
OS: osx 10.12.6
Python version: 3.6.5
Issue steps:
When running CLI with -l or --limit option specified, the following error is returned.
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine
executing without -l works fine.
Attempted to use json input but still returning same error. json file contents below:
{ "Records": [ {"keywords": "tops","limit": 1000}, {"keywords": "jacket","limit": 1000} ] }
Would it be possible to modify this so that we can use a txt or csv with a list of keywords? I have 400+ keywords to scrape images for and so manually doing it would take a while!
Excellent project, but how to visit google in china?
Great API, however when using something like:
arguments = {
"keywords": "red",
"limit": imgs,
"color": "red",
"print_urls": True,
"format" : "jpg"
}
response.download(arguments)
and setting imgs
over 100,like 120
I am getting the following error:
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine
An exception has occurred, use %tb to see the full traceback.
Using Python 3.6 with Spyder
Hey,
I am always having an IOError, can you help me out?
Hi,
I am trying to download images. I determined limit as 10000. However, downloading stopped near 600th image. I used a config file as below:
But, the result on command line is below:
Unfortunately all 10000 could not be downloaded because some images were not downloadable. 646 is all we got for this search filter!
What about making some UI elements on the current version of google-images-download? Is it planned for later versions?
It would be great if an enthusiast can add some tests (in the /tests directory) or add some tutorials (in the /docs or /tutorials directory) on different ways in which you have used and integrated this project into your application.
Urllib2 is deprecated and urllib.request.Request doesn´t exist.
Thank you.
I follow the README command
python google-images-download.py -si "https://storage.googleapis.com/zopnow-static/images/products/320/fresh-apple-red-delicious-v-500-g.png" -l 10
I got some error :
Traceback (most recent call last):
File "google-images-download.py", line 393, in
errorCount = bulk_download(search_keyword,suffix_keywords,limit,main_directory,delay_time)
NameError: name 'search_keyword' is not defined
Hello, loving this program - so useful, thanks for creating it!
I am having trouble setting it up to >100 issues. I installed via CLI and ran setup.py, so am assuming based on your instructions that Selenium is installed. I also followed the geckodriver instructions and that seems to have been successful. I'm running Linux Mint. Any support would be much appreciated!
$ googleimagesdownload -k "Siamese cat, Domestic cat" -l 199 -o "cat images" -f jpg
Item no.: 1 --> Item name = Siamese cat
Evaluating...
Starting Download...
Unfortunately all 199 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Item no.: 2 --> Item name = Domestic cat
Evaluating...
Starting Download...
Unfortunately all 199 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Everything downloaded!
Total Errors: 0
Total time taken: 0.14542031288146973 Seconds
Traceback (most recent call last):
File "/usr/bin/googleimagesdownload", line 11, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/google_images_download/init.py", line 2, in main
import google_images_download.google_images_download
ImportError: No module named google_images_download
Thank you!
Is there any way to download 1000 images?
'related_images' and 'prefix_keywords' do not exit in args_list in file google_images_download.py when installed by $ pip install google_images_download
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.