hardikvasa / google-images-download Goto Github PK

Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

License: MIT License

Python 100.00%

python python-script image-download google-images image-processing color-filter image-dataset image-database image-search image-scraper

google-images-download's People

Contributors

Stargazers

Watchers

Forkers

xhebyz codescholar dharahas10 orangepole pandahisham sasajib auromun tarcia h005 mard0na longlong-jing david-valentin sirsamuel2017 abbychau gregorysimoes walkoncross afrog33k stefanopini ivanfeliciano yassermustafa sleepless-se alconblue sunnysidesounds adarshsanjeev rytoj ohhdemgirls andzlw rachmadaniharyono shervinardeshir vvkv daedalus inakinavarro hsab neelchoudhury bobquest33 setebr charuchhimpa bihackers andreadean jtn-ms 4thfever coderx7 ismailtzn donald0iapps anumpatel datnguyenquy94 li-zhou sytta ch3rolll web2webs jundongq huai-ming jaunter rahgarg rjulian volkriot haribaskar hassanamr gexiaojiang xandriaw will-i-amor jafferwilson ramaswamym1987 mhugi aravindc leostratus rakis10 ovuvanchuyen gititagain felipepedroso mack1070101 mescoba1 vandanavk owsas lavieenroux ddthanhdat manjwu meshubhangi kiranparakkalkalam neurorobotictech enovale divyansh-sharma ylxqll metadr0me quintendewilde chunhowbeh nick-monto sunnyg1210 palethorpe bbbeeeee spagettnet tvkpz bhushanrathod folkestad rushikesh988 dellielo ckaavya maerad7 guillermomuntaner davidlanz

google-images-download's Issues

ResourceWarning: unclosed file

I've had several downloaded images where the file is not closed properly and I get a ResourceWarning. I checked and in the download_image function there is no exception handling for opening and writing to the file. I don't even see the output file being closed.

path = main_directory + "/" + dir_name + "/" + prefix + str(count) + ". " + image_name output_file = open(path, 'wb') data = response.read() output_file.write(data) response.close()

The same issue is is the single_image function.

I'm less familiar with urlopen, but does it also need some exception handling so that the resource is closed if need be?

How to download more images?

@hardikvasa
Hi Hardikvasa, thanks for your work. I have a question about downloading the images. What can I do if I want to download more images ? Specifically, What should I do if I want to download all the images under the keywork "Taj Mahal" ? Thanks for your help.

Any idea why the script is damn slow?

I am trying to play with your script and found it damn slow. Even the code doesn't work with Python 3.5 only.
Any idea how I can make the script work faster, please let me know

for windows

For windows the command
<cd google-images-download && sudo python setup.py install> gives the error <``bash: sudo: coomand not fine>

typo in your disclaimer

Re the line your disclaimer
"Please do not download any image without violating its copyright terms. "
you probably meant it the other way around. ;)

works great, thanks

Nice code, would make a good API

If you were to change google_images_download.py to be a class rather than a set of functions and main caller, you could then allow it to be used as an API from other python scripts. That way it becomes a very powerful tool.

This is more of a feature request, not a bug.

search by url

Dear all, I would like to search for images using the search by image. I think that this accomplished by providing the (-u + url ) in the CLI but I have problems .

Originallly, the problem was with the name of the file and of the directory where the download has to occur. Specifically, Windows do not accept name of directories and file with the ":" character, so I changed both image_name with image_name.replace(':',"x") and dir_name with the same .replace function in the google_images_download.py file.

Unfortunately, the correction was not sufficient because if I run the following CLI :

py google_images_download.py -k "adidas" -u "https://www.adidas.it/dis/dw/image/v2/aagl_prd/on/demandware.static/-/Sites-adidas-products/default/dwc78be248/zoom/C77124_01_standard.jpg?sh=600&strip=false&sw=600" -l 20

I receive the following error msg: "***** This search result did not return any results...please try a different search filter *****"

Which is quite strange because the search works correctly if I simply eliminate the "- u image_url" and I simply run from the CLI:

py google_images_download.py -k "adidas" -l 20

Did someone encounter this problem before and solved it? Is there a way to upload the image from a jpg or png file on your local directory instead of providing the web address of the same image?

Thanks

PS I run Python 3.6 and Windows 10

Getting list of image names/file paths in response to the download method call

Maybe I just haven't seen how to do this yet, but when using this library in a Python script, it's not obvious how to get the information about the downloaded file from the googleimagesdownload object. It would be nice to have a dictionary of the relevant information about the downloaded image (i.e. basically the meta data that is printed out). At a minimum, it would be helpful to have the filenames of the downloaded images. The size would also be nice. Maybe this already exists, but I'm not sure how to get it.

Do you know why only 100 images is the limit? Can we have more than 100?

`downloads/` Should not be committed to git.

Every time I run the script, I have the downloads/ folder not committed to git nor it is present in .gitignore

[~/mysites/2018/google-images-download]$ gst                                                                                                                              *[master]
On branch master
Your branch is up-to-date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	downloads/

Even after installation (through manual CLI) `googleimagesdownload` command not found.

Installation

[~/mysites/2018/google-images-download]$ sudo python3 setup.py install                                                                                                     [master]
Password:
running install
running bdist_egg
running egg_info
writing google_images_download.egg-info/PKG-INFO
writing dependency_links to google_images_download.egg-info/dependency_links.txt
writing entry points to google_images_download.egg-info/entry_points.txt
writing requirements to google_images_download.egg-info/requires.txt
writing top-level names to google_images_download.egg-info/top_level.txt
reading manifest file 'google_images_download.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.md'
writing manifest file 'google_images_download.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-x86_64/egg
running install_lib
running build_py
creating build/bdist.macosx-10.9-x86_64/egg
creating build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/__init__.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/__main__.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
copying build/lib/google_images_download/google_images_download.py -> build/bdist.macosx-10.9-x86_64/egg/google_images_download
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/__main__.py to __main__.cpython-36.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/google_images_download/google_images_download.py to google_images_download.cpython-36.pyc
creating build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/PKG-INFO -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/entry_points.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/requires.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying google_images_download.egg-info/top_level.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/google_images_download-2.0.4-py3.6.egg' and adding 'build/bdist.macosx-10.9-x86_64/egg' to it
removing 'build/bdist.macosx-10.9-x86_64/egg' (and everything under it)
Processing google_images_download-2.0.4-py3.6.egg
Removing /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google_images_download-2.0.4-py3.6.egg
Copying google_images_download-2.0.4-py3.6.egg to /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
google-images-download 2.0.4 is already the active version in easy-install.pth
Installing googleimagesdownload script to /Library/Frameworks/Python.framework/Versions/3.6/bin

Installed /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google_images_download-2.0.4-py3.6.egg
Processing dependencies for google-images-download==2.0.4
Searching for selenium==3.11.0
Best match: selenium 3.11.0
Processing selenium-3.11.0-py3.6.egg
selenium 3.11.0 is already the active version in easy-install.pth

Using /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium-3.11.0-py3.6.egg
Finished processing dependencies for google-images-download==2.0.4

Trying out

[~/mysites/2018/google-images-download]$ googleimagesdownload                                                                                                              [master]
zsh: command not found: googleimagesdownload
[~/mysites/2018/google-images-download]$ googleimagesdownload --version                                                                                                    [master]
zsh: command not found: googleimagesdownload

Error message when no arguments provided

Error message is shown instead of the help text when no arguments are provided

Traceback (most recent call last):
  File "/usr/local/bin/google_images_download.py", line 793, in <module>
    main()
  File "/usr/local/bin/google_images_download.py", line 785, in main
    response.download(arguments)
  File "/usr/local/bin/google_images_download.py", line 707, in download
    raise ValueError('Keywords is a required argument!')
ValueError: Keywords is a required argument!

keywords_from_file throws error

I am unable to get the keywords_from_file option to work. I get the message "OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\Users\Stell\Desktop\CurrentWorkfile\Data\Dorman\GID\Dorman 00588\r'".

If I specify keywords in the command instead of using the -kf option, it works fine. I've attached both my keyword file and the console output.

error.txt
output.txt

Ssl Error

while trying to download 100 images found by searching "pokemon go para ios" I get the following error:
ssl.CertificateError: hostname 'assets.phonedog.com' doesn't match either of 'cloudfront.net', '*.cloudfront.net'
I'm using the python 3.6 fork.
Is ther anyway to bypass this??

Missing Shebang

First line of google_images_download.py is

# In[ ]:

instead of

#!/usr/bin/env python

Using The Script Without Having To Modify It

Hello,

Thanks a lot for your work,
I modified it a bit and had to share this with you,

I simply changed the 'Edit From Here' section to this

########### Edit From Here ###########
search_keyword = [sys.argv[1]]
keywords = ['']

Which allows to run the programm from terminal only
Exemple.

python google-image-download.py kitty

For multiple words research, dont forget apostrophes.

python google-image-download.py 'flying laserbeam overkill kitty'

I think it's faster and easier for those non-coding.
Have a nice day.

More than one hyphen in keyword causes failure

If a keyword has more than one hyphen, it will sometimes throw "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)". It doesn't seem to do it with every example of this, but the ones that throw errors throw them every time. Log output attached.

error2.txt

headers['User-Agent']

I failed to run your code.
If I need to change the value of headers['User-Agent'] ?

name 'request' is not defined

name 'request' is not defined
[2018-01-17 06:11:18.374911] [ INFO ] ERROR : 'NoneType
' object has no attribute 'find'

elif cmd.startswith("image "):
sep = msg.text.split(" ")
search = msg.text.replace(sep[0] + " ","")
url = 'https://www.google.com/search?hl=en&biw=1366&bih=659&tbm=isch&sa=1&ei=vSD9WYimHMWHvQTg_53IDw&q=' + urllib.parse.quote(search)
raw_html = (download_page(url))
items = []
items = items + (_images_get_all_items(raw_html))
path = random.choice(items)
try:
start = timeit.timeit()
client.sendImageWithURL(to,path)
# wb1.sendMessage(msg.to, "「Google Image」\nType: Search Image\nTime taken: %s" % (start) +"\nTotal Image Links = "+str(len(items)))
except Exception as e:
client.sendMessage(to, str(e))

del

Running the parser with one alternative keyword provides much more accurate results than a list of keywords.

The first time I ran the parser I used the following.

memes = findMemes("meme",Meme_secondary_keywords,i)

Where "findMemes" is simply the parser wrapped in a method, Meme_secondary_keywords is a list of meme related terms like "greg,trump, and kobe" and i is superfluous

The second time I ran the parser I switched to this method

for word in variety_keywords:
Secondarykeyword = []
Secondarykeyword.append(word)
memes = findMemes(search_keyword,Secondarykeyword,i)
i = i + 1

In the above, i just acts as a tag that is added onto each unique results folder.

The results from the second method contained FAR more "meme-like" content. Reviewing the results from the first method, I noticed that after the first 50 or so results I was getting very few memes.

For example, the word "Trump" is the 11th element of the list, so it will run after 1,100 searches have been done. The first method of running the method did not return any trump memes, all of the results were just generic pictures of trump. The second method returned over 95 pictures which I would classify as memes

This is likely an idiosyncrasy of google's image search algorithm. I would hypothesize that as you get further from the original results page the scope of the results widens dramatically. I do not know if this problem can be fixed on your end, but I think it would be useful for end users to be aware of the second method. If I had ran my parse with the first method my results would be unusable, but the second method gave me very good data

See the master class in my memeClassifier for an example of this behavior https://github.com/dkennedy778/memeClassifier

Searching for memes with the for loop provides very good results, but searching with just the method and keyword is practically useless

Images aren't actually saved to the output directory

I noticed that images weren't being directly downloaded to the output directory specified. A subdirectory is created in that output directory and the files are saved there. It's a hassle because I always have to move files up a directory. It would also be great to just name them 1.jpg, 2.jpg, etc... so that the file names are easy to sort by the order in which they are downloaded.

Error: [Errno 2] No such file or directory

IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/myQuery /1

Ex:
Starting Download...
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 220px-Polar_Bear_-Alaska%28cropped%29.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. edu-center-D00002331.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Medium_WW22786.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Original_WW215278.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. skynews-polar-bears-arctic_4130795.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. Facts-about-polar-bears-5.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. ct_110915-23.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 162652-004-34A1BA10.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. polar-bear-help-604mk101612-604-337-7b6e7b90.rendition.598.336.jpg'
IOError on an image...trying next one... Error: [Errno 2] No such file or directory: 'downloads/Polar bears /1. 6375873931_9046c0c779_b-1024x738.jpg'

Happening on Windows and Linux.

Typo in README

Hello! I noticed that there was a minor typo in the readme file: the "Compatability" section should say "Compatibility".

Request timeout

Hello!

Thank you for your great script. However sometimes requests are taking unlimited time. I wonder if it is possible to add a timeout to stop image requests.

req = Request( items[i], headers={"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"})

Download limit

Is the download limit of 100 images set by google? because even if I change the limit inside the code I am unable to download >100 images. Is there a way to download >100 images?

Pagination

is pagination possible?

urllib2 is removed from python3

In this script, the library 'urllib2' is imported directly in the code, which would cause error. I guess using an if structure and import urllib.request if the python version is python3 can solve this problem.

allowing multiple values (Array) for file format argument

There are situations that we are interested in more than one specific file formats but not all formats. Currently the config file does not allow to have an array for "Format" attribute.

unrecongized arguments

When running the example I get following error on ubuntu 17.10

google-images-download.py: error: unrecognized arguments: Polar bears, baloons, Beaches 20

Image Metadata

Can we have the metadata of the images downloaded with the images itself?

Importing the package requires command line arguments

I'm just trying to run your sample code, but on the import line I get an error saying command line arguments are required. I'm using Python 3.6.

Here's the code in the test file I created, testing_images_download.py:

from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()   #class instantiation
arguments = {"keywords":"Polar bears, balloons, Beaches", "limit":1, "print_urls":True}   #creating list of arguments
response.download(arguments)   #passing the arguments to the function

Here's the error:

/usr/local/bin/python3 testing_images_download.py
usage: testing_images_download.py [-h] [-k KEYWORDS] [-kf KEYWORDS_FROM_FILE]
                                  [-sk SUFFIX_KEYWORDS] [-l LIMIT]
                                  [-f {jpg,gif,png,bmp,svg,webp,ico}] [-u URL]
                                  [-x SINGLE_IMAGE] [-o OUTPUT_DIRECTORY]
                                  [-d DELAY]
                                  [-c {red,orange,yellow,green,teal,blue,purple,pink,white,gray,black,brown}]
                                  [-ct {full-color,black-and-white,transparent}]
                                  [-r {labled-for-reuse-with-modifications,labled-for-reuse,labled-for-noncommercial-reuse-with-modification,labled-for-nocommercial-reuse}]
                                  [-s {large,medium,icon,>400*300,>640*480,>800*600,>1024*768,>2MP,>4MP,>6MP,>8MP,>10MP,>12MP,>15MP,>20MP,>40MP,>70MP}]
                                  [-t {face,photo,clip-art,line-drawing,animated}]
                                  [-w {past-24-hours,past-7-days}]
                                  [-wr TIME_RANGE]
                                  [-a {tall,square,wide,panoramic}]
                                  [-si SIMILAR_IMAGES] [-ss SPECIFIC_SITE]
                                  [-p] [-ps] [-m] [-e] [-st SOCKET_TIMEOUT]
                                  [-th]
                                  [-la {Arabic,Chinese Simplified),Chinese (Traditional,Czech,Danish,Dutch,English,Estonian,Finnish,French,German,Greek,Hebrew,Hungarian,Icelandic,Italian,Japanese,Korean,Latvian,Lithuanian,Norwegian,Portuguese,Polish,Romanian,Russian,Spanish,Swedish,Turkish}]
                                  [-pr PREFIX] [-px PROXY]
testing_images_download.py: error: Keywords is a required argument!

Process finished with exit code 2

Custom Resolution Option

There is an option for google image search size called "Exactly" where you can input a custom resolution like width = 3440 height = 1440 or '3440x1440'. This would be really handy to get exact resolution images / aspect ratio control. In the same vein a custom aspect ratio like "21:9" would be nice or a better explanation of panoramic vs wide.

How to add further specifications?

If I want to filter only the images that are "Labeled for Reuse", then, how do I specify it in the code? (tools -> usage rights -> labeled for reuse).

OSError directory name is invalid

I installed by pip3 and while running an "SI" search it ouputs this:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'TrainFiles\2018-03-03 22:21:01'

Looks like we cannot locate the path the 'chromedriver' if limit is included

OS: osx 10.12.6
Python version: 3.6.5
Issue steps:
When running CLI with -l or --limit option specified, the following error is returned.
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine
executing without -l works fine.

Attempted to use json input but still returning same error. json file contents below:
{ "Records": [ {"keywords": "tops","limit": 1000}, {"keywords": "jacket","limit": 1000} ] }

Use keyword file instead

Would it be possible to modify this so that we can use a txt or csv with a list of keywords? I have 400+ keywords to scrape images for and so manually doing it would take a while!

Excellent project, but how to visit google in china?

Limit over 100 won't work

Great API, however when using something like:

arguments =         {
            "keywords": "red",
            "limit": imgs,
            "color": "red",
            "print_urls": True,
            "format" : "jpg"
        }


response.download(arguments)

and setting imgs over 100,like 120

I am getting the following error:

Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine
An exception has occurred, use %tb to see the full traceback.

Using Python 3.6 with Spyder

IOError

Hey,
I am always having an IOError, can you help me out?

Limit isn't working

Hi,
I am trying to download images. I determined limit as 10000. However, downloading stopped near 600th image. I used a config file as below:

But, the result on command line is below:
Unfortunately all 10000 could not be downloaded because some images were not downloadable. 646 is all we got for this search filter!

UI Feature Request

What about making some UI elements on the current version of google-images-download? Is it planned for later versions?

Adding some tests and/or tutorials

It would be great if an enthusiast can add some tests (in the /tests directory) or add some tutorials (in the /docs or /tutorials directory) on different ways in which you have used and integrated this project into your application.

Won't work for Python 3.6

Urllib2 is deprecated and urllib.request.Request doesn´t exist.
Thank you.

about Reverse Image Search

I follow the README command

python google-images-download.py -si "https://storage.googleapis.com/zopnow-static/images/products/320/fresh-apple-red-delicious-v-500-g.png" -l 10

I got some error :
Traceback (most recent call last):
File "google-images-download.py", line 393, in
errorCount = bulk_download(search_keyword,suffix_keywords,limit,main_directory,delay_time)
NameError: name 'search_keyword' is not defined

>100 images on linux OS

Hello, loving this program - so useful, thanks for creating it!

I am having trouble setting it up to >100 issues. I installed via CLI and ran setup.py, so am assuming based on your instructions that Selenium is installed. I also followed the geckodriver instructions and that seems to have been successful. I'm running Linux Mint. Any support would be much appreciated!

$ googleimagesdownload -k "Siamese cat, Domestic cat" -l 199 -o "cat images" -f jpg

Item no.: 1 --> Item name = Siamese cat
Evaluating...
Starting Download...


Unfortunately all 199 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

Item no.: 2 --> Item name =  Domestic cat
Evaluating...
Starting Download...


Unfortunately all 199 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

Everything downloaded!
Total Errors: 0

Total time taken: 0.14542031288146973 Seconds

After finish downloading, it throws the following error all the time, which stales my script. Could you please share some suggestions?

Traceback (most recent call last):
File "/usr/bin/googleimagesdownload", line 11, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/google_images_download/init.py", line 2, in main
import google_images_download.google_images_download
ImportError: No module named google_images_download

Thank you!

Improper file format naming.

All images are treated as a jpg, if a non-jpg file is downloaded, it will be given the wrong file format.

Download more than 100 images

Is there any way to download 1000 images?

'related_images' and 'prefix_keywords' do not exit in args_list

'related_images' and 'prefix_keywords' do not exit in args_list in file google_images_download.py when installed by $ pip install google_images_download

hardikvasa / google-images-download Goto Github PK

google-images-download's People

Contributors

Stargazers

Watchers

Forkers

google-images-download's Issues

All images are treated as a jpg, if a non-jpg file is downloaded, it will be given the wrong file format.

Recommend Projects

Recommend Topics

Recommend Org

Jobs