xkumiyu / imagenet-downloader Goto Github PK
View Code? Open in Web Editor NEWDownloader from ImageNet Image URLs
Home Page: http://www.kumilog.net/entry/imagenet-download
License: MIT License
Downloader from ImageNet Image URLs
Home Page: http://www.kumilog.net/entry/imagenet-download
License: MIT License
Is it possible to share these files somewhere like a google driver?
Without these files, this repo is useless.
When I try:
wget http://image-net.org/imagenet_data/urls/imagenet_fall11_urls.tgz
I get
--2020-07-01 08:27:14-- http://image-net.org/imagenet_data/urls/imagenet_fall11_urls.tgz
Resolving image-net.org (image-net.org)... 171.64.68.16
Connecting to image-net.org (image-net.org)|171.64.68.16|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-07-01 08:27:14 ERROR 404: Not Found.
Also, when I go to page http://image-net.org/download-imageurls and try clicking on the links, it says that URL is not valid ...
Not saying that there is an error in this script, but just that it looks like it cannot be used anymore?
Hi, this is very helpful now that the imagenet website seems to be down. It's been several months and they still haven't granted me direct download access. So, URL's are the way to go.
While the suggested downloading script works well, I found that opening so many wget requests crashed my home internet connection. Receiving skyrockets to the cap speed, and I get through about 3GB of download before the connection dies. Instead, I use gnu-parallel
to limit the number of concurrent downloads. The difference is in download.sh to keep the requests in foreground, and to strip double quotes from urllists.txt, then run this command:
download.sh
:
#!/bin/sh
if [ $# -ne 2 ]; then
exit 1
fi
# original line
# wget $2 -O $1 -T 1 -t 5 -nc -b -a wget.log
# new line
wget $2 -O $1 -T 1 -t 5 -nc
sed 's/\"//g' list/urllist.txt > list/urllist_noquote.txt
cat list/urllists_noquote.txt | parallel --jobs 12 --colsep ' ' ./download.sh {1} {2}
It's slower, yes, but for people on a limited connection this way lets you keep working during the download :)
ubuntu@ip-10-0-175-243:/imagenet-downloader$ python3 gen_urls.py/imagenet-downloader$ ls
Traceback (most recent call last):
File "gen_urls.py", line 73, in
main()
File "gen_urls.py", line 50, in main
df = get_categories(args.words, args.categories)
File "gen_urls.py", line 12, in get_categories
all_list = pd.read_csv(all_list_file, header=None, delimiter='\t')
File "/home/ubuntu/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 705, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 445, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 814, in init
self._make_engine(self.engine)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1045, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1684, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 391, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 710, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'list/words.txt' does not exist
ubuntu@ip-10-0-175-243:
download.sh gen_list.py gen_urls.py README.md requirements.txt resize_images.py
After running gen_urls, the clist.csv file contains class ids from 0 to 998. It seems as if there's one class that is not present in the file to round it up to 1000 classes. Is this a normal behaviour?
Thanks
Several images are still visible but return NoneType when read
Ex: 2711
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.