GithubHelp home page GithubHelp logo

dreadlord1984 / google-landmark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cvdfoundation/google-landmark

0.0 2.0 0.0 10 KB

Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.

Shell 100.00%

google-landmark's Introduction

Google Landmarks Dataset v2

This is the second version of the Google Landmarks dataset, which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset is presented in our Google AI blog post.

This dataset is associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results will be discussed as part of a CVPR'19 workshop. Please visit the Kaggle challenge webpages for more detail on the data.

For reference, the previous version of the Google Landmarks dataset is available here.

Download train set

There are 4,132,914 images in the train set.

Download the labels and metadata

Downloading the data

The train set is split into 500 TAR files (each of size ~1GB) containing JPG-encoded images. The files are located in the train/ directory, and are named images_000.tar, images_001.tar, ..., images_499.tar. To download them, access the following link:

https://s3.amazonaws.com/google-landmark/train/images_000.tar

And similarly for the other files.

Using the provided script

mkdir train && cd train
bash ../download-dataset.sh train 499

This will automatically download, verify and extract the images to the train directory.

Note: This script downloads files in parallel. To adjust the number of parallel downloads, modify NUM_PROC in the script.

train image licenses

All images in the train set have CC-BY licenses without the NonDerivs (ND) restriction. To verify the license for a particular image, please refer to train_attribution.csv.

Download index set

There are 761,757 images in the index set.

Download the list of images

Downloading the data

The index set is split into 100 TAR files (each of size ~850MB) containing JPG-encoded images. The files are located in the index/ directory, and are named images_000.tar, images_001.tar, ..., images_099.tar. To download them, access the following link:

https://s3.amazonaws.com/google-landmark/index/images_000.tar

And similarly for the other files.

Using the provided script

mkdir index && cd index
bash ../download-dataset.sh index 99

This will automatically download, verify and extract the images to the index directory.

Note: This script downloads files in parallel. To adjust the number of parallel downloads, modify NUM_PROC in the script.

index image licenses

All images in the index set have CC-0 or Public Domain licenses.

Download test set

There are 117,577 images in the test set.

Download the list of images

Downloading the data

The test set is split into 20 TAR files (each of size ~500MB) containing JPG-encoded images. The files are located in the test/ directory, and are named images_000.tar, images_001.tar, ..., images_019.tar. To download them, access the following link:

https://s3.amazonaws.com/google-landmark/test/images_000.tar

And similarly for the other files.

Using the provided script

mkdir test && cd test
bash ../download-dataset.sh test 19

This will automatically download, verify and extract the images to the test directory.

Note: This script downloads files in parallel. To adjust the number of parallel downloads, modify NUM_PROC in the script.

test image licenses

All images in the test set have CC-0 or Public Domain licenses.

Checking the download

We also make available md5sum files for checking the integrity of the downloaded files. Each md5sum file corresponds to one of the TAR files mentioned above; they are located in the md5sum/index/, md5sum/test/ and md5sum/train/ directories, with file names md5.images_000.txt, md5.images_001.txt, etc. For example, the md5sum file corresponding to the images_000.tar file in the index set can be found via the following link:

https://s3.amazonaws.com/google-landmark/md5sum/index/md5.images_000.txt

And similarly for the other files.

If you use the provided download-dataset.sh script, the integrity of the files is already checked right after download.

Extracting the data

We recommend that the set of TAR files corresponding to each dataset split be extracted into a directory per split; ie, the index TARs extracted into an index directory; train TARs extracted into a train directory; etc. The directory structure of the image data is as follows: Each image is stored in a directory ${a}/${b}/${c}/${id}.jpg, where ${a}, ${b} and ${c} are the first three letters of the image id, and ${id} is the image id found in train.csv. For example, an image with the id 0123456789abcdef would be stored in 0/1/2/0123456789abcdef.jpg.

Dataset licenses

The annotations are licensed by Google under CC BY 4.0 license. The images listed in this dataset are publicly available on the web, and may have different licenses. Google does not own their copyright. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

google-landmark's People

Contributors

andrefaraujo avatar dbishai avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.