GithubHelp home page GithubHelp logo

curator-unlabeled-image-search-guide's Introduction

Curator, the guide 🌎

This is a guide for SpaceML’s machine learning pipeline that has seven components which are summarized below. Each program serves a different role in the pipeline from downloading satellite images and labeling images to training a machine learning model, improving an existing model and doing image similarity search. These programs can be used altogether but you can also utilize just one of them or a few of them according to your needs. Throughout this guide, we will showcase a few ways to combine this pipeline.

 

Program description & guide

A tool for downloading Earth images. You can download NASA satellite imagery of certain areas and certain time periods that you designate. It is useful to build an Earth image dataset.

Self-supervised learning program for training a machine learning model with fewer labeled data. You can train an encoder with unlabeled data and train a classifier with less labeled data compared to supervised learning.

Reverse image search app. Once you have a dataset and a model trained on the dataset, Image Similarity Search can calculate similarities between images in the dataset and show you similar images within the dataset to an image you pick. This can be used for a sanity check to make sure your model is trained well.

‘Image Similarity Search’ app works well with up to 3 million images. For the scalable image similarity search with bigger dataset, we used Index & Search (GCP), which utilizes Google Cloud Platform. To begin with, we saved the dataset and model we got from GIBS Downloader and Self-Supervised Learner on Google Cloud Storage Bucket. Then we had ①Index API and ②Search API. With Index API, we generated embeddings, an indexer file and a metadata file in Google Compute Engine VM. NVIDA DALI and FAISS were used to make the process more efficient. Then we deployed the Search API, which was built using FastAPI for minimal latency, to Google App Engine for the live image similarity search. Google Cloud Functions helped with easy and smooth usage of GCP throughout the process. To get a glimpse of how Index API works, check out this sample notebook

GUI based image labeling program. You can easily label images by swiping right/left, clicking accept/reject, or pressing the right/left arrow key on the keyboard. Multiple people can use Swipe Labeler at the same time without overwriting labels so you can enjoy speedy labeling with your teammates.

A program designed to better your model in an efficient manner. Once you have a trained model, Active Labeler will pick out images that the model has the most difficulty with. Then you’ll label those images through Swipe Labeler and retrain the model with the newly labeled images so that the model can overcome its weakness.

A chrome extension for finding similar images in the NASA Worldview website. Take a snapshot of a particular scene in a satellite image on the website. Then our extension will show you similar satellite images to the chosen image.

 

Combination guide

 

Required dataset format

Self-Supervised Learner, Image Similarity Search, Index & Search (GCP) and Active Labeler require a dataset to be organized in PyTorch ImageFolder format like this:

/Dataset
    /Class 1
        Image1.png
        Image2.png
    /Class 2
        Image3.png
        Image4.png

UC Merced Land Use dataset, which is used in some of our guide notebooks, is a good example:

/UCMerced_LandUse
    /Images
        /agricultural
            agricultural00.tif
            agricultural01.tif
            ...
        /airplane
            airplane00.tif
            airplane01.tif
            ...
        /...

In case there are no labels, you can organize images like this:

/Dataset
    /Unlabelled
        Image1.png
        Image2.png
        Image3.png

Citation

If you find Curator useful in your research, please consider citing the github code for this tool:

@code{
  title={Curator: A No-Code, Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery,
},
  url={https://github.com/spaceml-org/Curator-Unlabeled-Image-Search-Guide},
  year={2021}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.