GithubHelp home page GithubHelp logo

jeremyjordan / flower-classifier Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 2.0 429 KB

A simple image classifier for flowers.

Home Page: https://share.streamlit.io/jeremyjordan/flower-classifier/app.py

License: MIT License

Python 96.50% Jupyter Notebook 1.99% Makefile 1.50%

flower-classifier's Introduction

flower-classifier

Tests Streamlit App

Authors: Jeremy Jordan and John Huffman

John and I were walking through a garden one day and kept pointing out flowers that we thought looked cool. The only problem was... we didn't know the names of any of the flowers! As machine learning engineers, our first thought was "let's build an image classifier" and this project was born.

Passion Flower

Getting started

  1. Spin up a Colab notebook.
  2. Install colabcode.
  3. Start the code server.
from colabcode import ColabCode
ColabCode(port=10000, mount_drive=True)
  1. Go to the ngrok link provided.
  2. Clone the repo.
git clone https://github.com/jeremyjordan/flower-classifier.git
  1. Run make colab to set up the project on your Colab instance (or run make init if running locally).
  2. Start a training job by running train, optionally providing configuration options.
    • eg. If you want to do a quick check, you can run train trainer=smoke_test

Training a model

You can initiate a training job from the command line using the train script. We use Hydra to manage configuration of the job, which allows us to compose multiple separate config files for various parts of the system into a hierarchical structure. You can see all of the available config groups in the conf/ directory. We specify the default values to use in conf/config.yaml, but these can be overwritten using Hydra's override syntax.

Examples:

Start training job using default values

train

Override a configuration group (referencing non-default config files)

train trainer=smoke_test dataset=random

Override specific values in a config file (+ appends a new value, ~ deletes a value from the config)

train model.architecture=resnest200e +model.extra_arg=example ~model.dropout_rate

Realistic example

train model.architecture=efficientnet_b3 \
    model.dropout_rate=0.35 \
    optimizer.lr=0.01 \
    dataset.batch_size=64 \
    optimizer=sgd \
    lr_scheduler=onecycle \
    trainer.max_epochs=50 \
    dataset=folder

You can run train --help to view available configuration options.

Contributing

In order to commit code from a Colab machine, you'll need to do the following:

  1. Make sure you have an Github auth token (https://github.com/settings/tokens)
  2. Configure the git settings on the machine
git config --global user.name "Jeremy Jordan"
git config --global user.email ""
gh auth login --with-token <<< INSERT_TOKEN_HERE

Note: make sure you've ran make colab before setting this up.

flower-classifier's People

Contributors

huffmanjohnf avatar jeremyjordan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

flower-classifier's Issues

[BUG] model checkpointing isn't saving weights

Describe the bug
We get the following error at the end of our first epoch.

Traceback (most recent call last):
  File "flower_classifier/models/train.py", line 67, in <module>
    trainer.fit(model, datamodule=data_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit
    results = self.accelerator_backend.train(model)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train
    results = self.trainer.run_pretrain_routine(model)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1239, in run_pretrain_routine
    self.train()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
    self.run_training_epoch()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py", line 516, in run_training_epoch
    self.run_evaluation(test_mode=False)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 603, in run_evaluation
    self.on_validation_end()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/callback_hook.py", line 176, in on_validation_end
    callback.on_validation_end(self, self.get_model())
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py", line 27, in wrapped_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 380, in on_validation_end
    self._do_check_save(filepath, current, epoch, trainer, pl_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 421, in _do_check_save
    self._save_model(filepath, trainer, pl_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in _save_model
    raise ValueError(".save_function() not set")
ValueError: .save_function() not set

This is because Pytorch Lightning is doing some extra processing for the checkpoint_callback arg that it doesn't do for normal callbacks.
https://github.com/PyTorchLightning/pytorch-lightning/blob/ff0064f9563bcbbd2e3606ffb99ce8ba85a2791b/pytorch_lightning/trainer/connectors/callback_connector.py#L67

To Reproduce
Steps to reproduce the behavior:

from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint


model = LightningModel()
checkpoint_callback = ModelCheckpoint(save_top_k=3)
trainer = Trainer(
    gpus=1, callbacks=[checkpoint_callback], overfit_pct=0.01
)
trainer.fit(model)

build a model inference app

What's your idea?
We should have a quick and easy way to drop in photos and get model predictions.

Describe the solution you'd like
This would be straightforward to build using Streamlit.

allow users to submit photos to training set

What's your idea?
Add an option in our Streamlit app that allows users to submit their photo for us to include in our training set.

Describe the solution you'd like
Add a button to send the photo to a Google Drive photo. We won't be able to do this until Streamlit Sharing enables secrets, since the Google Drive API will need to be authenticated.

write some tests for models and training

What's your idea?
We should improve our test coverage in this repo.

Describe the solution you'd like
Simple tests for:

  • model can accept tensors of expected shape
  • model can decrease the loss through a single optimization step
  • feature processing code is tested

reorganize train script + expose through CLI

What's your idea?
Right now all of our training args are hard-coded in the script. We should expose these through a CLI.
We should also have the lightning module and the training script in separate .py files.

Describe the solution you'd like
Typer is a nice tool for building CLIs.

Add alternative names to top image prediction

What's your idea?
Sometimes variations in a flower's common name or scientific name lead to variations in users prediction grade or label for shared photos for the same underlying flower class. As a user it would be nice, if available, to see commonly used names for the top prediction, or at the very least common vs. scientific names.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

set up experiment tracking server

What's your idea?
We should set up an experiment tracking server to record metrics across various model training runs.

Describe the solution you'd like
Since this is a public repo, we'd be able to use Weights and Biases for free. If we go this route, we don't have to worry about how to deploy the server.

Describe alternatives you've considered
MLflow has a nice experiment tracking server that I've used on other projects. This is a self-hosted solution, so we'd want to consider how we deploy the experiment tracking server.

update how we manage datasets

What's your idea?
Currently, we have two ways of creating a dataset: CSVDataset and OxfordFlowers102Dataset. We're going to need a way to support the incremental addition of photos that we've collected from our Streamlit app.

Describe the solution you'd like
I think it makes sense to organize photos in Google Drive according to the ImageFolder standard, where we have a folder for each class. This can enable us to easily add new photos and explore the existing photos in our dataset (e.g. look at all the photos for a single class). However, we'll still want to be able to reproduce a single training run; if we have a mutating dataset, we'll need a way to represent the dataset at the point in time when the model was trained. We can accomplish this by creating a CSV file describing the dataset which could be reloaded at a later point using the existing CSVDataset.

Describe alternatives you've considered
We could keep using the CSVDataset as we add new images, but this approach feels a bit more awkward.

We could look into something like DVC to version control the data, but since we're storing the dataset in Google Drive I'm not sure how well this would work.

collect feedback from users in the app

What's your idea?
We should provide a simple way to collect feedback from users.

Describe the solution you'd like
Ask two optional questions:

  • "Do you think this prediction is correct?" โ€“ depending on the user response, we can add tags such as {classification:correct, classification:wrong, classification:unsure}
  • "Do you know what breed this flower is?" โ€“ add the input as an image tag

Completing #52 should help provide users with enough information to make a judgement if they'd like.

We can also add top_k model predictions and a model ID as image tags, which would help provide additional context to the feedback.

add a makefile for the project

What's your idea?
Add a makefile to run common commands.

Describe the solution you'd like
Commands for:

  • init (include setup for git hooks)
  • running tests
  • formatting

self-supervised learning on internet flower photos

What's your idea?
There's a large collection of flower photos available on Flikr that we could learn from.
https://www.flickr.com/search/?text=flowers&license=2%2C3%2C4%2C5%2C6%2C9

Describe the solution you'd like
There's a number of self-supervised papers for image data that have been released.

e.g.

write a callback to log images for model errors

What's your idea?
We should log a few example images to Weights and Biases during training so that we can visually inspect model errors.

Describe the solution you'd like
Write a PyTorch Lightning callback that logs images (https://docs.wandb.com/library/log#images-and-overlays) for validation examples that our model is misclassifying.

Note: need to check if we can get model outputs through validation callback metrics.

Parameters:

  • Allow users to specify how often this runs (every X epochs)
  • Allow users to specify the number of examples to log

explore various data augmentation strategies

Define your research question and variables
Can we squeeze out higher validation accuracy with more nuanced data augmentation?
Can data augmentation help us adjust to adapt to images from different domains (eg. iPhone photos)?

State your hypothesis

Describe your experimental methods

abstract mapping of class idx to class names

What's your idea?
Currently, our Streamlit app uses a hardcode list for mapping the model prediction to human-readable names. As we support new models trained on datasets with potentially different sets of class names (e.g. we might decide to add flower breeds), we'll want to be more flexible in how we perform this mapping.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

open access to streamlit

What's your idea?
Right now, you need to have an API key to be able to view the model predictions. We should have a user-facing application that doesn't require an API key. We can add weights to a Github release so that the Streamlit app can download these weights without having to authenticate with any services.

explore various backbone architectures

Define your research question and variables

torchvision provides a number of backbone architectures for us to use, we should explore these to see which performs the best on our dataset.

We can also use: https://rwightman.github.io/pytorch-image-models/

State your hypothesis

ResNeXt-101-32x8d has the top score on ImageNet, so it's reasonable to guess that it could give us the best performance on our dataset as well.

Describe your experimental methods

  • Implement backbone networks for a few torchvision models (we can skip ones like AlexNet and VGGNet)
  • Train the models using reasonable hyperparameters and log to Weights and Biases
  • Put together a Report in WandB presenting the results

Improve "other {breed} examples" photos with memorization

What's your idea?
Let's improve how we show example photos of the model's predicted breed.

Currently, we have a component in the Streamlit UI which shows the user example three images of the predicted breed. We obtain these images through a query using the Google Images Search API. However, because our API only allows 100 free requests per day, we don't automatically show these images (the user must click an additional button to view them).

Describe the solution you'd like
Because we have a fixed set of possible flower breeds to query, we should query the service once and memorize the results. We can store the results in a JSON file that gets loaded alongside the Streamlit app. Each time our model makes a prediction, we can simply perform a dictionary lookup to get the image URLs of example photos.

{
  "breed": [url, url, url],
  ...
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

add photo examples for top image prediction

What's your idea?
Since many people don't know flower breeds that well, it's hard to know when the model makes a correct prediction. However, if you show examples of the model prediction, it's easier to discern whether or not the model might be correct. We should add a new option in our streamlit app which shows the user a couple examples of the top predicted flower breed.

Describe the solution you'd like
Take the top predicted breed, search Google Images (or a similar site), and return the first 3 photos.

Describe alternatives you've considered
We could also show examples of the breed from our training set, but this would require us to host the dataset somewhere that is publicly available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.