nicovandenhooff / indoor-scene-detector Goto Github PK

This repository contains the source code for Indoor Scene Detector, a full stack deep learning computer vision application.

Home Page: https://www.indoorscenedetector.com/

License: MIT License

Python 48.22% CSS 3.88% HTML 2.22% Shell 0.05% JavaScript 44.69% Dockerfile 0.88% Procfile 0.06%

deep-learning neural-networks image-classification image-recognition convolutional-neural-networks machine-learning pytorch flask react captum

indoor-scene-detector's Introduction

indoor-scene-detector's People

Stargazers

Watchers

indoor-scene-detector's Issues

Rename repository

To rename repository to indoor-scene-detector and ensure FE works OK with gh pages

Frontpage

Welcome to the Indoor Scene Image Detector. Select or upload an image of an indoor scene to classify it!

About

Application description

Take from README

Acknowledgements

The data set used in building this application was the Indoor Scene Recognition Data set collected by MIT. https://web.mit.edu/torralba/www/indoor.html

Contact

Mel short bio

mel to add bio

Mel linkedin: https://www.linkedin.com/in/mel-liow/
Mel github: https://github.com/mel-liow

Nico short bio

Nico Van den Hooff, CPA is a graduate student in the Master of Data Science program at the University of British Columbia. Nico is also a Chartered Professional Accountant and has six years of experience working in finance, including three years at KPMG LLP as a Senior Consultant/Accountant.

Nico linkedin: https://www.linkedin.com/in/nicovandenhooff/
Nico github: https://github.com/nicovandenhooff

Tests in progress

The majority of tests for the backend are completed - just keeping a list here of ones that have not been done (with reasoning).

ml.training.py

train_model to refactor code so that it is more unit testable

ml.prediction

_get_weights_s3 weights will be moved off s3 soon so no test written
load_models same as above
wrangle_topk_predictions

ml.plotting.py

Currently determining how to speed up all tests since captum is slow for some attribute calculations

ml.preprocessing

Currently this file is not used in training so it is redundant

Deployment instructions

I was able to deploy a toy Flask-React app to Heroku/GitHub pages... here's instructions for us + example links below:

Fully deployed toy app: https://github.com/nicovandenhooff/react-flask-app
Heroku: https://react-flask-app-toy.herokuapp.com/
- Note that there's only one API call in the toy app which goes to https://react-flask-app-toy.herokuapp.com/api/time
GitHub pages: https://nicovandenhooff.github.io/react-flask-app/

Finally, the toy app code is based mainly on this tutorial: https://blog.miguelgrinberg.com/post/how-to-deploy-a-react--flask-project

1. Backend (`Flask`) with Heroku

HOSTED HERE: https://cnn-dashboard-backend.herokuapp.com/

1.1: Set up prior to Heroku deployment

Add CORS to Flask app and decorate routing functions
- Imports from flask_cors import CORS, cross_origin
- Wrap app with CORS COS(app)
- Decorate routes with @cross_origin()
- https://stackoverflow.com/questions/25594893/how-to-enable-cors-in-flask
- https://flask-cors.readthedocs.io/en/latest/
Create a Procfile within backend folder that contains web: gunicorn <module-name>:<flask-app>
- Backend folder isapi in our case
- Therefore our Procfile should contain web: gunicorn app:app
Ensure that requirements.txt is up to date and now includes gunicorn and Flask-Cors

1.2 Heroku deployment

Create Heroku app on https://www.heroku.com/
Within the app on Heroku, go to Deploy and connect to the relevant GitHub repository.
- Note: DO NOT enable automatic deploys since we have frontend and backend in the same repository.
- To consider: Creating a deployment branch rather than deploying from master or main
Within the app on Heroku go to Settings and set a new Config Vars to point to where the code is that we want deployed on Heroku
- key: PROJECT_PATH
- value: api
- Apparently this is supposed to ensure that only the api folder pushes to Heroku's git, but it didn't seem to work, hence step 4...
- To discuss with Mel: is it possible to have a GitHub brach that only has backend files? If so we can probably get rid of step 4 below as well
- https://stackoverflow.com/questions/39197334/automated-heroku-deploy-from-subfolder
Push ONLY the backend subdirectory to GitHub with git subtree push --prefix <path/to/directory> heroku main
- Our code will be git subtree push --prefix api heroku main
- https://medium.com/@shalandy/deploy-git-subdirectory-to-heroku-ea05e95fce1f
Go to Heroku site and confirm that everything has been deployed properly

2. Frontend (`React`) with GitHub pages

HOSTED HERE: https://nicovandenhooff.github.io/cnn-dashboard/

2.1 Set up prior to GitHub pages deployment

Point React to call the API server which is deployed on Heroku
- Note that I did this without axios in the toy app, I used fetch and hardcoded the backend Heroku api link to force the React app to know where the API is
- To discuss how to make this more programatic with Mel/or replicate with axios
- Also, another note is that the proxy line in package.json is only relevant from development, it has no impact on production

2.2 GitHub pages deployment

This part is simple, follow these instructions: https://github.com/gitname/react-gh-pages

After the GH pages deploys, check that everything is working properly...

3 Next steps

I think we can use GitHub actions to automate the entire build process above, but this isn't MVP

Frontend Improvements

Docs

Just a reminder to myself to address all TODO in README, CONTRIBUTING...

Heroku RAM full

Heroku is constantly failing since the RAM is full, currently trying to debug why

https://stackoverflow.com/questions/49991234/flask-app-memory-leak-caused-by-each-api-call
https://stackoverflow.com/questions/67198421/possible-python-flask-memory-leak
https://www.joelsleppy.com/blog/gunicorn-application-preloading/
https://github.com/extensive-vision-ai/thetensorclan-backend-heroku
https://discuss.pytorch.org/t/requires-grad-or-no-grad-in-prediction-phase/35759/2
https://kirankumargmrur.medium.com/memory-leak-in-django-application-aaa094ea324
https://stackoverflow.com/questions/67637004/gunicorn-worker-terminated-with-signal-9
http://www.streppone.it/cosimo/blog/2021/08/deploying-large-deep-learning-models-in-production/
https://devcenter.heroku.com/articles/python-gunicorn#advanced-configuration
http://jck.bio/pytorch_estimating_model_size/

FE compiling errors

@mel-liow

FYI the GH actions continuous deployment is failing with the below JS errors.

I think I could intuitively figure these out but will let you have a look since its FE.

Failed to compile.

./src/App.js
  Line 2:   'Panel' is defined but never used           no-unused-vars
  Line 6:   'Routes' is defined but never used          no-unused-vars
  Line 6:   'Route' is defined but never used           no-unused-vars
  Line 11:  'TableContainer' is defined but never used  no-unused-vars

./src/components/form/Form.js
  Line 87:   'buttonText' is assigned a value but never used  no-unused-vars
  Line 122:  Expected '===' and instead saw '=='              eqeqeq

./src/components/layout/navbar/NavBar.js
  Line 9:  'ThemeToggle' is defined but never used  no-unused-vars

PyTorch deployment links

https://www.kdnuggets.com/2019/03/deploy-pytorch-model-production.html
https://www.python-engineer.com/posts/pytorch-model-deployment-with-flask/
https://devcenter.heroku.com/articles/git#reset-a-git-repository
https://devcenter.heroku.com/articles/slug-compiler#build-cache
https://aws.amazon.com/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/
https://devcenter.heroku.com/articles/s3
https://aws.amazon.com/s3/

https://towardsdatascience.com/serving-pytorch-models-with-torchserve-6b8e8cbdb632

docker compose m1 chip

Docker compose doesnt work on m1 chip (fails at torch install), to debug w/ mel

Transfer learning

Just creating this issue for myself to track the architecture of networks for transfer learning

Front end ideas

Compiling a bunch of ideas for the front end here as a draft... Can discuss tomorrow and move concrete ones to Trello.

Image upload

Option to provide a URL for an image? See http://places2.csail.mit.edu/demo.html for an example

Networks

Rename Simple Network? I can't think of anything better though right now...
When SimpleNetwork is selected, the Fully tuned and Last layer tuned should either grey out or go away
Title above Fully tuned that says something like Select transfer learning technique
Rename Fully tuned and Last layer tuned. I think maybe All layers tuned and Only last layer tuned? But happy to chat abt this

Prediction output (numeric and label)

Add spinning wheel on the module of the FE that will show the output that says computing... or something similar after Submit is hit
Alternatively on the above, there could be a computing... for the prediction and then also another one for the heatmap
Allow user to pick between single classification where output is one class and it's probability OR top three classes and their corresponding probabilities
Remove class label #

Heatmaps

To consider having multiple pages or scrolling for heatmaps
We have four options here, saliency/occlusion/integrated gradients/SHAP. The last three all take a least a minute to calculate, whereas the saliency is almost instant. I think we should display saliency with the image, but then for the other three, consider creating a separate interface that allows the user to choose which one they want and then submit, since there will be high latency involved

Image carousel

Need to ensure that images selected are "interesting" in the sense that they are not just all 100% accurate predictions, i.e. challenging images can show why networks are struggling

Random

Refactor this code below in app.py to App.js

    # TODO: move network name to react
    if data["network"] != "simple_cnn":
        model_name += "_" + data["transferLearning"]

Not important but is there a way to add docstrings in the js code? Just so I can understand what function does what lol, but not a big deal if not

Image classes

Here's the counts per image class... to discuss how we want to filter the data to reduce classes down from 67.

A couple options:

If we cut off at a mandatory 200 images per class that actually works nicely and would mean we use the top 25 classes below (from kitchen to grocerystore).
We could pick and choose classes that we think are the most interesting/useful

[('kitchen', 734),
 ('livingroom', 706),
 ('bedroom', 662),
 ('airport_inside', 608),
 ('bar', 604),
 ('subway', 539),
 ('casino', 515),
 ('restaurant', 513),
 ('warehouse', 506),
 ('inside_subway', 457),
 ('bakery', 405),
 ('pantry', 384),
 ('bookstore', 380),
 ('toystore', 347),
 ('corridor', 346),
 ('laundromat', 276),
 ('dining_room', 274),
 ('winecellar', 269),
 ('deli', 258),
 ('locker_room', 249),
 ('hairsalon', 239),
 ('meeting_room', 233),
 ('gym', 231),
 ('bowling', 213),
 ('grocerystore', 213),
 ('bathroom', 197),
 ('church_inside', 180),
 ('auditorium', 176),
 ('mall', 176),
 ('movietheater', 175),
 ('poolinside', 174),
 ('museum', 168),
 ('tv_studio', 166),
 ('jewelleryshop', 157),
 ('stairscase', 155),
 ('trainstation', 153),
 ('waitingroom', 151),
 ('nursery', 144),
 ('artstudio', 140),
 ('closet', 135),
 ('operating_room', 135),
 ('dentaloffice', 131),
 ('gameroom', 127),
 ('kindergarden', 127),
 ('laboratorywet', 125),
 ('cloister', 120),
 ('fastfood_restaurant', 116),
 ('shoeshop', 116),
 ('computerroom', 114),
 ('classroom', 113),
 ('children_room', 112),
 ('buffet', 111),
 ('videostore', 110),
 ('office', 109),
 ('studiomusic', 108),
 ('library', 107),
 ('restaurant_kitchen', 107),
 ('clothingstore', 106),
 ('concert_hall', 103),
 ('florist', 103),
 ('garage', 103),
 ('prisoncell', 103),
 ('inside_bus', 102),
 ('elevator', 101),
 ('greenhouse', 101),
 ('hospitalroom', 101),
 ('lobby', 101)]

Agenda - March 21, 2022 Call

Draft agenda for our call later. Feel free to add to this @mel-liow

Project management
- What app to use for a project board to track milestones/features?
Dataset
- To decide how many classes we should use for our first MVP (see #4)
- To brainstorm what to do if low accuracy is still a problem with fewer classes, new dataset?
Backend goals this week
- Fully trained AlexNet, DenseNet, ResNet transfer learning models (last layer tuned) completed in .pth files for the number of classes that we decide upon above
- Fully trained custom CNN completed in .pth file
- Probably can do more, to discuss
Frontend goals this week
- To discuss
Other
- Next touchpoint/meeting (weekend?)
- Any other items

Prediction bug

@mel-liow currently if you select/upload an image and then just press submit (without choosing a model or pressing any other buttons) the prediction comes from alexnet even though densenet is selected. i was tinkering around with the order of the buttons so i probably broke something (see this commit b9c3f96)