GithubHelp home page GithubHelp logo

teamsemisupercv / semi-super Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 7.0 283.32 MB

Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

Python 1.41% Jupyter Notebook 94.86% Dockerfile 0.01% HTML 0.16% PureBasic 3.57%
computer-vision deep-learning semi-supervised-learning knowledge-distillation active-learning googlecloudplatform tpu-acceleration self-supervised-learning tensorflow2

semi-super's Introduction

Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Fully supervised approaches need large, densely annotated datasets. Only hospitals that can afford to collect large annotated datasets can utilize these approaches to aid their physicians. The project goal is to utilize self-supervised and semi-supervised learning approaches to significantly reduce the need for fully labelled data. In this repo, you will find the project source code, along with training notebooks, and the final TensorFlow 2 saved model used to develop the web application for detecting Pediatric Pneumonia from chest X-rays.

The semi/self-supervised learning framework used in the project comprises of three stages:

  1. Self-supervised pretraining
  2. Supervised fine-tuning with active-learning
  3. Knowledge distillation using unlabeled data

Refer to Google reserach team's paper (SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners) for more details regarding the framework used.

The training notebooks for Stage 1, 2, and 3 can be found in the notebooks folder. Notebooks for Selective Labeling (active-learning) using Entropy or Augmentations policies can be found in the Active_Learn folder. We also evaluated another Semi-Supervised Learning approach called FixMatch. Benchmarks for Fully-Supervised Learning can be found in the FSL_Benchmarks folder. The code for Data Preprocessing can be found in Data_Preparation.

Results

Stage 1 - Contrastive Accuracy

Labels Stage 1 (Self-Supervised)
No labels used 99.99%

Contrastive Accuracy is a measure of how invariant the model's predictions are when tested against image augmentations.

Stage 2 and 3 - Test Accuracy Comparison

Labels FSL (Benchmark) Stage 2 (Finetuning) Stage 3 (Distillation)
1% 85.2% 94.5% 96.3%
2% 85.1% 96.8% 97.6%
5% 86.0% 97.1% 98.1%
100% 98.9% N/A N/A

Despite needing only a small fraction of labels, our Stage 2 and Stage 3 models were able to acheive test accuracies that are comparable to a 100% labelled Fully-Supervised (FSL) model. Refer to the Project Report and the Final Presentation for a more detailed discussion and findings.

ML Workflow

Web App Demo

Installation

Your can run the app locally if you have Docker installed. First, clone this repo:

git clone https://github.com/TeamSemiSuperCV/semi-super

Navigate to the webapp directory of the repo:

cd semi-super/webapp

Build the container image using the docker build command (will take few minutes):

docker build -t semi-super .

Start the container using the docker run command, specifying the name of the image we just created:

docker run -dp 8080:8080 semi-super

After a few seconds, open your web browser to http://localhost:8080. You should see the app.

Acknowledgements

We took the SimCLR framework code from Google Research and heavily modified it for the purposes of this project. We enhanced the knowledge distillation feature along with several other changes to make it perform better with our dataset. With these changes and improvements, knowledge distillation can be performed on the Google Cloud TPU infrastructure, which reduces training time significantly.

Other Resources

semi-super's People

Contributors

ixig avatar sbhimire avatar sinemmy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.