Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Fully supervised approaches need large, densely annotated datasets. Only hospitals that can afford to collect large annotated datasets can utilize these approaches to aid their physicians. The project goal is to utilize self-supervised and semi-supervised learning approaches to significantly reduce the need for fully labelled data. In this repo, you will find the project source code, along with training notebooks, and the final TensorFlow 2 saved model used to develop the web application for detecting Pediatric Pneumonia from chest X-rays.

The semi/self-supervised learning framework used in the project comprises of three stages:

Self-supervised pretraining
Supervised fine-tuning with active-learning
Knowledge distillation using unlabeled data

Refer to Google reserach team's paper (SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners) for more details regarding the framework used.

The training notebooks for Stage 1, 2, and 3 can be found in the notebooks folder. Notebooks for Selective Labeling (active-learning) using Entropy or Augmentations policies can be found in the Active_Learn folder. We also evaluated another Semi-Supervised Learning approach called FixMatch. Benchmarks for Fully-Supervised Learning can be found in the FSL_Benchmarks folder. The code for Data Preprocessing can be found in Data_Preparation.

Results

Stage 1 - Contrastive Accuracy

Labels	Stage 1 (Self-Supervised)
No labels used	99.99%

Contrastive Accuracy is a measure of how invariant the model's predictions are when tested against image augmentations.

Stage 2 and 3 - Test Accuracy Comparison

Labels	FSL (Benchmark)	Stage 2 (Finetuning)	Stage 3 (Distillation)
1%	85.2%	94.5%	96.3%
2%	85.1%	96.8%	97.6%
5%	86.0%	97.1%	98.1%
100%	98.9%	N/A	N/A

Despite needing only a small fraction of labels, our Stage 2 and Stage 3 models were able to acheive test accuracies that are comparable to a 100% labelled Fully-Supervised (FSL) model. Refer to the Project Report and the Final Presentation for a more detailed discussion and findings.

ML Workflow

Web App Demo

Installation

Your can run the app locally if you have Docker installed. First, clone this repo:

git clone https://github.com/TeamSemiSuperCV/semi-super

Navigate to the webapp directory of the repo:

cd semi-super/webapp

Build the container image using the docker build command (will take few minutes):

docker build -t semi-super .

Start the container using the docker run command, specifying the name of the image we just created:

docker run -dp 8080:8080 semi-super

After a few seconds, open your web browser to http://localhost:8080. You should see the app.

Acknowledgements

We took the SimCLR framework code from Google Research and heavily modified it for the purposes of this project. We enhanced the knowledge distillation feature along with several other changes to make it perform better with our dataset. With these changes and improvements, knowledge distillation can be performed on the Google Cloud TPU infrastructure, which reduces training time significantly.

teamsemisupercv / semi-super Goto Github PK

semi-super's Introduction

Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Results

Stage 1 - Contrastive Accuracy

Stage 2 and 3 - Test Accuracy Comparison

ML Workflow

Web App Demo

Installation

Acknowledgements

Other Resources

semi-super's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs