computer_vision_pneumonia_x_ray's Introduction

Computer_vision_pneumonia_x_ray

Authors of the project : Kai Yung TAN (Adam) & Jean Christophe Meunier

1. Purpose and project objective

Purpose

Learning how to design and evaluate a custom made convolutional neural network for practical purposes
Using CNN models to analyse x ray images
Designing a CNN capable of recognising pneumonia in x-rays of patients

Objectives

Consolidate the knowledge in Python, specifically in : Tensorflow/kerras, NumPy, Pandas, Matplotlib,...
To be able to search and implement new librairies
Consolidate knowledge of data science and machine/deep learning algorithm for developping an accurate regression prediction model
To be able perform appropriate model hyperparametrisation

Features

Must-have

A CNN trained on a large x ray dataset (>5k) that can recognise new images outside of the training set
Proper model evaluation (split dataset, confusion matrix, etc)
Visualisations of model results (properly labeled, titled...)

Nice-to-Have

A visualisation of the feature maps of the model
Comparison with other CNN model structures
Assessing and comparing

Context of the project

All the work achieved was done during the BeCode's AI/data science bootcamp 2020-2021

2. The project

Working plan and steps

1. Research

Research and understand the term, concept and requirement of the project.
Discover new libraries that can serve the project purposes
Developing, using and testing machine learning algorithm (i.a. tensorflow/kerras,...)
Consolidating knowledge on model building and model hyperparametrisation (e.g. type of layers, pooling, dropout, batch normalization, type of activation functions,...)
Data augmentation
Aside from that, we also searched documentation on the internet on existing published work and/or studies on x ray data manipulation and modelization, as for example :
- sibeltan/pneumonia_detection_CNN
- Jain et al., 2020. Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement, 165, 1.

2. Data collection

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Examples of data input

3. Data manipulation

Image size reduction: original jpg were reduced to size 128 x 128 in order to accelerate data processing during models training
Standardisation of the images
Data augmentation using CV2 library and the 'ImageDataGenerator' function in order to increase training quality

4. Modelization

In total, a number of 17 models were build, trained and compared using various hyperparametrisation (see notebook section):

depth of the neural network
type of layers (dense, convolutional,...)
filters (number, size, padding, etc.)
type of activation (i.a. relu, leaky-relu, sigmoid, softmax,...)
dropout
pooling
batch normalization

For each model, hyperparametrisation was fine-tuned based on the performance indices on the test data set (624 pictures). When a model reached a satifying accuracy, he was finally rerun on the validation set (16 pictures)

The best fitted model was choosen partly based on previous good performance on train and test data set but mostly on performance on validation data set.

Final best fitting model

1. Model architecture

8 convolution layers (filters=32/32/32/64/64/64/128/128, kernel_size=(3, 3) activation='Leaky-relu')
MaxPool2D((2, 2)
Dropout(0.25) on all layers excepting the last one
Flatten
1 dense layer (1024, activation='relu')
model.add(Dense(2, activation='sigmoid'))
Dropout(0.5)
loss='binary_crossentropy', optimizer='adam'
shuffle = True
data augmentation: rotation_range = 20, zoom_range = 0.2, width_shift_range = 0.2, height_shift_range = 0.2, horizontal_flip = True, vertical_flip = True
Batch size : 16
Epochs : 100

2. Performance evaluation

Loss and accuracy

Confusion matrix on test set

Performance indices on test set

Confusion matrix on validation set

Performance indices on validation set

3. Further development

Further train the model on additional data
Model optimization: constructing simpler models that reach similar metric performance
Building a RESTfull API to be deployed on a web based environment (e.g. Heroku, Azure, etc.)
Completing the API with a web-based interface (e.g. using streamlit) allowing for uploading x ray images to get pneumonia diagnose
Extending model to include other types of pathologies (i.e. multiclass classification including other respiratory diseases)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.