GithubHelp home page GithubHelp logo

computer_vision_pneumonia_x_ray's Introduction

Computer_vision_pneumonia_x_ray

Authors of the project : Kai Yung TAN (Adam) & Jean Christophe Meunier

1. Purpose and project objective

Purpose

  • Learning how to design and evaluate a custom made convolutional neural network for practical purposes
  • Using CNN models to analyse x ray images
  • Designing a CNN capable of recognising pneumonia in x-rays of patients

Objectives

  • Consolidate the knowledge in Python, specifically in : Tensorflow/kerras, NumPy, Pandas, Matplotlib,...
  • To be able to search and implement new librairies
  • Consolidate knowledge of data science and machine/deep learning algorithm for developping an accurate regression prediction model
  • To be able perform appropriate model hyperparametrisation

Features

Must-have

  • A CNN trained on a large x ray dataset (>5k) that can recognise new images outside of the training set
  • Proper model evaluation (split dataset, confusion matrix, etc)
  • Visualisations of model results (properly labeled, titled...)

Nice-to-Have

  • A visualisation of the feature maps of the model
  • Comparison with other CNN model structures
  • Assessing and comparing

Context of the project

  • All the work achieved was done during the BeCode's AI/data science bootcamp 2020-2021

2. The project

Working plan and steps

1. Research

2. Data collection

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

  • Examples of data input

3. Data manipulation

  • Image size reduction: original jpg were reduced to size 128 x 128 in order to accelerate data processing during models training

  • Standardisation of the images

  • Data augmentation using CV2 library and the 'ImageDataGenerator' function in order to increase training quality

4. Modelization

In total, a number of 17 models were build, trained and compared using various hyperparametrisation (see notebook section):

  • depth of the neural network
  • type of layers (dense, convolutional,...)
  • filters (number, size, padding, etc.)
  • type of activation (i.a. relu, leaky-relu, sigmoid, softmax,...)
  • dropout
  • pooling
  • batch normalization

For each model, hyperparametrisation was fine-tuned based on the performance indices on the test data set (624 pictures). When a model reached a satifying accuracy, he was finally rerun on the validation set (16 pictures)

The best fitted model was choosen partly based on previous good performance on train and test data set but mostly on performance on validation data set.

Final best fitting model

1. Model architecture

  • 8 convolution layers (filters=32/32/32/64/64/64/128/128, kernel_size=(3, 3) activation='Leaky-relu')
  • MaxPool2D((2, 2)
  • Dropout(0.25) on all layers excepting the last one
  • Flatten
  • 1 dense layer (1024, activation='relu')
  • model.add(Dense(2, activation='sigmoid'))
  • Dropout(0.5)
  • loss='binary_crossentropy', optimizer='adam'
  • shuffle = True
  • data augmentation: rotation_range = 20, zoom_range = 0.2, width_shift_range = 0.2, height_shift_range = 0.2, horizontal_flip = True, vertical_flip = True
  • Batch size : 16
  • Epochs : 100

2. Performance evaluation

  • Loss and accuracy

  • Confusion matrix on test set

  • Performance indices on test set

  • Confusion matrix on validation set

  • Performance indices on validation set

3. Further development

  • Further train the model on additional data
  • Model optimization: constructing simpler models that reach similar metric performance
  • Building a RESTfull API to be deployed on a web based environment (e.g. Heroku, Azure, etc.)
  • Completing the API with a web-based interface (e.g. using streamlit) allowing for uploading x ray images to get pneumonia diagnose
  • Extending model to include other types of pathologies (i.e. multiclass classification including other respiratory diseases)

computer_vision_pneumonia_x_ray's People

Contributors

jcmeunier77 avatar

Watchers

 avatar

Forkers

kaiyungtan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.