GithubHelp home page GithubHelp logo

flamingofugang / azurechestxraynoaml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from georgeaccnt-gh/azurechestxraynoaml

0.0 1.0 0.0 3.11 MB

Azure Chest Xay project outside AML

License: MIT License

HTML 24.59% Jupyter Notebook 74.86% Python 0.11% Roff 0.45%

azurechestxraynoaml's Introduction

Azure Chest Xay project outside AML

Author: George Iordanescu

Introduction

This repository contains the code related to the blog post: Using Microsoft AI to Build a Lung-Disease Prediction Model using Chest X-Ray Images, by Xiaoyong Zhu, George Iordanescu, Ilia Karmanov, data scientists from Microsoft, and Mazen Zawaideh, radiologist resident from University of Washington Medical Center. While the blog repo was developped using Azure Machine Learning Services (AML) workbench, we provide here the non-AML version of the code, showing how one can leverage the power of Microsoft AI platform to build advanced analytics solutions using powerful open source tools like docker, Jupyter notebooks, deep learning frameworks like Keras (with TensorFlow backend) on multi-GPU enabled Azure Deep Learning Virtual Machines (DLVM).

Understanding the inner works of training and deploying deep-learning models is important for developping new models and also highlights the critical benefits of using Azure Machine Learning Services for training, operationalization and model management.

Step by step instructions:

  • Deply an Azure Deep Learning Virtual Machines (DLVM)
  • Open up ports for ssh, plus 2 Jupyter Notebook servers (one plain and the other one used for building the dockerized training and scoring scripts)
  • Add disks or expand the current ones as needed (you will need several 100 GB to store data and images)
  • Move/download NIH Chest X-Ray data (12 images_xxx.tar.gz files) into an Azure blob storage account. Make sure you know the container address and its key
  • We will use a Jupyter notebook running on the provisioned Azure DLVM to create the training docker image
  • Training docker image will run in a container on the same DLVM. We'll connect to it via a second Jupyter Notebook server, and we will develop the training script and train a deep learning model for image classification.
  • The trained model and its associated scring script will then be deployed via a scoring docker image on an a Azure Kubernetes Service (AKS) cluster.
  • login (ssh) into the VM and create the project base directory structure:
sudo mkdir -p /data/datadrive01
sudo chmod -R ugo=rwx  /data/datadrive01/
sudo mkdir -p /data/datadrive01/prj
sudo mkdir -p /data/datadrive01/data
sudo chmod -R ugo=rwx  /data/datadrive01/
  • Login into dockerhub:
docker login
sudo groupadd docker
sudo usermod -aG docker $USER
  • Update/install a few system libs:
sudo apt-get update
pip install --upgrade pip
sudo apt-get install tmux
pip install -U python-dotenv
  • Clone the project:
cd /data/datadrive01/prj/
git clone https://github.com/georgeAccnt-GH/AzureChestXRayNoAML.git
sudo chmod -R ugo=rwx  /data/datadrive01/
  • The project code structure is shown below. Data will be downloaded by notebooks and stored in /data/datadrive01/data/:
ls -l /data/datadrive01/prj/AzureChestXRayNoAML/
total 16
drwxrwxrwx 6 loginVM_001 loginVM_001 4096 Sep 26 17:08 code
drwxrwxrwx 2 loginVM_001 loginVM_001 4096 Sep 26 16:56 docker
-rwxrwxrwx 1 loginVM_001 loginVM_001 1161 Sep 26 16:56 LICENSE
-rwxrwxrwx 1 loginVM_001 loginVM_001 2828 Sep 26 16:56 README.md
  • Start the base jupyter notebook server on the vm (via tmux if you want to still have cli control):
cd /data/datadrive01/
tmux attach-session -t jupyter_srvr
jupyter notebook --notebook-dir=$(pwd) --ip='*' --port=10002 --no-browser --allow-root
  • If you can not save notebooks, run these commands to enable write rigths to your directories:
sudo chmod -R ugo=rwx  /data/datadrive01/

# to change directories' ownership
sudo chown -R loginVM_001:loginVM_001 /data/datadrive01/prj/

# to change files'  ownership
sudo find . -type f  | xargs sudo chown loginVM_001:loginVM_001

  • You can nonnect to the base Jupyter notebook server from your local machine by using the appropriate port (e.g. 10002 below) and the tocken reported by the server on the VM in the tmux session:
http://ghiordtlvisgpvm.southcentralus.cloudapp.azure.com:10002/some_token
  • Go to AzureChestXRayNoAML/code/00_create_docker_image.ipynb and generate the training doscker file and associated docker image, The command to start the training docker container is printed towards the end of the notebook, as shown below. Port 10003 shown below is the one used for the second (training, dockerized) Jupyter notebook server, adn it should match the third port opened on the VM.
nvidia-docker run -i -t -p 10003:8888 -v $(pwd):/local_dir:rw ...
  • You can run the above command in a new tmux session:
tmux new -s jupyter_srvr_docker
sudo nvidia-docker run -i -t -p 10003:8888 -v $(pwd):/local_dir:rw georgedockeraccount/chestxray-no-aml-gpu:1.0.1 /bin/bash -c "/opt/conda/bin/jupyter notebook --notebook-dir=/local_dir --ip=* --port=8888 --no-browser --allow-root"
  • You can nonnect to the training dockerized Jupyter notebook server from your local machine by using the other port (e.g. 10003 below) and the tocken reported by the server on the VM in the tmux session:
http://ghiordtlvisgpvm.southcentralus.cloudapp.azure.com:10003/some_token
  • Follow the project notebooks (use edit_python_files.ipynb to edit .py files as needed):
    • AzureChestXRayNoAML/code/01_DataPrep/001_get_data.ipynb to get the data (from the storage account where you downloaded NIH image data and auxiliary files (BBox_List_2017.csv, blacklist.csv and Data_Entry_2017.csv)
    • AzureChestXRayNoAML/code/02_Model/000_preprocess.ipynb
    • AzureChestXRayNoAML/code/02_Model/010_train.ipynb

azurechestxraynoaml's People

Contributors

georgeaccnt-gh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.