GithubHelp home page GithubHelp logo

cs598dl4h's Introduction

CS598DL4H

Project for CS 598 Deep Learning for Healthcare

Local Setup

Submodules

git submodule init
git submodule update

Virtual Environments

CS598DL4h

pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt -r dev-requirements.txt

Use env/bin/python as the kernel for MIMIC_III.ipynb.

caml-mimic

pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt

Use caml-mimic/env/bin/python as the kernel for caml-mimic/notebooks/dataproc_mimic_III.ipynb.

Explainable-Automated-Medical-Coding

pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt

Use Explainable-Automated-Medical-Coding/env/bin/python as the kernel for Explainable-Automated-Medical-Coding/HLAN/demo_HLAN_viz.ipynb.

Google Colab Setup

See Setup.ipynb for Google Colab-only set up steps. Also, relevant Colab-only header sections in each notebook reference this set up.

Repro Steps

Prerequisites / Demo

MIMIC_III.ipynb

Training

Examples detailed in Explainable-Automated-Medical-Coding/README.md.

cd Explainable-Automated-Medical-Coding/HLAN/
source ../env/bin/activate

MIMIC-III Top 50

Currently working off of Explainable-Automated-Medical-Coding/datasets/mimiciii_*_50_th0.txt.

Original
cd Explainable-Automated-Medical-Coding/HLAN/
../env/bin/python HAN_train.py \
    --dataset mimic3-ds-50 \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=False \
    --num_epochs 100 \
    --report_rand_pred=False \
    --running_times 1 \
    --early_stop_lr 0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir ../checkpoints/checkpoint_HAN_50_per_label_bs32_LE/ \
    --use_sent_split_padded_version=False \
    --marking_id 50-hlan \
    --gpu=True  # Colab only
Clone

See scripts directory for operational parameters for HLAN, HA-GRU and HAN variants with and without label embedding (LE):

  • scripts/HLAN+LE.sh
  • scripts/HA-GRU+LE.sh
  • scripts/HAN+LE.sh
  • scripts/HLAN.sh
  • scripts/HA-GRU.sh
  • scripts/HAN.sh

MIMIC-III COVID-19 Shielding

NOTE: this section is incomplete due to reproducibility challenges with the COVID-19 shielding data sourced from UK's NHS, and mapped from ICD-10 to ICD-9 using tools from The Govt. of NZ

Needs preprocessing to extract only Admissions IDs from admissions containing COVID-19 related ICD-9 codes, derived from COVID-19 related ICD-10 codes in ./spl-icd10-opcs4-disease-groups-v2.0.csv and from an ICD-10-to-ICD-9 mapping in ./masterb8.csv. This process needs to generate CSV files akin to caml-mimic/mimicdata/mimic3/*_50.csv to be converted by csv_to_text.py to Explainable-Automated-Medical-Coding/datasets/mimiciii_*_full_th_50_covid_shielding.txt, the files the training expects.

Original
cd Explainable-Automated-Medical-Coding/HLAN/
../env/bin/python HAN_train.py \
    --dataset mimic3-ds-shielding-th50 \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=False \
    --num_epochs 100 \
    --report_rand_pred=False \
    --running_times 1 \
    --early_stop_lr 0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir ../checkpoints/checkpoint_HAN_shielding_per_label_bs32_LE/ \
    --use_sent_split_padded_version=False \
    --marking_id shielding-hlan \
    --gpu=True  # Colab only

Code Changes

Refactoring of HAN Class

  • Original source: Explainable-Automated-Medical-Coding/HLAN/HAN_model_dynamic.py
  • Refactored source: HLAN/HAN_model_dynamic.py

Original Class

The original class is a single implementation with the responsibility for HLAN (Hierarchical Label-wise Attention Network), as well as both downgraded models, HA-GRU (Hierarchical Attention - Gated Recurrent Unit) and HAN (Hierarchical Attention Network). In addition, it handles the transparent application of Label Embedding (LE) to each, by conditional application of a pre-trained word2vec model.

The important call out in this diagram is the number of instances <method> and <method>_per_label pairs that exists, indicative of an imperitive implementation. Concretely, where to apply label-wise attention (i.e.: attention per label) is the primary difference between each of the model variants HAN, HA-GRU, and HLAN.

Replace Conditional with Polymorphism

The first refactoring was Replace Conditional with Polymorphism. This allowed all the instances of <method> and <method>_per_label pairs to be modeled, instead, with an inheritence hierarchy from the simplest model (HAN, which applies no label-wise attention) to the most complex (HLAN, which applies label-wise attention at the sentence and word level).

Form Template Method

The second refactoring applied was Form Template Method. This allowed a great deal of duplication to be effectively removed by making many more finer-grained methods than the original class supported. With this change, commonality among methods defined by more than one class became apparent, and all common methods could be pushed up the inheritence hierarchy as a Template Method.

Deduplication Results

Both refactorings allowed approximately a 40% redunction in Lines of Code, and 75% reduction in words, for a functionally equivalent implementation, as shown.

Lines
$ wc -l Explainable-Automated-Medical-Coding/HLAN/HAN_model_dynamic.py
    1193 Explainable-Automated-Medical-Coding/HLAN/HAN_model_dynamic.py
$ wc -l HLAN/HAN_model_dynamic.py
     698 HLAN/HAN_model_dynamic.py
Words
$ wc -w Explainable-Automated-Medical-Coding/HLAN/HAN_model_dynamic.py
    6577 Explainable-Automated-Medical-Coding/HLAN/HAN_model_dynamic.py
$ wc -w HLAN/HAN_model_dynamic.py
    1678 HLAN/HAN_model_dynamic.py

cs598dl4h's People

Contributors

dmcguire81 avatar chrispebble avatar

Stargazers

David avatar  avatar Trung Ng avatar Hang Dong avatar

Watchers

James Cloos avatar  avatar

Forkers

lthroy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.