CLOCS is a patient-specific contrastive learning method that can be used to pre-train medical time-series data. It can improve the generalization performance of downstream supervised tasks.
This method is described in "CLOCS: Contrastive Learning of Cardiac Signals"
The CLOCS code requires
- Python 3.6 or higher
- PyTorch 1.0 or higher
(The authors left out describing what other packages are required... So I provide a working conda environment.yml
file for easy setup)
The datasets can be downloaded from the following links:
- PhysioNet 2020: https://physionetchallenges.github.io/2020/
- Chapman: https://figshare.com/collections/ChapmanECG/4560497/2
- Cardiology: https://irhythm.github.io/cardiol_test_set/
- PhysioNet 2017: https://physionet.org/content/challenge-2017/1.0.0/
In order to pre-process the datasets appropriately for CLOCS and the downstream supervised tasks, please refer to the following repository: https://anonymous.4open.science/r/9ecc66f3-e173-4771-90ce-ff35ee29a1c0/
To train the model(s) in the paper, run this command:
python run_experiments.py
To evaluate the model(s) in the paper, run this command:
python run_experiments.py
- Run the download_data.py script to download the Physionet2020 data, which will be used for pretraining. The same script will also download the Chapman data, which will be used for fine-tuning during the transfer learning phase. Note you have to manually edit the spreadsheet name in the chapman_ecg folder to
Diagnostics.xlsx
and then export a.csv
file for yourself (I should probably do this in the script). - For the data preparation stage, we also have to run some of the authors' scripts. For Physionet2020 first run
load_physionet2020.py
and then rungenerates_frames_and_lables_phases.py
. For Chapman, just runload_chapman_ecg.py
. I think I have modified these scripts to automatically detect our cwd to do the file manipulations. If not, please check it for yourself. - Finally we can train the model. For pretraining, first open the
run_experiments.py
and modify the parameters at the bottom the file. Specifically, changetrial_to_run_list
to contain justCMSC
orCMLC
orCMSMLC
. The to_load list doesn't matter. For the downstream_dataset and second_dataset use bothphysionet2020
. Finally, modify thelabelled_fraction
to be 1. Additionally, you can provide a list of embedding dimensions to train models of multiple sizes. - After pretraining, we have to modify the script to do fine-tuning. Change trial_to_run to
Fine-Tuning
and the second_dataset tochapman
. Then lower the labelled_fraction to 0.5 or lower, otherwise you will get spuriously good results. Then run the script again and it will produce some fine-tuned models. By default it should run for about 80 epochs.