Running main.py
will complete the following nine steps:
- Unpack
2020-10-21_gisaid-fasta.tar.bz2
1
tar -xjvf data/2020-10-21_gisaid-fasta.tar.bz2 -C fasta
- Create VCF files from FASTA files1
python 00_fasta.py
- Create JSON files from
2020-10-21_gisaid-patient.json
1
python 00_recode.py
- Run VCF data processing scripts
python 01_long.py 02_wide.py
- Join VCF and JSON data
python 03_join.py
- Calculate variant frequency from VCF data
python 03_var-freq.py
- Clean data in preparation for modeling
python 04_clean.py
- Run logistic regression and create Figure 2
python 05_logit.py
- Create Figures 1, 3, and S1-S3
python 06_plot_variants.py 06_fig-s1.py 06_fig-s2.py
1: Datasets are not provided. Users must obtain data from GISAID after signing a GISAID Data Use Agreement.