The following two scripts will create a databases
directory containing
sub-directories: databases/{kegg,kofamscan}
. If preferred, these output directories can be
manually changed in the scripts.
The first script downloads databases and places them in their corresponding sub-directory: databases/kegg/{brite,pathways}
bash 00-download-kegg-dbs.sh
This second script downloads databases and places them in their corresponding sub-directory: databases/kofamscan
bash 00-download-kofamscan-dbs.sh
The following script will download reference genomes using ncbi's datasets command line tool.
This will first need to be installed and available in the environment before running the following script.
# Download ncbi datasets tool
mamba install ncbi-datasets
bash 00-download-ncbi-reference-genomes.sh
kofamscan
may be downloaded using mamba
:
mamba create -n kofamscan -c bioconda kofamscan pandas
conda activate kofamscan
# with kofamscan env active
01-kofamscan-mags-and-refs.sh
mamba create -c bioconda -n autometa autometa -y
conda activate autometa
# with autometa env active
02-tabulate-kofamscan-results.sh
mamba env create -f=feature-analysis-app.environment.yml
conda activate feature-analysis-app
matrix="processed/kofamscan_results_matrix.tsv"
table="processed/kofamscan_results_table.tsv"
# The below embedding paths are files generated by:
# 02-tabulate-kofamscan-results.sh
# Choose one of the paths represented in $embedding below
embedding="processed/kofamscan_results_{clr,ilr}_{umap,densmap,bhsne}.tsv"
python src/feature-analysis-app.py \
--matrix $matrix \
--table $table \
--embedding $embedding
(feature-analysis-app) evan@userserver:~/metabolismCoDa$ ./src/feature-analysis-app.py -h
usage: feature-analysis-app.py [-h] --matrix MATRIX --table TABLE --embedding EMBEDDING [--debug]
options:
-h, --help show this help message and exit
--matrix MATRIX path to kofamscan_results_matrix.tsv
--table TABLE path to kofamscan_results_table.tsv
--embedding EMBEDDING
path to kofamscan_results_embedding.tsv
--debug Set app.debug to True
mamba env create -f=explainer-dashboard.environment.yml
conda activate explainer-dashboard-app
(explainer-dashboard-app) evan@userserver:~/metabolismCoDa$ python src/explainer-dashboard.py -h
usage: explainer-dashboard.py [-h] --matrix MATRIX --ko-data KO_DATA [--factor-name FACTOR_NAME] [--n-estimators N_ESTIMATORS] [--n-jobs N_JOBS] [--host HOST] [--port PORT]
options:
-h, --help show this help message and exit
--matrix MATRIX Path to kofamscan_results_matrix.tsv (default: None)
--ko-data KO_DATA Path to metabolism_feature_analysis.tsv (downloaded from feature-analysis-app.py) (default: None)
--factor-name FACTOR_NAME
Factor to use for modeling feature analysis (default: None)
--n-estimators N_ESTIMATORS, -T N_ESTIMATORS
Number of trees to use for training RandomForestClassifier (default: 50)
--n-jobs N_JOBS Parallelizes jobs using joblib. For now only used for calculating permutation importances. (default: None)
--host HOST Host address to use for dashboard (default: 0.0.0.0)
--port PORT Port number to use for dashboard (default: 8855)
If unsure what factor names are available, omit the --factor-name
argument
and the program will print the available columns then exit.
matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
python src/explainer-dashboard.py \
--matrix $matrix \
--ko-data $ko_data
Determining permutation importances and other metadata may take some time...
matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
factor_name="Endobugula Grouping"
python src/explainer-dashboard.py \
--matrix $matrix \
--ko-data $ko_data \
--factor-name "${factor_name}" \
--n-estimators 50 \
--n-jobs 48
# syntax
# ssh -L localport:host:remoteport
ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC
ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC a
NOTE: Whatever is specified as
remoteport
should be provided using--port
when callingexplainer-dashboard-app.py