Metabolism Compositional Data Analysis (CoDa)

Create KEGG ortholog counts matrix

Download KEGG and `kofamscan` databases

The following two scripts will create a databases directory containing sub-directories: databases/{kegg,kofamscan}. If preferred, these output directories can be manually changed in the scripts.

The first script downloads databases and places them in their corresponding sub-directory: databases/kegg/{brite,pathways}

bash 00-download-kegg-dbs.sh

This second script downloads databases and places them in their corresponding sub-directory: databases/kofamscan

bash 00-download-kofamscan-dbs.sh

Download reference genomes

The following script will download reference genomes using ncbi's datasets command line tool.

This will first need to be installed and available in the environment before running the following script.

# Download ncbi datasets tool
mamba install ncbi-datasets

bash 00-download-ncbi-reference-genomes.sh

Annotate MAGs and reference genomes using `kofamscan`

kofamscan may be downloaded using mamba:

`kofamscan` environment setup

mamba create -n kofamscan -c bioconda kofamscan pandas

conda activate kofamscan

# with kofamscan env active
01-kofamscan-mags-and-refs.sh

Create processed results of KEGG ortholog annotations

tabulate `kofamscan` results environment setup

mamba create -c bioconda -n autometa autometa -y

conda activate autometa

Tabulate results

# with autometa env active
02-tabulate-kofamscan-results.sh

Feature analysis app

Create `feature-analysis-app` env

mamba env create -f=feature-analysis-app.environment.yml

conda activate feature-analysis-app

Run feature analysis app

matrix="processed/kofamscan_results_matrix.tsv"
table="processed/kofamscan_results_table.tsv"
# The below embedding paths are files generated by:
# 02-tabulate-kofamscan-results.sh
# Choose one of the paths represented in $embedding below
embedding="processed/kofamscan_results_{clr,ilr}_{umap,densmap,bhsne}.tsv"

python src/feature-analysis-app.py \
    --matrix $matrix \
    --table $table \
    --embedding $embedding

feature analysis app usage

(feature-analysis-app) evan@userserver:~/metabolismCoDa$ ./src/feature-analysis-app.py -h
usage: feature-analysis-app.py [-h] --matrix MATRIX --table TABLE --embedding EMBEDDING [--debug]

options:
  -h, --help            show this help message and exit
  --matrix MATRIX       path to kofamscan_results_matrix.tsv
  --table TABLE         path to kofamscan_results_table.tsv
  --embedding EMBEDDING
                        path to kofamscan_results_embedding.tsv
  --debug               Set app.debug to True

Explainer Dashboard app

Create `explainer-dashboard-app` env

mamba env create -f=explainer-dashboard.environment.yml

conda activate explainer-dashboard-app

Explainer dashboard app usage

(explainer-dashboard-app) evan@userserver:~/metabolismCoDa$ python src/explainer-dashboard.py -h
usage: explainer-dashboard.py [-h] --matrix MATRIX --ko-data KO_DATA [--factor-name FACTOR_NAME] [--n-estimators N_ESTIMATORS] [--n-jobs N_JOBS] [--host HOST] [--port PORT]

options:
  -h, --help            show this help message and exit
  --matrix MATRIX       Path to kofamscan_results_matrix.tsv (default: None)
  --ko-data KO_DATA     Path to metabolism_feature_analysis.tsv (downloaded from feature-analysis-app.py) (default: None)
  --factor-name FACTOR_NAME
                        Factor to use for modeling feature analysis (default: None)
  --n-estimators N_ESTIMATORS, -T N_ESTIMATORS
                        Number of trees to use for training RandomForestClassifier (default: 50)
  --n-jobs N_JOBS       Parallelizes jobs using joblib. For now only used for calculating permutation importances. (default: None)
  --host HOST           Host address to use for dashboard (default: 0.0.0.0)
  --port PORT           Port number to use for dashboard (default: 8855)

Example usage

Get available factor names

If unsure what factor names are available, omit the --factor-name argument and the program will print the available columns then exit.

matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
python src/explainer-dashboard.py \
    --matrix $matrix \
    --ko-data $ko_data

Run explainer-dashboard-app

Determining permutation importances and other metadata may take some time...

matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
factor_name="Endobugula Grouping"
python src/explainer-dashboard.py \
    --matrix $matrix \
    --ko-data $ko_data \
    --factor-name "${factor_name}" \
    --n-estimators 50 \
    --n-jobs 48

Tunnel/attach to remote

Create tunnel using tmux

# syntax
# ssh -L localport:host:remoteport
ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC

Attach to existing tunnel using tmux

ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC a

NOTE: Whatever is specified as remoteport should be provided using --port when calling explainer-dashboard-app.py

wiscevan / metabolismcoda Goto Github PK

metabolismcoda's Introduction

Metabolism Compositional Data Analysis (CoDa)

Create KEGG ortholog counts matrix

Download KEGG and kofamscan databases

Download reference genomes

Annotate MAGs and reference genomes using kofamscan

kofamscan environment setup

Create processed results of KEGG ortholog annotations

tabulate kofamscan results environment setup

Tabulate results

Feature analysis app

Create feature-analysis-app env

Run feature analysis app

feature analysis app usage

Explainer Dashboard app

Create explainer-dashboard-app env

Explainer dashboard app usage

Example usage

Get available factor names

Run explainer-dashboard-app

Tunnel/attach to remote

Create tunnel using tmux

Attach to existing tunnel using tmux

metabolismcoda's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org