GithubHelp home page GithubHelp logo

metabolismcoda's Introduction

Metabolism Compositional Data Analysis (CoDa)

Create KEGG ortholog counts matrix

Download KEGG and kofamscan databases

The following two scripts will create a databases directory containing sub-directories: databases/{kegg,kofamscan}. If preferred, these output directories can be manually changed in the scripts.

The first script downloads databases and places them in their corresponding sub-directory: databases/kegg/{brite,pathways}

bash 00-download-kegg-dbs.sh

This second script downloads databases and places them in their corresponding sub-directory: databases/kofamscan

bash 00-download-kofamscan-dbs.sh

Download reference genomes

The following script will download reference genomes using ncbi's datasets command line tool.

This will first need to be installed and available in the environment before running the following script.

# Download ncbi datasets tool
mamba install ncbi-datasets
bash 00-download-ncbi-reference-genomes.sh

Annotate MAGs and reference genomes using kofamscan

kofamscan may be downloaded using mamba:

kofamscan environment setup

mamba create -n kofamscan -c bioconda kofamscan pandas 
conda activate kofamscan
# with kofamscan env active
01-kofamscan-mags-and-refs.sh

Create processed results of KEGG ortholog annotations

tabulate kofamscan results environment setup

mamba create -c bioconda -n autometa autometa -y
conda activate autometa

Tabulate results

# with autometa env active
02-tabulate-kofamscan-results.sh

Feature analysis app

Create feature-analysis-app env

mamba env create -f=feature-analysis-app.environment.yml
conda activate feature-analysis-app

Run feature analysis app

matrix="processed/kofamscan_results_matrix.tsv"
table="processed/kofamscan_results_table.tsv"
# The below embedding paths are files generated by:
# 02-tabulate-kofamscan-results.sh
# Choose one of the paths represented in $embedding below
embedding="processed/kofamscan_results_{clr,ilr}_{umap,densmap,bhsne}.tsv"

python src/feature-analysis-app.py \
    --matrix $matrix \
    --table $table \
    --embedding $embedding

feature analysis app usage

(feature-analysis-app) evan@userserver:~/metabolismCoDa$ ./src/feature-analysis-app.py -h
usage: feature-analysis-app.py [-h] --matrix MATRIX --table TABLE --embedding EMBEDDING [--debug]

options:
  -h, --help            show this help message and exit
  --matrix MATRIX       path to kofamscan_results_matrix.tsv
  --table TABLE         path to kofamscan_results_table.tsv
  --embedding EMBEDDING
                        path to kofamscan_results_embedding.tsv
  --debug               Set app.debug to True

Explainer Dashboard app

Create explainer-dashboard-app env

mamba env create -f=explainer-dashboard.environment.yml
conda activate explainer-dashboard-app

Explainer dashboard app usage

(explainer-dashboard-app) evan@userserver:~/metabolismCoDa$ python src/explainer-dashboard.py -h
usage: explainer-dashboard.py [-h] --matrix MATRIX --ko-data KO_DATA [--factor-name FACTOR_NAME] [--n-estimators N_ESTIMATORS] [--n-jobs N_JOBS] [--host HOST] [--port PORT]

options:
  -h, --help            show this help message and exit
  --matrix MATRIX       Path to kofamscan_results_matrix.tsv (default: None)
  --ko-data KO_DATA     Path to metabolism_feature_analysis.tsv (downloaded from feature-analysis-app.py) (default: None)
  --factor-name FACTOR_NAME
                        Factor to use for modeling feature analysis (default: None)
  --n-estimators N_ESTIMATORS, -T N_ESTIMATORS
                        Number of trees to use for training RandomForestClassifier (default: 50)
  --n-jobs N_JOBS       Parallelizes jobs using joblib. For now only used for calculating permutation importances. (default: None)
  --host HOST           Host address to use for dashboard (default: 0.0.0.0)
  --port PORT           Port number to use for dashboard (default: 8855)

Example usage

Get available factor names

If unsure what factor names are available, omit the --factor-name argument and the program will print the available columns then exit.

matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
python src/explainer-dashboard.py \
    --matrix $matrix \
    --ko-data $ko_data

Run explainer-dashboard-app

Determining permutation importances and other metadata may take some time...

matrix="processed/kofamscan_results_matrix.tsv"
ko_data="metabolism_feature_analysis_data.tsv"
factor_name="Endobugula Grouping"
python src/explainer-dashboard.py \
    --matrix $matrix \
    --ko-data $ko_data \
    --factor-name "${factor_name}" \
    --n-estimators 50 \
    --n-jobs 48

Tunnel/attach to remote

Create tunnel using tmux
# syntax
# ssh -L localport:host:remoteport
ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC
Attach to existing tunnel using tmux
ssh -L 8855:127.0.0.1:8855 deep-thought -t /home/evan/miniconda3/bin/tmux -CC a

NOTE: Whatever is specified as remoteport should be provided using --port when calling explainer-dashboard-app.py

metabolismcoda's People

Contributors

evanroyrees avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.