GithubHelp home page GithubHelp logo

matt-sd-watson / spora Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 318 KB

:mushroom: spora: Streamlined Phylogenomic Outbreak Report Analysis

Home Page: https://matt-sd-watson.github.io/spora/

License: GNU General Public License v3.0

Python 100.00%
covid-19 outbreak snakemake bioinformatics

spora's Introduction

spora: Streamlined Phylogenomic Outbreak Report Analysis

PyPI version example workflow

snakemake and Python integrated workflow for intermediate file generation for COVID outbreak analysis

Installation

git clone https://github.com/matt-sd-watson/spora.git
conda env create -f ncov_spora/environments/environment.yml
conda activate ncov_spora
cd spora
pip install . 

Updating

conda activate ncov_spora
cd ~/spora
git checkout main
git pull
pip install . 

Usage

usage: 
    	spora -c <config.yaml> 
    	OR
    	spora --focal_list ...<input args>

spora: Streamlined Phylogenomic Outbreak Report Analysis

optional arguments:
  -h, --help            Show the help output and exit.
  -c CONFIG, --config CONFIG
                        Input config file in yaml format, all command line arguments can be passed via the config file.
  -f FOCAL_SEQS, --focal-sequences FOCAL_SEQS
                        Input .txt list or multi-FASTA focal samples for outbreak. Required
  -b BACKGROUND_SEQS, --background-sequences BACKGROUND_SEQS
                        Optional input .txt list or multi-FASTA background samples to add to analysis
  -m MASTER_FASTA, --master-fasta MASTER_FASTA
                        Master FASTA of genomic sequences to select from. Required if either --focal-sequences or --background-sequences are not supplied in FASTA format
  -o OUTDIR, --output-directory OUTDIR
                        Path to the desired output directory. If none is provided, a new folder named spora will be created in the current directory
  -r REFERENCE, --reference REFERENCE
                        .gb file containing the desired COVID-19 reference sequence. Required
  -p PREFIX, --prefix PREFIX
                        Prefix string to label all output files. Default: outbreak
  -t NTHREADS, --nthreads NTHREADS
                        Number of threads to use for processing. Default: 2
  -s, --snps-only       Generate a snps-only FASTA from the input FASTA. Default: False
  -rn, --rename         Rename the FASTA headers to be compatible with NML standards. Default: False
  -nc NAMES_CSV, --names-csv NAMES_CSV
                        Use the contents of a CSV to rename the input FASTA. Requires the following column headers: original_name, new_name
  -ncs, --no-constant-sites
                        Do not enable constant sites to be used for SNPs only tree generation. Default: Enabled
  -fi, --filter         Filter both the focal and background sequences based on genome completeness and length. Default: Not enabled
  -gc GENOME_COMPLETENESS, --genome-completeness GENOME_COMPLETENESS
                        Integer for the minimum genome completeness percentage for filtering. Default: 90
  -gl GENOME_LENGTH, --genome-length GENOME_LENGTH
                        Integer for the minimum genome length for filtering. Default: 29500
  -rp, --report         Generate a summary output report for the spora run. Default: Not enabled
  -v, --version         Show the current spora version then exit.

Documentation

More detailed documentation for spora usage and functionality can be found here

Acknowledgments

Inspiration for code structure and design for spora was inspired by pangolin and civet, and minor code blocks were adopted from these software.

The Background section in the documentation describing outbreak definitions was written by Mark Horsman.

spora's People

Contributors

matt-sd-watson avatar

Stargazers

 avatar

Watchers

 avatar  avatar

spora's Issues

enable auto editing of margins for trees with tiplabs

Need to be able to have the labels for trees in the summary report to automatically be visible within the margin of ggtree. Will require scaling with xlim when rendering the tree based on the maximum branch length.

Cannot detect focal sequence names in summary report if --rename is used

In the draft version of the summary report, the txt file or multi-FASTA for both the focal and background sequences are read in for summary statistics. If any of the sample names in the tree match to the focal sequence list, they will be categorized as "focal", or "background" otherwise.

Currently this breaks if --rename is used, as the focal and background names are read from the files before the renaming, so a mismatch will occur and no focal sequences will be detected.

auto-place treescale

Have the summary report be able to figure out where to render the geom_treescale() using ggtree in the trees depending on branch locations (i.e. do not have the scale placed over a branch).

--rename with names csv does not work if filter removes sequences

--rename with --names-csv requires that all sequences supplied in the CSV exist in the multi-FASTA. if filtering removes any sequences, this breaks the fastafurious rename command.

Likely need to change the order of rule filter and rule rename to be compatible with the removal of sequences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.