GithubHelp home page GithubHelp logo

kovid-trees-nf's Introduction

Generate fast trees for clustering purposes in SARS-CoV-2

NOTE: THIS IS REPO IS IN ACTIVE DEVELOPMENT. LOTS OF CHANGES ARE TO COME.

Background

There is an ever increasing number of samples included in phylogenetic tree building for SARS-CoV-2. There are a number of complications associated with building high quality trees for cluster dedection in SARS-CoV-2 (addressed elsewhere [ADD CITATIONS])

Running the current pipeline

Requirements

Running the test dataset

  • With conda:
nextflow run MDU-PHL/kovid-trees-nf --test
  • With docker
nextflow run MDU-PHL/kovid-trees-nf -profile docker --test
  • With singularity
nextflow run MDU-PHL/kovid-trees-nf -profile singularity --test

Running with your own dataset

All you need is an alignment file in multiFASTA format as input to the pipeline.

nextflow run MDU-PHL/kovid-trees-nf --input_aln relative/path/to/alignment.aln --input_id aln_20210201

Current output files

Filename Description Step
*_replace.aln Replaces any calls that are not ACGTacgt- with - (needed for FastTree) GOALIGN_CLEAN
*_clean.aln Cleaned up alignment of gappy sequences (removes sequences >5% missing data) GOALIGN_CLEAN_SEQS
*_filtered.aln Remove gappy columns and columns with no phylogenetic value CLIPKIT
*_dedup.aln Keep only unique sequences GOALIGN_DEDUP
*_compress.aln Keep only unique columns GOALIGN_COMPRESS
*.weights The count of occurrence for each pattern in the alignment GOALIGN_COMPRESS
*.nwk The FastTree newick file FASTTREE
*_resolve.nwk Resolve multiforcations in the FastTree topology GOTREE_RESOLVE
*.raxml.bestTree RAXML-NG branch length optimised tree branches RAXMLNG_EVALUATE
*_brlen_round.nwk Round branch lengths GOTREE_BRLEN_ROUND
*_collapse_length.nwk Collapse branch lengths that are too small to be real GOTREE_COLLAPSE_LENGTH
*_repopulate.nwk Add back all the removed duplicate samples GOTREE_REPOPULATE
*_order.nwk Order nodes to make comparisons easier NWUTILS_ORDER
*_reroot.nwk Reroot the tree with the appropriate outbreak (e.g., Wuhan-1) NWUTILS_REROOT

Pipeline details

GOALIGN_REPLACE	
    goalign \
        --auto-detect \
        replace \
        -e -s '[^ACTGactg]' -n '-' \
        -t 1 \
        -o test_replace.aln \
        -i test_aln.aln
    
GOALIGN_CLEAN_SEQS	
    goalign \
        --auto-detect \
        clean \
        seqs \
        --cutoff 0.05 \
        -t 1 \
        -o test_clean.aln \
        -i test_replace.aln
    
CLIPKIT	
    clipkit \
        test_clean.aln \
        -m kpic-smart-gap \
        -l \
        -o test_filtered.aln
    
GOALIGN_DEDUP	
    goalign \
        --auto-detect \
        dedup \
        -l test.dedup \
        -t 1 \
        -o test_dedup.aln \
        -i test_filtered.aln
    
GOALIGN_COMPRESS	
    goalign \
        --auto-detect \
        compress \
        -t 1 \
        -o test_compress.aln \
        --weight-out test.weights \
        -i test_dedup.aln
    
FASTTREE	
    OMP_NUM_THREADS=1 \
        fasttree \
        -nosupport -fastest \
        -out test.nwk \
        -nt test_compress.aln
    
GOTREE_RESOLVE	
    gotree \
        resolve \
        -t 1 \
        -i test.nwk \
        -o test_resolve.nwk \
    
RAXMLNG_EVALUATE	
    raxml-ng \
    --evaluate \
    --blmin 0.0000000001 \
    --threads 1 \
    --prefix test \
    --force perf_threads \
    --model GTR+G4 \
    --tree test_resolve.nwk \
    --msa test_compress.aln \
    --site-weights test.weights
    
GOTREE_BRLEN_ROUND	
    gotree \
        brlen \
        round \
        -p 6 \
        -t 1 \
        -i test.raxml.bestTree \
        -o test_brlen_round.nwk
    
GOTREE_COLLAPSE_LENGTH	
    gotree \
        collapse \
        length \
        -l 0 \
        -t 1 \
        -i test_brlen_round.nwk \
        -o test_collapse_length.nwk
    
GOTREE_REPOPULATE	
    gotree \
        repopulate \
        -t 1 \
        -g test.dedup \
        -i test_collapse_length.nwk \
        -o test_repopulate.nwk \
    
NWUTILS_ORDER	
    nw_order \
        test_repopulate.nwk > test_order.nwk
    
NWUTILS_REROOT	
    nw_reroot \
        test_order.nwk \
        ref\|NC_045512.2\| > test_reroot.nwk

Authors

  • Anders Goncalves da Silva (@andersgs)
  • Torsten Seemann (@tseemann)

kovid-trees-nf's People

Contributors

andersgs avatar

Stargazers

Curtis Kapsak avatar Aaron Petkau avatar Davi Marcon avatar Son Nguyen avatar Dan Fornika avatar

Watchers

Torsten Seemann avatar James Cloos avatar Will Pitchers avatar Kristy Horan avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.