GithubHelp home page GithubHelp logo

lebeerlab / nf-metagenome-seq-kraken-bracken Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 28.53 MB

Nextflow pipeline for metagenome sequencing reads

Dockerfile 0.73% Python 41.76% R 11.45% Nextflow 46.06%

nf-metagenome-seq-kraken-bracken's Introduction

Kraken-Bracken Pipeline

Metagenomics analysis pipeline using kraken for OTU detection and bracken for correction of detected abundance values.

Input

You can check all input arguments by using the --help flag of the pipeline. All inputs and options can be modified either from the command line or directly by changing their default value in the nextflow.config file.
The following arguments are required:

reads

Location of the input fastq files. Example:

--reads '/data/samples/*_R{1,2}_001.fastq.gz' for paired end reads --reads 'data/samples/*.fastq.gz' for single end reads

The name of the path is provided in quotes and a * glob pattern is used to find all fastq files.
It is possible to specify whether the reads are paired or with the parameter pairedEnd. If they are paired, it is necessary to use {1,2} notation to specify read pairs.

krakendb

Path to Kraken database. Before execution of the pipeline it is wise to copy the database to a ramdisk to improve read/write speed.

Output

ReadCountsFiltered

Kraken

Bracken

Help

$ nextflow run main.nf --help
N E X T F L O W  ~  version 22.10.3
Launching `main.nf` [golden_wiles] DSL2 - revision: e005bf7aee
WARN: Access to undefined parameter `min_size` -- Initialise it to a default value eg. `params.min_size = some_value`

 Name: nf-kraken2-bracken
 Author: LAMB (UAntwerp)
=========================================
Required arguments:
  --reads                   Path to directory with input samples. If using paired reads
                            they need to be captured using a glob expression such as the following:
                            data/samples/*_R{1,2}_001.fastq.gz

  --krakendb                Path to kraken database.
Optional arguments:

  --help  --h               Shows this help page
  --test_pipeline           Run a test of the pipeline on SRR2085099 and print the 10 most abundant taxa at the end of the pipeline.
  --debug                   Run on a small subset of samples, for debugging purposes.
  --outdir                  The output directory where the results will be saved. Defaults to ./results

  --pairedEnd               Specifies if reads are paired-end (true | false). Default = true
  --min_reads               Minimum amount of reads needed for analysis. Default = null

  --truncLen                Truncation length used by fastp. Default = 0
  --trimLeft --trimRight    Trimming on left or right side of reads by fastp. Default = 0
  --minLen                  Minimum length of reads kept by fastp. Default = 50
  --maxN                    Maximum amount of uncalled bases N to be kept by fastp. Default = 2

  --b_treshold              Minimum base quality used in classification with Kraken2.
  --confidence              The confidence used in Kraken2 classfication. Default = 0

  --bracken_treshold        The minimum number of reads required for a classification at a specified rank.

Usage example:
    nextflow run main.nf --reads '/path/to/reads' --krakendb '/path/to/krakendb/'


nf-metagenome-seq-kraken-bracken's People

Contributors

theoafidian avatar

Stargazers

Stijn Wittouck avatar

Watchers

Stijn Wittouck avatar  avatar

nf-metagenome-seq-kraken-bracken's Issues

Add normalization of abundance by genome size

Add a step that takes the output of Bracken and divides the abundance by the genome size of the classified taxa. Then multiply by standard size (e.g. 10e6) to get the counts per x basepairs.

Validate output test pipeline

Write a piece of code (within pipeline?) to validate the output in case of test_pipeline=true against an expected result.

Add Method to generate read distribution

A new method needs to be written to asses the readlength distribution per sample and output the maximum length from (50,100,200bp) for which >=80% of total reads conform to that definition.

Alternatively this tool would keep 90%(?) of reads before filtering and a manual inspection of the fastp files after the workflow can confirm wether or not enough reads are kept

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.