GithubHelp home page GithubHelp logo

444thliao / amplicon_workflow Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 3.0 55.66 MB

amplicon analysisng pipelines based on luigi with qiime2-dada2/ qiime2-deblur/vsearch/usearch

License: GNU General Public License v3.0

Python 97.74% Perl 0.56% R 1.71%
amplicon-pipeline luigi-pipeline

amplicon_workflow's Introduction

16s pipelines based on luigi

For better handle different pipelines with vsearch/usearch, qiime2-deblur or qiime2-dada2. We present a unified pipelines which you could perforom different analysis with single data_input.tab

pipelines overview

red arrow means: combine with multiple down stream tasks.

installation

qiime2

Because this pipelines are embed with qiime2 pipelines, it is better to install a qiime2 environment.

Just follow the official instruction qiime2 like that :

wget https://data.qiime2.org/distro/core/qiime2-2019.4-py36-linux-conda.yml
conda env create -n qiime2-2019.4 --file qiime2-2019.4-py36-linux-conda.yml

vsearch == 2.6.2

Just follow the official instruction vsearch like that :

wget https://github.com/torognes/vsearch/archive/v2.6.2.tar.gz
tar xzf v2.6.2.tar.gz
cd vsearch-2.6.2
./autogen.sh
./configure
make
make install # (optional)  # as root or sudo make install

script requires

Requirements could follow requirements.txt

Using pip install -r requirements.txt

or install its inside the environment of qiime2

source active qiime2-2019.4
pip install -r requirements.txt

conf

there are multiple config file need to be adjusted.

  • conf of pipelines: config/soft_db_path.py
  • conf of fastq_screen db: dir_of_fastq_screen/fastq_screen.conf

testing

Within environment of qiime2 (if you don't want to perform qiime2 relative analysis pipelines), you could ignore this.

Just type:

source activate qiime2-2019.4
python3 main.py test -o ~/temp/16s_testdata --local-scheduler

It will run all three pipelines with testing data contained at testset directory.

Because there have --local-scheduler, it will not monitor by luigid. More detailed about the scheduler and luigid, please follow luigi Central Scheduler doc

QuickStart

After installation, you need to run testdata first to validate all software and required database is installed.

When everything is ready, you may have your own pair-end sequencing data.

Following the header and separator of config.data_input.template, fulfill a new data_input.tab.

With this tab, you could run:

python3 ~main.py run -- workflow --tab data_input.tab --odir output_dir --analysis-type otu --workers 4 --log-path output_dir/cmd_log.txt --local-scheduler

Besides the params --tab, --odir, --analysis-type, --log-path, other params are luigi implemented.

Here describe a little bit about these params. For more detailed, you should check the documentation of luigi at luigi doc

  • --tab: given a path(could be relative/absolute) of input_data.tab
  • --odir: jus the path of output directory. a little be need to say is that, different pipelines like otu, deblur, dada2, it will separately located the final output of different pipelines. So don't worry using same output dir will confuse the result.
  • --analysis-type: for now, three options including otu, deblur, dada2 could be selected, if you want to perform all at once. You could pass all param to it. Because there are a lot of overlapped tasks among three different pipelines, it would save a lot of time than running these separately with different odir.
  • --log-path: it just record the cmd history.(optional)
  • --workers: it could control how many tasks could be parallel.

about the input_data.tab

If you look at the config.data_input.template, there are only three header.

Tab is taken as separator of this input_data.tab for better handle some weird filename.

Inside this iniput_data.tab, you could append more columns besides the necessary three columns(sample ID R1 R2). This pipelines only check you have these three instead of only have these three.

Problems

1. Could I use barcoded/multiplexed data?

If you have a multiplexed data and you want to use this pipelines. First, you need to use a demux tool to demutiplex your data, and fulfill a data_input.tab with generated/demutiplexed data.

If you are not sure which demux tool to use, you could use our demux pipelines instead of main pipelines.

please following README.md at static directory.

2. Could I tune the parameter of this pipelines?

All parameter have been embedd into a unify directory called config. If you want to change the path of software/database, you could see config/soft_db_path.py. If you want to change the parameter of otu/deblur/dada2, you could see config/default_params.py.

config/default_file_structures.py is not yet used at pipelines, so changing it would not change the result.

3. Error raised by dada2?

Maybe because the version of r-base in global environment is higher than the installed r-base version inside the conda environment. So please united the version of R in global and env.

Feedback

Please file questions, bugs or ideas to the Issue Tracker

Authors

Tianhua Liao email: [email protected]

amplicon_workflow's People

Contributors

444thliao avatar lam-c avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.