GithubHelp home page GithubHelp logo

robertoalvarezm / 18s-pipeline-dada2-r Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shu251/18s-pipeline-dada2-r

0.0 0.0 0.0 66.42 MB

Pipeline for running DADA2 in R specific for microbial eukaryotic work.

R 1.55% Python 3.85% Shell 0.36% Jupyter Notebook 94.24%

18s-pipeline-dada2-r's Introduction

18s-pipeline-dada2-R

Pipeline for running DADA2 in R specific for microbial eukaryotic work.

(1) Perform initial QC & removal primers

For high-throughput sequence quality control and primer removal, I use snakemake.

# build conda environment to run snakemake
conda env create --name snake-qc --file envs/snake.yaml

# Enter this conda environment
conda activate snake-qc

Set up working directory

  1. Modify your config.yaml file to tell snakemake where to look for raw sequences, the format of those sequences, and where you want the output trimmed reads. Use a text editor like nano to modify this file.
  2. Raw fastq sequences should be

(2) DADA2 pipeline in R

library(dada2)

## pending ##


Assign taxonomy to reference sequences

Use assignTaxonomy() from the dada2 package to assign taxonomy. Below code can also be used to assign taxonomy to other sequences.

(1) Bring sequences for assignment into R

Extract reference sequences from qiime2 run:

# Enter qiime2 environment
qiime tools export --input-path ref-sequences.qza  
	--output-path ref-seqs/

Output will be a new directory called ref-seqs/, inside is a .fna file with all reference sequences. Fasta headers are Feature.IDs from output ASV or OTU table.

Now import those reads into R to assign taxonomy from dada2.

## R v3.6.1

# Read in reference sequence file, create dataframe with Feature.ID and reference sequence as columns

library(Biostrings)
fna_in <- readDNAStringSet("xxxx.fasta") #Import reference sequences
Feature.ID <- names(fna_in)
SEQUENCE <- paste(fna_in)
fna_df <- data.frame(Feature.ID, SEQUENCE)
# Dataframe with 1 column of sequence header and a 2nd column with sequence.

(2) Assign taxonomy

library(dada2)

seqs <- as.character(fna_df$SEQUENCE) #extract sequences

# Assign taxonomy. Note that for the PR2 database in dada2, you need to use the taxLevels argument as is here (do not use the default)
taxa_pr2 <- assignTaxonomy(seqs, "/vortexfs1/omics/huber/shu/db/pr2-db/pr2_version_4.12.0_18S_dada2.fasta.gz", 
	taxLevels = c("Kingdom","Supergroup","Division","Class","Order","Family","Genus","Species"), 
	multithread = TRUE, minBoot = 0, outputBootstraps = TRUE)

# See "Compile-taxonomy-assignments.ipynb" to compile with exisiting count table

(3) Compile with original fasta file or count table

See R notebook Compile-taxonomy-assignments.ipynb to compile taxonomy assignments with previous count table. This specific example compared output from qiime2 taxonomy and dada2 approach.

library(Biostrings); library(tidyverse)
## ASVs
ref_asv <- readDNAStringSet("/vortexfs1/omics/huber/shu/slo-pier-weekly/qiime2/asv/slo-pier-ref-seqs-asv.fna")
Feature.ID <- names(ref_asv)
ReferenceSequence <- paste(ref_asv)
fna_df <- data.frame(Feature.ID, ReferenceSequence)

load("Pier-assigned-refseqs.RData", verbose = T)
asv_table <- read.delim("/vortexfs1/omics/huber/shu/slo-pier-weekly/qiime2/asv/CountTable-wtax-2020-04-22.txt")

# Comile ASV results with reference sequence
asv_wtax <- data.frame(taxa_pr2) %>% 
    rownames_to_column(var = "ReferenceSequence") %>% 
    right_join(fna_df) %>% 
    unite(Taxon_dada2_boot0, starts_with("tax."), sep = ";") %>% 
    unite(Confidence_dada2, starts_with("boot."), sep = ";") %>% 
    left_join(asv_table) %>% 
    select(Feature.ID, Taxon_qiime2 = Taxon, Conf_qiime2 = Confidence, 
            Taxon_dada2_boot0, Conf_dada2_boot0 = Confidence_dada2, everything()) %>% 
    data.frame
# head(asv_wtax)

(4) Explore & reassign taxonomy from PR2

Using the list of taxa and bootstrap values for all taxonomic assignment levels, we need to determine what the most appropriate threshold is.

In R (v3.6.1)

library(tidyverse)

# Import R object that has FeatureID information, reference sequence, and PR2 assignment information from above
head(asv_wtax)


# Create new column of updated taxonomic list, based on a minBoot threshold of 70
asv_updated_tax <- asv_wtax %>% 
  type.convert(as.is = TRUE) %>%
  separate(Confidence_dada2, c("Kingdom_boot","Supergroup_boot","Division_boot","Class_boot","Order_boot","Family_boot","Genus_boot","Species_boot"), sep = ";", convert = TRUE) %>%
  separate(Taxon_dada2_boot0, c("Kingdom_lev","Supergroup_lev","Division_lev","Class_lev","Order_lev","Family_lev","Genus_lev","Species_lev"), sep = ";", convert = TRUE) %>%
  mutate(Taxon_updated = 
     case_when(
      Species_boot >= 70 ~
         select(., ends_with("_lev")) %>% 
            reduce(str_c, sep=";"),
      Species_boot < 70 & Genus_boot >= 70 ~ 
          select(., Kingdom_lev:Genus_lev) %>%
            reduce(str_c, sep=";"),
      Genus_boot < 70 & Family_boot >=70 ~
          select(., Kingdom_lev:Family_lev) %>%
            reduce(str_c, sep=";"),
      Family_boot < 70 & Order_boot >=70 ~
          select(., Kingdom_lev:Order_lev) %>%
            reduce(str_c, sep=";"),
      Order_boot < 70 & Class_boot >=70 ~
          select(., Kingdom_lev:Class_lev) %>%
            reduce(str_c, sep=";"),
      Class_boot < 70 & Division_boot >=70 ~
          select(., Kingdom_lev:Division_lev) %>%
            reduce(str_c, sep=";"),
      Division_boot < 70 & Supergroup_boot >=70 ~
          select(., Kingdom_lev:Supergroup_lev) %>%
            reduce(str_c, sep=";"),
      TRUE ~ Kingdom_lev)) %>% 
  data.frame

Output generates a new column with an updated taxonomic assignment that is concatenated (;) by the appropriate minimum bootstrap value. Option to set different levels for various taxonomic levels.

Last updated - 19-08-2020 - SKH

18s-pipeline-dada2-r's People

Contributors

shu251 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.