GithubHelp home page GithubHelp logo

femiliani / circuitseq-kit14 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mckennalab/circuitseq

0.0 0.0 0.0 115.43 MB

Nanopore Plasmid Sequencing

License: Other

Shell 1.89% JavaScript 55.97% Python 14.73% R 5.48% Jupyter Notebook 13.48% Dockerfile 1.65% Nextflow 6.81%

circuitseq-kit14's Introduction

Circuit-seq

CircuitSeq is a pipeline to assemble and analyze plasmids from Nanopore long-read sequencing. We use a number of tools; reads are first basecalled and demultiplexed using Guppy, filtering out chimeric reads and short reads (with Porechop and NanoFilt), correcting with the reads with Canu, and ultimately assembling with Flye Miniasm, These assemblies are then polished with Medaka.

Setup

Things you'll need if running from fast5s:

  • location of your fast5 from your Oxford Nanopore run
  • A server with a GPU with CUDA correctly setup (currently CUDA 11.2 is best to match TensorFlow support)
  • Singularity (preferred) or Docker (fine, just requires admin permissions), configured with access for your username

Things you'll need if running with basecalled data:

  • location of your fastq directory
  • location of the sequencing_summary.txt file (usually found in the basecalling dir)
  • Singularity (preferred) or Docker (fine, just requires admin permissions), configured with access for your username

The computational pipeline starts from a directory of raw Nanopore fast5 files, you can also skip the basecalling step if you already have basecalled your data. Circuit-seq uses the Nextflow pipeline engine to move data through each step and output assemblies, plasmid assessments, and other information about each plasmid.

Install Nextflow

Directions are on their website: https://www.nextflow.io/, no administrator permissions needed Ideally put nextflow in your PATH

Singularity setup

The tools used by Circuit-seq are often complex to install and have many dependencies. To make this easier we've packaged all the tools into a single Docker container which can be used by either the Singularity or Docker container engines. You'll need to have either Singularity (with user control of binds or Docker running on the system you want to run Circuit-seq. Unfortunately running Docker on some HPC nodes requires permissions which you may have to set up with your institution; Singularity is generally a better option on shared systems.

It's easiest to pre-download the Singularity container into the current directory (and remove it when you're done). This is done with the following command:

 singularity pull plasmidassembly.sif docker://aaronmck/plasmidassembly:1_0_1

This will create a file called plasmidassembly.sif in the current working directory with the fully packaged Singularity container.

Older versions of singularity (v2 and below) can be downloaded with:

 singularity build plasmidassembly.sif docker://aaronmck/plasmidassembly:1_0_1

Circuit-seq setup

  1. Clone the Circuit-seq repository into the directory where you'll perform data analysis:
git clone https://github.com/mckennalab/Circuitseq/
  1. Prepare a sample sheet. An example sample sheet is provided in this repository in the example directory. This is a tab-delimited file with the following headers: position, sample, reference.

    • position: the number of the barcode well you used for this sample (e.g 01-96)
    • sample: the sampleID, which can be a plasmid name or a alphanumeric code (no special characters or spaces)
    • reference: you can provide the location where the known fasta reference is located. Due to some Nextflow weirdness, this can't be directly in the run directory (but a subdirectory like ./references/ is fine). If you have a reference, this is worth setting up, it will allow the Circuit-seq pipeline to do quality assessment on the assembly and give you aligned BAM files even when the assembly fails. If you don't have a reference simple fill in this column with NA.
  2. Create a copy of the run_nf.sh shell script found in the pipelines directory and modify the following parameters:

    • Path to nextflow if it is not in your path
    • Path to the CircuitSeq.nf pipeline file found in /pipelines
    • Path to the nextflow.config file found in /pipelines
    • Path to your samplesheet
    • Choose if you are running from fast5 or fastq by changing use_existing_basecalls to false or true, respectively
    • If running from fast5 provide a path to --fast5 directory and leave --basecalling_dir and --base_calling_summary_file as ""
    • If running from fastq leave --fast5as "" and provide fastq directory for--basecalling_dir and sequencing_summary.txt file for--base_calling_summary_file
#It is safest to use absolute paths  
NXF_VER=21.10.6 nextflow run <path to /pipelines/CircuitSeq.nf> \
           --GPU ON \
           -c <path to /pipelines/nextflow.config> \
           -with-singularity <path to .sif file> \
           --samplesheet <path to sample_sheet.tsv> \
           --use_existing_basecalls <false if from fast5, true if from fastq> \
           --fast5 <path to fast5 directory, use "none" if starting from fastq> \
           --basecalling_dir <path_to_fastq_dir, use "none" if starting from fast5> \
           --base_calling_summary_file <path_to_summary.txt, use "none" if starting from fast5> \
           --barcodes /plasmidseq/barcodes/v2/ \
           --barcode_kit "MY-CUSTOM-BARCODES" \
           --guppy_model dna_r9.4.1_450bps_sup.cfg \
           --medaka_model r941_min_sup_g507 \
           --gpu_slot cuda:0 \
           --barcode_min_score 65 \
           --quality_control_processes true \
           -resume
	   

#To use nanopore barcoding kits you can change: 
#--barcodes to /plasmidseq/barcodes/nanopore_official/
#--barcode_kit to the name of the barcode kit you used (this is with guppy v5.0.16 names which can be found in our barcodes/nanopore_official directory on the github. 
#our --barcode_min_score was set based on our demultiplexing data to achieve best sensititivy/specificity, if you find that using nanopore barcodes you are losing too many reads or getting too much noise you can change this parameter accordingly. 

  1. Finally, once you have modified the files mentioned above, you can run the pipeline by running:
bash <shell_script_you_modified_just_above_here.sh>

Barcodes and custom barcodes

The nanopore official barcodes are already included in the docker container, if you use the nanopore kits you need to set:

 --barcodes /plasmidseq/barcodes/nanopore_official/ \
 --barcode_kit "kit-name-in-quotes" \

The kit name is the one you would use for normal guppy demulitplexing, eg: "SQK-RBK110-96"

If you would like to use your own barcodes you can by mounting your custom files in the singularity container by adding --bind /source/directory:/location/in/container to the run script and then changing --barcodes to point to where you mounted them. e.g.:

NXF_VER=21.10.6 nextflow run <path to /pipelines/CircuitSeq.nf> \
           --bind /my/custom/barcodes:/mnt/barcodes
	   --GPU ON \
           -c <path to /pipelines/nextflow.config> \
           -with-singularity <path to .sif file> \
           --samplesheet <path to sample_sheet.tsv> \
           --use_existing_basecalls <false if from fast5, true if from fastq> \
           --fast5 <path to fast5 directory, use "none" if starting from fastq> \
           --basecalling_dir <path_to_fastq_dir, use "none" if starting from fast5> \
           --base_calling_summary_file <path_to_summary.txt, use "none" if starting from fast5> \
           --barcodes /mnt/barcodes/ \
           --barcode_kit "MY-NEW-CUSTOM-BARCODES" \
           --guppy_model dna_r9.4.1_450bps_sup.cfg \
           --medaka_model r941_min_sup_g507 \
           --gpu_slot cuda:0 \
           --barcode_min_score 65 \
           --quality_control_processes true \
           -resume

Test data

If you want to test out the pipeline we have added a downsampled fast5 and fastqs in the example_data directory. There you will find an explanation of how to run each example.

pLannotate your plasmids

By popular demand we are trying to implement a quick way of annotating your plasmids using Matt McGuffie's pLannotate. If you are interested please go to the plannotate directory and follow the readme.

Learn more / cite our publication

For more information please refer to our publication. If you find this work useful, also consider citing our work:

@article{doi:10.1021/acssynbio.2c00126,
author = {Emiliani, Francesco E. and Hsu, Ian and McKenna, Aaron},
title = {Multiplexed Assembly and Annotation of Synthetic Biology Constructs Using Long-Read Nanopore Sequencing},
journal = {ACS Synthetic Biology},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1021/acssynbio.2c00126},
    note ={PMID: 35695379},

URL = {
        https://doi.org/10.1021/acssynbio.2c00126

}

circuitseq-kit14's People

Contributors

aaronmck avatar femiliani avatar francs50 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.