GithubHelp home page GithubHelp logo

moradpuor / hifiadapterfilt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sheinasim/hifiadapterfilt

0.0 0.0 0.0 70 KB

Remove CCS reads with remnant PacBio adapter sequences and convert outputs to a compressed .fastq (.fastq.gz).

License: GNU General Public License v3.0

Shell 99.29% TeX 0.71%

hifiadapterfilt's Introduction

HiFiAdapterFilt

Convert .bam to .fastq and remove reads with remnant PacBio adapter sequences

Dependencies:

  • BamTools
  • BLAST+

Optional:

  • pigz

Add script and database to your path using:

export PATH=$PATH:[PATH TO HiFiAdapterFilt]
export PATH=$PATH:[PATH TO HiFiAdapterFilt]/DB

Usage

bash pbadapterfilt.sh [ -p file Prefix ] [ -l minimum Length of adapter match to remove. Default=44 ] [ -m minimum percent Match of adapter to remove. Default=97 ] [ -t Number of threads for blastn. Default=8 ] [ -o outdirectory prefix Default=. ]

All flags are optional.

If no -p argument is provided, the script will run on all sequence files (.bam, .fastq, .fastq.gz, .fq, .fq.gz) in the working directory.

Outputs

  • {prefix}.contaminant.blastout (Output of BLAST search)
  • {prefix}.blocklist (Headers of PB adapter contaminated reads to be removed)
  • {prefix}.filt.fastq.gz (Fastq reads free of PB adapter sequence ready for assembly)
  • {prefix}.stats (File with simple math on number of reads removed, etc)

Citation

If this script is useful to you, please cite the following in your publication:

@article{HiFiAdapterFilt,
   author = {Sim, Sheina B. and Corpuz, Renee L. and Simmonds, Tyler J. and Geib, Scott M.},
   title = {HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly},
   journal = {BMC Genomics},
   volume = {23},
   number = {1},
   pages = {157},
   ISSN = {1471-2164},
   DOI = {10.1186/s12864-022-08375-1},
   url = {https://doi.org/10.1186/s12864-022-08375-1},
   year = {2022},
   type = {Journal Article}
}

Sheina B. Sim
USDA-ARS
US Pacific Basin Agricultural Research Service
Hilo, Hawaii, 96720 USA
[email protected]

This script is in the public domain in the United States per 17 U.S.C. § 105

hifiadapterfilt's People

Contributors

sheinasim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.