GithubHelp home page GithubHelp logo

pacbio-hifi-sequencing-pipeline's Introduction

PacBio-Hifi-Sequencing-Pipeline

This document outlines the steps to process a PacBio HiFi bam file using pbmm2 for alignment to GRCh38, generate alignment stats, call structural variants using pbsv, and call small variants using DeepVariant.

Requirements

  • PacBio HiFi bam file
  • Snakemake
  • pbmm2
  • GRCh38 reference genome
  • samtools
  • bedtools
  • pbsv
  • DeepVariant
  • GATK

Steps

1. Align PacBio HiFi reads to GRCh38 reference genome using pbmm2

bash Copy code pbmm2 align --sort -j 24 -J 8 -v
-o {output_directory}/aligned.bam
-r {path_to_reference}/GRCh38.fa
{input_bam_file} -j 24 and -J 8 indicate the number of threads to use for multithreading -v is used to output the log file in a verbose format output_directory is the path to the directory where the aligned bam file will be saved path_to_reference is the path to the reference genome input_bam_file is the path to the input PacBio HiFi bam file

2. Generate alignment stats using samtools

samtools stats {output_directory}/aligned.bam > {output_directory}/aligned.stats This generates the alignment statistics for the aligned bam file, which can be used for QC purposes.

3. Call structural variants using pbsv

pbsv discover
-o {output_directory}/structural_variants.vcf
-b {output_directory}/aligned.bam
-r {path_to_reference}/GRCh38.fa This generates a VCF file of structural variants detected by pbsv.

4. Call small variants using DeepVariant

Create a GVCF file

python deepvariant_runner.py
--ref {path_to_reference}/GRCh38.fa
--reads {output_directory}/aligned.bam
--model_type=WGS
--output_gvcf={output_directory}/output.gvcf.gz
--num_shards=24

Joint call the GVCF file

gatk --java-options "-Xmx50g" GenotypeGVCFs
-R {path_to_reference}/GRCh38.fa
-V {output_directory}/output.gvcf.gz
-O {output_directory}/output.vcf.gz This generates a VCF file of small variants detected by DeepVariant.

Note: deepvariant_runner.py is a script provided by the DeepVariant team that takes the bam file as input and runs it through the DeepVariant pipeline to generate the GVCF file.

Conclusion This pipeline can be used to process PacBio HiFi sequencing data for alignment to GRCh38, generate alignment stats, call structural variants using pbsv, and call small variants using DeepVariant.

pacbio-hifi-sequencing-pipeline's People

Contributors

nicholke avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.