GithubHelp home page GithubHelp logo

geng-lee / transindel Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cauyrd/transindel

0.0 0.0 0.0 39 KB

Indel caller for DNA-seq or RNA-seq

License: GNU General Public License v3.0

Python 100.00%

transindel's Introduction

Introduction

transIndel is used to detect indels (insertions and deletions) from DNA-seq or RNA-seq data by parsing chimiric alignments from BWA-MEM.

Prerequisites

Samtools/1.0 or newer (http://www.htslib.org/) Python 3.6 or newer (https://www.python.org/) Python packages:

Getting Soure Code

git clone git://github.com/cauyrd/transIndel.git
cd transIndel

Running transIndel

STEP 1: Build new BAM file with redefined CIGAR string

  • analyzing DNA-seq data (whole genome seq/exome-seq/targeted capture)

     python transIndel_build_DNA.py -i input_bam_file -o output_bam_file [options]
    
  • analyzing RNA-seq data

     python transIndel_build_RNA.py -i input_bam_file -r reference_genome_fasta -g gtf_file -o output_bam_file [options]
    

Options:

-h, --help            show this help message and exit
-i INPUT, --input INPUT
                    Input BAM file
-o OUTPUT, --output OUTPUT
                    Output BAM file
-r REF, --ref REF     reference genome used for analyzing RNA-seq data
-g GTF, --gtf GTF     gene annotatino file used for analyzing RNA-seq data
-s SPLICE_BIN, --splice_bin SPLICE_BIN
                    splice site half bin size (default: 20)
-m MAPQ, --mapq MAPQ  minimal MAPQ of read from BAM file for supporting
                    Indel (default: 15)
-l LENGTH, --length LENGTH
                     Maximum deletion length to be detected (default:1000000)
-v, --version         show program's version number and exit

Input:

input_bam_file   			:input BAM file is produced by BWA-MEM and is sorted and indexed.
reference_genome_fasta (for RNA-seq)    :reference genome in FastA format
gtf_file (for RNA-seq)    		:gene annotation file in GTF format

Output:

your_output_bam_file			:BAM file for CIGAR string redefinement.

transIndel generates the following optional fields in output BAMs

Tag| Meaning
--------------------------------------------------------------------------------------
OA | original representative alignment; format: (pos,CIGAR)
JM | splicing junction reads; infered from GTF or splicing motif (used in RNA-seq BAM)

STEP 2: Call indel

  • Option 1: using transIndel_call.py script
     python transIndel_call.py -i input_bam_from_transIndel_build -o output_vcf_filename_prefix [options]	
    

Options:

-h, --help            show this help message and exit
-i INPUT, --input INPUT
                        Input BAM file
-o OUTPUT, --output OUTPUT
                    output VCF file prefix
-c AO, --ao AO        minimal observation count for Indel (default: 4)
-d DEPTH, --depth DEPTH
                 minimal depth to call Indel (default: 10)
-f VAF, --vaf VAF     minimal variant allele frequency (default: 0.1)
-l LENGTH, --length LENGTH
                 minimal Indel length (>=1) to report (default: 10)
-m MAPQ, --mapq MAPQ  minimal MAPQ of read from BAM file to call Indel
                 (default: 15)
-t REGION             Limit analysis to targets listed in the BED-format
                 FILE or a samtools region string
-v, --version         show program's version number and exit

Input:

input_bam_file   			:input BAM file is produced by transIndel_build.py

Output:

output_vcf_file   			:Reported Indels with VCF format
  • Option 2: using existing variant caller (e.g. VarDict, freebayes, GATK)
     following the specific variant caller's manual
    

transindel's People

Contributors

cauyrd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.