GithubHelp home page GithubHelp logo

ngsclean's Introduction

ngsClean

DO NOT USE, UNDER DEVELOPMENT

Scripts for initial data filtering. These have been shamelessly copied from https://github.com/fgvieira/ngsClean and slightly modified.

These scripts are a simple way to go from raw FASTQ reads to BAM files, by filtering reads, sites, and samples.

Briefly, we can wrap up this process in 4 main steps:

  1. Reads QC.

    ./filter_reads.sh file_1.fastq file_2.fastq

If file_2.fastq ommited assumes single end reads. This script reads an input file called filter_reads.paths.txt with paths to software, and filter_reads.options.txt with parameters to be used for filtering. Examples of these files are given.

  1. Read mapping. Not included here! Use your favorite read mapping program (eg. BWA, Bowtie2, SNAP, ...)

  2. Sites and samples QC.

    ./filter_SNP.sh run_ID $N_IND $CHR bam_list.txt $REF_SEQ $ANC_SEQ

Again, this script reads an input file called filter_SNP.paths.txt with paths to software, and filter_SNP.options.txt with parameters to be used for filtering.

The output from this step is a BED file with the positions that passed the quality filters. This file can be used on all downstream analyses.

  1. Check VCF and SFS to assess overall quality of data. % Rscript analyze_vcf.R myvariants.vcf output_file

Script Description

filter_reads.sh Script to perform raw reads QC. Namely, read filtering, quality trimming, adaptor trimming, PE read merging,...

filter_SNP.sh SNP QC based on min/max depth, HWE, quality bias, strand bias, ... It calls the scripts below.

get_mut_bias.pl Script to plot the mutation frequency bias along the read. It was developed to deal with ancient DNA.

get_depth_thresh.R R script to automatically define the upper and lower depth coverage limits on SNPcleaner. It fits a mixture of Gamma and Neg-Binomial distribution to the empirical read coverage distribution.

SNPcleaner.pl Filters sites from VCF files following a set of rules.

analyze_vcf.R Script to visualize aspects of data in VCF files.

ngsclean's People

Watchers

Matteo Fumagalli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.