GithubHelp home page GithubHelp logo

slowkow / saturation Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 63 KB

:sponge: Estimate sequencing saturation for GEX, VDJ, and ADT data from the 10x Genomics platform.

License: MIT License

R 100.00%
10xgenomics bioinformatics gene-expression ngs-analysis vdj

saturation's Introduction

saturation.R

DOI

Table of contents:

Introduction

Here is an R script saturation.R for estimating sequencing saturation from a GEX, VDJ, or ADT dataset from the 10x Genomics platform.

The script uses the binomial distribution to downsample the reads and estimate a saturation curve. This can be helpful to determine if a sequencing experiment has enough reads.

10xgenomics.com gives us this formula for sequencing saturation:

Sequencing Saturation = 1 - (n_deduped_reads / n_reads)

Here is my illustration of the relationship between sequencing saturation and reads per unique molecular identifier (UMI):

We can compute reads per UMI from the saturation, and vice versa:

d <- data.frame(sat = seq(0, 1, length.out = 1001))
d$rpu <- 1 / (1 - d$sat)

Learn more about sequencing saturation from the 10xgenomics.com documentation:

File formats:

Getting started

Install the dependencies:

install.packages(
  c("data.table", "ggplot2", "ggtext", "glue", "optparse", "pbapply", "scales", "stringr", "BiocManager")
)
BiocManager::install("rhdf5")

See output for an example of the output files.

Usage examples

GEX

Rscript saturation.R --out output --file molecule_info.h5
Reading molecule_info.h5
Estimating GEX saturation for 519882 barcodes
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Writing output/saturation-gex.tsv
Writing output/total_reads-vs-saturation-gex.pdf

VDJ

Rscript saturation.R --out output/tcr --file all_contig_annotations.csv
Reading all_contig_annotations.csv
INFO: Removing '-1' from the end of each barcode
Estimating VDJ saturation for 20233 barcodes
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Writing output/tcr/saturation-vdj.tsv
Writing output/tcr/total_reads-vs-saturation-vdj.pdf

ADT

Rscript saturation.R --out output --file Batch_1A_ADT.stat.csv.gz
Reading Batch_1A_ADT.stat.csv.gz
Estimating ADT saturation curve for 729372 barcodes
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
Writing output/saturation-adt.tsv
Writing output/total_reads-vs-saturation-adt.pdf
Writing output/saturation-adt-feature.tsv
Writing output/histogram-saturation-adt-feature.pdf

Running parallel jobs with rush

Install rush by Wei Shen:

go install github.com/shenwei356/rush

Then make a list of input files and pass it to rush:

ls /project/cellranger_output/*/{molecule_info.h5,all_contig_annotations.csv,*.stat.csv.gz} > files.txt

# Run 16 jobs in parallel, capture outputs from each job in one file
rush -i files.txt -o rush-saturation.txt -j16 'Rscript saturation.R --out out/{/%} --file {}'

saturation's People

Contributors

slowkow avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ahmedarslan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.