GithubHelp home page GithubHelp logo

thejacksonlaboratory / nf-tenx Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 1.0 20.17 MB

All 10x Genomics processing pipelines rolled into one containerized pipelines

License: MIT License

Groovy 54.01% Nextflow 0.57% Python 28.17% Makefile 0.17% C 17.09%
bioinformatics pipeline

nf-tenx's Introduction

nf-tenx

All 10x Genomics data processing pipelines in one containerized workflow

Purpose

This pipeline is intended to be used at The Jackson Laboratory for processing 10x Genomics data of all kinds. However, with only a few tweaks to container locations, this pipeline and its associated tools could be utilized anywhere.

Supported Assays

Fully supported

  • 3'/5' Gene Expression
  • 5' Immune Profiling
  • Chromatin Accessibility
  • Multiome (joint GEX & ATAC)
  • CITE-seq with Biolegend TotalSeq-A/B/C ADTs
  • Multiplexing with BioLegend TotalSeq-A/C HTOs
  • Spatial Transcriptomics (Fresh frozen and FFPE)
  • Multiplexing with CellPlex

To do:

  • Targetted Gene Expression
  • Aggregating all assays

Usage

  1. Configure singularity to pull from JAX's Singularity registry:
srun -q <insert cluster-specific QOS> --time=30 --pty bash
module load singularity
singularity remote add jaxreg jaxreg.jax.org
singularity remote use jaxreg

All containers are the singlecell collection with the format <tool-name>:<tool-version>.

  1. Fill out a YAML/JSON-based samplesheet (described below).

  2. Run with

nextflow run -profile elion TheJacksonLaboratory/nf-tenx -r 0.3.5 --samplesheet path/to/samplesheet.yml

Samplesheet specifics

The samplesheet format expects a root level YAML sequence or JSON array:

# Long form
- sample_name: tinygex
  libraries:
    - LIB123
  library_types:
    - Gene Expression
  ...
# Condensed
- { sample_name: another-sample, libraries: [LIB987], library_types: [Gene Expression], ...}

Required keys

These typically depend on the assay, but most assays require a bare minimum of the following:

  • libraries: sequence of library names each library name must be present somewhere in the FASTQ file names e.g. AB12345 would be valid for XYZ_AB12345_LABNAME_S1_L001_R1_001.fastq.gz
  • library_types: one of the following corresponding to each library speciified above [Gene Expression, Chromatin Accesibility, Spatial Gene Expression, TotalSeq-A, etc.]
  • fastq_paths: sequeunce of library names. a directory path for each of the libraries specified above
  • tool: a software tool [cellranger, spaceranger, cellranger-arc, etc.]
  • tool_version: a semantic version string for the tool specified above e.g. 6.1.1
  • command: tool command to run e.g. count to run cellranger count
  • sample_name: any alphanumeric string including hyphens.
  • reference_path: a file path for the reference transcriptome/genome/etc.

Note that

  1. len(libraries) == len(library_types) == len(fastq_paths).
  2. even if there is only one library, these three keys must be sequences (e.g. [lib123]).

Documentation is in progress for assays that require additional keys, like TotalSeq (to specify which tags/oligos are in use) or Visium to specify image and probeset paths. For now, just look in modules/assay-type.nf to see what cli options are utilized.

Check out the example samplesheets under the examples directory.

Tools available on JAXReg

Updated 2023-10-20

Tool Version Notes
CellRanger 1.0.0
1.2.0
1.3.0
1.3.1
2.0.2
2.1.0
2.1.1
2.2.0
3.0.0
3.0.2
3.1.0
4.0.0
5.0.0
5.0.1
6.0.0
6.0.1
6.0.2
6.1.1
7.0.0
7.1.0
CellRanger ATAC 1.1.0 do not use, barcode doublet issue
1.2.0
2.0.0
2.1.0
CellRanger ARC 1.0.0
1.0.1
2.0.0
SpaceRanger 1.0.0
1.1.0
1.2.0
1.2.2
1.3.0
2.0.0
2.1.0
CITEseq Count 1.4.5

Testing

This is a work in progress, but we have compiled and dramatically augmented the small chr21 test datasets made available from 10x Genomics into a testing suite for all assays.

To download this data (connected to JAX network):

nextflow pull TheJacksonLaboratory/nf-tenx
nextflow run -profile test nf-tenx/example.nf

Contributing

Please help.

nf-tenx's People

Contributors

wflynny avatar ahmed-said-jax avatar

Stargazers

 avatar  avatar TimeMachine avatar William McCann avatar Jarrian Gojar avatar Frank Zappulla avatar Martine avatar

Watchers

 avatar James Cloos avatar  avatar Martine avatar Frank Zappulla avatar

Forkers

hediatnani

nf-tenx's Issues

Better error handling if there are missing or corrupted FASTQs

FASTQ existence isn't strictly enforced during the preflight and samplesheet checking process.

Need to fix this such that:

  • Records with missing FASTQs (likely because the user specified the wrong fastq_path) are skipped
  • Records with erroneous FASTQs (misformated, not gzipped?, something else) raise an error

Refactor samplesheet validation script

Currently this script runs through all records for each assay type and with Visium, Visium FFPE, and possibly Visium HD in the future all using spaceranger count, this script cannot distinguish between the different required processing options.

5' GEX with VDJ added to nf-tenx Pipeline

Currently, this has to manually. Is it worth putting the time into adding this assay to the nf-tenx pipeline, or is the frequency of this assay low enough to make manual data processing acceptable?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.