GithubHelp home page GithubHelp logo

barski-lab / sc-seq-analysis Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 3.11 MB

CWL toolkit for single-cell sequencing data analysis

Home Page: https://barski-lab.github.io/sc-seq-analysis/

License: Apache License 2.0

Common Workflow Language 50.04% Dockerfile 0.91% R 48.27% Shell 0.79%
bioinformatics genomics rna-seq single-cell 10x 10x-genomics cwl fair scatac-seq scrna-seq

sc-seq-analysis's Introduction

Build Status Python 3.8 DOI

CWL toolkit for single-cell sequencing data analysis

Notes:

  • For details on how to use the published version v1.0.1 of workflows for scRNA-Seq data analysis in SciDAP refer to the Tutorials page.
  • For up to date workflow description see Wiki page.
  • Although, we eager to make our pipelines as reproducible as possible, certain issues with Seurat may affect the reproducibility even for containerized tools (see Reproducibility issue #5358)

Publications:

  • Aizhan Surumbayeva, Michael Kotliar, Linara Gabitova-Cornell, Andrey Kartashov, Suraj Peri, Nathan Salomonis, Artem Barski, Igor Astsaturov, Preparation of mouse pancreatic tumor for single-cell RNA sequencing and analysis of the data, STAR Protocols, Volume 2, Issue 4, 2021, 100989, ISSN 2666-1667, https://doi.org/10.1016/j.xpro.2021.100989
  • Kotliar M, Kartashov A and Barski A. CWL toolkit for single-cell sequencing data analysis [version 1; not peer reviewed]. F1000Research 2022, 11:819 (poster) (https://doi.org/10.7490/f1000research.1119046.1)

Minimum software requirements:


How to use it

This repository contains R scripts, CWL tools and examples of CWL workflows for single-cell RNA-Seq and Multiome data analyses.

Each R script can be run directly from the command line following the --help message instructions. However, to guarantee results reproducibility we containerized them and wrapped in CWL format.

CWL tools can be combined into the workflows depending on the type of input datasets and required complexity of the analysis. For example, for single-cell RNA-Seq use 1(a) – 2(a) – 3(a) and optionally 4(a) – 5(a,b); for Multiome ATAC-Seq and RNA-Seq use 1(b) – 2(b) – 2(a) – 3(a) - 3(b) - 3(c) and optionally 4(a) – 5(a,b).

All CWL tools are divided into groups to cover the major steps of data analysis. For integrity reasons we recommend starting from the raw FASTQ files and use one of the Cell Ranger based pipelines from the Data preprocessing group. The results of these pipelines can be optionally exported into UCSC Cell Browser (see Visualization group).

Both sc-rna-filter.cwl and sc-multiome-filter.cwl tools use feature-barcode matrices as the main inputs. All other tools from the scRNA-Seq, scATAC-Seq and Multiome, and Secondary analyses groups exchange data through RDS files.

Data preprocessing

Name Description
cellranger-mkref.cwl Builds Cell Ranger compatible reference folder from the custom genome FASTA and gene GTF annotation files
cellranger-count.cwl Quantifies gene expression from a single-cell RNA-Seq library
cellranger-aggr.cwl Aggregates outputs from multiple runs of Cell Ranger Count Gene Expression
cellranger-arc-mkref.cwl Builds Cell Ranger ARC compatible reference folder from the custom genome FASTA and gene GTF annotation files
cellranger-arc-count.cwl Quantifies chromatin accessibility and gene expression from a single-cell Multiome ATAC/RNA-Seq library
cellranger-arc-aggr.cwl Aggregates outputs from multiple runs of Cell Ranger ARC Count Chromatin Accessibility and Gene Expression

Visualization

Name Description
cellbrowser-build-cellranger.cwl Exports clustering results from Cell Ranger Count Gene Expression and Cell Ranger Aggregate experiments into compatible with UCSC Cell Browser format
cellbrowser-build-cellranger-arc.cwl Exports clustering results from Cell Ranger ARC Count Chromatin Accessibility and Gene Expression or Cell Ranger ARC Aggregate experiments into compatible with UCSC Cell Browser format

scRNA-Seq

Name Description
sc-rna-filter.cwl Filters single-cell RNA-Seq datasets based on the common QC metrics
sc-rna-reduce.cwl Integrates multiple single-cell RNA-Seq datasets, reduces dimensionality using PCA
sc-rna-cluster.cwl Clusters single-cell RNA-Seq datasets, identifies gene markers

scATAC-Seq and Multiome

Name Description
sc-multiome-filter.cwl Filters single-cell multiome ATAC-Seq and RNA-Seq datasets based on the common QC metrics
sc-atac-reduce.cwl Integrates multiple single-cell ATAC-Seq datasets, reduces dimensionality using LSI
sc-atac-cluster.cwl Clusters single-cell ATAC-Seq datasets, identifies differentially accessible peaks
sc-wnn-cluster.cwl Clusters multiome ATAC-Seq and RNA-Seq datasets, identifies gene markers and differentially accessible peaks

Secondary analyses

Name Description
sc-ctype-assign.cwl Assigns cell types for clusters based on the provided metadata file
sc-rna-de-pseudobulk.cwl Identifies differentially expressed genes between groups of cells coerced to pseudobulk datasets
sc-rna-da-cells.cwl Detects cell subpopulations with differential abundance between datasets split by biological condition
sc-triangulate.cwl Harmonizes conflicting annotations in single-cell genomics studies using scTriangulate

Utilities

Name Description
tar-extract.cwl Extracts the content of TAR file into a folder
tar-compress.cwl Creates compressed TAR file from a folder

Workflow examples for scRNA-Seq analysis

Name Description
sc-ref-indices-wf.cwl Builds a Cell Ranger and Cell Ranger ARC compatible reference folders from the custom genome FASTA and gene GTF annotation files
sc-rna-align-wf.cwl Runs Cell Ranger Count to quantify gene expression from a single-cell RNA-Seq library
sc-rna-aggregate-wf.cwl Aggregates gene expression data from multiple Single-cell RNA-Seq Alignment experiments
sc-rna-analyze-wf.cwl Runs filtering, normalization, scaling, integration (optionally) and clustering for a single or aggregated single-cell RNA-Seq datasets

Workflow examples for Multiome analysis

Name Description
sc-multiome-align-wf.cwl Runs Cell Ranger ARC Count to quantifies chromatin accessibility and gene expression from a single-cell Multiome ATAC and RNA-Seq library
sc-multiome-aggregate-wf.cwl Aggregates data from multiple Single-cell Multiome ATAC and RNA-Seq Alignment experiments
sc-multiome-analyze-wf.cwl Runs filtering, normalization, scaling, integration (optionally) and clustering for a single or aggregated single-cell Multiome ATAC-Seq and RNA-Seq datasets

sc-seq-analysis's People

Contributors

michael-kotliar avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sc-seq-analysis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.