GithubHelp home page GithubHelp logo

chenyongrowan / degage Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 5.47 MB

DEGage is a novel model-based method for gene differential expression analysis between two groups of scRNA-seq count data. It employs a novel family of discrete distributions for describing the difference of two NB distributions (named DOTNB).

License: MIT License

R 58.65% C 0.36% C++ 40.99%
comparative-analysis differentially-expressed-genes negative-binomial scrna-seq

degage's Introduction

DEGage

This package allows for the differential expression analysis of two groups of scRNA-seq count data. It employs a novel family of discrete distributions for describing the difference of two NB distributions (named DOTNB). DEGage take the raw counts of scRNA-seq as inputs, and thus avoid introducing artificially bias in normalization steps in current methods. A workflow is shown as follows.

DEGage Workflow

Installation

To install DEGage, run the following:

library(devtools)
install_github("chenyongrowan/DEGage")

Dependencies

DEgage requires the following dependencies:

MASS >= 7.3-59
pscl >= 1.5.5
hypergeo >= 1.2-13
stats >= 4.2.2
doParallel >= 1.0.17
parallel >= 4.3.0
foreach >=1.5.2

A recent MASS update made it incompatible with R versions below 4.4. To install a legacy version of MASS, go to https://cran.r-project.org/src/contrib/Archive/MASS/ and download an archived version that is compatible with your current R installation. Then, run the following in an R terminal:

install.pacakges("path/to/MASS_SOURCE.tar.gz", repos=NULL, type="source")

Detailed Documentation And Usage

For more detailed documentation and tutorials, please refer to the following:
For thorough documentation: https://rpubs.com/aliciaprowan/1043456
For a basic comparison with other methods: https://rpubs.com/aliciaprowan/1202999

Differential Expression Functions

DEGage

DEGage performs pairwise differential analysis on NGS count data. The input is typically a dataframe where columns contain samples and rows contain genes. DEGage equips with two sampling strategies, random assignment and subsampling, for handling imbalanced scRNA-seq datasets.

DEGage(counts,
     group,
     perm.preprocess = TRUE,
     gene.filter.threshold = 0.9,
     nperms = 2000,
     nsubsample = NA,
     perm.pval = 0.1,
     ncores = 4,
     maxiter = 100,
     mean.ratio = 4,
     subsampled.k = T)

DEGage_multitest

Similar to DEGage, except it can perform an indefinite number of pairwise comparisons.

DEGage_multitest(counts,
               group,
               perm.preprocess = TRUE,
               gene.filter.threshold = 0.9,
               nperms = 2000,
               nsubsample = NA,
               perm.pval = 0.1,
               ncores = 4,
               maxiter = 100,
               mean.ratio = 4,
               subsampled.k = T,
               writing.dir = NULL)

DEGage_preprocess

An optional function that performs basic prefiltering on a dataframe of read counts prior to calling DEGage()

DEGage_preprocess(counts,
                min.transcripts.per.gene = 200,
                max.transcripts.per.gene = 10000,
                min.transcripts.per.cell = 500,
                max.transcripts.per.cell = 20000,
                min.cells.per.gene = 3,
                max.dropout.prop = 0.8,
                max.mt.percent = 0.2,
                mt.prefix = 'MT-',
                max.rRna.percent = 0.2,
                rRna.prefix = 'rr')

DEGage_Simulation

Generates simplified simulated scRNA-Seq counts following an NB distribution with pre-defined proportions of dropouts.

DEGage_Simulation(ngenes,
       ndegs,
       ngroup1,
       ngroup2,
       lfc = 1,
       min.prop.zeros = 0.1,
       max.prop.zeros = 0.5,
       seed = NULL,
       ncores = 4,
       dispersions = NULL,
       means = NULL)

Example Usage

In this section, we will detail how to use DEGage functions

First, we will simulate a small data frame of counts to pass through DEGage() using DEGage simulate. It has 100 cells total with 50 in each condition, as well as 1000 genes with 100 degs:

library(DEGage)
df <- DEGage_Simulation(ngenes = 1000, ndegs = 100, ngroup1 = 50, ngroup2 = 50)

Next, we will pass these counts through DEGage:

cellgroups <- factor(c(rep(1,50), rep(2,50)))
results <- DEGage(counts = df, group = cellgroups)

To test DEGage_multitest, we will simulate a second dataframe of counts, merge them together, and pass them through DEGage_multitest():

df2 <- DEGage_Simulation(ngenes = 1000, ndegs = 100, ngroup1 = 50, ngroup2 = 50)

df <- cbind(df, df2)
cellgroups2 <- factor(c(rep(3,50), rep(4,50)))
cellgroups <- factor(c(cellgroups, cellgroups2))

multitest.results <- DEGage_multitest(df, cellgroups)

Citation

Please cite the following article if you use DEGage in your research:

Petrany A., Chen R., Zhang S. and Chen, Y. "Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data". Genome Research, 2024. doi: 10.1101/gr.278843.124.

degage's People

Contributors

chenyongrowan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.