zjdaye / mdseq Goto Github PK

MDSeq: Gene expression mean and variability analysis for RNA-seq counts

License: Other

R 11.66% HTML 88.34%

mdseq's Issues

Calculation of normalised counts

Hi, I think there is an error in the calculation of normalised counts using normalize.counts() when using TMM or RLE.

The function gets normalisation factors using edgeR's calcNormFactors() and divides the counts by the resulting factors, but the factors returned by calcNormFactors should be applied to library sizes, not directly to counts. I think that the factors (nf) calculated in the vignette to be used as offsets are the factors that should be used to directly normalise counts, i.e.:

cnf <- calcNormFactors(dat.filtered, method="TMM")
libsize <- colSums(dat.filtered)
rellibsize <- libsize/exp(mean(log(libsize)))
nf <- cnf * rellibsize

In normalize.counts(), these could be obtained as:

y$samples$norm.factors * y$samples$lib.size / exp(mean(log(y$samples$lib.size)))
rather than just using y$samples$norm.factors.

Please let me know if I've misinterpreted something here, but I think this is right. Thanks.

Help running the package

I tried using MDSeq(Seuratobject), so all parameters are default, and got this message

Error in as.data.frame.default(count) : 
  cannot coerce class ‘structure("Seurat", package = "Seurat")’ to a data.frame

This data is PBMC 5k and has been through a more or less "standard" Seurat pipeline, having just completed DE analysis. I am looking to find "Cluster vs All" genes, similar to Seurat's FindAllMarkers function.

What function to call to test for homogeneity of variance?

I've run MDSeq(...), but the results don't seem to include results of this particular test (and its p-value), although I'm not sure if I'm interpreting the results correctly.

Is it there? Is it extract.ZIMD? Or something else?

About checked$outliers

Hi!
I would like your help. I have read your paper titled "Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq" which is a excellent work. So I plan to use the MDSeq to detect the gene expression variability. However, I get error messages when I try to detect the outlier. It seems that all the genes were detected as outlier.

Why is that? Could you give me any hints and how to fixed this issue?

> dat.checked <- remove.outlier(dat.normalized[1:1000, ], X=covars, U=covars,
+                               contrast = groups, mc.cores = 4)
4 threads are using!
Total time elapsed:3.9 seconds
> head(dat.checked$outliers)
                  status num.outliers
ENSG00000227232.4      2           NA
ENSG00000238009.2      2           NA
ENSG00000237683.5      2           NA
ENSG00000239906.1      2           NA
ENSG00000241860.2      2           NA
ENSG00000228463.4      2           NA
> table(dat.checked$outliers$status)

   2 
1000 
>

> covars
          GENDER AGE RACE   BMI SMRIN SMTSISCH            X1            X2            X3            X4
Normal1        1  63    3 29.53   5.9     1017  0.1630506541 -1.095766e-01 -0.0056186635  0.0540478146
Normal2        1  62    3 30.78   8.1      133 -0.0688636685 -1.749981e-02  0.0345506232  0.0869748239
COPD1          2  66    3 25.82   7.6      840  0.1037386933 -9.394710e-02 -0.0855297718  0.0384818582
Normal3        1  58    3 29.83   6.2      825  0.1425086228  6.758439e-02  0.0474234960 -0.0076811603
COPD2          1  58    3 27.02   6.4     1218  0.0328722772  5.871772e-02  0.0569128696 -0.0170814274
COPD3          1  62    3 29.54   5.8      991  0.1076800161  7.706005e-02  0.0719706122 -0.0013471444
COPD4          2  66    3 20.12   7.6      670 -0.0160420656  9.598426e-02 -0.0436259659  0.1806682021
Normal4        2  66    3 29.05   6.9      773  0.0994010357  3.385225e-02 -0.0921304267  0.0165312554
Normal5        2  51    3 29.35   7.2       80 -0.0787204805 -6.110502e-02 -0.0235813528  0.0084930880

Finally, my used sessionInfo()

Best wishes
Kevin

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtools_3.5.0               edgeR_3.18.1               limma_3.32.2              
 [4] MDSeq_1.0.5                BiocInstaller_1.26.0       devtools_1.13.2           
 [7] org.Hs.eg.db_3.4.0         AnnotationDbi_1.38.1       BiocParallel_1.10.1       
[10] WGCNA_1.51                 fastcluster_1.1.22         dynamicTreeCut_1.63-1     
[13] R.matlab_3.6.1             RColorBrewer_1.1-2         clusterProfiler_3.4.3     
[16] DOSE_3.2.0                 gplots_3.0.1               ggplot2_2.2.1             
[19] sva_3.24.0                 genefilter_1.58.1          mgcv_1.8-17               
[22] nlme_3.1-131               DESeq2_1.16.1              SummarizedExperiment_1.6.3
[25] DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
[28] GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
[31] S4Vectors_0.14.3           BiocGenerics_0.22.0

Help with Installation

Install MDSeq from local source with

install.packages("MDSeq_1.0.5.tar.gz", repos=NULL, type="source")

Where can I get MDSeq_1.0.5.tar.gz from? This git has no releases. I can download the entire git (default MDSeq-master.zip) and
install.packages("MDSeq-master.zip", repos=NULL, type="source")
but I get
Installing package into ‘C:/Users/<NAME>/Documents/R/win-library/4.0’ (as ‘lib’ is unspecified)
and then
library(MDSeq) and library(MDSeq-master) both gives no package called... respectively. Did it install correctly but named differently?

Install MDSeq from GitHub with

library(devtools)
install_github("zjdaye/MDSeq")

Gives this error
Error: Failed to install 'MDSeq' from GitHub: (converted from warning) package ‘cqn’ is not available (for R version 4.0.2)

Please include in documentation more than a single groups column

I'm having trouble running your code with a 2 columns of groups:

>XX
DataFrame with 300 rows and 2 columns
      Strain   Condition
    <factor>    <factor>
17d    RIM15         SDC
19d     SLT2         SDC
20d     SNF1         SDC
21d     SSN3         SDC
22d    STE11         SDC
...      ...         ...
30s     YGK3        Salt
32s     YPK3        Salt
8r      IRE1   Rapamycin
9r      KIN1   Rapamycin
15t     PKC1 Tunicamycin

Which I can use to run MDSeq:

fit <- MDSeq(counts, contrast = XX, mc.cores = 30)

But now I can't run extract.ZIMD. Here is what I have tried:

extract.ZIMD(fit, compare = list(A='StrainWT', B='StrainPBS2'))
extract.ZIMD(fit, compare = list(A='WT', B='PBS2'))

Any advice?

zjdaye / mdseq Goto Github PK

mdseq's Issues

Calculation of normalised counts

Help running the package

What function to call to test for homogeneity of variance?

About checked$outliers

Help with Installation

Please include in documentation more than a single groups column

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs