GithubHelp home page GithubHelp logo

mdseq's Issues

Calculation of normalised counts

Hi, I think there is an error in the calculation of normalised counts using normalize.counts() when using TMM or RLE.

The function gets normalisation factors using edgeR's calcNormFactors() and divides the counts by the resulting factors, but the factors returned by calcNormFactors should be applied to library sizes, not directly to counts. I think that the factors (nf) calculated in the vignette to be used as offsets are the factors that should be used to directly normalise counts, i.e.:

cnf <- calcNormFactors(dat.filtered, method="TMM")
libsize <- colSums(dat.filtered)
rellibsize <- libsize/exp(mean(log(libsize)))
nf <- cnf * rellibsize

In normalize.counts(), these could be obtained as:

y$samples$norm.factors * y$samples$lib.size / exp(mean(log(y$samples$lib.size)))
rather than just using y$samples$norm.factors.

Please let me know if I've misinterpreted something here, but I think this is right. Thanks.

Help running the package

I tried using MDSeq(Seuratobject), so all parameters are default, and got this message

Error in as.data.frame.default(count) : 
  cannot coerce class ‘structure("Seurat", package = "Seurat")’ to a data.frame

This data is PBMC 5k and has been through a more or less "standard" Seurat pipeline, having just completed DE analysis. I am looking to find "Cluster vs All" genes, similar to Seurat's FindAllMarkers function.

About checked$outliers

Hi!
I would like your help. I have read your paper titled "Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq" which is a excellent work. So I plan to use the MDSeq to detect the gene expression variability. However, I get error messages when I try to detect the outlier. It seems that all the genes were detected as outlier.

Why is that? Could you give me any hints and how to fixed this issue?

> dat.checked <- remove.outlier(dat.normalized[1:1000, ], X=covars, U=covars,
+                               contrast = groups, mc.cores = 4)
4 threads are using!
Total time elapsed:3.9 seconds
> head(dat.checked$outliers)
                  status num.outliers
ENSG00000227232.4      2           NA
ENSG00000238009.2      2           NA
ENSG00000237683.5      2           NA
ENSG00000239906.1      2           NA
ENSG00000241860.2      2           NA
ENSG00000228463.4      2           NA
> table(dat.checked$outliers$status)

   2 
1000 
> 
> covars
          GENDER AGE RACE   BMI SMRIN SMTSISCH            X1            X2            X3            X4
Normal1        1  63    3 29.53   5.9     1017  0.1630506541 -1.095766e-01 -0.0056186635  0.0540478146
Normal2        1  62    3 30.78   8.1      133 -0.0688636685 -1.749981e-02  0.0345506232  0.0869748239
COPD1          2  66    3 25.82   7.6      840  0.1037386933 -9.394710e-02 -0.0855297718  0.0384818582
Normal3        1  58    3 29.83   6.2      825  0.1425086228  6.758439e-02  0.0474234960 -0.0076811603
COPD2          1  58    3 27.02   6.4     1218  0.0328722772  5.871772e-02  0.0569128696 -0.0170814274
COPD3          1  62    3 29.54   5.8      991  0.1076800161  7.706005e-02  0.0719706122 -0.0013471444
COPD4          2  66    3 20.12   7.6      670 -0.0160420656  9.598426e-02 -0.0436259659  0.1806682021
Normal4        2  66    3 29.05   6.9      773  0.0994010357  3.385225e-02 -0.0921304267  0.0165312554
Normal5        2  51    3 29.35   7.2       80 -0.0787204805 -6.110502e-02 -0.0235813528  0.0084930880

Finally, my used sessionInfo()

Best wishes
Kevin

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtools_3.5.0               edgeR_3.18.1               limma_3.32.2              
 [4] MDSeq_1.0.5                BiocInstaller_1.26.0       devtools_1.13.2           
 [7] org.Hs.eg.db_3.4.0         AnnotationDbi_1.38.1       BiocParallel_1.10.1       
[10] WGCNA_1.51                 fastcluster_1.1.22         dynamicTreeCut_1.63-1     
[13] R.matlab_3.6.1             RColorBrewer_1.1-2         clusterProfiler_3.4.3     
[16] DOSE_3.2.0                 gplots_3.0.1               ggplot2_2.2.1             
[19] sva_3.24.0                 genefilter_1.58.1          mgcv_1.8-17               
[22] nlme_3.1-131               DESeq2_1.16.1              SummarizedExperiment_1.6.3
[25] DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
[28] GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
[31] S4Vectors_0.14.3           BiocGenerics_0.22.0  

Help with Installation

Install MDSeq from local source with

install.packages("MDSeq_1.0.5.tar.gz", repos=NULL, type="source")

Where can I get MDSeq_1.0.5.tar.gz from? This git has no releases. I can download the entire git (default MDSeq-master.zip) and
install.packages("MDSeq-master.zip", repos=NULL, type="source")
but I get
Installing package into ‘C:/Users/<NAME>/Documents/R/win-library/4.0’ (as ‘lib’ is unspecified)
and then
library(MDSeq) and library(MDSeq-master) both gives no package called... respectively. Did it install correctly but named differently?

Install MDSeq from GitHub with

library(devtools)
install_github("zjdaye/MDSeq")

Gives this error
Error: Failed to install 'MDSeq' from GitHub: (converted from warning) package ‘cqn’ is not available (for R version 4.0.2)

Please include in documentation more than a single groups column

I'm having trouble running your code with a 2 columns of groups:

>XX
DataFrame with 300 rows and 2 columns
      Strain   Condition
    <factor>    <factor>
17d    RIM15         SDC
19d     SLT2         SDC
20d     SNF1         SDC
21d     SSN3         SDC
22d    STE11         SDC
...      ...         ...
30s     YGK3        Salt
32s     YPK3        Salt
8r      IRE1   Rapamycin
9r      KIN1   Rapamycin
15t     PKC1 Tunicamycin

Which I can use to run MDSeq:

fit <- MDSeq(counts, contrast = XX, mc.cores = 30)

But now I can't run extract.ZIMD. Here is what I have tried:

extract.ZIMD(fit, compare = list(A='StrainWT', B='StrainPBS2'))
extract.ZIMD(fit, compare = list(A='WT', B='PBS2'))

Any advice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.