ocbe-uio / discbio Goto Github PK
View Code? Open in Web Editor NEWA user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
License: Other
A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
License: Other
The following lines of PPI()
always write and read a file from the user's local drive:
Lines 42 to 45 in 5b2b23a
1.1.0
R functions are not supposed to write on the user's working directory unless explicitly told to.
A clear and concise description of what the bug is.
Try to create a minimally-reproducible example so other people can better understand the problem.
Steps to reproduce the behavior:
A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.
Add any other context about the problem here.
Under DIscBIO version 0.0.0.9004, the new vignette hangs with the reduced dataset.
The following code works fine, and is identical to the one currently present on the vignette (minus commented-out code for brevity):
## ----options, echo=FALSE------------------------------------------------------
library(knitr)
opts_chunk$set(fig.width=7, fig.height=7)
## -----------------------------------------------------------------------------
library(DIscBIO)
library(enrichR)
## -----------------------------------------------------------------------------
DataSet <- valuesG1msReduced
head(DataSet)
## -----------------------------------------------------------------------------
sc <- DISCBIO(DataSet)
## -----------------------------------------------------------------------------
sc<-NoiseFiltering(sc,percentile=0.9, CV=0.2)
#### Normalizing the reads without any further gene filtering
sc<-Normalizedata(sc, mintotal=1000, minexpr=0, minnumber=0, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000)
#### Additional gene filtering step based on gene expression
sc<-FinalPreprocessing(sc,GeneFlitering="NoiseF",export = TRUE) ### The GeneFiltering can be either "NoiseF" or"ExpF"
## -----------------------------------------------------------------------------
#OnlyExpressionFiltering=TRUE
OnlyExpressionFiltering=FALSE
if (OnlyExpressionFiltering==TRUE){
MIínExp<- mean(rowMeans(DataSet,na.rm=TRUE))
MIínExp
MinNumber<- round(length(DataSet[1,])/3) # To be expressed in at least one third of the cells.
MinNumber
sc<-Normalizedata(sc, mintotal=1000, minexpr=MIínExp, minnumber=MinNumber, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000) #### In this case this function is used to filter out genes and cells.
sc<-FinalPreprocessing(sc,GeneFlitering="ExpF",export = TRUE)
}
## -----------------------------------------------------------------------------
sc<- Clustexp(sc,cln=3,quiet=TRUE) #### K-means clustering to get three clusters
plotGap(sc) ### Plotting gap statisticssc<- Clustexp(sc, clustnr=20,bootnr=50,metric="pearson",do.gap=T,SE.method="Tibs2001SEmax",SE.factor=.25,B.gap=50,cln=K,rseed=17000)
## -----------------------------------------------------------------------------
sc<- comptSNE(sc,rseed=15555,quiet = TRUE)
cat("\t"," Cell-ID"," Cluster Number","\n")
sc@cpart
## -----------------------------------------------------------------------------
# Silhouette of k-means clusters
par(mar=c(6,2,4,2))
plotSilhouette(sc,K=3) # K is the number of clusters
## -----------------------------------------------------------------------------
Jaccard(sc,Clustering="K-means", K=3, plot = TRUE) # Jaccard of k-means clusters
## -----------------------------------------------------------------------------
############ Plotting K-means clusters
plottSNE(sc)
plotKmeansLabelstSNE(sc) # To plot the the ID of the cells in eacj cluster
plotSymbolstSNE(sc,types=sub("(\\_\\d+)$","", names(sc@ndata))) # To plot the the ID of the cells in each cluster
## -----------------------------------------------------------------------------
outlg<-round(length(sc@fdata[,1])/200) # The cell will be considered as an outlier if it has a minimum of 0.5% of the number of filtered genes as outlier genes.
Outliers<- FindOutliersKM(sc, K=3, outminc=5,outlg=outlg,probthr=.5*1e-3,thr=2**-(1:40),outdistquant=.75, plot = TRUE, quiet = FALSE)
RemovingOutliers=FALSE
# RemovingOutliers=TRUE # Removing the defined outlier cells based on K-means Clustering
if(RemovingOutliers==TRUE){
names(Outliers)=NULL
Outliers
DataSet=DataSet[-Outliers]
dim(DataSet)
colnames(DataSet)
cat("Outlier cells were removed, now you need to start from the beginning")
}
## -----------------------------------------------------------------------------
sc<-KmeanOrder(sc,quiet = FALSE, export = TRUE)
plotOrderKMtsne(sc)
## -----------------------------------------------------------------------------
KMclustheatmap(sc,hmethod="single", plot = TRUE)
## -----------------------------------------------------------------------------
g='ENSG00000000003' #### Plotting the expression of MT-RNR2
plotExptSNE(sc,g)
## ----degKM--------------------------------------------------------------------
####### differential expression analysis between cluster 1 and cluster 3 of the Model-Based clustering using FDR of 0.05
cdiff <- DEGanalysis2clust(
sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE, quiet=TRUE
)
## -----------------------------------------------------------------------------
#### To show the result table
head(cdiff[[1]]) # The first component
head(cdiff[[2]]) # The second component
The next line, however, hangs:
cdiff <- DEGanalysis(
sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE,
quiet=FALSE
)
The last output lines before the freeze are these:
Number of thresholds chosen (all possible thresholds) = 115
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl2 in Cl1 VS Cl2
Estimating sequencing depths...
Resampling to get new data matrices...
The up should be down and the down should be up
This is most likely due the ">" in line 122.
The packages' internal functions are scattered across several files on the R/
folder. For example, there's the internal-functions.R file which contains several functions, then there's samr-adapted.R containing internal, adapted functions from the samr package, then there are a few individually-isolated functions like reformatSiggenes.R and replaceDecimals.R. It would be great if there were more consistency on this.
One suggestion would be to keep the samr functions separated and aggregate all the rest into internal-functions.R.
DIscBIO was developed based on two datasets using human and mouse genes. It would be great if it could be adapted to work on other organisms.
Conquer is a collection of analysis-ready public scRNA-seq data sets. We would like to add it to our manuscript. It has about 40 datasets from three organisms: human, Zebrafish and mouse. When I wrote DIscBIO I was focusing on humans but now we want to make it applicable for any organism with a taxonomy ID. To do so we need to change in the DIscBIO-classes.R lines 157-159 from
Lines 157 to 159 in 0c90899
to
shortNames <- substr(rownames(tmpExpdataAll), 1, 3)
geneTypes <- factor(
c(ENS = "ENS", ERC = "ERC")[shortNames]
I did not change the code because the dev is not working, I was worried to make the situation worst. Could you change the code after you bring back dev to work?
library(MultiAssayExperiment)
GSE41265 <- readRDS("~/GSE41265.rds")
Dataset=assays(experiments(GSE41265)[["gene"]])[["count"]]
rownames(Dataset) <- as.list(sub("*\\..*", "", unlist(rownames(Dataset))))
sc<- DISCBIO(Dataset)
sc<- Clustexp(sc,cln=2,quiet=F,clustnr=6,rseed=17000)
Cdiff<-DEGanalysis2clust(sc,Clustering="K-means",K=2,fdr=0.05,name="M",export = TRUE,quiet=F)
Cdiff<-DEGanalysis(sc,Clustering="K-means",K=2,fdr=0.05,name="All",export = TRUE,quiet=F) ####### differential expression analysis between all clusters
CdiffBinomial<-ClustDiffGenes(sc,K=2,export = T,fdr=.01,quiet=F)
At the moment if DEGanalysis and DEGanalysis2clust can work even without having the gene names as ClustDiffGenes that will be great.
There are some circumstances that cause the DEGanalysis
function to hang.
library(DIscBIO)
load("data/pan_indrop_matrix_8000cells_18556genes.rda")
# ==============================================================================
# Determining contants
# ==============================================================================
n_genes <- 500
K <- 3
# ==============================================================================
# Subsetting and formatting datasets
# ==============================================================================
sc_dataframe <- pan_indrop_matrix_8000cells_18556genes[, seq_len(n_genes)]
sc <- DISCBIO(sc_dataframe)
# ==============================================================================
# Performing operations
# ==============================================================================
MIinExp <- mean(rowMeans(sc_dataframe, na.rm=TRUE))
MinNumber <- round(length(sc_dataframe[1, ]) / 10)
sc <- Normalizedata(
sc, mintotal=1000, minexpr=MIinExp, minnumber=MinNumber, maxexpr=Inf,
downsample=FALSE, dsn=1, rseed=17000
)
sc <- FinalPreprocessing(sc, GeneFlitering="ExpF", export=FALSE, quiet=TRUE)
sc <- Clustexp(sc, cln=K, quiet=TRUE)
sc <- comptSNE(sc, rseed=15555, quiet=TRUE)
# This is the part that freezes
cdiff <- DEGanalysis(
sc, Clustering="K-means", K=K, fdr=0.10, name="all_clusters",
export=FALSE, quiet=FALSE, plot=FALSE, nresamp=5, nperms=10
)
DEGanalysis
being:The dataset is ready for differential expression analysis[1] "Cl2" "Cl1" "Cl3"
Number of comparisons: 6
Estimating sequencing depths...
Resampling to get new data matrices...
perm= 1
perm= 2
perm= 3
perm= 4
perm= 5
perm= 6
perm= 7
perm= 8
perm= 9
perm= 10
Number of thresholds chosen (all possible thresholds) = 1283
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Low-regulated genes in the Cl1 in Cl2 VS Cl1
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Estimating sequencing depths...
Resampling to get new data matrices...
Ctrl+C doesn't quit the function, only killing R does the trick.
For a simpler set of parameters, for example n_genes <- 100
and K <- 2
, the code above ends with the following output:
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Comparisons Target cluster Gene number File name Gene number File name
1 Cl2 VS Cl1 Cl1 477 Up-regulated-all_clustersCl1inCl2VSCl1.csv 941 Low-regulated-all_clustersCl1inCl2VSCl1.csv
2 Cl2 VS Cl1 Cl2 477 Low-regulated-all_clustersCl2inCl2VSCl1.csv 941 Up-regulated-all_clustersCl2inCl2VSCl1.csv
Moreover, the structure of cdiff
is:
List of 2
$ : chr [1:1422, 1:2] "ENSG00000005022" "ENSG00000006327" "ENSG00000008394" "ENSG00000008517" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "DEGsE" "DEGsS"
$ :'data.frame': 2 obs. of 6 variables:
..$ Comparisons : chr [1:2] "Cl2 VS Cl1" "Cl2 VS Cl1"
..$ Target cluster: chr [1:2] "Cl1" "Cl2"
..$ Gene number : int [1:2] 477 477
..$ File name : chr [1:2] "Up-regulated-all_clustersCl1inCl2VSCl1.csv" "Low-regulated-all_clustersCl2inCl2VSCl1.csv"
..$ Gene number : int [1:2] 941 941
..$ File name : chr [1:2] "Low-regulated-all_clustersCl1inCl2VSCl1.csv" "Up-regulated-all_clustersCl2inCl2VSCl1.csv"
The following Notebooks contain calls to the legacy function KmeanOrder()
:
As per a warning on that function, KmeanOrder()
has been replaced by pseudoTimeOrdering()
, so all calls to KmeanOrder()
should be replaced ASAP.
Ideally, just changing the name of the function called should suffice; if bugs occur, fixes should be made to pseudoTimeOrdering()
.
This change would allow us to remove KmeanOrder
, thus reducing the code footprint and check time of the package (something DIscBIO is in need).
I am opening this issue as a notification because DIscBIO
is listed here as a package that relies (depends/imports/suggests) on Seurat. As you may know, we recently released Seurat v5 as a beta in March of this year, with new updates for spatial, multimodal, and massively scalable analysis. For more information on updates and improvements, check out our website https://satijalab.org/seurat/.
We are now preparing to release Seurat v5 to CRAN, and plan to submit it on October 23rd. While we have tried our best to keep things backward-compatible, it is possible that updates to Seurat and SeuratObject might break your existing functionality or tests. We wanted to reach out before the new version is on CRAN, so that there's time to report issues/incompatibilities and prepare you for any changes in your code base that might be necessary.
We apologize for any disruption or inconvenience, but hope that the improvements to Seurat v5 will benefit your users going forward.
To test the upcoming release, you can install Seurat from the seurat5
branch using the instructions available on this page: https://satijalab.org/seurat/articles/install.
Thank you!
Seurat v5 team
The linter workflow has found several refactoring improvements to suggest. Solving all the warnings raised should be enough to close this issue.
Error: `file` must be a string, raw vector or a connection.
Traceback:
1. PPI(data, FileName)
2. read_tsv(repo_content)
3. read_delimited(file, tokenizer, col_names = col_names, col_types = col_types,
. locale = locale, skip = skip, skip_empty_rows = skip_empty_rows,
. comment = comment, n_max = n_max, guess_max = guess_max,
. progress = progress)
4. col_spec_standardise(data, skip = skip, skip_empty_rows = skip_empty_rows,
. comment = comment, guess_max = guess_max, col_names = col_names,
. col_types = col_types, tokenizer = tokenizer, locale = locale)
5. datasource(file, skip = skip, skip_empty_rows = skip_empty_rows,
. comment = comment)
6. stop("`file` must be a string, raw vector or a connection.",
. call. = FALSE)
[13.39] Salim Ghannoum I am always running the noebook
[13.39] Salim Ghannoum CTCs-Binder-Part2.ipynb
[13.40] Salim Ghannoum I printed the repo_content
So it is not empty, I dont know why read_tsv() is not working
The names of DIscBIO functions are not consistent: some use Pascal case (e.g. ClassVectoringDT, PCAplotSymbols), others use Dromedary case (e.g. clustheatmap, plotGap); even the spelling of similar functions can change (e.g. plotOrderTsne vs. plotLabelstSNE). This can cause confusion on a user. It would be great if all functions had a naming convention.
Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerbqeislym'...
HEAD is now at b9a1dc3 Merge branch 'release-1.2.0'
Error during build: .0.0 is not valid SemVer st
Possible solution provided on jupyterhub/repo2docker#1140 (comment).
The Networking()
and PPI()
functions use httr::GET()
to parse URLs. They only try the URL one time, which sometimes fail due to network errors (a common problem among users with less-than-perfect internet connections). It would be great if these calls were wrapped around loops that try GET some (e.g. 3) times until they get a status_code()
of 200, which means a successful URL retrieval.
Networking()
PPI()
The code below lacks downloading steps for the scripts and data from the Binder.
library("DIscBIO")
source("DIscBIO-CTCs-Binder-Part1.r")
source("DIscBIO-CTCs-Binder-Part2.r")
Networking(data, FileName)
PPI(data, FileName)
TravisCI builds for DIscBIO are failing
A passing build.
The last passing build was https://travis-ci.org/github/ocbe-uio/DIscBIO/builds/721646083. The only change between commit 0567683 and the next one (c7d9929) are a few lines on README.md. Maybe the new lines regarding BiocManager::install
caused this?
P.S.: to check the difference between the two commits, run
git diff c7d99297a2dea98763d79248ffd0d675a3be64b5 0567683379186cf3b60a37f861932bb86696e04b
on a terminal at the DIscBIO working directory.
Installation of DIscBIO from CRAN fails.
From an R interactive session, run:
install.packages("DIscBIO")
Output that ends in:
* DONE (DIscBIO)
The downloaded source packages are in
‘/tmp/Rtmpgn8yf4/downloaded_packages’
The name of the temporary directory, Rtmpgn8yf4, will probably be different in your case.
The downloaded source packages are in
‘/tmp/Rtmpgn8yf4/downloaded_packages’
Warning messages:
1: In install.packages("DIscBIO") :
installation of package ‘rJava’ had non-zero exit status
2: In install.packages("DIscBIO") :
installation of package ‘RWekajars’ had non-zero exit status
3: In install.packages("DIscBIO") :
installation of package ‘RWeka’ had non-zero exit status
4: In install.packages("DIscBIO") :
installation of package ‘DIscBIO’ had non-zero exit status
Package poorman
is schedule for removal from CRAN, which affects the philentropy
package and, by extension, DIscBIO
. We use philentropy
for calcualtion of the Jaccard distances, so the functionality can be rewritten.
With 59 838 obsservations and 94 variables, the valuesG1ms dataset that comes with the package is too large for some function examples.
@SystemsBiologist, is it possible to add a second data example, with a subset of valuesG1ms? Which is the best way to subset the data and still have the dataset make sense? One idea is to just keep 33 columns (G1–G1.10, S1–S1.10, G2–G2.10) and, say, the first 1 000 rows of the dataset. Is this reasonable?
DIscBIO contains the boot package as a dependency just for the purpose of using the boot()
function inside Jaccard()
(see here). If this were to be replaced by an in-house solution, there would be one fewer dependency for DIscBIO (which is currently depending on 21 non-default packages; this generates a NOTE from devtools::check()
).
Clicking on the "View as code" icon of some notebook files leads to a 404 error page.
I don't know if I'm using the notebook properly, but let's take as an example in this page:
I would like to copy the R code chunks, but manually selecting each chunk and Ctrl+C/Ctrl+V-ing my way through each one of them sounds like waaay much more work than it should. Normal Python notebooks can export just the code chunks for simple copying and pasting, so I assume Binder allows a reader to do that too. I also assume that can be done by clicking here:
When I click on the highlighted icon in the image above, this is where I end up:
So maybe the icon is pointing to the wrong place. Please fix or advise.
The dev branch is unstable; Binder should be pointed to master by default.
Some functions are very explicit about their dependency on other functions, for example running
comptSNE(DISCBIO(valuesG1ms))
Returns
Error in comptSNE(DISCBIO(valuesG1ms)) : run clustexp before comptsne
However, there are several other functions such as KmeanOrder
which have the same requirement, but are not explicit about it. Running
KmeanOrder(DISCBIO(valuesG1ms), export = FALSE)
Returns
Error in
[<-
(*tmp*
, cid, , value = colMeans(pcareduceres[names(clusterid[clusterid == : indeksen ligg utanfor grensene
(apologies for the error in Nynorsk, the point is that the first argument is out of bounds).
The function is clearly expecting some other clustering function such as clustexp
to be run beforehand.
There are a few functions like this, and I can fix this by adding a similar validation algorithm to them as I go through the demo script, but I was wondering what this validation algo contains. Is it enough to check if length(object@cpart) == 0
or should I be checking other slots? Is clustexp
the only clustering function that needs to be run before or are there alternatives?
Dear Author,
Thank you for the excellent tool. I have tested the tool on the test dataset of CTC and it works flawlessly. I then tried to use it on a 10x dataset from GSE136103. It has 10 healthy and 10 diseased samples.
I have processed these using Seurat as follows
data.10x = list()
data.10x[[1]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041161/cellranger/")
data.10x[[2]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041162/cellranger/")
data.10x[[3]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041163/cellranger/")
data.10x[[4]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041164/cellranger/")
data.10x[[5]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041165/cellranger/")
data.10x[[6]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041166/cellranger/")
data.10x[[7]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041167/cellranger/")
data.10x[[8]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041168/cellranger/")
data.10x[[9]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Cirrhotic.Cellranger/Cirrhotic/GSM4041169/cellranger/")
data.10x[[10]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041150/cellranger/")
data.10x[[11]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041151/cellranger/")
data.10x[[12]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041152/cellranger/")
data.10x[[13]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041153/cellranger/")
data.10x[[14]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041154/cellranger/")
data.10x[[15]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041155/cellranger/")
data.10x[[16]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041156/cellranger/")
data.10x[[17]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041157/cellranger/")
data.10x[[18]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041158/cellranger/")
data.10x[[19]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041159/cellranger/")
data.10x[[20]] <- Read10X(data.dir = "/home/abhishek/UseCase/Liver/UseCaseLiver/Healthy.Cellranger/Healthy/GSM4041160/cellranger/")
### Create vector of sample names
samples = c("GSM4041150","GSM4041151","GSM4041152","GSM4041153","GSM4041154","GSM4041155","GSM4041156","GSM4041157","GSM4041158", "GSM4041159","GSM4041160", "GSM4041161","GSM4041162","GSM4041163","GSM4041164","GSM4041165","GSM4041166","GSM4041167","GSM4041168","GSM4041169")
#Create Seurat Objects
scrna.list = list()
for (i in 1:length(data.10x)) {
scrna.list[[i]] = CreateSeuratObject(counts = data.10x[[i]], min.cells=5, min.features=50, project=samples[i]);
scrna.list[[i]][["DataSet"]] = samples[i];
}
rm(data.10x)
### Merge Seurat object into a single object
scrna <- merge(x=scrna.list[[1]], y=c(scrna.list[[2]],scrna.list[[3]],scrna.list[[4]],scrna.list[[5]],scrna.list[[6]],scrna.list[[7]],
scrna.list[[8]],scrna.list[[9]],scrna.list[[10]],scrna.list[[11]],scrna.list[[12]],scrna.list[[13]],
scrna.list[[14]],scrna.list[[15]],scrna.list[[16]],scrna.list[[17]],scrna.list[[18]],scrna.list[[19]],
scrna.list[[20]]), add.cell.ids = c("GSM4041161","GSM4041162",
"GSM4041163","GSM4041164","GSM4041165","GSM4041166","GSM4041167","GSM4041168","GSM4041169",
"GSM4041150","GSM4041151","GSM4041152","GSM4041153","GSM4041154","GSM4041155","GSM4041156",
"GSM4041157","GSM4041158", "GSM4041159","GSM4041160"));
Then computed the count matrix per sample by taking average as below
avg.counts <- AverageExpression(object = scrna)
write.csv(avg.counts, file="Liver.csv")
Then I use DIscBIO pipeline as outlined for further analyses
library(DIscBIO)
FileName<-"liver"
DataSet <- read.csv(file = paste0(FileName,".csv"), sep = ",",header=T)
rownames(DataSet)<-DataSet[,1]
DataSet<-(DataSet[,-1])
cat(paste0("The ", FileName," contains:","\n","Genes: ",length(DataSet[,1]),"\n","cells: ",length(DataSet[1,]),"\n"))
sc<- DISCBIO(DataSet)
In the next step everything becomes infinity
S1<-summary(colSums(DataSet,na.rm=TRUE)) # It gives an idea about the number of reads across cells
print(S1)
I am not sure where I am going wrong and how can I fix this and in addition is DIscBIO capable of handling 10x data or it is only for FACS sorted and SMART-seq data?
Could you please guide me and give suggestion that could fix this problem.
Thank you
Unit tests, local and on GitHub Actions, are either failing (usualy due to a pathfinding problem) or taking forever to finish (because the datasets are large and the tests need adaptations, e.g. smaller bootstrapping runs).
test-coverage.yaml
As a matter of fact, I'm not sure the Binder tests should be part of the package, since they use datasets that are external to it (i.e., they are only used on the notebook). Alternatively, use smaller versions of them.
The unit tests for the package have been simplified so the package could conform to CRAN policies. There is no need to completely remove them from the repository, as they only need to be excluded from the package itself. Having unit tests is very important in maintaining compatibility across versions.
Hence, I propose:
[20:21, 10.10.2020] Salim Ghannoum: I have just noticed something strange in DIscBIO, the Networking() is giving a network that is not the same as the hyperlink
[20:22, 10.10.2020] Salim Ghannoum: You can see that in the binder for MLS or part 2 or part 4
[20:25, 10.10.2020] Salim Ghannoum: I went through the code but could not understand perfectly the part you added, could you please check it when you have time?
Several funcitons detect whether k-means or model-based clustering was performed. The code to do so is individualized on each function, even though the procedure is almost identical and should be aggregated into one internal function.
Here are some examples of code chunks containing such redundancy:
DIscBIO/R/DIscBIO-generic-pseudoTimeOrdering.R
Lines 28 to 45 in 0c90899
DIscBIO/R/DIscBIO-generic-plotSilhouette.R
Lines 25 to 44 in 0c90899
DIscBIO/R/DIscBIO-generic-plottSNE.R
Lines 15 to 28 in 0c90899
Also, the code makes it so that the functions will prefer k-means over MB, so if both are ran, than k-means is preferred. This should be documented. Ideally, the user should be made aware of this, though, and perhaps be prompted to choose or warned of the choice made by the function.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.