shamir-lab / multi-omics-cancer-benchmark Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
Hello,
I'm trying to run some parts of your benchmark and I have some questions about your code and some of the choices you made.
First, I have a question about the MultiCCA run :
cca.ret = PMA::MultiCCA(omics.transposed, ncomponents = MAX.NUM.CLUSTERS)
sample.rep = omics.transposed[[1]] %*% cca.ret$ws[[1]]
It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :
sample.rep = omics.transposed[[2]] %*% cca.ret$ws[[2]]
What are the consequences on the results ?
Second, in the same MultiCCA run, the silhouette values of clusters are computed to chose coherent clusters :
sils = c()
clustering.per.num.clusters = list()
for (num.clusters in 2:MAX.NUM.CLUSTERS) {
cur.clustering = kmeans(sample.rep, num.clusters, iter.max=100, nstart=30)$cluster
sil = get.clustering.silhouette(list(t(sample.rep)), cur.clustering)
sils = c(sils, sil)
clustering.per.num.clusters[[num.clusters - 1]] = cur.clustering
}
cca.clustering = clustering.per.num.clusters[[which.min(sils)]]
I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be which.max(sils)
instead ?
Finally, my last question is about the choice of removing some tissues from the datasets :
filter.non.tumor.samples <- function(raw.datum, only.primary=only.primary) {
# 01 is primary, 06 is metastatic, 03 is blood derived cancer
if (!only.primary)
return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01', '03', '06')])
else
return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01')])
}
Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?
I hope my questions are clear,
Thank you in advance !
Galadriel
Hello,
Thank you for this well-documented project and documented code.
I am interested in the way you have performed the Multi-NMF on your multi-omics data. Therefore, I am requesting whether I can have access to the binary file.
Thank you very much in advance for your answer,
Best,
We would like to use your benchmark to test our tool, however, currently, there is no license available. Could you please include it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.