The multi-omics-cancer-benchmark from shamir-lab

MultiCCA clustering and tumor samples

Hello,

I'm trying to run some parts of your benchmark and I have some questions about your code and some of the choices you made.

First, I have a question about the MultiCCA run :

cca.ret = PMA::MultiCCA(omics.transposed, ncomponents = MAX.NUM.CLUSTERS)
sample.rep = omics.transposed[[1]] %*% cca.ret$ws[[1]]

It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :

sample.rep = omics.transposed[[2]] %*% cca.ret$ws[[2]]

What are the consequences on the results ?

Second, in the same MultiCCA run, the silhouette values of clusters are computed to chose coherent clusters :

 sils = c()
  clustering.per.num.clusters = list()
  for (num.clusters in 2:MAX.NUM.CLUSTERS) {
    cur.clustering = kmeans(sample.rep, num.clusters, iter.max=100, nstart=30)$cluster  
    sil = get.clustering.silhouette(list(t(sample.rep)), cur.clustering)
    sils = c(sils, sil)
    clustering.per.num.clusters[[num.clusters - 1]] = cur.clustering
}
 cca.clustering = clustering.per.num.clusters[[which.min(sils)]]

I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be which.max(sils) instead ?

Finally, my last question is about the choice of removing some tissues from the datasets :

filter.non.tumor.samples <- function(raw.datum, only.primary=only.primary) {
  # 01 is primary, 06 is metastatic, 03 is blood derived cancer
  if (!only.primary)
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01', '03', '06')])
  else
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01')])
}

Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?

I hope my questions are clear,
Thank you in advance !
Galadriel

Use of Multi-NMF binary file

Hello,
Thank you for this well-documented project and documented code.
I am interested in the way you have performed the Multi-NMF on your multi-omics data. Therefore, I am requesting whether I can have access to the binary file.
Thank you very much in advance for your answer,
Best,

Request for a license

We would like to use your benchmark to test our tool, however, currently, there is no license available. Could you please include it?

shamir-lab / multi-omics-cancer-benchmark Goto Github PK

multi-omics-cancer-benchmark's People

Contributors

Stargazers

Watchers

Forkers

multi-omics-cancer-benchmark's Issues

MultiCCA clustering and tumor samples

Use of Multi-NMF binary file

Request for a license

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs