GithubHelp home page GithubHelp logo

multi-omics-cancer-benchmark's People

Contributors

nimrodrap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

multi-omics-cancer-benchmark's Issues

MultiCCA clustering and tumor samples

Hello,

I'm trying to run some parts of your benchmark and I have some questions about your code and some of the choices you made.

First, I have a question about the MultiCCA run :

cca.ret = PMA::MultiCCA(omics.transposed, ncomponents = MAX.NUM.CLUSTERS)
sample.rep = omics.transposed[[1]] %*% cca.ret$ws[[1]]

It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :

sample.rep = omics.transposed[[2]] %*% cca.ret$ws[[2]]

What are the consequences on the results ?

Second, in the same MultiCCA run, the silhouette values of clusters are computed to chose coherent clusters :

 sils = c()
  clustering.per.num.clusters = list()
  for (num.clusters in 2:MAX.NUM.CLUSTERS) {
    cur.clustering = kmeans(sample.rep, num.clusters, iter.max=100, nstart=30)$cluster  
    sil = get.clustering.silhouette(list(t(sample.rep)), cur.clustering)
    sils = c(sils, sil)
    clustering.per.num.clusters[[num.clusters - 1]] = cur.clustering
}
 cca.clustering = clustering.per.num.clusters[[which.min(sils)]]

I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be which.max(sils) instead ?

Finally, my last question is about the choice of removing some tissues from the datasets :

filter.non.tumor.samples <- function(raw.datum, only.primary=only.primary) {
  # 01 is primary, 06 is metastatic, 03 is blood derived cancer
  if (!only.primary)
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01', '03', '06')])
  else
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01')])
}

Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?

I hope my questions are clear,
Thank you in advance !
Galadriel

Use of Multi-NMF binary file

Hello,
Thank you for this well-documented project and documented code.
I am interested in the way you have performed the Multi-NMF on your multi-omics data. Therefore, I am requesting whether I can have access to the binary file.
Thank you very much in advance for your answer,
Best,

Request for a license

We would like to use your benchmark to test our tool, however, currently, there is no license available. Could you please include it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.