GithubHelp home page GithubHelp logo

Comments (6)

wallet-maker avatar wallet-maker commented on August 31, 2024 1

Hi Cristian,

thanks for your feedback.

Regarding your first question:

These behaviors is expected.

Three aspects: a) Multiple factors getting named by the same gene set and b) cell type specific factors getting named by factors from a different cell type c) a factor does not get a gene set name assigned

a) There is by design no 1:1 mapping between gene sets and factors. The factors are named by their overlap coefficient with all input gene sets, so also cell type specific gene sets will be used for naming all of the factors. But don't worry, Spectra only used them to fit the factors for the cell types you indicated. Maybe I will constrain the factor naming to the gene sets that were used for fitting the factors for the respective cell type. Personally, I felt it was helpful to perform the naming by computing overlap of all gene sets vs all factors because when you set a gene set as cell type specific, Spectra may still discover a similar factor outside of that cell type in an unsupervised way that is without using the gene set. Let me know if that makes sense.

b) Because of this lack of 1:1 mapping, gene sets which are not very coherent (maybe they are a mix of several processes) can be split into several factors mapping to a same gene. You may want to look at the marker genes for each of these factors. I suspect they will use different genes from the gene set. We believe this behavior is desired as the gene sets cannot be regarded as ground truth and may contain several ground truth processes which should then be split into distinct factors by Spectra. I think this behavior will be reduced if you reduce lambda, however this will also lead to stronger supervision by the gene sets so you learn less from the data. If you have a lot more factors than gene sets you can try reducing the number of factors first. Another approach could also be to take a closer look at the marker genes and see if there are perhaps some important subprocesses in your gene set and whether the gene set can be split and renamed or made more coherent. Of course there are also other ways to name the factors e.g. by running GSEA on the marker genes/ gene scores/loadings.

c) If there is an gene set vs factor marker genes overlap coefficient lower than the threshold in spectra_est (default of overlap_threshold = 0.2 ), the factor does not get a gene set assigned, so it is just called 'factor-index-X-celltype-X-factor-index' with cell type being the cell type of that factor and the factor index just the index in the factors x gene loadings/scores matrix. The factor names should be unique though and this is also what I see from the data you posted correct?

Regarding eta:
You can find eta in the model file outputted by est_spectra by calling:
model.return_eta_diag()

Regarding the cell scores:
Yes, this is expected behavior. The absolute value of the cell score is not super helpful / not easy to interpret. You are correct that in the preprint we have generally higher cell scores which is probably just because we used a lower number of factors. We have added information and importance scores in the revised manuscript. These quantify the contribution of a factor to explaining the observed expression data and in cell type variation, respectively. You can find these in spectra_util . I will try to add these to the tutorial soon. Also if you have suggestions for better naming conventions we are happy to consider them.

Regarding the gene set dictionary:
Yes, if you run Spectra without a gene set dictionary or without a gene set dictionary and without cell types it will only be supervised by cell type or completely unsupervised, respectively.

Regarding the GPU version:
This is interesting, I do not have a very confident answer and would refer this to @russellkune .

Let me know if that helps.

Thanks,
Thomas

from spectra.

kvshams avatar kvshams commented on August 31, 2024

@ccruizm I also seen a similar observation. And got the lam saturation in drastic diffrent on the same data set same machine and same seed (#23) on multiple run. This is much inconsistent in GPU implementation than in CPU.

from spectra.

ccruizm avatar ccruizm commented on August 31, 2024

Hello Thomas,

I appreciate your thorough explanation. Now things are much clearer. I will need to invest some time to digest all the output I got from Spectra taking into account all the remarks you made :)

Will also test different lambda values and try to refine my gene sets so there are not 'promiuscous' gene sets that cover many cell processes.

Regarding your comment on unassigned factors to a specific gene set, you are right that all entries are unique and named with no duplicates. However I am curious why the final number of factors is 'limited/sticked' to the number of gene sets. I would assume if there are factors that are not fitting that well or less 'relevant', then the number of final factors should change. Because I have the feeling now that from my 465 gene sets, Spectra will try to fit 465 factors even if some of them are not specific for a particular cell type (such as the example with jessa22_M2 (which appears at least 8 times and abrogates other gene sets). What happened with the gene sets that now, talking again about this particular gene set (jessa22_M2), has 'replaced'? Does it mean that the gene set 17 that in my dictionary belongs to 'all_selenoamino-acid metabolism' has an overlap with jessa22_M2 but is better defined by the last one? (although I have noticed that factors try to keep the same order as in the dictionary, sometimes they are shuffled and there is no direct 1:1 replacement with less fitting factors).

About the cell scores, I had the feeling they are more 'arbitrary' units and not necessarily can be comparable across factors. Looking forward to read more about it once the revised version is publically available (or in a updated tutorial๐Ÿ™‚).

Lastly, if I run Spectra in an unsupervised way, how does it determines the number of factors? do I need to set and expected number of factors to be recovered? or will Spectra determine the best and more stable value where no more overlapping factors and more relevant ones are reached?

Hope @russellkune can tell us a bit more about the GPU implementation of Spectra.

Thanks in advance for the time and detailed answers.

from spectra.

ccruizm avatar ccruizm commented on August 31, 2024

Hi there,

I was trying to run spectra fully unsupervised (no dictionary nor cell types), but I am unsure how to set the model besides setting use_cell_types=False. Since gene_set_dictionary must be provided, should I create an empty dictionary with global as a unique key? Which of the other options should be modified when running in an unsupervised way (e.g., lam, delta, etc)

Thanks!

from spectra.

wallet-maker avatar wallet-maker commented on August 31, 2024

Hi Cristian,
We have never used the method this way. I think setting gene_set_dictionary = {'global':{}} sounds good. The choice of lambda should not matter because it controls the weight of the data vs gene sets. I think you can use default settings for the parameters. There might be bugs running the method without gene sets and cell types. It was not really designed for that.
best,
Thomas

from spectra.

wallet-maker avatar wallet-maker commented on August 31, 2024

Perhaps to add to that: If you use Spectra without cell types and without gene sets it should essentially behave like NMF. So probably it's better to run NMF for this use case (mind that in the paper we show that NMF is actually performing very poorly so I would maybe rather use something like scHPF if you want to go fully unsupervised).

from spectra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.