GithubHelp home page GithubHelp logo

Componentization about patroon HOT 6 CLOSED

CorinaMeyer avatar CorinaMeyer commented on August 12, 2024
Componentization

from patroon.

Comments (6)

rickhelmus avatar rickhelmus commented on August 12, 2024

Hello Corina,

For CliqueMS: I did submit a few fixes some time ago, which could be related to this. Unfortunately, they haven't been integrated by the author so far. The handbook suggests to install the version with my patches, maybe you could (re-)install this version just to be safe?

remotes::install_github("rickhelmus/cliqueMS")

For the rest it sounds like the componentization algorithms have a lot of data to chew un. Maybe they simply didn't finish yet, even if things appear to be stuck? I am curious with how many feature groups you are dealing with? Perhaps you could try more strict filters to reduce the number. You could at least do this to confirm if componentization is working at all, for instance, by increasing the intensity threshold during the feature group filter step.

Thanks,
Rick

from patroon.

CorinaMeyer avatar CorinaMeyer commented on August 12, 2024

from patroon.

rickhelmus avatar rickhelmus commented on August 12, 2024

Hi Corina,

About 3000 features groups after the filter() step is a decent amount, but nothing too extreme I would say. How many analyses are you processing? I just tried to run the componentization with the unfiltered demo data (~1500 feature groups, ~8000 features) and it took my system about 2 minutes. So I guess it's not strange that with your feature numbers you would need wait a bit :-)

Since you mentioned you applied a very high intensity threshold now, does that mean you previously had also way more feature groups? I would say having much more than a couple of thousand feature groups after filtering your feature group data may yield to very long processing times in some steps.

For the cache: yes it's normal that this can quickly grow, but of course also within reasonable amounts. What kind of file sizes are we talking about with your workflow? With the demo data test run my cache file ~300 MB, although I have seen it grow to a few GB with more realistic projects.

Thanks,
Rick

from patroon.

CorinaMeyer avatar CorinaMeyer commented on August 12, 2024

from patroon.

rickhelmus avatar rickhelmus commented on August 12, 2024

Hi Corina,

Thank you for the additional details.

You are right that so many hours of processing and a cache file of hundreds of GB is very unreasonable! The 50.000 features you mentioned, where they the grouped features (feature groups) or raw features (eg what findFeatures() returns)? And if they were feature groups, was this after the filter() step? If the question is 'yes' to both, I understand what is happening ;-) These amounts would be simply way too much to work with (I guess it would've been nice if patRoon warned you about this). Are you working by any chance with Orbitrap data? I noticed that with Orbitraps the intensity scale is quite different and some more careful optimization is needed. You can have a look at the suspect screening workflow in the patRoon paper to get some inspiration.

The situation with GenForm is a bit tricky. You are right that the progress bar is not linear. The reason for this is that the calculations generally go from low to high m/z. Unfortunately, GenForm tends to produce a lot of candidates for higher feature masses, especially with additional elements specified. There are few things that you can try to remedy this:

  1. Reduce the maximum timeout: by default there is a timeout of 2 minutes for each calculation. You can lower this by setting the timeout option to generateFormulas() (in seconds).
  2. Setting maxCandidates: this is similar to the timeout option: in this case calculations will be stopped when GenForm reaches a maximum number of candidates. The difference with the timeout option is that it will still proceed with the candidates that were generated before the threshold was reached.
  3. Setting calculateFeatures to FALSE. This way calculations are done on a feature group level, instead of per feature. This might be a bit less accurate, but in my experience it still works very well and I'm considering to make this default as this often makes more sense.
  4. Reduce the number of possible elements. Obviously, your study needs to allow this... But any element you specify less can be a huge gain.
  5. Limit the feature masses you look at. For instance, maybe you are only interested up to 600 m/z? In that case you could first run a filter step on the feature groups, e.g. filter(fGroups, mzRange = c(0, 600))
  6. Try SIRIUS instead of GenForm. I think it might be more robust for these kind of scenarios.

Hope this helps!

from patroon.

rickhelmus avatar rickhelmus commented on August 12, 2024

Closed due to inactivity, feel free to re-open!

from patroon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.