Comments (6)
Hello Corina,
For CliqueMS: I did submit a few fixes some time ago, which could be related to this. Unfortunately, they haven't been integrated by the author so far. The handbook suggests to install the version with my patches, maybe you could (re-)install this version just to be safe?
remotes::install_github("rickhelmus/cliqueMS")
For the rest it sounds like the componentization algorithms have a lot of data to chew un. Maybe they simply didn't finish yet, even if things appear to be stuck? I am curious with how many feature groups you are dealing with? Perhaps you could try more strict filters to reduce the number. You could at least do this to confirm if componentization is working at all, for instance, by increasing the intensity threshold during the feature group filter step.
Thanks,
Rick
from patroon.
from patroon.
Hi Corina,
About 3000 features groups after the filter()
step is a decent amount, but nothing too extreme I would say. How many analyses are you processing? I just tried to run the componentization with the unfiltered demo data (~1500 feature groups, ~8000 features) and it took my system about 2 minutes. So I guess it's not strange that with your feature numbers you would need wait a bit :-)
Since you mentioned you applied a very high intensity threshold now, does that mean you previously had also way more feature groups? I would say having much more than a couple of thousand feature groups after filtering your feature group data may yield to very long processing times in some steps.
For the cache: yes it's normal that this can quickly grow, but of course also within reasonable amounts. What kind of file sizes are we talking about with your workflow? With the demo data test run my cache file ~300 MB, although I have seen it grow to a few GB with more realistic projects.
Thanks,
Rick
from patroon.
from patroon.
Hi Corina,
Thank you for the additional details.
You are right that so many hours of processing and a cache file of hundreds of GB is very unreasonable! The 50.000 features you mentioned, where they the grouped features (feature groups) or raw features (eg what findFeatures() returns)? And if they were feature groups, was this after the filter() step? If the question is 'yes' to both, I understand what is happening ;-) These amounts would be simply way too much to work with (I guess it would've been nice if patRoon warned you about this). Are you working by any chance with Orbitrap data? I noticed that with Orbitraps the intensity scale is quite different and some more careful optimization is needed. You can have a look at the suspect screening workflow in the patRoon paper to get some inspiration.
The situation with GenForm is a bit tricky. You are right that the progress bar is not linear. The reason for this is that the calculations generally go from low to high m/z. Unfortunately, GenForm tends to produce a lot of candidates for higher feature masses, especially with additional elements specified. There are few things that you can try to remedy this:
- Reduce the maximum timeout: by default there is a timeout of 2 minutes for each calculation. You can lower this by setting the
timeout
option togenerateFormulas()
(in seconds). - Setting
maxCandidates
: this is similar to thetimeout
option: in this case calculations will be stopped when GenForm reaches a maximum number of candidates. The difference with thetimeout
option is that it will still proceed with the candidates that were generated before the threshold was reached. - Setting
calculateFeatures
toFALSE
. This way calculations are done on a feature group level, instead of per feature. This might be a bit less accurate, but in my experience it still works very well and I'm considering to make this default as this often makes more sense. - Reduce the number of possible elements. Obviously, your study needs to allow this... But any element you specify less can be a huge gain.
- Limit the feature masses you look at. For instance, maybe you are only interested up to 600 m/z? In that case you could first run a filter step on the feature groups, e.g.
filter(fGroups, mzRange = c(0, 600))
- Try SIRIUS instead of GenForm. I think it might be more robust for these kind of scenarios.
Hope this helps!
from patroon.
Closed due to inactivity, feel free to re-open!
from patroon.
Related Issues (20)
- Error: newProject()/Tp Screening unable to add parent suspect list HOT 5
- Handbook: Inconsistency for labelled Isotopes HOT 1
- Suggestion: 'conc'-column in newProject() HOT 2
- Error: Finding features using XCMS HOT 4
- Using data processing code with already annotated csv file HOT 1
- error with generateCompounds( ) - could not run MetFrag HOT 4
- Error: plotVenn() with featureGroupsComparison HOT 8
- merge generateCompounds from different tools HOT 6
- Swath acquisition HOT 4
- Error: report with compoundsConsensus fails HOT 1
- Report generation occasionally fails HOT 10
- findFeatures Error - Error in if (!recentFFM) settings <- c(settings, `-algorithm:epd:masstrace_snr_filtering` = boolToChr(traceSNRFiltering)) : missing value where TRUE/FALSE needed HOT 4
- MetFrag not functionning with java versions problem being compiled by a more recent version of the Java Runtime (52.0 and 55.0) HOT 1
- MS2Quant: "Error in if (fingerprints != "") { : the condition has length > 1" HOT 2
- Error: Docker RStudio-server, unable to connect to service HOT 2
- Load a previous group features list HOT 2
- Filtering MS/MS Peaks with Characteristic Fragment Ions HOT 5
- Selection of several adducts HOT 3
- Error: graphics device error in Docker HOT 4
- Request: Reduce cache size HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patroon.