GithubHelp home page GithubHelp logo

Comments (5)

drewszabo avatar drewszabo commented on August 12, 2024

Just on the point about using the isolatePrec rule. It's hard to evaluate the exact MS1 isotopic abundance if the entire spectrum is included. Here is an example of the MS1 from simazine - I should be able to see the M+2 isotope fingerprint of Cl but its difficult to distinguish in this spectrum.

spec-MS-c27cc6c2f8ac1b4d

You see the same thing if you manually add the .ms files to SIRIUS GUI. I wonder if this makes it more difficult for the DNN algorithm that SIRIUS uses to properly evaluate the correct formula with so much noise?

from patroon.

rickhelmus avatar rickhelmus commented on August 12, 2024

Hi Drew,

Many thanks for the bug report and other feedback! :-)

The bug was actually related to reAverage=TRUE. This would remove peak IDs and therefore you got the errors with formula annotation. There was also another issue that this argument was ignored when checking if cached data was available, leading to some strange situations like you saw that the precursor isolation seemed to be the issue. I just pushed some fixes. Hopefully all should be fine now.

Quite interesting that you see that the isolation of precursors may also be useful for SIRIUS! I thought that SIRIUS had its own filtering, that's why I only recommend it currently for GenForm. I am curious if you see any differences in the isoScores? If you get consistently better results it might be good to also make it default for SIRIUS.

from patroon.

drewszabo avatar drewszabo commented on August 12, 2024

Fantastic. Ill run some tests with SIRIUS with and without the precursor isolation and get back to you with the isoScore results. You might be right that SIRIUS performs its own filtering within the algorithm and does not include the other peaks in the scoring. When I manually submit the .ms files to the GUI, it does show the entire spectrum.

from patroon.

drewszabo avatar drewszabo commented on August 12, 2024

Hey Rick,

Here are some of the results from my testing. I ran SIRIUS back-to-back to try and eliminate inconsistencies with hitting their server. The biggest difference is the total run time. With fewer MS1 precursors, SIRIUS takes a fraction of the time to complete. I imagine this will scale up enormously with more features and peaks in the mslists. There are also negligible effects on the predictions. In fact, SIRIUS only correctly annotated Emamectin B1a (m/z = 886,5317) when the precursor isotopologue was isolated. Otherwise the SIRIUS results were the exact same for the 14 features (with MSMS data). Including the isoScores, which remained unchanged between using isolatePrec and without. SIRIUS must be performing their own precursor filtering but at a huge cost to compute time.

Elapsed Time
With isolatePrec
features 42
peaks 3495
formulasSIRIUS 31 sec
compoundsSIRIUS 39.49 sec
Top1 comp annotation = 11/14

Without isolatePrec
features 42
peaks 70034
formulasSIRIUS 328.65 sec
compoundsSIRIUS 781.65 sec
Top1 comp annotation 10/14

It's not a definitive experiment by any means, but I will probably be using the isolatePrec rule moving forward for my own analysis. The time it saves me is a huge advantage, especially without impacting the annotation performance.

from patroon.

rickhelmus avatar rickhelmus commented on August 12, 2024

Wow, that's awesome! Thanks for the tests! Makes me wonder if this filtering step should be done in default workflows... Something to think about ... :-)

from patroon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.