GithubHelp home page GithubHelp logo

Comments (13)

ypriverol avatar ypriverol commented on August 15, 2024 1

@mmattano @nilshoffmann I have contacted the metaboligths team, and they provided three different examples of datasets that would be great to have representation in SDRF-metabolomics:

These three examples would be good gold standard datasets for annotations.

from proteomics-sample-metadata.

deeptijk avatar deeptijk commented on August 15, 2024 1

[#703] Metabolomics specification DRAFT [PLEASE DO NO MERGE]

from proteomics-sample-metadata.

nilshoffmann avatar nilshoffmann commented on August 15, 2024

@ypriverol Instead of pure ISA-Tab, would the MetaboLights flavor of it also work for a start?
This study could serve as a starting point: https://www.ebi.ac.uk/metabolights/editor/study/MTBLS1375

from proteomics-sample-metadata.

mmattano avatar mmattano commented on August 15, 2024

Hi @ypriverol and @nilshoffmann , I thought about how the SDRF proteomics format needs to be modified to fit metabolomics. In general, mass spectrometry-based analysis is very similar regardless of the investigated molecule so I think it’s more of a question of what should be commented on/recommended. For example, fractionating is quite common in proteomics and mentioned as required information in the proteomics SDRF paper but it’s an edge case in metabolomics, so I would treat it as optional.
In the recommended section I would suggest adding comments on critical information for reanalysis such as derivatization, if positive or negative mode was used or if the samples stem from an isotopic labeling experiment. There are additional comments that are debatable, since they are either rare or not related to the measurement. For example, multiplexing (using isobaric labeling tags) is quite rare, but an annotation procedure analogous to multiplexing for proteomics could be described. Also, downstream analysis information, first and foremost if this measurement is intended for a targeted- or an untargeted analysis, could be added but is not a part of the data itself.
What do you think about this? Can you think of specific information that could/should be added? Should we discuss NMR-based metabolomics as well or limit ourself to MS?
Going forward (loosely following Yasset’s outline above) I would suggest that I set up a list of required and recommended information + explanation/glossary. Then we can use this to request community feedback from EuBIC and potentially the broader metabolomics community. In the meantime, I would set up a list of databases and collect example studies. For databases with metadata, and specific metadata formats such as ISAtab, I will write parsers to translate to the SDRF. Then we can call for a EuBIC meeting with whoever wants to be involved, discuss details (I would do this after writing some parsers since they can be adjusted and provide example files to present) and ask for contributions in annotations/checking annotated files.
Please let me know what you think about this and what you think I should get started with.

from proteomics-sample-metadata.

ypriverol avatar ypriverol commented on August 15, 2024

Hi @mmattano and @nilshoffmann Here my comments:

First of all thanks for leading this.

Hi @ypriverol and @nilshoffmann , I thought about how the SDRF proteomics format needs to be modified to fit metabolomics. In general, mass spectrometry-based analysis is very similar regardless of the investigated molecule so I think it’s more of a question of what should be commented on/recommended. For example, fractionating is quite common in proteomics and mentioned as required information in the proteomics SDRF paper but it’s an edge case in metabolomics, so I would treat it as optional.

Following these lines we need to check what will be the case for multiplexing studies. We use the label column, to tackle and represend multiplexing making possible that multiple samples are related with the same file but they are differenciated using the label column. It may be the case that in metabolomics multiplexing and labeling is not common, then we can make that column optional.

In the recommended section I would suggest adding comments on critical information for reanalysis such as derivatization, if positive or negative mode was used or if the samples stem from an isotopic labeling experiment. There are additional comments that are debatable, since they are either rare or not related to the measurement. For example, multiplexing (using isobaric labeling tags) is quite rare, but an annotation procedure analogous to multiplexing for proteomics could be described.

Related with my previous comment ☝️.

Also, downstream analysis information, first and foremost if this measurement is intended for a targeted- or an untargeted analysis, could be added but is not a part of the data itself. What do you think about this?

I think columns regarding the type of experiment MUST be part of the data information, target and untargeted related in some part with the way the data is captured and analyzed. We do have those cases in proteomics where we specified the type of the acquisition method. We have two options here:

1- We can define in the same way something like comment [metabolomics profiling] with possible values: untargeted metabolite profiling or targeted metabolite profiling.

2- We can use also the column technology type which two different types of values untargeted metabolite profiling or targeted metabolite profiling

Can you think of specific information that could/should be added? Should we discuss NMR-based metabolomics as well or limit ourself to MS?

The main priority and first proposal must be about MS metabolomics and how the SDRF can facilitate reanalyzis of public proteomics data. Then, we can focus on the other use cases, what do you think?

Going forward (loosely following Yasset’s outline above) I would suggest that I set up a list of required and recommended information + explanation/glossary. Then we can use this to request community feedback from EuBIC and potentially the broader metabolomics community.

Fully, agreed. I think we should have in the same repo a PR with three documents:

1- Proposal for SDRF-metabolomics, in that one we reference the SDRF proteomics for the sections that are common and refine the ones that are different.

2- A set of templates and and one example that represent the proposal.

3- A lit of ontology terms that needs to be added to PSI-MS to futfil the specification.

In the meantime, I would set up a list of databases and collect example studies. For databases with metadata, and specific metadata formats such as ISAtab, I will write parsers to translate to the SDRF.

Agreed.

Then we can call for a EuBIC meeting with whoever wants to be involved, discuss details (I would do this after writing some parsers since they can be adjusted and provide example files to present) and ask for contributions in annotations/checking annotated files. Please let me know what you think about this and what you think I should get started with.

As soon as we have a solid proposal and topics to be discussed, we can present this to EUBIC and HUPO-PSI groups.

from proteomics-sample-metadata.

mwang87 avatar mwang87 commented on August 15, 2024

I think this is a good initiative. One thing that we might want to consider on the analysis portal side to make it easier for people to get it into these formats is automatically convert from GNPS2 metadata to SDRF so it'll just be super easy.

It'll help meet people where they are right now.

Just an example of controlled vocabulary forms of what we have in public GNPS is here:

https://redu.gnps2.org/dump

We've also put some effort into getting as much as we can into the same CV from metabolomics workbench by mining the metadata they already have available.

Best,

Ming

from proteomics-sample-metadata.

ypriverol avatar ypriverol commented on August 15, 2024

@mwang87 Thanks for your comments:

I do agree that we should make easy the conversion from GNPS to SDRF. The major challenges could be to transform free text to CV terms.

1- We can collect all of them.
2- Add them to the corresponding ontology or define the corresponding mapping
3- Finally, integrate in the sdrf-pipelines how to perform the conversion from GNPS to SDRF.

The most important thing here now is defined the columns in the SDRF metabolomics that enables to perform semi-automatic reanalysis. I have a couple of questions:

1- Is GNPS focus in mainly MS targeted and untargeted metabolomics experiments?
2- Should we tackle in the SDRF metabolomics other technologies and analytical methods like MNR?
3- What properties do you think are crucial at the data level comments in SDRF to enable automatic reanalysis at resource level?

from proteomics-sample-metadata.

nilshoffmann avatar nilshoffmann commented on August 15, 2024

@ypriverol @mmattano @TineClaeys
I have started test driving / adapting lesSDRF with the MTBLS1129 study here: https://github.com/nilshoffmann/lesSDRF/tree/sdrf_metabolomics to get a better understanding of currently supported fields / columns vs unsupported ones. One immediate finding is the difference between MetaboLights multi-column encoding of e.g.:

Characteristics[Organism] Term Source REF Term Accession Number
Homo sapiens NCBITAXON http://purl.obolibrary.org/obo/NCBITaxon_9606

In this case, the mapping from Characteristics[Organism] <-> characteristics[organism] is trivial, but there are other more difficult cases.

Not sure if lesSDRF should be able to import / edit MetaboLights ISA files, but if we plan for adaptation / conversion at some point, having a programmatic route would be very helpful, imho.

from proteomics-sample-metadata.

ypriverol avatar ypriverol commented on August 15, 2024

@nilshoffmann @TineClaeys:

We should not support in any tool SDRF for metabolomics if the standard doesn't exist. For example, in the proteomics standard the fraction idnetifier is required/mandatory. I don't think this is needed for metabolomics datasets. What @mmattano is trying to do is to standardize SDRF for metabolomics in this PR. Decide which fields should be required, the optional fields, etc; a similar exercise to what we did in proteomics. We are doing the same for other use cases like affinity proteomics datasets.

My point is for MS-based proteomics we can agree on templates, etc. However, for metabolomics we have to create a format, rules and guidelines. Then, at the moment if we create an SDRF with lesSDRF, it will be wrong.

Can you give your input in the PR created by @mmattano Who is leading the development of SDRF for metabolomics? Can we have a chat @mmattano @nilshoffmann and others interested in the topic early February about metabolomics SDRF.

Here, the PR from @mmattano #680

from proteomics-sample-metadata.

nilshoffmann avatar nilshoffmann commented on August 15, 2024

@ypriverol @TineClaeys Please do not misunderstand my intention here. I do not plan to have SDRF for metabolomics supported in lesSDRF until there is a spec, which is also not on me to decide. With my findings, I will of course contribute to @mmattano 's PR #680
Happy to chat any time in February.

from proteomics-sample-metadata.

ypriverol avatar ypriverol commented on August 15, 2024

Thanks @nilshoffmann for your quick reply. I will schedule a meeting for early February and send an email around. @mwang87 would you be able to participate?

from proteomics-sample-metadata.

mwang87 avatar mwang87 commented on August 15, 2024

Happy to chat. We've also done some work on our end for harmonizing metabolights with other repositories in metabolomics so hopefully our work/insight is helpful.

from proteomics-sample-metadata.

ypriverol avatar ypriverol commented on August 15, 2024

@nilshoffmann @mwang87 @mmattano and @bigbio/collaborators here a doodle poll for the meeting about the SDRF for metabolomics https://doodle.com/meeting/participate/id/avmJom0e

from proteomics-sample-metadata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.