GithubHelp home page GithubHelp logo

EMSL study identifiers about nmdc-schema HOT 23 CLOSED

aclum avatar aclum commented on July 1, 2024
EMSL study identifiers

from nmdc-schema.

Comments (23)

turbomam avatar turbomam commented on July 1, 2024 1

We should also rename emsl_proposal_identifier to emsl_proposal_identifiers because all alternative identifiers take a list of values

from nmdc-schema.

SamuelPurvine avatar SamuelPurvine commented on July 1, 2024 1

let me preface my comment by saying I wish I had the power to "make it so". Meaning I'm not certain I am a/the decider. Lee Ann showed preference for keeping emsl_project_identifier as the slot, and setting emsl_proposal_identifier as an alias if that's possible. The description would change to something like, "The project number assigned to the EMSL awarded study proposal that relates to that which is represented in NMDC" and drop any description for emsl_proposal_identifier. As to renaming it to the plural sense, that's dealer's choice :)

Part of the confusion is there is not yet consensus on EMSL's part as to which verbiage is the correct one to use. Proposal is what we've used for years, as a written proposal is the thing that is actually sent by an EMSL User so may as well just call it that. However, PNNL culture speaks in terms of projects when discussing a body of work, irrespective of whether it was derived from a proposal we sent out, or was sent to us, or was broken out internally with no particular proposal associated. I dimly recall semi-pitched battle when I started here about whether we should/could refer to EMSL work as a proposal or a project, with the decision made to call these things "proposal" to reduce confusion with the higher level entities known as projects (in PNNL parlance, the entire EMSL endeavor is a project). Sorry for the pedantry, it's increasingly becoming part of by nature!

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024 1

i just had a brief discussion with the EMSL NEXUS team about this.
Project vs proposal is funded vs not funded. That said, Sam's point is still true. The terms are used loosely.

@kauberry made a good point that a proposal vs a project doesn't matter much as only the awarded projects get DOIs. So they'd prefer us link the DOIs.

https://www.osti.gov/award-doi-service/biblio/10.46936/intm.proj.2021.60141/60000423

So The above link shows the award DOI
Award DOI:
https://doi.org/10.46936/intm.proj.2021.60141/60000423

FYI @plithnar @turbomam @lamccue

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024 1

@turbomam

IMO,

emsl_proposal_doi
emsl_proposal_identifier (move to src/schema/external_identifiers.yaml?)

Can be depricated and we should just have emsl_project_identifier where we link the DOIs and have alias for proposal and project_doi and award_doi

emsl_project_identifer needs to be under alternative identifiers > external database identifiers
https://microbiomedata.github.io/nmdc-schema/study_identifiers/

With the DOI, you can get the EMSL project (proposal) # and that will also provide the information you'd need to find it on DMS in EMSL (@SamuelPurvine )

Once the slots are there, and correct, it should be easy to add the DOI via change sheet. So, for 1000 soils (linked in the above comment) it'll be

slot emsl_project_identifier
emsl.project: https://doi.org/10.46936/intm.proj.2021.60141/60000423

I will work on getting all the identifiers we need together.

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024 1

@turbomam Ive made example data https://github.com/microbiomedata/nmdc-schema/blob/issue-927-emsl-doi/src/data/valid/Study-emsl-doi.yaml

currently modeled as a single value but pending input @mslarae13 we may want this to be a list

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024 1

@mslarae13 confirmed that slots should allow a list and requested a change to the slot name. I've updated the example valid data and created some example invalid data that uses the old slot name

to summarize the slot should have the following:

  • name emsl_project_doi
  • use a prefix of doi
  • use identifers.org or bioregistry.io as where the prefix is registered
  • accept a list of dois

@turbomam

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

From @lamccue
Check with EMSL leadership (George and Ken) about the bioregistry curie for EMSL. The emsl.project prefix that was registered is no problem from their viewpoint.  We agreed that if NMDC identifies a desire/need to have more prefixes (for samples or other things), then the process for us will be to talk to George and Ken about it first - what the prefix is and how it will be used, etc. They can decide if it is something that they also want to incorporate into the NEXUS schema, and then we decide how to proceed and who does the registering.

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

With that, we'd like to get EMSL projects linked in the data portal on the study landing pages at the "External Resources" -> "Additional data".

@aclum or @turbomam or @naglepuff or @dwinston Can someone tell what the class-slot is I need to put these links under? I attempted to use the API but was unsuccessful in finding the slot.

I assume this will need to be added to mongo via change sheet?

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

@naglepuff when we do get these added, they should appear here
Screen Shot 2023-06-01 at 7 32 13 PM

With this logo
EML_Logo-Light

Unless someone has another opinion? (I will check with Lee Ann)

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

@turbomam this emsl.identifiers curie should probably have it's own slot like GOLD,

https://microbiomedata.github.io/nmdc-schema/study_identifiers/

I believe this is where these links comes from?

Or is this it: https://microbiomedata.github.io/nmdc-schema/emsl_project_identifier/
? I cannot tell...

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024

That slot hasn't been used yet. That could be used, if we do at some point it should be updated so it inherits from alternative_identifiers and be enforced as a curie now that we have one. @turbomam let us know if you want to do that now under this ticket, that seems cleaner since we haven't used the slot yet or if this ticket should be closed and we just need a nmdc-server repo ticket to get this information displayed.

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

#960

@turbomam is adding emsl_project_identifiers to study_identifiers: https://microbiomedata.github.io/nmdc-schema/study_identifiers/

However, EMSL uses study, proposal, and project rather interchangeably.

Addition of this new slot, will include a todo, to add alias.

  • What are the aliases?

Montana to check with NEXUS

  • Add aliases

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Here are the authority-based mixins in src/schema/external_identifiers.yaml, and the object-type-based identifier slots that they are mixed into.

gold_identifiers

mixed into description range domain
gold_study_identifiers identifiers for corresponding project(s) in GOLD None Study
gold_biosample_identifiers identifiers for corresponding sample in GOLD None Biosample
gold_sequencing_project_identifiers identifiers for corresponding sequencing project in GOLD None OmicsProcessing
gold_analysis_project_identifiers identifiers for corresponding analysis project in GOLD None MetagenomeAnnotationActivity, MetatranscriptomeAnnotationActivity

insdc_identifiers

mixed into description range domain
insdc_sra_ena_study_identifiers identifiers for corresponding project in INSDC SRA / ENA None  
insdc_bioproject_identifiers identifiers for corresponding project in INSDC Bioproject None Study
insdc_biosample_identifiers identifiers for corresponding sample in INSDC None Biosample
insdc_secondary_sample_identifiers secondary identifiers for corresponding sample in INSDC None  
insdc_experiment_identifiers   None OmicsProcessing
insdc_analysis_identifiers   None  
insdc_assembly_identifiers   None MetagenomeAssembly, MetatranscriptomeAssembly

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Some of that hierarchy is lost in the submission-schema

jgi_portal_identifiers and neon_identifiershaven't haven't been mixed into any usable identifiers yet. They're just abstract now.

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

there isn't any emsl_identifiers mixin yet! But there are some slots with 'emsl_' in their names, defined in schema files:

  • src/schema/portal/emsl.yaml:
    • emsl_store_temp
  • src/schema/nmdc.yaml:
    • emsl_biosample_identifiers (move to src/schema/external_identifiers.yaml?)
    • emsl_project_identifier (move to src/schema/external_identifiers.yaml?)
    • emsl_proposal_doi
    • emsl_proposal_identifier (move to src/schema/external_identifiers.yaml?)

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

@mslarae13 @SamuelPurvine

Here's the thing I'm most confused about:

Class Study uses both

  • emsl_project_identifier
  • emsl_proposal_identifier

If a Study needs to mention multiple EMSL entities, these slots need better differentiating criteria.

emsl_proposal_identifier has the description 'The proposal number assigned to the EMSL awarded study that relates to that which is represented in NMDC.'

but emsl_project_identifier doesn't have any annotations besides a see_also and a comment I just added.

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Thanks. Are you saying that you only want emsl_project_identifier, which would be a sub-property of study_identifiers?

TL;DR: I don't see how we can consider emsl_project_identifier a study_identiifers and use it to link to a DOI

The range of study_identiifers is uri or curie, but I have applied a validation pattern that required the curie form.

Is this bit from you post above supposed to be a valid value for emsl_project_identifier?

emsl.project: https://doi.org/10.46936/intm.proj.2021.60141/60000423

I would expect the value to something like 'emsl.project:60141', where emsl.project resolves to 'https://bioregistry.io/reference/emsl.project:'

So the expanded URL would be https://bioregistry.io/reference/emsl.project:60141, which redirects to https://www.emsl.pnnl.gov/project/60141,

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Your example

the following will be removed from the schema

  • emsl_project_identifier
  • emsl_proposal_doi
  • emsl_proposal_identifier

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Remaining question: should emsl_project_doi be placed in the alternative_identifiers hierarchy? Perhaps in study_identifiers ? Maybe even taking the place that will be vacated by emsl_proposal_identifier ?

That sounds good to me as long as the objects of a emsl_project_doi will always be information about a study or project.

The DOI you provided, 'doi:10.46936/intm.proj.2021.60141/60000423', resolves to a page that's about a "Research Campaign". Is that the same as a study?

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024

blocked on data portal study page revamp squad.

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

emsl_project_doi & emsl_proposal_identifier were removed.

EMSL study curie : https://bioregistry.io/registry/emsl.project

Need to add EMSL study ID numbers to the. Add (un comment out) emsl_project_identifer to the schema. Study alternative_identifiers with a range of the EMSL curie.

from nmdc-schema.

mslarae13 avatar mslarae13 commented on July 1, 2024

Prefix expansion www.emsl.pnnl.gov/project/

So when the 5 digit project number is associated with the prefix (emsl.project:60141) it resolves to https://www.emsl.pnnl.gov/project/60141

full curie = emsl.project:60141

and the registered prefix is at bioregistry -- https://bioregistry.io/registry/emsl.project

slot that should take this value is emsl_project_identifier ... Mark confirm this has NOT been used yet by checking MONGO. Not been used in the data yet

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

I'm going to summarize the situation in my own words, and say what I think I should do next.

Background

Old modelling that could be reactivated

At one point, emsl_project_identifier was defined as a slot in the nmdc-schema, and was associated with the Study class but that modeling was all commented out on June 8th.

Therefore, I should reintroduce emsl_project_identifier as a slot associated with Study

Here's some old definitions that I could base it on:

Possibly redundant modeling?

It appears that emsl.project:60141 is very closely coupled with doi:10.46936/intm.proj.2021.60141/60000423 aka https://bioregistry.io/doi:10.46936/intm.proj.2021.60141/60000423 aka https://dx.doi.org/10.46936/intm.proj.2021.60141/60000423, which could be modeled as the following nmdc-schema Doi

- doi_value: doi:10.46936/intm.proj.2021.60141/60000423
- doi_provider: emsl
- doi_category: award_doi

In general, we should be avoiding alternate ways of saying the same thing in nmdc-schema modeling, but I will take it at face value that others have determined that there really is a need for Dois like I just illustrated, plus a emsl_project_identifier slot.

cc @cmungall

from nmdc-schema.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.