Comments (23)
We should also rename emsl_proposal_identifier
to emsl_proposal_identifiers
because all alternative identifiers take a list of values
from nmdc-schema.
let me preface my comment by saying I wish I had the power to "make it so". Meaning I'm not certain I am a/the decider. Lee Ann showed preference for keeping emsl_project_identifier as the slot, and setting emsl_proposal_identifier as an alias if that's possible. The description would change to something like, "The project number assigned to the EMSL awarded study proposal that relates to that which is represented in NMDC" and drop any description for emsl_proposal_identifier. As to renaming it to the plural sense, that's dealer's choice :)
Part of the confusion is there is not yet consensus on EMSL's part as to which verbiage is the correct one to use. Proposal is what we've used for years, as a written proposal is the thing that is actually sent by an EMSL User so may as well just call it that. However, PNNL culture speaks in terms of projects when discussing a body of work, irrespective of whether it was derived from a proposal we sent out, or was sent to us, or was broken out internally with no particular proposal associated. I dimly recall semi-pitched battle when I started here about whether we should/could refer to EMSL work as a proposal or a project, with the decision made to call these things "proposal" to reduce confusion with the higher level entities known as projects (in PNNL parlance, the entire EMSL endeavor is a project). Sorry for the pedantry, it's increasingly becoming part of by nature!
from nmdc-schema.
i just had a brief discussion with the EMSL NEXUS team about this.
Project vs proposal is funded vs not funded. That said, Sam's point is still true. The terms are used loosely.
@kauberry made a good point that a proposal vs a project doesn't matter much as only the awarded projects get DOIs. So they'd prefer us link the DOIs.
https://www.osti.gov/award-doi-service/biblio/10.46936/intm.proj.2021.60141/60000423
So The above link shows the award DOI
Award DOI:
https://doi.org/10.46936/intm.proj.2021.60141/60000423
FYI @plithnar @turbomam @lamccue
from nmdc-schema.
IMO,
emsl_proposal_doi
emsl_proposal_identifier (move to src/schema/external_identifiers.yaml?)
Can be depricated and we should just have emsl_project_identifier where we link the DOIs and have alias for proposal and project_doi and award_doi
emsl_project_identifer needs to be under alternative identifiers > external database identifiers
https://microbiomedata.github.io/nmdc-schema/study_identifiers/
With the DOI, you can get the EMSL project (proposal) # and that will also provide the information you'd need to find it on DMS in EMSL (@SamuelPurvine )
Once the slots are there, and correct, it should be easy to add the DOI via change sheet. So, for 1000 soils (linked in the above comment) it'll be
slot emsl_project_identifier
emsl.project: https://doi.org/10.46936/intm.proj.2021.60141/60000423
I will work on getting all the identifiers we need together.
from nmdc-schema.
@turbomam Ive made example data https://github.com/microbiomedata/nmdc-schema/blob/issue-927-emsl-doi/src/data/valid/Study-emsl-doi.yaml
currently modeled as a single value but pending input @mslarae13 we may want this to be a list
from nmdc-schema.
@mslarae13 confirmed that slots should allow a list and requested a change to the slot name. I've updated the example valid data and created some example invalid data that uses the old slot name
to summarize the slot should have the following:
- name emsl_project_doi
- use a prefix of doi
- use identifers.org or bioregistry.io as where the prefix is registered
- accept a list of dois
from nmdc-schema.
From @lamccue
Check with EMSL leadership (George and Ken) about the bioregistry curie for EMSL. The emsl.project prefix that was registered is no problem from their viewpoint. We agreed that if NMDC identifies a desire/need to have more prefixes (for samples or other things), then the process for us will be to talk to George and Ken about it first - what the prefix is and how it will be used, etc. They can decide if it is something that they also want to incorporate into the NEXUS schema, and then we decide how to proceed and who does the registering.
from nmdc-schema.
With that, we'd like to get EMSL projects linked in the data portal on the study landing pages at the "External Resources" -> "Additional data".
@aclum or @turbomam or @naglepuff or @dwinston Can someone tell what the class-slot is I need to put these links under? I attempted to use the API but was unsuccessful in finding the slot.
I assume this will need to be added to mongo via change sheet?
from nmdc-schema.
@naglepuff when we do get these added, they should appear here
Unless someone has another opinion? (I will check with Lee Ann)
from nmdc-schema.
@turbomam this emsl.identifiers curie should probably have it's own slot like GOLD,
https://microbiomedata.github.io/nmdc-schema/study_identifiers/
I believe this is where these links comes from?
Or is this it: https://microbiomedata.github.io/nmdc-schema/emsl_project_identifier/
? I cannot tell...
from nmdc-schema.
That slot hasn't been used yet. That could be used, if we do at some point it should be updated so it inherits from alternative_identifiers and be enforced as a curie now that we have one. @turbomam let us know if you want to do that now under this ticket, that seems cleaner since we haven't used the slot yet or if this ticket should be closed and we just need a nmdc-server repo ticket to get this information displayed.
from nmdc-schema.
@turbomam is adding emsl_project_identifiers to study_identifiers: https://microbiomedata.github.io/nmdc-schema/study_identifiers/
However, EMSL uses study, proposal, and project rather interchangeably.
Addition of this new slot, will include a todo, to add alias.
- What are the aliases?
Montana to check with NEXUS
- Add aliases
from nmdc-schema.
Here are the authority-based mixins in src/schema/external_identifiers.yaml, and the object-type-based identifier slots that they are mixed into.
gnps_identifiers
is only mixed intostudy_identifiers
to getgnps_task_identifiers
- alternative_identifiers
- external_database_identifiers
- study_identifiers
- gnps_task_identifiers [ gnps_identifiers]
- study_identifiers
- external_database_identifiers
- alternative_identifiers
gold_identifiers
- mixed into many identifiers. see below.
insdc_identifiers
- mixed into many identifiers. see below.
jgi_portal_identifiers
- not mixed into anything yet
massive_identifiers
is only mixed intostudy_identifiers
to getmassive_study_identifiers
- alternative_identifiers
- external_database_identifiers
- study_identifiers
- massive_study_identifiers [ massive_identifiers]
- study_identifiers
- external_database_identifiers
- alternative_identifiers
mgnify_identifiers
is only mixed intoanalysis_identifiers
to getmgnify_analysis_identifiers
- alternative_identifiers
- external_database_identifiers
- analysis_identifiers
- mgnify_analysis_identifiers [ mgnify_identifiers]
- analysis_identifiers
- external_database_identifiers
- alternative_identifiers
neon_identifiers
- not mixed into anything yet
gold_identifiers
mixed into | description | range | domain |
---|---|---|---|
gold_study_identifiers | identifiers for corresponding project(s) in GOLD | None | Study |
gold_biosample_identifiers | identifiers for corresponding sample in GOLD | None | Biosample |
gold_sequencing_project_identifiers | identifiers for corresponding sequencing project in GOLD | None | OmicsProcessing |
gold_analysis_project_identifiers | identifiers for corresponding analysis project in GOLD | None | MetagenomeAnnotationActivity, MetatranscriptomeAnnotationActivity |
insdc_identifiers
mixed into | description | range | domain |
---|---|---|---|
insdc_sra_ena_study_identifiers | identifiers for corresponding project in INSDC SRA / ENA | None | |
insdc_bioproject_identifiers | identifiers for corresponding project in INSDC Bioproject | None | Study |
insdc_biosample_identifiers | identifiers for corresponding sample in INSDC | None | Biosample |
insdc_secondary_sample_identifiers | secondary identifiers for corresponding sample in INSDC | None | |
insdc_experiment_identifiers | None | OmicsProcessing | |
insdc_analysis_identifiers | None | ||
insdc_assembly_identifiers | None | MetagenomeAssembly, MetatranscriptomeAssembly |
from nmdc-schema.
Some of that hierarchy is lost in the submission-schema
jgi_portal_identifiers
and neon_identifiers
haven't haven't been mixed into any usable identifiers yet. They're just abstract now.
from nmdc-schema.
there isn't any emsl_identifiers
mixin yet! But there are some slots with 'emsl_' in their names, defined in schema files:
- src/schema/portal/emsl.yaml:
emsl_store_temp
- src/schema/nmdc.yaml:
emsl_biosample_identifiers
(move to src/schema/external_identifiers.yaml?)emsl_project_identifier
(move to src/schema/external_identifiers.yaml?)emsl_proposal_doi
emsl_proposal_identifier
(move to src/schema/external_identifiers.yaml?)
from nmdc-schema.
Here's the thing I'm most confused about:
Class Study
uses both
- emsl_project_identifier
- emsl_proposal_identifier
If a Study
needs to mention multiple EMSL entities, these slots need better differentiating criteria.
emsl_proposal_identifier
has the description
'The proposal number assigned to the EMSL awarded study that relates to that which is represented in NMDC.'
but emsl_project_identifier
doesn't have any annotations besides a see_also
and a comment I just added.
from nmdc-schema.
Thanks. Are you saying that you only want emsl_project_identifier
, which would be a sub-property of study_identifiers
?
TL;DR: I don't see how we can consider emsl_project_identifier
a study_identiifers
and use it to link to a DOI
The range of study_identiifers
is uri or curie, but I have applied a validation pattern that required the curie form.
Is this bit from you post above supposed to be a valid value for emsl_project_identifier
?
emsl.project: https://doi.org/10.46936/intm.proj.2021.60141/60000423
I would expect the value to something like 'emsl.project:60141', where emsl.project
resolves to 'https://bioregistry.io/reference/emsl.project:'
So the expanded URL would be https://bioregistry.io/reference/emsl.project:60141, which redirects to https://www.emsl.pnnl.gov/project/60141,
from nmdc-schema.
Your example
- uses a new
emsl_project_doi
slot to associate study 'nmdc:sty-11-ab' with a resource with the curie doi:10.46936/intm.proj.2021.60141/60000423 - the schema will expand prefix 'doi' to a base url of 'https://bioregistry.io/doi:', so 'doi:10.46936/intm.proj.2021.60141/60000423' will expand to 'https://bioregistry.io/doi:10.46936/intm.proj.2021.60141/60000423'
- bioregistry and/or OSTI expand that to https://www.osti.gov/award-doi-service/biblio/10.46936/intm.proj.2021.60141/60000423
the following will be removed from the schema
emsl_project_identifier
emsl_proposal_doi
emsl_proposal_identifier
from nmdc-schema.
Remaining question: should emsl_project_doi
be placed in the alternative_identifiers hierarchy? Perhaps in study_identifiers ? Maybe even taking the place that will be vacated by emsl_proposal_identifier ?
That sounds good to me as long as the objects of a emsl_project_doi
will always be information about a study or project.
The DOI you provided, 'doi:10.46936/intm.proj.2021.60141/60000423', resolves to a page that's about a "Research Campaign". Is that the same as a study?
from nmdc-schema.
blocked on data portal study page revamp squad.
from nmdc-schema.
emsl_project_doi & emsl_proposal_identifier were removed.
EMSL study curie : https://bioregistry.io/registry/emsl.project
Need to add EMSL study ID numbers to the. Add (un comment out) emsl_project_identifer to the schema. Study alternative_identifiers with a range of the EMSL curie.
from nmdc-schema.
Prefix expansion www.emsl.pnnl.gov/project/
So when the 5 digit project number is associated with the prefix (emsl.project:60141) it resolves to https://www.emsl.pnnl.gov/project/60141
full curie = emsl.project:60141
and the registered prefix is at bioregistry -- https://bioregistry.io/registry/emsl.project
slot that should take this value is emsl_project_identifier ... Mark confirm this has NOT been used yet by checking MONGO. Not been used in the data yet
from nmdc-schema.
I'm going to summarize the situation in my own words, and say what I think I should do next.
Background
- the prefix
emsl.project
is associated with the expansionhttps://bioregistry.io/emsl.project:
in nmdc.yaml emsl.project
is registered as a prefix at https://bioregistry.io/60141
is a valid EMSL Project ID. Abstractly, we would call that a valid local identifier.- A CURIe (or compact URI) is a prefix + a local identifier, so at least NMDC and Bioregistry are saying that
emsl.project:60141
is a reasonable CURIe, which should be expanded to https://bioregistry.io/emsl.project:60141 - Bioregistry will redirect URLs like https://bioregistry.io/emsl.project:60141 to https://www.emsl.pnnl.gov/project/60141
- this redirection increases complexity and overhead, and it requires coordination between NMDC, Bioregistry and ideally authorized people from EMSL.
- the benefit is FAIRness, especially Findability, Interoperability, and Reuse. Hopefully other people who need to abbreviate EMSL Project IDs and page URLs will follow our example and use the
emsl.project
prefix and Bioregistry's redirection - if EMSL ever decides to serve its Project ID pages from a different address, only the Bioregistry's redirection needs to be changed, as opposed to lots of projects needing to update their own direct, hard-coded prefix expansion.
Old modelling that could be reactivated
At one point, emsl_project_identifier
was defined as a slot in the nmdc-schema, and was associated with the Study
class but that modeling was all commented out on June 8th.
Therefore, I should reintroduce emsl_project_identifier
as a slot associated with Study
Here's some old definitions that I could base it on:
nmdc-schema/src/schema/external_identifiers.yaml
Lines 378 to 384 in 387ed3b
nmdc-schema/src/schema/external_identifiers.yaml
Lines 229 to 234 in 387ed3b
Possibly redundant modeling?
It appears that emsl.project:60141
is very closely coupled with doi:10.46936/intm.proj.2021.60141/60000423
aka https://bioregistry.io/doi:10.46936/intm.proj.2021.60141/60000423 aka https://dx.doi.org/10.46936/intm.proj.2021.60141/60000423, which could be modeled as the following nmdc-schema Doi
- doi_value: doi:10.46936/intm.proj.2021.60141/60000423
- doi_provider: emsl
- doi_category: award_doi
In general, we should be avoiding alternate ways of saying the same thing in nmdc-schema modeling, but I will take it at face value that others have determined that there really is a need for Doi
s like I just illustrated, plus a emsl_project_identifier
slot.
cc @cmungall
from nmdc-schema.
Related Issues (20)
- Remodel class that aggregates steps of `WorkflowExecution` for easier schema traversal HOT 1
- `has_input`, `has_output`, and `has_process_parts` slots on `ProtocolExecution` need pattern constraints
- `berkeley-schema-fy24`: Facilitate access to `nmdc_materialized_patterns` schema via PyPI package HOT 2
- Delete Class WorkflowChain in Berkeley schema HOT 2
- `berkeley-schema-fy24`: Implement super migrator that runs all partial migrators in correct order HOT 1
- Find a home for these comments taken from `src/scripts/report_biosamples_per_study.py`
- check if classes associated with `alternative_identifiers` can use ANY `alternative_identifiers` HOT 5
- Migrations: Make it easier to test migrators against a Mongo database
- `berkeley-schema-fy24`: Update migrators to account for `WorkflowChain` class being removed
- Publish schema to PyPI via GitHub Release `v10.5.4` HOT 1
- Migrations: Implement "no op" migrator from `v10.4.0` to `v10.5.4`
- `berkeley-schema-fy24`: Some migrators use incorrect collection name (instead of `mags_set`)
- Rename branches to eliminate Berkeley commits from `main` HOT 6
- 2024-06-18 `id` pattern validation summaries and SPARQL-based referential integrity checks on MongoDB contents with and without migration HOT 6
- Facilitate access to `nmdc_materialized_patterns` schema variant via PyPI package HOT 2
- produce a nmdc-schema YAML artifact with deprecated elements included HOT 1
- Remove WorkflowExeuctionActivity as a range for Database slot activity_set
- tighter pattern constraint on was_generated_by
- Migrator: Update `migrator_from_10_3_0_to_10_4_0.py` so it also updates `was_generated_by` values HOT 1
- Docker Compose shows warning saying `version` (specifier) is obsolete
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nmdc-schema.