Comments (13)
Continuation of work discussed (at least on GH) from the May sprint at https://github.com/microbiomedata/nmdc-metadata/issues/308
from nmdc-schema.
Third level of checks handled in https://github.com/microbiomedata/nmdc-metadata/issues/362
from nmdc-schema.
Removing other assignees. @turbomam let me know if this assignment isn't you.
from nmdc-schema.
RE Is the ID prefix valid? (e.g. KEGG.KO vs KEGG.ORTHOLOG)
I see prefix definitions, especially in nmdc-schema/src/schema/annotation.yaml
MAM@MAM-M74 schema % pwd
/Users/MAM/Documents/gitrepos/nmdc-schema/src/schema
MAM@MAM-M74 schema % grep -i kegg *
annotation.yaml: - KEGG.PATHWAY
annotation.yaml: - KEGG.REACTION
annotation.yaml: - KEGG.ORTHOLOGY ## KO number
core.yaml: - KEGG.COMPOUND
Anywhere else I should be looking? @wdduncan @cmungall
from nmdc-schema.
RE Is the local part of the ID syntactically conformant? (e.g. KEGG:K\d+)
I don't see patterns for the local parts, at least not in nmdc-schema/src/schema/annotation.yaml
pathway:
aliases:
- biological process
- metabolic pathway
- signaling pathway
is_a: functional annotation term
description: >-
A pathway is a sequence of steps/reactions carried out by an organism or community of organisms
slot_usage:
has_part:
range: reaction
multivalued: true
description: >-
A pathway can be broken down to a series of reaction step
id_prefixes:
- KEGG.PATHWAY
- COG
exact_mappings:
- biolink:Pathway
from nmdc-schema.
very rough example for nmdc-schema/src/schema/annotation.yaml
from @cmungall
functional annotation term:
aliases:
- function
- functional annotation
is_a: ontology class
slot_usage:
id:
pattern: "^(KEGG.ORTHOLOG:K\\d+|EC:\\d+\\.ETC)$"
description: >-
Abstract grouping class for any term/descriptor that can be applied to a functional unit of a genome (protein, ncRNA, complex).
abstract: true
todos:
- decide if this should be used for product naming
from nmdc-schema.
Was microbiomedata/nmdc-metadata issue 360
I will be adding local part patterns to the yaml files in this repo.
@ssarrafan @wdduncan @cmungall
from nmdc-schema.
See notes from @cmungall at PR #70, especially
I suggested the parens to indicate that we need other IDs, e.g (FOO|BAR|...)
from nmdc-schema.
@wdduncan and @cmungall : I can't find patterns for COG or RetroRules at the BioRgistry, or sample usages in the MongoDB.
example working query:
> db.raw.functional_annotation_set.find({"has_function": {"$regex": "^pfam", $options: 'i'}})
{ "_id" : ObjectId("6011a09275ead576bdc24c02"), "subject" : "nmdc:Ga0482148_260452_3_287", "has_function" : "PFAM:PF00001", "was_generated_by" : "nmdc:8a43ec3baf8aafe09d96eb7fbf58c916" }
{ "_id" : ObjectId("6011a0d2666867f660864500"), "subject" : "nmdc:Ga0482235_197390_1_279", "has_function" : "PFAM:PF00001", "was_generated_by" : "nmdc:e763e255fa74e2629d7d86e10f838d4b" }
{ "_id" : ObjectId("6011a1113350938c11bd6527"), "subject" : "nmdc:Ga0482263_74753_2_277", "has_function" : "PFAM:PF00001", "was_generated_by" : "nmdc:686818cb31dc45d3d4482847ec007584" }
But neither of these return any matches:
db.raw.functional_annotation_set.find({"has_function": {"$regex": "^cog", $options: 'i'}})
db.raw.functional_annotation_set.find({"has_function": {"$regex": "^retrorules:", $options: 'i'}})
from nmdc-schema.
@ssarrafan do you have a sense of who raised this concern? Can I close it?
from nmdc-schema.
It doesn't seem like it's really specific to checks on the contents of a JOSN file by the JSON schema serialization of the schema.
from nmdc-schema.
Is the concern especially about validating KEGG-related CURIes?
from nmdc-schema.
@ssarrafan do you have a sense of who raised this concern? Can I close it?
This is from 2021 so I don't remember which meeting this came from. I would say it can probably be closed.
from nmdc-schema.
Related Issues (20)
- remove build-datafile-from-api-requests usages and *filtered-request* from project.Makefile
- remove fuseki container shutdown and direct TDB2 writing from project.Makefile
- remove mixs regernation by introspetion
- remove legacy files from `assets/`
- move all Neon mapping files to a single path
- for visualizations, `slot_usages` over object properties should indicate narrowest range
- what to exclude from visualizastion?
- what predicates should be asserted in visualizations, esp for MIxS slots
- has unit appears as a slot for Biosample HOT 4
- has_unit does not appear as a slot for QuantityValue in the online documentation ?
- Split up MetatranscriptomeActivity like metaG was HOT 2
- `napa_compliance` branch: Make `insert_many_pymongo.py` more robust HOT 1
- Deprecate MetagenomeSequencingActivity
- `nmdc.owl` is being generated with URIs like `<https://w3id.org/mixs/depth>` not `<https://w3id.org/mixs/0000018>` HOT 3
- Update Code of conduct with staff feedback
- Update `extract_study` command to search Legacy IDs
- workarround: generate RDF joining element labels to id-based IRIs
- Check referential integrity of Napa squad's MongoDB database via SPARQL in Fuseki
- Add `associated_studies` onto Biosample and make part_of not required for Biosample
- Developer documentation and `Dockerfile` contradict one another HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nmdc-schema.