Comments (12)
To elaborate on the previous comment: I temporarily restored two backups into a Mongo server and @aclum then queried those databases to get the information she was interested in. I'm going to delete those temporary restorations now.
![image](https://private-user-images.githubusercontent.com/134325062/321424912-54620e72-c0b1-4532-9f1e-84ddd0ddb4e4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3ODk2NDcsIm5iZiI6MTcxOTc4OTM0NywicGF0aCI6Ii8xMzQzMjUwNjIvMzIxNDI0OTEyLTU0NjIwZTcyLWMwYjEtNDUzMi05ZjFlLTg0ZGRkMGRkYjRlNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQyMzE1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yY2EyMzRiNGI4N2ViZDhmZTUwYzdjYTFlZjJiMjViNTIwZjVmNDRhMmFmYmRkMTEzOWZiMzMzMDA2ZTQ2OWI1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.QAs7CPZ_AIBiuJXgKt8eq0r0gR_c2qFak_5k66PboNY)
from nmdc-schema.
The plan is to discuss at the infrastructure sync today. If no schema development is needed I'll convert this to a nmdc-runtime issue.
from nmdc-schema.
I want to emphasize that the Mongo docs say that the timestamp embedded in the ObjectId indicates the creation time of the ObjectId. I think the author of that converter may be making a "logical leap" by assuming it also indicates the creation time of the [rest of the] document, itself. That's something I haven't tested yet—I just want to reiterate that distinction here.
![image](https://private-user-images.githubusercontent.com/134325062/322149037-0914c3bd-0ea0-4712-8c0f-a7d9d7bb330e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3ODk2NDcsIm5iZiI6MTcxOTc4OTM0NywicGF0aCI6Ii8xMzQzMjUwNjIvMzIyMTQ5MDM3LTA5MTRjM2JkLTBlYTAtNDcxMi04YzBmLWE3ZDlkN2JiMzMwZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQyMzE1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wNjQxMGEyMTI5OWE5MmM4ODkyN2FhNGM2N2U4Yzk0YzUzYjMwMzQwMDAwMjFmNjRjODNiZmFhNWJiZTUyMzhlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.equGpRhYPRkcl7PZyYYoDAF0rAc5gscAiZkCdMD_2C0)
from nmdc-schema.
@dwinston and I can update the runtime to support analytics queries against when data was created/updated in mongo. There's a couple ways to go about this, one of which would not require updates in nmdc-schema
.
@aclum could you:
- give a sense for the urgency of the request to help us decide on the best approach?
- If we decide to move forward with a runtime approach that does not require a new slot to be added to the schema, could you close this issue and open one in the runtime repo?
Approach Options
1. New created_at
attribute/slot
One approach is to add created_at
and updated_at
fields for individual collections (eg Biosample, Study, etc). However this would introduce an issue with data validation if an equivalent slot is not added in the nmdc schema.
2. New ledger
collection
A second approach is to create an append-only ledger of datomic entries. This would allow for a broader range of search queries and preserve update histories. However this would add additional complexity to querying and maintenance.
from nmdc-schema.
I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.
from nmdc-schema.
I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.
Great! That should give @dwinston and me enough time to implement the more robust append-only ledger solution, pending any larger decisions from the 4/25 database discussion
from nmdc-schema.
Good discussion. Where does this stand as a schema request?
If it is a schema request, how would the slot we're talking about relate to the add_date
slot?
from nmdc-schema.
RE add_date, this current pulls from GOLD so is the GOLD added date so it would be good to clarify that at some point.
from nmdc-schema.
If it is a schema request, how would the slot we're talking about relate to the
add_date
slot?
Here's a link to the documentation for the add_date
slot: https://microbiomedata.github.io/nmdc-schema/add_date/.
Here are ways that I think I'd want the slot's specification to change if it were going to be used in the way people are talking about here:
- Rename slot to not imply its value is a date alone (as opposed to a date-and-time)
- I like the name
created_at
(add_date
, to me, sounds like a function name)
- I like the name
- Restrict slot to only accept strings that conform to some date-and-time standard (as opposed to any string)
from nmdc-schema.
Also consider pulling from Mongo ObjectID which encodes the timestamp.
from nmdc-schema.
I just learned about that option within the past couple days (never knew that)! There is one caveat that I think exists with that option: based on what I read, the timestamp encoded in the ObjectId indicates when the ObjectId was created, not when "the [rest of the] document" was created. So, if we were to restore from a backup and not use the --preserveUUID
flag when doing so, I think the ObjectIds would all describe the restoration time, not the original creation time. Note: I haven't confirmed that suspicion through testing yet—it's just something that came to mind when I was reading about the fact that the ObjectId contains a timestamp.
![image](https://private-user-images.githubusercontent.com/134325062/321806523-c975a74e-ebb8-498f-914e-b05b923ecf39.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3ODk2NDcsIm5iZiI6MTcxOTc4OTM0NywicGF0aCI6Ii8xMzQzMjUwNjIvMzIxODA2NTIzLWM5NzVhNzRlLWViYjgtNDk4Zi05MTRlLWIwNWI5MjNlY2YzOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYzMFQyMzE1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iZGM4YmExZjYyODczODQwNmE4YTk5NDRhYjZhNDU3N2M4NTYzOTQ0Mzk0YjA2MjI2OWZiMTc4YmJmOWUyMGZlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.zqd-pHRs0jgbhLY7oRuCyouiq98Iw-KmpkJL0RWDhlM)
from nmdc-schema.
ie https://steveridout.com/mongo-object-time/
Closing for now will use
db.comments.find({_id: {$gt: ObjectId("5272e0f00000000000000000")}})
, where 5272e0f00000000000000000 is the target date, syntax for now.
from nmdc-schema.
Related Issues (20)
- Remodel class that aggregates steps of `WorkflowExecution` for easier schema traversal HOT 1
- `has_input`, `has_output`, and `has_process_parts` slots on `ProtocolExecution` need pattern constraints
- `berkeley-schema-fy24`: Facilitate access to `nmdc_materialized_patterns` schema via PyPI package HOT 2
- Delete Class WorkflowChain in Berkeley schema HOT 2
- `berkeley-schema-fy24`: Implement super migrator that runs all partial migrators in correct order HOT 1
- Find a home for these comments taken from `src/scripts/report_biosamples_per_study.py`
- check if classes associated with `alternative_identifiers` can use ANY `alternative_identifiers` HOT 5
- Migrations: Make it easier to test migrators against a Mongo database
- `berkeley-schema-fy24`: Update migrators to account for `WorkflowChain` class being removed
- Publish schema to PyPI via GitHub Release `v10.5.4` HOT 1
- Migrations: Implement "no op" migrator from `v10.4.0` to `v10.5.4`
- `berkeley-schema-fy24`: Some migrators use incorrect collection name (instead of `mags_set`)
- Rename branches to eliminate Berkeley commits from `main` HOT 6
- 2024-06-18 `id` pattern validation summaries and SPARQL-based referential integrity checks on MongoDB contents with and without migration HOT 6
- Facilitate access to `nmdc_materialized_patterns` schema variant via PyPI package HOT 2
- produce a nmdc-schema YAML artifact with deprecated elements included HOT 1
- Remove WorkflowExeuctionActivity as a range for Database slot activity_set
- tighter pattern constraint on was_generated_by
- Migrator: Update `migrator_from_10_3_0_to_10_4_0.py` so it also updates `was_generated_by` values HOT 1
- Docker Compose shows warning saying `version` (specifier) is obsolete
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nmdc-schema.