For some classes, like workflow activity records we have information about when data w

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Also consider pulling from Mongo <a href="https://www.mongodb.com/docs/manual/referenc

support slot for when document was added about nmdc-schema HOT 12 CLOSED

aclum commented on July 1, 2024

support slot for when document was added

from nmdc-schema.

Comments (12)

eecavanna commented on July 1, 2024 1

To elaborate on the previous comment: I temporarily restored two backups into a Mongo server and @aclum then queried those databases to get the information she was interested in. I'm going to delete those temporary restorations now.

from nmdc-schema.

aclum commented on July 1, 2024 1

The plan is to discuss at the infrastructure sync today. If no schema development is needed I'll convert this to a nmdc-runtime issue.

from nmdc-schema.

eecavanna commented on July 1, 2024 1

I want to emphasize that the Mongo docs say that the timestamp embedded in the ObjectId indicates the creation time of the ObjectId. I think the author of that converter may be making a "logical leap" by assuming it also indicates the creation time of the [rest of the] document, itself. That's something I haven't tested yet—I just want to reiterate that distinction here.

from nmdc-schema.

PeopleMakeCulture commented on July 1, 2024

@dwinston and I can update the runtime to support analytics queries against when data was created/updated in mongo. There's a couple ways to go about this, one of which would not require updates in nmdc-schema.

@aclum could you:

give a sense for the urgency of the request to help us decide on the best approach?
If we decide to move forward with a runtime approach that does not require a new slot to be added to the schema, could you close this issue and open one in the runtime repo?

Approach Options

1. New `created_at` attribute/slot

One approach is to add created_at and updated_at fields for individual collections (eg Biosample, Study, etc). However this would introduce an issue with data validation if an equivalent slot is not added in the nmdc schema.

2. New `ledger` collection

A second approach is to create an append-only ledger of datomic entries. This would allow for a broader range of search queries and preserve update histories. However this would add additional complexity to querying and maintenance.

from nmdc-schema.

aclum commented on July 1, 2024

I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.

from nmdc-schema.

PeopleMakeCulture commented on July 1, 2024

I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.

Great! That should give @dwinston and me enough time to implement the more robust append-only ledger solution, pending any larger decisions from the 4/25 database discussion

from nmdc-schema.

turbomam commented on July 1, 2024

Good discussion. Where does this stand as a schema request?

If it is a schema request, how would the slot we're talking about relate to the add_date slot?

from nmdc-schema.

aclum commented on July 1, 2024

RE add_date, this current pulls from GOLD so is the GOLD added date so it would be good to clarify that at some point.

from nmdc-schema.

eecavanna commented on July 1, 2024

If it is a schema request, how would the slot we're talking about relate to the add_date slot?

Here's a link to the documentation for the add_date slot: https://microbiomedata.github.io/nmdc-schema/add_date/.

Here are ways that I think I'd want the slot's specification to change if it were going to be used in the way people are talking about here:

Rename slot to not imply its value is a date alone (as opposed to a date-and-time)
- I like the name created_at (add_date, to me, sounds like a function name)
Restrict slot to only accept strings that conform to some date-and-time standard (as opposed to any string)

from nmdc-schema.

shreddd commented on July 1, 2024

Also consider pulling from Mongo ObjectID which encodes the timestamp.

from nmdc-schema.

eecavanna commented on July 1, 2024

I just learned about that option within the past couple days (never knew that)! There is one caveat that I think exists with that option: based on what I read, the timestamp encoded in the ObjectId indicates when the ObjectId was created, not when "the [rest of the] document" was created. So, if we were to restore from a backup and not use the --preserveUUID flag when doing so, I think the ObjectIds would all describe the restoration time, not the original creation time. Note: I haven't confirmed that suspicion through testing yet—it's just something that came to mind when I was reading about the fact that the ObjectId contains a timestamp.

from nmdc-schema.

aclum commented on July 1, 2024

ie https://steveridout.com/mongo-object-time/
Closing for now will use
db.comments.find({_id: {$gt: ObjectId("5272e0f00000000000000000")}}), where 5272e0f00000000000000000 is the target date, syntax for now.

from nmdc-schema.

support slot for when document was added about nmdc-schema HOT 12 CLOSED

Comments (12)

Approach Options

1. New `created_at` attribute/slot

2. New `ledger` collection

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

Comments (12)

Approach Options

1. New created_at attribute/slot

2. New ledger collection

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. New `created_at` attribute/slot

2. New `ledger` collection