GithubHelp home page GithubHelp logo

Comments (12)

eecavanna avatar eecavanna commented on July 1, 2024 1

To elaborate on the previous comment: I temporarily restored two backups into a Mongo server and @aclum then queried those databases to get the information she was interested in. I'm going to delete those temporary restorations now.

image

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024 1

The plan is to discuss at the infrastructure sync today. If no schema development is needed I'll convert this to a nmdc-runtime issue.

from nmdc-schema.

eecavanna avatar eecavanna commented on July 1, 2024 1

I want to emphasize that the Mongo docs say that the timestamp embedded in the ObjectId indicates the creation time of the ObjectId. I think the author of that converter may be making a "logical leap" by assuming it also indicates the creation time of the [rest of the] document, itself. That's something I haven't tested yet—I just want to reiterate that distinction here.

image

from nmdc-schema.

PeopleMakeCulture avatar PeopleMakeCulture commented on July 1, 2024

@dwinston and I can update the runtime to support analytics queries against when data was created/updated in mongo. There's a couple ways to go about this, one of which would not require updates in nmdc-schema.

@aclum could you:

  1. give a sense for the urgency of the request to help us decide on the best approach?
  2. If we decide to move forward with a runtime approach that does not require a new slot to be added to the schema, could you close this issue and open one in the runtime repo?

Approach Options

1. New created_at attribute/slot

One approach is to add created_at and updated_at fields for individual collections (eg Biosample, Study, etc). However this would introduce an issue with data validation if an equivalent slot is not added in the nmdc schema.

2. New ledger collection

A second approach is to create an append-only ledger of datomic entries. This would allow for a broader range of search queries and preserve update histories. However this would add additional complexity to querying and maintenance.

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024

I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.

from nmdc-schema.

PeopleMakeCulture avatar PeopleMakeCulture commented on July 1, 2024

I was able to get what I needed for the quarterly report from Eric restoring some of the backups so this isn't urgent but I would like to see this addressed this quarter.

Great! That should give @dwinston and me enough time to implement the more robust append-only ledger solution, pending any larger decisions from the 4/25 database discussion

from nmdc-schema.

turbomam avatar turbomam commented on July 1, 2024

Good discussion. Where does this stand as a schema request?

If it is a schema request, how would the slot we're talking about relate to the add_date slot?

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024

RE add_date, this current pulls from GOLD so is the GOLD added date so it would be good to clarify that at some point.

from nmdc-schema.

eecavanna avatar eecavanna commented on July 1, 2024

If it is a schema request, how would the slot we're talking about relate to the add_date slot?

Here's a link to the documentation for the add_date slot: https://microbiomedata.github.io/nmdc-schema/add_date/.

Here are ways that I think I'd want the slot's specification to change if it were going to be used in the way people are talking about here:

  • Rename slot to not imply its value is a date alone (as opposed to a date-and-time)
    • I like the name created_at (add_date, to me, sounds like a function name)
  • Restrict slot to only accept strings that conform to some date-and-time standard (as opposed to any string)

from nmdc-schema.

shreddd avatar shreddd commented on July 1, 2024

Also consider pulling from Mongo ObjectID which encodes the timestamp.

from nmdc-schema.

eecavanna avatar eecavanna commented on July 1, 2024

I just learned about that option within the past couple days (never knew that)! There is one caveat that I think exists with that option: based on what I read, the timestamp encoded in the ObjectId indicates when the ObjectId was created, not when "the [rest of the] document" was created. So, if we were to restore from a backup and not use the --preserveUUID flag when doing so, I think the ObjectIds would all describe the restoration time, not the original creation time. Note: I haven't confirmed that suspicion through testing yet—it's just something that came to mind when I was reading about the fact that the ObjectId contains a timestamp.

image

from nmdc-schema.

aclum avatar aclum commented on July 1, 2024

ie https://steveridout.com/mongo-object-time/
Closing for now will use
db.comments.find({_id: {$gt: ObjectId("5272e0f00000000000000000")}}), where 5272e0f00000000000000000 is the target date, syntax for now.

from nmdc-schema.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.