GithubHelp home page GithubHelp logo

Comments (11)

KarlLevik avatar KarlLevik commented on August 23, 2024

There are a lot of foreign keys (implicitly) proposed here. My first thought is that it might not be the optimal schema design.

Also, we have a DataCollectionComment table which could be extended to capture this information, although it was intended for user comments.

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Hi @KarlLevik ,

The idea of this table is to centralize all notifications. For example, it allows to check all the errors produced by data analysis software on a session, sample or data collection.
It is similar to what Stu proposes here but by using the same table for all notifications.
#10

Let's say it is a shortcut to get an overview about what is happening, the idea is not to retrieve information quickly but exhaustively.

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

Would be interested to know exactly what type of notifications you add at the moment? Do you have some examples?

I agree with Karl, there are many foreign keys here that are implicit of other foreign keys.
sessionid, sampleid, datacollectiongroupid are all implicit of datacollectionid, energyscanId, ... and should therefore be removed. You do not need workflowid and wokflowstepid, the step must implicate a workflow.

Robotaction already reports errors (that is one of its primary uses), maybe doesnt belong in here
AutoProcStatus already tells us about integration steps and their outcomes (if we populated them, DLS do not atm, maybe ESRF do?)
MXMRRun has a status column and an associated comment (dimple).
PhasingProgramRun has status and comments columns
ScreeningOutput has status flags and associated comments.

We should avoid duplicating existing reporting.
These are just statuses and messages, maybe you are making more detailed reporting than that of course (hence asking for examples).

We have lots of error reporting distributed about and writing the right query could bring these together.

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

A central table sounds sensible, though it is still a complex query to retrieve what this table has to say.

You should probably remove sessionid, blsampleid, datacollectiongroupid, and workflowid. These are implicit and by having them in this table you are duplicating data (there should be only one instance of this information in the db). Instead you should join to BlSession, BLSample, ... . Im not actually sure dataCollectionId should be in here. What are you going to report regards the datacollection that is not mopped up by the autoproc tables? I think maybe we need to have a chat about this?

Ahh and the obvious missing column in my opinion is a timestamp, id like to know when these happened.

So to summarise, if we are to merge into #10 and be complete:

notificationId int pk (auto-incremented)
level enum ('info', 'warning', 'error')
origin varchar(100), (for example "Dimple") <-- not sure this is needed, it should be implied by the foreign key used)
summary varchar(255)
message text
bltimestamp timestamp
energyScanId fk
xfeFluorescenceSpectrumId fk
autoProcIntegrationId or autoProcProgamId fk (not sure which is most sensible)
phasingStepId fk
screeningId fk
mxmrrunid fk

robotActionId fk <-- stil not sure about this
dataCollectionId fk (maybe remove)
workflowStepId fk (is this not covered by the auto proc tables too? I can only see a workflow link from datacollectiongroup)

I'm not sure recording things like beamdumps in here is sensible, because it means for every on going session / sample you will duplicate the same information (i.e. you have 5 beamlines in operation, you have 5 entries in this table with the same information, 5 beamlines may be 25 in due course).

from ispyb-database-modeling.

KarlLevik avatar KarlLevik commented on August 23, 2024

Let's say it is a shortcut to get an overview about what is happening, the idea is not to retrieve information quickly but exhaustively.

Maybe I still don't get it, but this reminds me of something we've talked about previously for Synchweb: The idea of a "wall" or "activity stream" with everything of relevance to the user, a bit like Facebook and other social media platforms. This allows users to see what is happening with their data collections and processing in "real-time". We wanted to have a special table for this, an "events" table, which sounds a lot like your notifications table. The application would then poll this table at short intervals.

We could never quite agree on this table at Diamond, so at the moment we solve this in Synchweb through complex queries and joins. The data collection list on a session is our "activity stream" page.

The idea of duplicating information for performance reasons was just too alien to some in my group, though personally I don't see any problem with it as long as we're clear on what is the authoritative data and what is just duplicated for performance reasons.

However, consider this:

  • The notifications / events data can potentially grow quite large (e.g. if every new row in ImageQualityIndicators is considered an event ...).
  • High write frequency on a table could lead to contention due to locking, especially if there are a lot of foreign keys with implicit indexes that will be updated with every insert.
  • The notifications / events data really isn't something we need to store forever, i.e. it's temporary / ephemeral data.

For these reasons I've been thinking more recently that maybe a better solution could be found outside of the relational database realm. I haven't put a lot of thought into this yet, but perhaps something like a messaging system could be a better solution? Maybe WAMP and websockets?

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Hi @KarlLevik

Back to your considerations:

The notifications / events data can potentially grow quite large (e.g. if every new row in ImageQualityIndicators is considered an event ...).

It really depends on the facility and how you fill that table. We don't expect this to be very large. We don't want to store every event but events that we want to notify to the user

High write frequency on a table could lead to contention due to locking, especially if there are a lot of foreign keys with implicit indexes that will be updated with every insert.

As we are using this inside of a more complex transaction that will most likely block many tables. I don't see this as a showstopper. But if we have got problems in the future it is true that should be reconsidered.

The notifications / events data really isn't something we need to store forever, i.e. it's temporary / ephemeral data.

Again, it is like point one in my opinion, we are discussing here the data model, a facility might want to keep the data forever or to remove every few days.

I am not against about the use of a messaging system or WAMP however I think we can start with something simpler for the moment.

So, what do you think?

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

I have been thinking about this, and i think we are looking at this totally the wrong way round. The output we are talking about here always comes from a process. Therefore we should be using a table linked to AutoProcProgram. I had not dug in depth into the Phasing tables, but PhasingProgram is a 1:1 copy of AutoProcProgram, why doesnt phasing using this instead? We should add a foreign key references to all tables that relate to a process back to AutoProcProgram, and then create an AutoProcProgramMessage table something like:

AutoProcProgramMessage

autoprocprogrammessageid fk pk ai
autoprocprogramid int fk
timestamp timestamp
severity int (0-2)
message varchar(200)
description text

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Original idea was that it does not necessarily comes from a process. It might come from the beamline control module, user portal, users, local contact, etc...

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

Ahh that is beyond @olofsvensson's original scope.

What sort of things do you want to record, and against what? That last list looks like things that would be reported against a session?

There is datacollectioncomments, where local contacts / users can write extended comments against a datacollection.

I'm not sure what you'd expect from your user portal? I think UAS is unlikely to put any notifications into ispyb.

Im not trying to be obstructive, just trying to understand exactly what you are trying to do. Still to discuss im afraid. I think the answer to @olofsvensson's original question is as per above, as for other notifications i think i need some examples to understand?

from ispyb-database-modeling.

delageniere avatar delageniere commented on August 23, 2024

As the notifications are stored now in another log book for the beamlines, this issue can be closed.

from ispyb-database-modeling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.