GithubHelp home page GithubHelp logo

EM Data Model about ispyb-database-modeling HOT 11 OPEN

ispyb avatar ispyb commented on August 23, 2024
EM Data Model

from ispyb-database-modeling.

Comments (11)

stufisher avatar stufisher commented on August 23, 2024

@antolinos i got some clarification on the per movie nominaldefocus we were discussing. There is a value recorded with each movie but it is a total guess. It is determined properly by the CTF correction. So debatable as to whether we should store it. (it is apparently captured in the xml file)

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Hi @stufisher,

Thanks. We are starting with Scipion and the ISPyB monitors and gathering all metadata that it will pushed into ISPyB later on.
My feeling today is that some parameters will need to be stored per movie.
As soon as we got a clear and clean data flow we will share it with you.

from ispyb-database-modeling.

olofsvensson avatar olofsvensson commented on August 23, 2024

Hi @stufisher and @antolinos,

I have been exterminating the files produced by our CryoEM, and after discussion with Isai here's a suggested list of meta-data we would like to start to upload to ISPyB after each movie acquisition (i.e. before motion correction):

Common meta-data to all movies:

  • Path to the directory of the movies
  • Paths to the GridSquare JPG, MRC and XML files
  • NumberOffractions (i.e. number of frames per movie)
  • Pixelsize
  • Counting or Super resolution mode

Individual movie meta-data:

  • Filename of the movie file
  • Identifier of the foilhole
  • Date and time of acquisition
  • Sequential index of movie
  • Paths to movie meta-data JPG, MRC and XML files
  • Dose per movie

This list will probably be extended in the future, however, for now it should get us going.

After a quick inspection of the suggested data model I found that these parameter can be stored without any modification:

  • Path to the directory of the movies
  • Paths to the GridSquare JPG, MRC and XML files (via DataCollectionFileAttachment)
  • NumberOffractions (i.e. number of frames per movie)
  • Pixelsize

However, I don't see how these parameters can be fitted:

  • Counting or Super resolution mode
  • Filename of the movie file
  • Identifier of the foilhole
  • Date and time of acquisition
  • Sequential index of movie
  • Paths to movie meta-data JPG, MRC and XML files
  • Dose per movie

Maybe we need to add a new specific "movie" table?

This is just a start of discussion and not a list of requirements written in stone...

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

I'm trying to avoid a movie table if it all possible as it sends us down the same hole as the Image table for mx, we should really avoid saving full paths to jpg, mrc, and xml. The Image table does not scale well at all, hence why we have abandoned it at DLS (we can assume long term EM will scale like MX has so should think carefully about this now!). We should be able to construct per movie jpg, mrc, xml files from other variables as images are in mx. (DC.fileprefix, DC.imagedirectory, GridImageMap.some sequential number)

Dose per movie is an interesting one, we know the total dose of the whole exp, cant we just divide through, or is each movie really unique? Do people actually care as this will be calculated properly in MotionCorr after?

Sequential index of movie is stored in GridImageMap, and i will add a timestamp in there too, as this is required here too

from ispyb-database-modeling.

olofsvensson avatar olofsvensson commented on August 23, 2024

Hi @stufisher,
I agree that we should think carefully about this now so that we don't have the same situation as the Image table. The situation is though not quite the same:

The file name from a SR data collection can be found via a template and an image number. This is not true for a Cryo-EM movie file name: FoilHole_19150795_Data_19148847_19148848_20170619_2101-0344.mrc. For each movie many parts of the filename change:

  • "FoilHole_": This prefix is always the same for a whole grid square
  • "19150795": This seems to me to be the identifier of the foilhole, as this number is the same for four (or more) consecutive movies.
  • "19148847_19148848": These numbers seem to identify the foilhole, as these numbers are repeated for movies taken in different foilholes.
  • "20170619_2101": Date and time of the acquisition
  • "0344": a sequential index which is increased by one for each new movie.

We can find the date, time and the sequential index from the GridImageMap, but where will we be able to find the other foilhole identifiers? You can argue that we don't need them since we have a unique sequential index, however, this is not true for the corresponding mrc, jpg and xml files:

  • FoilHole_19150795_Data_19148847_19148848_20170619_2101.jpg
  • FoilHole_19150795_Data_19148847_19148848_20170619_2101.mrc
  • FoilHole_19150795_Data_19148847_19148848_20170619_2101.xml

Current data rates from one Cryo-EM instrument is (I guess) about 10-20 movies / minute, while current image rates from one SR data collection is > 1000 images / minute. So, the question is if the data rate from Cryo-EMs are going to be significantly increased in the not so far future?

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

We could add some other identifier fields to gridimagemap and store the corresponding numbers, these fields would then have a fixed size and be more scalable than a varchar(255)

i.e.

GridImageMap
identifier1 int
identifier2 int

I really want to keep these generic too, and not EM specific if possible.

Why the FEI/Gatan? software cant write sane file names is beyond me...

I think we should try to assume nothing, when ISPyB was designed 10 years ago we didnt expect MX to collect ~1000s images a second. 1-3kfps detectors already exist for EM (we make one)

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

Following from yesterdays discussion i have now added a movie table and deprecated gridimagemap:
em_ispyb_model

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Thanks. @olofsvensson and I are still working on webservices and even if most likely this will change I wanted to keep you updated and get your feedback:

So, this is preliminary structure for Movie table:
image

We propose to rename movieFullPath by moviePath and add some extra fields.

from ispyb-database-modeling.

antolinos avatar antolinos commented on August 23, 2024

Hi @stufisher,

This is how it looks like now:
image

Please have a look as there are some changes due to:

  • We wanted to store a value that was not in your schema
  • We did not include fields because we thought that they don't belong to that table or we don't know how to get them (yet)

In both cases, we are not sure then some discussion about that would be appreciated.

There are still few parameters that belong to datacollection:

  • voltage
  • sphericalAberration
  • amplitudeContrast
  • magnification
  • scannedPixelSize

We are thinking about specializing a new table called EMDataCollection with these values. It will avoid to increase the number of columns on data collection and will make ISPyB more scalable.

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024
  • We did not include fields because we thought that they don't belong to that table or we don't know how to get them (yet)

Please specify these explicitly, the two last points are quite different from each other!

You have undone a lot of my work here. You have renamed a lot of the columns, I don't understand? Why not work from our existing schema, rather than starting from scratch?

I had conceded and added a table (movie) to store a single varchar(255) per movie, now you have added another 4 columns of the same dimensions to a table that we discussed is going to be heavily populated and may grow exponentially over time. Can we not determine the xml path from the movieFullPath? I had chosen movieFullPath as the name to be consistent with the other tables in ISPyB.

As i had previously described:

  • voltage = wavelength => energy of radiation
  • sphericalAberration is a fixed function of the microscope and should be stored in BeamlineSetup as CS (what its referred to in EM terms).
  • amplitudeContrast is a function of the CTF (= it is determined by the correction)

Please can you provide a diff from my schema?

Movie

I'm not sure why you have micrograph or micrograph snapshot in movie? Does a micrograph even exist at this point? Movie is as it says a series of frames, is a micrograph not constructed from these via another process => MotionCorrection? (at least the one people will look at)
dosePerImage = dosePerFrame in MotionCorrection (=duplication?)

MotionCorrection

You have added log file, please remove it. MotionCorrection links to AutoProcProgram, which has a link to AutoProcProgramAttachment where logs should be stored
timestamp should be removed it is catered for by AutoProcProgram as well
do we really need another varchar(255) to the dose corrected micrograph?

CTF

You have added log file, please remove it. CTF links to AutoProcProgram, which has a link to AutoProcProgramAttachment where logs should be stored
timestamp should be removed it is catered for by AutoProcProgram as well

I'm not sure why you have removed amplitudeContrast from CTF. It is not a function of the movie/datacollection, it is determined by the CTF correction.

Why have spectraImage and spectraImageThumbnail? We dont need both. Why rename from fftTheoretical? (its a [fast] fourier transform of the micrograph + the theoretical one from the CTF function)

MotionCorrectionDrift

is the data that makes up the driftPlotFullPath. You will probably show the driftPlotFullPath in EXI, i want access to the raw data. Kevin tells me this data is available to Scipion somewhere

Lots of other columns have changed name, i dont understand why...

from ispyb-database-modeling.

stufisher avatar stufisher commented on August 23, 2024

Can we try and pick up converging on this model?

from ispyb-database-modeling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.