GithubHelp home page GithubHelp logo

dm-usecases's People

Contributors

bonnarel avatar glemson avatar lmichel avatar loumir avatar mcdittmar avatar msdemlei avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dm-usecases's Issues

Linking Uses cases to astronomical projects

The current uses cases tree is built-up following the organisation of metadata used in data files ( columns in VOTables).
It aims at covering multiple scenarios of arrangements of identifiers, measures, reference systems , calibration metadata, etc. and their relations to each other.
Many astronomical projects encounter these kinds of metadata arrangements
I would be interested to see a cross table, between a use case listed in this tree and the various projects, Chandra, Mast , Gavo, Gaia, Vizier, etc... with the various levels of interest, and implication :

  • (C) concerned with this use-case
  • (T) testing various solutions
  • (I) implementing this use case in routine already
Usecase vs Project Gaia Chandra GLAST LSST Gavo Vizier "my project"
combined_data C, T? C? C? C? T, I T, I
column_grouping C, T? T, I T, I

and each project could feed this table during progress.

The idea is to have an overview of the use case coverage, and of the DM experience feedback at various places.
Thanks .

Baseline use cases

I've always assumed the following has been the (rough) consensus agenda behind the effort, but I'm starting to realise that that is not necessarily the case, so I'm trying to raise the following (IMHO) baseline use case as an explicit issue.

"Let a client figure out non-VOTable metadata for a column independently of whether they are in a time series, a spectrum, or some generic table."

Examples (all assuming "if present in the table", of course):

  • If I see an ra, figure out the corresponding dec
  • If I have an ra/dec, figure out the reference system, the reference position, the epoch
  • If I have a position, figure out an associated proper motion
  • If I have any sort of quantity, figure out the/a error (and, eventually its sort, perhaps one day distribution paraemters, etc)
  • If I have a flux or magnitude, figure out the band it's in, perhaps a zeropoint, etc.

Again, and that is important: Clients must not need to know a full data model (e.g., "timeseries") to do that. The rationale behind that is that client writers will curse us if they will have to re-implement "find an error for a value" again and again just because there's a new product type. And the data providers will curse us if they cannot annotate a value/error pair just because it's, say, a redshift and there's no "data model" for that.

I'll note here I will have a very hard time singing off to anything that does not show code that actually lets clients do these things and that will credibly make it to astropy and/or STIL.

I'll shamelessly link again to https://github.com/msdemlei/astropy, which is an astropy fork that can pull such information from annotation produced right now by DaCHS' time series code. An analogous astropy (or STIL, if you prefer) fork for the other proposals would really help to give me some confidence in them.

I'll freely admit that my fork clearly isn't ready for PR yet (it was hacked in 1 1/2 afternoons). But I claim, can be made ready for a PR with ~ a week of concentrated work, with write and programmatic annoation support possible in perhaps another week, and something like that would of course be enough to convince me of the other proposal's viability (as long as all the features proposed are exercised, which is particularly true for the SQL-in-XML part if you do insist on that).

Sorry for being a bit obstinate here, but I have apologised far too often that 20 years into the VO we still cannot give a reliable recipe for "what's the dec for this ra", and if we again fail to create that it would be a real shame.

After the introduction: Do people roughly agree these are use cases that our annotation, and the models, ought to support?

Where to put this example?

This is a follow-up from the discussion which took place in the PR Issue #29.

I've completed a new example file and am trying to decide where this fits in the dm_usecases set.
I think it'd fit nicely into the "combined_data" case, whose primary goal appears to be in the AssociatedData relation (with WebEndpoint). This case covers the other 2 flavors, and could fit into the case description.. data obtained from cross match (Source, Detections with assoc. LightCurves) can be combined with similar from other catalogs to generate combined LightCurves.

My 'implementation' of this is merely to summarize the content, show that everything is accessible.

thoughts?

The example illustrates:

  • Multiple table usage
    • TimeSeries in Master table, with data from Detection table.
  • Different flavors of AssociatedData
    • VOModelInstance and AssociatedMangoInstance
  • Annotation to different models
    • The same data is annotated to difference model 'root' elements (mango:Source and cube:SparseCube). This would be a bit more clear if the TimeSeries relation was a reference.

Raw file: (modification of standard_properties case - Chandra file)

  • Master Source table: (one record per source)
    • identifier, position, significance, extent, variability
  • Detections table: (one per observation)
    • identifier, position, time, flux, hardness_ratios

These are mapped to model instances as:

  • Mango instance - Master Source
    Source.identifier => name
    Property (meas:Position) => (gal_l,gal_b)
    Property (meas:Generic) => significance
    Property (mango:FLAG) => extent_flag
    Property (mango:FLAG) => var_inter_hard_flag
    AssocData (mango:Source) => Detections
    AssocData (cube:SparseCube)=> TimeSeries

  • Mango instance - Detection
    Source.identifier => name
    Property (meas:Time) => o.gti_obs
    Property (mango:Photometry)=>o.flux_aper_b
    Property (mango:HR) => o.hard_hs|lolim|hilim
    Property (mango:HR) => o.hard_ms|lolim|hilim
    Property (mango:HR) => o.hard_hm|lolim|hilim

  • Cube - Light Curve of Detection data for each source
    Observable (meas:Time) => o.gti_obs
    Observable (mango:Photometry)=>o.flux_aper_b

Role of the roles

This issue is related to the @msdemlei proposal.
In ts.vot, I'm wondering how a client can get the role of a mapped object (or does it wish to?).

Lets take an easy example line 79:

      <INSTANCE ID="ndwibspodost" dmtype="ndcube:Cube">
      ...
      </INSTANCE>

The client can read the dmtype and guess that this mapping bock contains a cube. But what happens if the data set contains 2 ndcube:Cube. (contains 2 positions is more realistic I admit)

  • How to distinguish them?
  • Are you sure you can always get ride of dmroles on top level instances?

Let's take another example more confusing to me, line 53

<TEMPLATES>
      <INSTANCE ID="ndwibspapabt" dmtype="stc2:Coords">
      ...
      </INSTANCE>

This object is typed as a stc2:Coords which looks like an abstract class. No way to know which role does this instance play. If I enter it, I can see 2 un-typed attributes (time and space) but this does not tell me more about what to do with it.

  • Do you assume that the client has not to be informed on the role?
  • Did I miss some key point?

Native Frame case: example file mixup in GLON|GLAT spec

In the DATA:
column 4 = values ~0.3, range +/-90, therefore is Galactic latitude (b)
column 5 = values ~342, range 0:360, therefore is Galactic longitude (l)

FIELD 4 = specs for LONGITUDE, with DESCRIPTION for latitude
FIELD 5 = specs for LATITUDE, with DESCRIPTION for longitude

<FIELD name="GLON" ucd="pos.galactic.lon" datatype="double" width="12" precision="E5" unit="deg"><!-- ucd="POS_GAL_LON" -->
<DESCRIPTION>[-90/90] Source position, Galactic latitude (equinox J2000, epoch J2000) (gal_b)</DESCRIPTION>
</FIELD>
<FIELD name="GLAT" ucd="pos.galactic.lat" datatype="double" width="12" precision="E5" unit="deg"><!-- ucd="POS_GAL_LAT" -->
<DESCRIPTION>[0/360] Source position, Galactic longitude (gal_l)</DESCRIPTION>
</FIELD>

Column Groups: Question on coordinate systems involved

I have a question about the coordinate systems involved in this case.
The file has COOSYS element for FK4 with equinox=B1950
* the RA,DEC columns refer to this system

The file also has Proper Motion, and Radial Velocity elements, which do not reference any coordinate system.
I expect that they share the same system as the Position.. but
* Radial Velocity ucd and description both say this is "HELIOCENTRIC"
and my understanding is that FK4 origin is the BARYCENTER, not the HELIOCENTER.

A little help sorting this out would be appreciated.
In AstroPy, the Position, Proper Motion, and Radial Velocity are all contained within the same Frame, so I'm curious how these would reconcile into an AstroPy instance.

<FIELD name="RV" ucd="spect.dopplerVeloc;pos.heliocentric" datatype="float" width="6" precision="1" unit="km/s"><!-- ucd="VELOC_HC" -->
<DESCRIPTION>? Heliocentric radial velocity</DESCRIPTION>
<VALUES null="NaN" />
</FIELD>

MANGO model

Regarding the MANGO model

I am using a particular commit in my mapping, namely
https://raw.githubusercontent.com/ivoa-std/MANGO/a46441f6fc498a6aeb33ed97e65689fee3d00f6c/vo-dml/mango.vo-dml.xml
Referring only to the "last" commit may break certain mappings if the model changes.
Should do the same with the Cube and the various STC models I suppose.
I ran into some issues there that MCD fixed.

It would be nice if all the mappings that we compare here use exactly the same versions of the models. Can we define such a set? Note that these need to be internally consistent as well when considering model imports.

There are some problems with the VO-DML for that mango model:

The validation in the volute vo-dml folder produces some xsd problems, mainly pattern validation errors caused by leading spaces.

More importantly the following model errors were found:

  • Target role of subsets constraint on 'exterrors.MultiParamError1D' with vodml-ref mango:errors.MultiParamError.correlatedErrors can not be found
  • Target role of subsets constraint on 'exterrors.MultParamErro2D' with vodml-ref mango:exterrors.MultiParamError.correlatedErrrors can not be found
  • datatype mango:extcoords.FlagState of composition extcoords.FlagSys.statusLabels is not an object type but a 'dataType'

Minimal provenance in VOTable output

For VizieR it will be really appreciated (to not say required) to have common way to provide a minimal origin information.

The mango VizieR prototype uses the dock "associatedData" to link a remote URL which contains a "complete" VOProvenance.
I would like to add an other concise provenance output in the VOTable (for "naive" client)

The minimal provenance for a VOTable are: author+year_of_publication, doi or bibcode of the reference article.
In DatasetDM, I didn't see a clear distinction between creator/author .. Markus do you have an example of this serialization in your output?

.. and I would like more - but is it possible in a concise serialization: to specify a short annotation to specify the origin of a measure - e.g. the filter configuration: with the curator + a URL.
Any idea ?

Proper motions: modeling

There are currently 2 use cases whose data includes proper motions.
* Precise Astrometry: includes ra, dec, pm_ra, pm_dec, radial velocity, parallax
* Column Groupings: includes ra, dec, 'total' pm, radial velocity

My question has to do with the modeling of proper motion.
The Measurements model currently has ProperMotion( longitude, latitude ); which corresponds to the Precise Astrometry case. The other appears to be ProperMotion( magnitude, position_angle ) where the position angle is missing (perhaps because it is not relevant to the Column Groupings use case).

Questions:

  1. am I interpreting this correctly?
  2. are we going to want to support both representations of ProperMotion?

MANGO Annotation Scope

This issue is a fork of #12 that diverged from the initial dependant axes topic

Last message (#12 (comment)):

On Fri, Mar 19, 2021 at 07:23:56AM -0700, Laurent MICHEL wrote:

The scope of the annotations must go beyond simple column
annotations which must remain supported though.
I detailed it here section 2.
I'm starting to be unsure whether we are actually disagreeing on much
here -- and I've not found anything in that section 2 that I'd need
to contradict.

So, perhaps a clarification: is my time series use case "single
column annotation", and if so, why? What actual usage would go
beyond what's possible there?

My point, is since we have a self-consistant model made with a
hierarchy of elements identified with dmtype, dmrole and others
things, the annotation must be something matching that structure.

Well, the thing with dmrole and dmtype to me is the annotation, but
I think what you're saying here is that the annotation should be
directly derived from the model.

That I wholeheartedly agree with,
and that's why I'm so concerned about the current MCT proposal -- if
it were some abstract musing, I'd be totally ok with it. But when
the model defines the annotation structure. whatever we do in the
model has concrete operational consequences. Which, mind you, is
fine -- we'll have to deal with them somewhere and the DM is the
right place for that.

Once you have it, you can use accessors based on those identifiers.
That is what I call a public API does no refer to any native data element but only to model elements

...and I still cannot figure out why you want this -- after all, the
point of the whole exercise IMNSHO is to add information to VOTables
(and later perhaps other container formats) that is not previously in
there.

What would the use case for your free-floating annotation be, if this
is what your are proposing?

I the examples I showed up is these use-cases, I transform the
annotation in Pyhton dictionnaries that are easily serializable in
JSON (a good point for data exchange).

In pseudo code, this would look like this:

annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
  sys.exit(1)

mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7

This wouldn't require Python classes implementing the model
(fundamental point)

I claim that the annotation must be designed in a way that allows
this in addition to basic usages.

-- but why would you want to do this JSON serialisation? Wouldn't it
be much better overall to just put that value into a VOTable and
transmit that rather than fiddle around with custom JSON
dictionaries? In particular when there are quite tangible benefits
if you make it explicit in the model what exactly it is that you're
annotating?

By the way, if by "wouldn't require Python classes" you mean "You
don't have to map model classes into python classes" then yes, I
agree, that is a very desirable part of anything we come up with.
Let's avoid code generators and similar horrors as much as we can.
Nobody likes those.

Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:

  • Basic quantities (no significant gain I admit)
  • Complex quantities (e.g. complex errors)
  • Columns grouping
  • Status values
  • Associated data or services

I agree to all these use cases (except, as I said, even for basic
quantities the gain is enormous because we can finally express
frames, photometric systems, and the like in non-hackish ways).

But: which of these use cases would you miss with the non-entangled,
explicit-reference models?

MeasCoords enhancement

The topics reported here have already been developed on either the mailing list or the RFC page

Need a lon/lat sky position

Let's have a look at the Coords:Point class that represents a sky position:

              <INSTANCE dmrole="meas:Measure.coord" dmtype="coords:Point">
                <INSTANCE dmrole="coords:Point.axis1" dmtype="ivoa:RealQuantity">
                  <ATTRIBUTE dmrole="ivoa:RealQuantity.value" dmtype="ivoa:real" ref="RAJ2000"/>
                  <ATTRIBUTE dmrole="ivoa:Quantity.unit" dmtype="ivoa:Unit" value="deg"/>
                </INSTANCE>
                <INSTANCE dmrole="coords:Point.axis2" dmtype="ivoa:RealQuantity">
                  <ATTRIBUTE dmrole="ivoa:RealQuantity.value" dmtype="ivoa:real" ref="DEJ2000"/>
                  <ATTRIBUTE dmrole="ivoa:Quantity.unit" dmtype="ivoa:Unit" value="deg"/>
                </INSTANCE>
                <INSTANCE dmrole="coords:Coordinate.coordSys" dmref="SpaceSys_ICRS_Spherical"/>
              </INSTANCE>

Reading this, Ive no way to guess that coords:Point.axis1 is a right ascension and coords:Point.axis2 the declination.
This might be not an issue for clients into which the model is hard-coded (e.g. code generated from the VODML), but this is a big issue for clients exploring the mapping by using e.g. Xpath selectors (I can testify).
MCT needs a lonLatPoint class looking like this:

              <INSTANCE dmrole="meas:Measure.coord" dmtype="coords:LonLatPoint">
                <ATTRIBUTE dmrole="coords:LonLatPoint.longitude" dmtype="ivoa:real" ref="RAJ2000"/>                
                <ATTRIBUTE dmrole="coords:LonLatPoint.latitude" dmtype="ivoa:real" ref="DEJ2000"/>                
                <ATTRIBUTE dmrole="coords:LonLatPoint.unit" dmtype="ivoa:string" value="deg"/>
                <INSTANCE dmrole="coords:Coordinate.coordSys" dmref="SpaceSys_ICRS_Spherical"/>
              </INSTANCE>

Proper Motion

This MCT pattern is the following:

 meas -> coord 

Having all measures build on the same pattern is a good point for both interoperability and developers and it must be preserved as much as possible.
So why is it broken for proper motions?

 meas  -> coord lon
     | -> coord lat

The proper motion (or a LonLatProper motion) could use the above coords:LonLatPoint as coordinate class.

Impact on the model change

This imporant issue comes in continuation of MANGO Annotation Scope.

It continues the discussion whose content is recalled here:

have to be mapped. The rest can (must) be ignored. The mapping
block represents a subset of the model. If the model changes keep
the backward compatibility, the 'old' annotations remain consistant
and the interoperability between dataset mapped with different DM
versions is preserved.

Yes -- that's a minor version. These aren't a (large) problem, and
indeed I'm claiming that our system needs to be built in a way that
clients don't even notice minor versions unless they really want to
(which, I think, so far is true for all proposals).

If you are saying that clients must be updated to take advantage of
new model features, you are right, whatever the annotation scheme
is, this is just because. new model class => new role => new processing.

No, that is not my point. My point is what happens in a major
version change. When DM includes Coord and Coord includes Meas and
you now need to change Meas incompatibly ("major version), going to
Meas2 with entangled DMs will require new Coord2 and a DM2 models,
even it nothing changes in them, simply to update the types of the
references -- which are breaking changes.

With the simple, stand-alone models, you just add a Meas2 annotation,
and Coord and DM remain as they are. In an ideal world, once all
clients are updated, we phase out the legacy Meas annotation. The
reality is of course going to be uglier, but still feasible, in
contrast to having to re-do all DM standards when we need to re-do
Meas).

VODML Mapping vs ModelInstanceInVot

Both proposals (VOML Mapping, ModelInstanceInVot) have similar structures. There is nevertheless a major difference that is justified here.

  • Fig 1 shows a dataset made with a block of metadata on the top of the data block and mapped with VOML Mapping
    • Data are mapped in the TEMPLATE block that maps data for one row.
    • Metadata are located in the GLOBALS block along with the coordinate frames.
    • Issues
      • The parser has to browse 2 different mapping blocks to retrieve all the components of the dataset
      • The parser has no easy way to discrimine which GLOBALS element is part or not of the dataset mapping.
  • Fig 3 shows the same dataset mapped with ModelInstanceInVot
    • Both data and metadata are mapped within the TABLE_TEMPLATE blocks
    • The data mapping is enclosed in a TABLE_RAW_TEMPLATE block
    • Gains
      • All elements related to the dataset mapping are located in a single block (TABLE_TEMPLATE), this makes the parser job easier.
      • All elements not related to any dataset are located in the GLOBALS block
      • We can have several TABLE_RAW_TEMPLATE blocks within a single TABLE_TEMPLATE. This allows to tell the parser to iterate several times on the same data table e.g. to extract data subsets (selection by filter in time-series/gaia_multiband).

Screenshot 2021-03-24 at 14 01 19

The advantage of the ModelInstanceInVot mapping structure is even more obvious for multi-table VOTables.

  • Fig 2 shows the mapping structure for a VOTable containing 2 tables, each one with its own mapping (e.g. source + detections).
    • The GLOBALS block contains now the meta data of the 2 datasets in addition to the coordinate frames.
    • Issues
      • Same as before
      • The parser has no easy way to identify which GLOBALS element is part of dataset 1 ort 2
  • Fig 4 shows the same datasets mapped with ModelInstanceInVot
    • Each table is mapped in a specific TABLE_TEMPLATE.
    • Gains
      • Same as before
      • No longer risk to mix up meta data of both datasets.

Screenshot 2021-03-24 at 14 01 29

What is that dependant axes

If I understand well your serialisation, you map a list of NDPoint, each one being composed with

  • one independant value typed as time
  • 2 dependants values typed as GenericMeasure

I don not see how a client can see that the 1st dependant value is a magnitude and a second a flux.

  • Is it supposed to check the FIELD ucd or the coord system type?

This question is related to the discussion we have been having here

Using intergrated models

I open this thread because several times in the discussion I mentioned SAMP as an interesting use-case for using integrated models and I never took room enough to tell more on this.

Let's imagine the Samp emitter showing up a VOTable where the user spots one interesting source (table row).
So far and within the limits of my SAMP skill, if that user want's to share this source (not only the position but all parameters) with other applications, he/she has to generate a single-row VOTable and operate a table.load with the hope that the recipient will be able to parse it (*)

If that Votable is mapped on MANGO, the client will be able to easily build a MANGO (source) instance serialized in a convenient way (e.g. JSON - I like it - MType=mango.load.json) that can be interpreted very easily by e.g. a web application.
This will work whatever the complexity of the source data (grouped columns, associated detection ...) thanks to the use of an integrated model. This example can be extended to other models.

This is one of the cases for which I promote the use of integrated models. This is little aside of the main topic (VOtable annotation) but anticipating new realistic usages must be part of the design.

(*) All VO client are able to parse VOTables, but Samp can be used in other contexts, (WEB applications in mind).

Have human-readable labels on instances that may need UI choices

In many applications user agents will want to present a choice of using different instances in a UI. For example, there could be several positions per table row, or perhaps several photometric measurements (Kron vs. aperture). In these cases, it is helpful for the user agent to have a hint on how to present this choice.

So, when a coordinate has such a human-readable label, a UA could show a selector for, say,

"Short-term solution/Long-term solution/Galactic"

rather than the less helpful

"Coord in ICRS 1/Coord in ICRS 2/Coord in Galactic 1"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.