GithubHelp home page GithubHelp logo

opencadc / caom2 Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 11.0 6.53 MB

Common Archive Observation Model

License: GNU Affero General Public License v3.0

Java 95.60% Roff 0.06% HTML 0.44% XSLT 3.54% Shell 0.01% Makefile 0.19% Python 0.16%

caom2's People

Contributors

at88mph avatar brianmajor avatar hjeeves avatar jburke-cadc avatar opencadc-admin avatar pdowler avatar yeunga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caom2's Issues

allow hierarchy of composite observations

The current UML diagram indicates that the members of a CompositeObservation are always SImpleObservation(s).

Allowing other CompositeObservation(s) to be members would more clearly describe the intended aggregation of observations.

fits2caom2 help is inaccurate.

The -help content suggest that -test does not persist to the database but really -test turns off writing to a file.

Are there cases where fits2caom2 writes to a database anymore?

option A: remove -test but that might break code that expects -test

option B: Change the behaviour so -test still writes the .xml (also might break current usgage)

option C: Change the help line to indicate that no file will be write to -out, impacts likely minimal.

python module validation of observation_id field - too stringent?

Several of our collections at IRSA have observationid values that include spaces or slashes, and we're running into the following error trying to export them with the python caom2 library:

invalid observation_id: may not contain space ( ), slash (/), escape (\), or percent (%)

I reached out to @pdowler, who said this is to ensure that observation_id does not contain any characters aren't URI-safe, and suggested I open an issue here for discussion.

We don't use that field for URI population and haven't been able to find any documentation indicating that the observation_id string has any restrictions.

Is the caom2 module possibly being too stringent in how it validates this column?

(FYI we use CAOM 2.3, and the python module caom2 2.3.8.4)

Requiring pixel values for some WCS

While validating CAOM v2.3 XML files, I’ve run into an error which appear to be driven by the schema design: in the SpectralWCS, PolarizationWCS, and TemporalWCS (energy, polarization, and
time in Chunk, respectively) there is an axis element that contains a bounds element of type CoordsBound1D. This element consists of samples of start/end points defined by RefCoord which currently require both pix and val to be provided. While the physical value (val) should always be present, it’s not clear that a pixel value (pix) will be and that’s been the case for some of the data I’ve been working with.
A simple fix may be to set these as minOccurs="0" in the schema.

Add NAIF ID to Target?

Most (but not all) moving object targets have a NAIF ID. Target already has a keywords field, so we can add it there. It might be better for there to be a separate field for it. I am not sure, though.

Schema description should include units.

The schema descriptions that appear in the TAP services should tell the user what UNITs quantities in the table are expressed in. e.g. Plane.energy_bounds_lower is described as:
'lower bound on energy axis (barycentric wavelength)' which is helpful, but without knowing what units those are (nm?) the user is left to guess.

One might thing that just declaring the entire table as following some convention (cgi? mks?) would be sufficient) but reminding the user with good column descriptions would be more friendly.

PublisherID constructor URI arg produces odd resource ID

The PublisherID constructor in caom2 with the URI argument constructs a resourceID from the path of the argument. However, the URI.getPath() contains the leading slash, which produces the following:

final PublisherID publisherID = new PublisherID(URI.create(PublisherID.SCHEME + "://com.myauth/MYCOLLECTION?OBSID/PRODID"));
System.out(publisherID.getResourceID()) // << Outputs ivo://com.myauth//MYCOLLECTION

The double slash MAY not be a problem for some parsers, but breaks tests.

Provenance from multiple versions

This sort of relates to Issue #66 and the question of cardinality.

Planes are often produced from an ensemble of software, not a single application. In the case of ALMA MS data, in particular, we have MS data that is calibrated using CASA XXX and then split using CASA YYY. The provenance of CASA YYY would tell you that you should use YYY to open these files (MS is not a standard format) but the CASA XXX part is needed to tell you what the calibration system was. In particular CASA XXX is what tells you about calibration trust while CASA YYY part is more about data form. How (if at all) should this be expressed in the provenance?

Missing slash in MastResolver

The URLs produced by the MastResolver are missing the slash between the base URL and the scheme specific part of the URI.

The tests for MastResolver should be corrected so that they fail with this implementation. Then, the implementation should be corrected so that the tests pass.

Add data release date to Artifacts

JWST has file based release dates rather than observation based. Would like to be able to optionally add dataRelease to Artifact - we could then roll up to higher levels as needed.

support alternative representations of plane metadata

The current plane metadata is roughly:
position: ICRS (deg)
energy: (barycentric) wavelength (m)
time: MJD UTC (d)
polarization: list of states

These coordinate/reference systems and units are now listed in the "interoperable profile".
However, the CADC TAP service also provides some energy columns with frequency values because the typical query cannot be re-written in a way that can be (easily) indexed...

In principle, one could specify places to put values with alternate representations. This is most obvious for wavelength/frequency/energy/velocity, less so for position (icrs/other epochs/galactic?), and then maybe time frames....

RFE: add checksum to Artifact

Proposal from the HST Archive Coordination Meeting: add a checksum (probably MD5) to the artifact so that metadata sharing of CAOM observation metadata provides sufficient information to enable partners to figure out which data files they need to download. In the case of new artifacts, the partner won't have the file (denoted by the Artifact.uri) at all. For changed files, they will detect this via the checksum. For changed arifact metadata, the partner would examine the artifact due to timestamp change but can determine from the checksum that they do not need to download the data again.

support for IVOA polygon data type is too missing

The Plane.position.bounds Polygon allows for disjoint pieces and holes while IVOA (DALI) polygon must be a simple outer hull.

energy.bounds and time.bounds Interval values are consistent with IVOA (DALI) interval.

Add s/n (signal to noise) to Plane.Metrics

We have a new spectroscopic initiative for JWST work which would require having the observation signal to noise ratio (float value) available in the model. The obvious place is to add it to the plane.metrics class. Would it be possible to get this in v2.4 since it is a pretty simple addition?

Addition of *_calib_status "optional" ObsCore attributes to the model?

The ObsCore data model contains "optional" elements of the form *_calib_status, providing more information about the level of calibration of various axes, spatial, temporal, spectral, and observable. We are likely to include these in the Rubin Observatory's ObsCore tables, basically because the overall calib_level = 1 vs. =2 distinction doesn't adequately capture the way we create the planned data products, and in particular how the observable (flux-like) axis is calibrated.

In order to maintain the connection with CAOM2, we would like to suggest that the Position, Time, Energy, and Observable objects in the CAOM2 data model each be supplemented with a string-valued calib_status attribute with multiplicity [0..1] (i.e., optional).

Looking at the language in the ObsCore standard (quoted verbatim below) it doesn't seem like the enumeration values for these attributes are sufficiently standardized to be able to force them to be explicit enumerations in CAOM.

Attribute Short title Principal Utype Suggested values Description
s_calib_status Type of calibration along the spatial axis 1 Char.SpatialAxis .calibrationStatus uncalibrated, raw, calibrated A string to encode the calibration status along the spatial axis (astrometry). Possible values could be {uncalibrated, raw, calibrated} and correspond to the Utype Char.SpatialAxis.calibrationStatus. For some observations, only the pointing position is provided (s_calib_status =”uncalibrated”). Some other may have a raw linear relationship between the pixel coordinates and the world coordinates (s_calib_status = ”raw”).
t_calib_status Type of time coordinate calibration 0 Char.TimeAxis .calibrationStatus uncalibrated, calibrated, raw, relative This parameter gives the status of time axis calibration. This is especially useful for time series. Possible values are principally {uncalibrated, calibrated, raw, relative}. This may be extended for specific time domain collections.
em_calib_status Type of spectral coord calibration 0 Char.SpectralAxis .calibrationStatus uncalibrated, calibrated, relative, absolute This attribute of the spectral axis indicates the status of the data in terms of spectral calibration. Possible values are defined in the Characterisation Data Model and belong to {uncalibrated, calibrated, relative, absolute}.
o_calib_status Type of calibration for the observable coordinate 1 Char.ObservableAxis .calibrationStatus absolute, relative, normalized, any This describes the calibration applied on the Flux observed (or other observable quantity). It is a string to be selected in {absolute, relative, normalized, any} as defined in the SSA specification (Tody, Dolensky and al. 2012) in section 4.1.2.10. This list can be extended or updated for instance using an extension mechanism similar to the definition of new UCDs in the IVOA process, following the feedback from implementations of ObsTAP services.

The following characteristics are in common for all four attributes:

Attribute Value
Datatype adql:VARCHAR / Enum string
Units NULL
UCD meta.code.qual
Mandatory 0
Index 0 (except for o_calib_status, where it's "TBD")
Std 1

RFE: add a flag to indicate that a simple observation is a member

Proposal from HST Archive Coordination Meeting: make it easy to exclude members (SimpleObservation) from search results.

If members could be flagged then the query would not need to include a subquery or join to determine this. In addition, such an on-the-fly approach would mean that composites created by someone else and included in the system (eg in the aggregate database at CADC) would cause simple observations to be hidden. A specific flag would mean the provider that curates the simple and composite observations would control this explicitly.

The form of the flag is TBD.

fits2caom2 unit tests have system-dependent Date assertions

e.g. FitsMapperTest.testPopulateObservation, line 461 verifies a java.util.Date value by comparing a hard coded string (in PST) to {Date variable}.toString(). The latter relies on the locally configured timezone so fails if one is not in the pacific timezone.

It's also totally the wrong way to compare Date values, which have a perfectly good equals method.

RFE: add more values to DataProductType

Proposal from HST Archive Coordination Meeting to support additional values/details.

Currently allows ObsCore values plus catalog. We should consider moving to a more loosely coupled vocabulary to allow for extensions.

caom2-compute: WCS validator should check for restfrq|restwav if spectral axis is velocity

restfrq and restwav are technically optional, but if the spectral ctype is velocity then one of them is needed or the validation fails in an obscure way.

The validator should check that one is provided when the axis is velocity and throw an exception with a good error mesage like "one of restfrq or restwav is required for axis with ctype={the ctype value}"

RFE: add concept of logical plane identifier

This is an identifier that is generated by the content originator and kept intact at mirror sites. The CAOM publisherID (publisher dataset identifier) is specific to each data centre.

restrict Algorithm.name values for simple observations

There are some defacto standard algorithm names in use (exposure) and a few others that could be restricted for use with simple observations only (simulation was the one that prompted allowing other names in the first place.

add Range subclass of Shape to conform to DAL standards

CutoutUtil.getPositionBounds(SpatialWCS, Shape) throws an UnsupportedOperationException

ultimately needed for complete IVOA SODA support

possible solution: add the class to the java library to support SODA but don't add it to the model since it isn't needed to describe data bounds?

Extend Observation Intent Type

STScI has been ingesting observations encapsulating images created by the Office of Public Outreach into our CAOM database. We've needed to use intent_type=science for these, but ideally we would prefer an outreach intent for them.
Partially implemented via PR opencadc/caom2tools#171 but opening an issue here for further discussion.

RFE: add metadata checksum

From thinking about the plan that emerged from the HST Archive Coordination Meeting:

Usage scenario: if site B has a mirror of the caom metadata from site A, they can harvest new/changed observation documents by looking for recent maxLastModified but they cannot feasibly perform a validation that the content they have is correct in detail.

Proposal: add a checksum of the metadata that could be transmitted with in the observation document and found via a query that lists observation identifiers, maxLastModified, and the checksum.

SUBARU artifact resolution has invalid hostname

The URL resolution in the caom2-artifact-resolvers/src/main/java/ca/nrc/cadc/caom2/artifact/resolvers/SubaruResolver.java class contains a base URL for the CADC site, but the logic is contained on the CANFAR site.

The base URL should be http://www.canfar.net rather than http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca.

Add lunar keywords

A number of our observatories, include space missions like WISE, record characteristics about the moon. IRTF has the most complete information

  • LUN_FLI: Fraction Lunar Illumination (FLI) is the percent of the Moon's visible disk illuminated by the sun. Range is 0.0 to 100.0.

  • LUN_LIGHT = The lunar light level based on the lunar elevation (EL), and Fraction Lunar Illumination (FLI) values from JPH Horizon. Values are:

    • dark = 0% <= FLI <25.0%, or Moon Elevation < 0 degrees.
    • gray = 25% <= FLI < 75.0% with Moon Elevation > 0 degrees.
    • bright = 75.0 <= FLI, and Moon Elevation > 0 degrees.
  • LUN_SEP = The lunar separation in degrees of RA,DEC – moon.

  • LUN_EL = The lunar position's Elevation in degrees, +90.0 to -90.0

  • LUN_AZ = The lunar position's Azimuth in degrees. 0-360. 0=North, 90=east.

This seems common enough, especially for ground based observatories, such that at least some of these keywords should be supported.

CutoutUtil.initCutout in caom2-compute does not implement correct axis order

The comment in the method is correct but the appending of codes to create the template is only done in typical axis order and not by using values of axis indices in the chunk.

For input with Chunk.naxis=3, energyAxis=1, positionAxis1=2, positionAxis2=3 (legit) the current initCutout would create px,py,ee instead of ee,px,py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.