opencadc / caom2 Goto Github PK
View Code? Open in Web Editor NEWCommon Archive Observation Model
License: GNU Affero General Public License v3.0
Common Archive Observation Model
License: GNU Affero General Public License v3.0
The current UML diagram indicates that the members of a CompositeObservation are always SImpleObservation(s).
Allowing other CompositeObservation(s) to be members would more clearly describe the intended aggregation of observations.
The frameinfo=<YYYY-MM-DD>/<FRAMEID>
format is being URL encoded by the SubaruResolver
class. When a user requests SUBARU data from the browser, it also encodes it which results in a double-encoded value and is unreadable.
The -help content suggest that -test does not persist to the database but really -test turns off writing to a file.
Are there cases where fits2caom2 writes to a database anymore?
option A: remove -test but that might break code that expects -test
option B: Change the behaviour so -test still writes the .xml (also might break current usgage)
option C: Change the help line to indicate that no file will be write to -out, impacts likely minimal.
Several of our collections at IRSA have observationid values that include spaces or slashes, and we're running into the following error trying to export them with the python caom2 library:
invalid observation_id: may not contain space ( ), slash (/), escape (\), or percent (%)
I reached out to @pdowler, who said this is to ensure that observation_id does not contain any characters aren't URI-safe, and suggested I open an issue here for discussion.
We don't use that field for URI population and haven't been able to find any documentation indicating that the observation_id string has any restrictions.
Is the caom2 module possibly being too stringent in how it validates this column?
(FYI we use CAOM 2.3, and the python module caom2 2.3.8.4)
While validating CAOM v2.3 XML files, I’ve run into an error which appear to be driven by the schema design: in the SpectralWCS, PolarizationWCS, and TemporalWCS (energy, polarization, and
time in Chunk, respectively) there is an axis element that contains a bounds element of type CoordsBound1D. This element consists of samples of start/end points defined by RefCoord which currently require both pix and val to be provided. While the physical value (val) should always be present, it’s not clear that a pixel value (pix) will be and that’s been the case for some of the data I’ve been working with.
A simple fix may be to set these as minOccurs="0" in the schema.
Most (but not all) moving object targets have a NAIF ID. Target already has a keywords field, so we can add it there. It might be better for there to be a separate field for it. I am not sure, though.
In caom2-artifact-sync, print the URL in the message field of the END JSON message when there is a failure.
add to xml serialisation ;
also add support in other repos: caom2tools.git (py) and caom2db.git
The artifact URIs for the smoka observation records need to be resolved.
To get data make a POST request as shown in this example:
data requests
To get preview PNG files make a get request on the service as described here:
preview requests
this comes from an archive partners slack discussion started by David Rodrigues
proposal.id
telescope.name
instrument.name
plane.energy.bandpassName
... maybe more
The schema descriptions that appear in the TAP services should tell the user what UNITs quantities in the table are expressed in. e.g. Plane.energy_bounds_lower is described as:
'lower bound on energy axis (barycentric wavelength)' which is helpful, but without knowing what units those are (nm?) the user is left to guess.
One might thing that just declaring the entire table as following some convention (cgi? mks?) would be sufficient) but reminding the user with good column descriptions would be more friendly.
The PublisherID
constructor in caom2
with the URI
argument constructs a resourceID
from the path
of the argument. However, the URI.getPath()
contains the leading slash, which produces the following:
final PublisherID publisherID = new PublisherID(URI.create(PublisherID.SCHEME + "://com.myauth/MYCOLLECTION?OBSID/PRODID"));
System.out(publisherID.getResourceID()) // << Outputs ivo://com.myauth//MYCOLLECTION
The double slash MAY not be a problem for some parsers, but breaks tests.
This sort of relates to Issue #66 and the question of cardinality.
Planes are often produced from an ensemble of software, not a single application. In the case of ALMA MS data, in particular, we have MS data that is calibrated using CASA XXX and then split using CASA YYY. The provenance of CASA YYY would tell you that you should use YYY to open these files (MS is not a standard format) but the CASA XXX part is needed to tell you what the calibration system was. In particular CASA XXX is what tells you about calibration trust while CASA YYY part is more about data form. How (if at all) should this be expressed in the provenance?
The current Polygon definition is limited to less than all-sky. A MultiPolygon with two hemispheres could be constructed, but you cannot create an outer simple polygon that contains it.
The URLs produced by the MastResolver are missing the slash between the base URL and the scheme specific part of the URI.
The tests for MastResolver should be corrected so that they fail with this implementation. Then, the implementation should be corrected so that the tests pass.
JWST has file based release dates rather than observation based. Would like to be able to optionally add dataRelease to Artifact - we could then roll up to higher levels as needed.
The current plane metadata is roughly:
position: ICRS (deg)
energy: (barycentric) wavelength (m)
time: MJD UTC (d)
polarization: list of states
These coordinate/reference systems and units are now listed in the "interoperable profile".
However, the CADC TAP service also provides some energy columns with frequency values because the typical query cannot be re-written in a way that can be (easily) indexed...
In principle, one could specify places to put values with alternate representations. This is most obvious for wavelength/frequency/energy/velocity, less so for position (icrs/other epochs/galactic?), and then maybe time frames....
It would be nice to track which software and version was used to generate metadata for an observation. This is especially true when sharing metadata between sites.
to refer to the details of the propsal (abstract, document, etc)
some telescopes provide the endpoint
Proposal from the HST Archive Coordination Meeting: add a checksum (probably MD5) to the artifact so that metadata sharing of CAOM observation metadata provides sufficient information to enable partners to figure out which data files they need to download. In the case of new artifacts, the partner won't have the file (denoted by the Artifact.uri) at all. For changed files, they will detect this via the checksum. For changed arifact metadata, the partner would examine the artifact due to timestamp change but can determine from the checksum that they do not need to download the data again.
From HST Archive coordination meeting: add a detector name.
The Plane.position.bounds Polygon allows for disjoint pieces and holes while IVOA (DALI) polygon must be a simple outer hull.
energy.bounds and time.bounds Interval values are consistent with IVOA (DALI) interval.
Some filters span multiple predefined energy regimes (eg HST filters span optical and UV).
We have a new spectroscopic initiative for JWST work which would require having the observation signal to noise ratio (float value) available in the model. The obvious place is to add it to the plane.metrics class. Would it be possible to get this in v2.4 since it is a pretty simple addition?
The ObsCore data model contains "optional" elements of the form *_calib_status
, providing more information about the level of calibration of various axes, spatial, temporal, spectral, and observable. We are likely to include these in the Rubin Observatory's ObsCore tables, basically because the overall calib_level = 1
vs. =2
distinction doesn't adequately capture the way we create the planned data products, and in particular how the observable (flux-like) axis is calibrated.
In order to maintain the connection with CAOM2, we would like to suggest that the Position
, Time
, Energy
, and Observable
objects in the CAOM2 data model each be supplemented with a string-valued calib_status
attribute with multiplicity [0..1]
(i.e., optional).
Looking at the language in the ObsCore standard (quoted verbatim below) it doesn't seem like the enumeration values for these attributes are sufficiently standardized to be able to force them to be explicit enumerations in CAOM.
Attribute | Short title | Principal | Utype | Suggested values | Description |
---|---|---|---|---|---|
s_calib_status |
Type of calibration along the spatial axis | 1 | Char.SpatialAxis .calibrationStatus | uncalibrated, raw, calibrated | A string to encode the calibration status along the spatial axis (astrometry). Possible values could be {uncalibrated, raw, calibrated} and correspond to the Utype Char.SpatialAxis.calibrationStatus. For some observations, only the pointing position is provided (s_calib_status =”uncalibrated”). Some other may have a raw linear relationship between the pixel coordinates and the world coordinates (s_calib_status = ”raw”). |
t_calib_status |
Type of time coordinate calibration | 0 | Char.TimeAxis .calibrationStatus | uncalibrated, calibrated, raw, relative | This parameter gives the status of time axis calibration. This is especially useful for time series. Possible values are principally {uncalibrated, calibrated, raw, relative}. This may be extended for specific time domain collections. |
em_calib_status |
Type of spectral coord calibration | 0 | Char.SpectralAxis .calibrationStatus | uncalibrated, calibrated, relative, absolute | This attribute of the spectral axis indicates the status of the data in terms of spectral calibration. Possible values are defined in the Characterisation Data Model and belong to {uncalibrated, calibrated, relative, absolute}. |
o_calib_status |
Type of calibration for the observable coordinate | 1 | Char.ObservableAxis .calibrationStatus | absolute, relative, normalized, any | This describes the calibration applied on the Flux observed (or other observable quantity). It is a string to be selected in {absolute, relative, normalized, any} as defined in the SSA specification (Tody, Dolensky and al. 2012) in section 4.1.2.10. This list can be extended or updated for instance using an extension mechanism similar to the definition of new UCDs in the IVOA process, following the feedback from implementations of ObsTAP services. |
The following characteristics are in common for all four attributes:
Attribute | Value |
---|---|
Datatype | adql:VARCHAR / Enum string |
Units | NULL |
UCD | meta.code.qual |
Mandatory | 0 |
Index | 0 (except for o_calib_status , where it's "TBD") |
Std | 1 |
Proposal from HST Archive Coordination Meeting: make it easy to exclude members (SimpleObservation) from search results.
If members could be flagged then the query would not need to include a subquery or join to determine this. In addition, such an on-the-fly approach would mean that composites created by someone else and included in the system (eg in the aggregate database at CADC) would cause simple observations to be hidden. A specific flag would mean the provider that curates the simple and composite observations would control this explicitly.
The form of the flag is TBD.
It currently only supports polygons.
e.g. FitsMapperTest.testPopulateObservation, line 461 verifies a java.util.Date value by comparing a hard coded string (in PST) to {Date variable}.toString(). The latter relies on the locally configured timezone so fails if one is not in the pacific timezone.
It's also totally the wrong way to compare Date values, which have a perfectly good equals method.
Proposal from HST Archive Coordination Meeting to support additional values/details.
Currently allows ObsCore values plus catalog. We should consider moving to a more loosely coupled vocabulary to allow for extensions.
restfrq and restwav are technically optional, but if the spectral ctype is velocity then one of them is needed or the validation fails in an obscure way.
The validator should check that one is provided when the axis is velocity and throw an exception with a good error mesage like "one of restfrq or restwav is required for axis with ctype={the ctype value}"
This is technically possible, but we should define a common practice for the URI structure so that other users can, in principle, understand the URI and do something useful.
The new URL should look like:
http://archive1.dm.noao.edu:7003/?fileRef=$image
instead of:
http://nsaserver.sdm.noao.edu:7003/?fileRef=$image
This is an identifier that is generated by the content originator and kept intact at mirror sites. The CAOM publisherID (publisher dataset identifier) is specific to each data centre.
For ALMA, product planes could refer to separate targets and the target name at the observation level is not sufficient.
There are some defacto standard algorithm names in use (exposure) and a few others that could be restricted for use with simple observations only (simulation was the one that prompted allowing other names in the first place.
Chunk.naxis=3
positionAxis1=null
positionAxis2=null
energyAxis=null
timeAxis=3
time =
This should fail because no axes are assigned to specified as 1 and 2.
FDEP is Faraday Depth
RM is Rotation Measure
Radio cubes from the GMIMS project at DRAO have CTYPE3=RM (rotation measure) and FDEP is another usage that we need to support.
DerivedObservation would incldue composites (eg stacks), but also:
A Chunk with null naxis fails validaiton, but the model allows naxis to be null.
Correct behaviour: null naxis is allowed; the presence of other metadata just describes the "blob"
CutoutUtil.getPositionBounds(SpatialWCS, Shape) throws an UnsupportedOperationException
ultimately needed for complete IVOA SODA support
possible solution: add the class to the java library to support SODA but don't add it to the model since it isn't needed to describe data bounds?
STScI has been ingesting observations encapsulating images created by the Office of Public Outreach into our CAOM database. We've needed to use intent_type=science for these, but ideally we would prefer an outreach intent for them.
Partially implemented via PR opencadc/caom2tools#171 but opening an issue here for further discussion.
From thinking about the plan that emerged from the HST Archive Coordination Meeting:
Usage scenario: if site B has a mirror of the caom metadata from site A, they can harvest new/changed observation documents by looking for recent maxLastModified but they cannot feasibly perform a validation that the content they have is correct in detail.
Proposal: add a checksum of the metadata that could be transmitted with in the observation document and found via a query that lists observation identifiers, maxLastModified, and the checksum.
For example:
keyword is a phrase with spaces
keyword has special characters like single-tick, quote, etc...
Plane.position.resolution [0..1] Interval
Plane.energy.resolvingPower [0..1] Interval
Plane.time.resolution [0..1] Interval
The URL resolution in the caom2-artifact-resolvers/src/main/java/ca/nrc/cadc/caom2/artifact/resolvers/SubaruResolver.java
class contains a base URL for the CADC site, but the logic is contained on the CANFAR site.
The base URL should be http://www.canfar.net
rather than http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca
.
A number of our observatories, include space missions like WISE, record characteristics about the moon. IRTF has the most complete information
LUN_FLI: Fraction Lunar Illumination (FLI) is the percent of the Moon's visible disk illuminated by the sun. Range is 0.0 to 100.0.
LUN_LIGHT = The lunar light level based on the lunar elevation (EL), and Fraction Lunar Illumination (FLI) values from JPH Horizon. Values are:
LUN_SEP = The lunar separation in degrees of RA,DEC – moon.
LUN_EL = The lunar position's Elevation in degrees, +90.0 to -90.0
LUN_AZ = The lunar position's Azimuth in degrees. 0-360. 0=North, 90=east.
This seems common enough, especially for ground based observatories, such that at least some of these keywords should be supported.
The comment in the method is correct but the appending of codes to create the template is only done in typical axis order and not by using values of axis indices in the chunk.
For input with Chunk.naxis=3, energyAxis=1, positionAxis1=2, positionAxis2=3 (legit) the current initCutout would create px,py,ee instead of ee,px,py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.