GithubHelp home page GithubHelp logo

cfradial's Introduction

CfRadial

NetCDF CF Conventions for RADAR and LIDAR data in polar coordinates.

Overview

[Overview document] (./docs/CfRadialOverview.20170201.pdf)

[AGU poster 2016] (./presentations/AGU_2016.poster_IN23A-1761.cfradial.pdf)

[Earthcube workshop] (./reports/Workshop.20160525.md) |

Timeline - CfRadial Version 1 - Classic model

Date Activity
2016/08/01 [Version 1.4] (./current_docs/CfRadialDoc.v1.4.20160801.pdf)
2013/07/01 [Version 1.3] (./old_docs/CfRadialDoc.v1.3.20130701.pdf)
2011/06/07 [Version 1.2] (./old_docs/CfRadialDoc.v1.2.20110607.pdf)
2011/02/15 [Version 1.1] (./old_docs/CfRadialDoc.v1.1.20110215.pdf)
2010 [Initial submission to CF] (http://cf-trac.llnl.gov/trac/ticket/59)

Timeline - CfRadial Version 2 - NetCDF4 with Groups

Date Activity
2018 [Version 2.0 ] (./current_docs/CfRadialDoc-v2.0-20180430.pdf)
2017 [Version 2.0 draft] (./current_docs/CfRadialDoc.v2.0.draft.20170308.pdf)

Standard Names for CfRadial

Date Activity
2018 [Version 2.0] (./current_docs/CfRadialStandardNames.20180615.pdf)
2017 [Version 1.0] (./old_docs/CfRadialStandardNames.20171222.pdf)

This page is published at:

https://ncar.github.io/CfRadial

cfradial's People

Contributors

mike-dixon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cfradial's Issues

Need to clarify NetCDF type for string attributes

The current CfRadial 2 draft (2019-02-03) specifies the type of attributes as either string, int, float, double, string[], or array of same type as field data.

For most of these types the mapping to a concrete netcdf data type is obvious:

  • int --> NC_INT
  • float --> NC_FLOAT
  • double --> NC_DOUBLE

For the string attributes things are a little more complicated due to there being two functions for writing string based attributes in the NetCDF API: nc_put_att_text and nc_put_att_string.

The nc_put_att_text function writes the attribute as a 1D array of NC_CHAR. This is the traditional and most common way to write a scalar string attribute.

The nc_put_att_string function writes the attribute as a NC_STRING. This API allows us to output arrays of strings, and is thus the only option for our string[] attributes.

These two methods of writing strings result in fundamentally different types in the output file. The difference is visible in ncdump output:

  	short KDP(time, range) ;
  		string KDP:ancillary_variables = "foo" ;
  		KDP:legend_xml = "bar" ;

Here the ancillary_variables attribute was written with nc_put_att_string (with a single string passed), while legend_xml was written with nc_put_att_text.

We probably need to clarify that string in the CfRadial 2 specification maps to the traditional array of NC_CHAR as output by nc_put_att_text, while string[] maps to an array of the NC_STRING type as output by nc_put_att_string.

Issues with range coordinate variable in sweep group

A couple of minor issues with the range coordinate variable specification in section 5.2.2 (page 25):

  1. The long_name attribute value is styled as a standard_name. Long names are intended to be human readable (suitable for plot / axis titles). Suggest we change it to "Range to measurement volume".
  2. The spacing_is_constant, meters _to_center_of_first_gate and meters_between_gates are technically redundant since this data can be obtained from the array itself. I'm worried about what happens if these values disagree with the array. Do we really need them?

No way to enumerate field datasets within a sweep group

The variables containing moment data ('field variables') are stored directly within the group of the containing sweep. Other variables containing metadata related to the sweep, rays or bins are also stored directly within the sweep group. For example, sweep_number, azimuth, range and pulse_width.

As a result of this, there is no easy way to discover or loop through all moments within a sweep.

All other levels of the data model support discovery / enumeration of child objects:

  • Sweeps can be discovered via the sweep_group_names global variable
  • Spectrum datasets can be discovered via the spectrum_group_names variable within each sweep

To support discovery of moments within a sweep a new variable we could reuse the mechanism established above:

  • A new dimension called field added to the sweep group. Dimension should be required, and should also be an unlimited dimension to allow later insertion of new fields.
  • A new variable called field_names added to the sweep group based on the field dimension. This variables will be a string type and will contain the names of each moment/field variable in the sweep.

Specification of latitude, longitude and altitude variables

The site latitude, longitude and altitude variables don't currently specify the ellipsoid they reference.

I'm particularly worried about the altitude variable. The comment states that the value is 'above mean sea level'. In my experience the values for lat/lon/alt used in site metadata are extracted from GPS instrumentation, generally as WGS84 coordinates. There can be up to 100m difference between WGS84 and MSL heights (according to Wikipedia).

Questions:

  1. Should we specify all lat/lon/alt values in the standard are in relation to a reference ellipsoid?
  2. Should we nominate a fixed reference ellipsoid (WGS84) or allow the user to nominate one through metadata?
  3. Should we change the description of altitude from 'above mean sea level' to 'above the reference ellipsoid'?

Sweep time coordinate relative to time_coverage_start may be problematic

The time coordinate variable for a sweep is currently described as "Time at center of each ray, in fractional seconds since time_coverage_start". I think this specification may cause problems for systems that need to send data on a sweep-by-sweep basis.

A very common use case (for me) is to send a volume as a series of single sweep files. This allows early viewing of the data while the volume is still being scanned. After all sweeps are received, the individual files are combined into a single volume.

By specifying the ray times as relative to time_coverage_start, and time_coverage_start as the time of the first ray in the file this means that the times in each sweep file will be relative to a different point. When combining sweeps into a single volume file it would be necessary to rewrite all of the time coordinates to be relative to the time_coverage_start of the earliest sweep.

I would like to propose that we remove the "in fractional seconds since time_coverage_start" part of the description for this variable in section 5.2 (page 24). The variable already has the units well specified as 'seconds since YYYY-MM-DD hh:mm:ss'. The user should be free to use any appropriate timestamp in this unit string rather than being forced to use the time corresponding to time_coverage_start.

In the sweep-by-sweep scenario above this would allow files to be combined without forcing a translation of all time coordinates to be relative to the earliest sweep.

Unnecessary `calib_index` variable in `radar_calibration` sub-group

The table in section 7.3.2 (page 41) lists variables for the radar_calibration subgroup. The first entry in this table is calib_index which is incorrect.

The calib_index variable is part of the per-ray metadata and is used to identify the calibration used for any given ray. It is (correctly) specified as part of the sweep group metadata under section 5.3 (page 27). It is also listed using the dimension time, which is impossible since the time dimensions are scoped to each sweep, while radar_calibration is scoped to the volume.

We should delete the calib_index row from the table in section 7.3.2.

Type of time_coverage_start and time_coverage_end

The time_coverage_start and time_coverage_end variables are currently specified as strings, which is not very "CF" like. The standard way of representing a time is as a numerical offset from an epoch.

I would like to change the type of these variables from string to double and specify them as times in the traditional CF manner. This allows the variables to be used with existing functions for dealing with times in NetCDF files.

The new specification could be as type double, with units seconds since YYYY-MM-DD HH:MM:SS UTC. Where the exact YYYY-MM-DD HH:MM:SS time is up to the user.

As part of this change I would also like to remove the comment that the 'T' character in the ISO8601 spec is optional and can be replaced by any character. As far as I can tell the character may only be 'T' or whitespace.

The duplication of these variables as attributes is also a bit awkward. When writing an API to deal with the file, should you read the attribute or the variable? What should happen when they don't agree? I think this was originally done for convenience of users, but I'm not sure it's necessary. If using the standard time representation above, the value can easily be viewed by users with 'ncdump -t'.

To summarize:

  1. Can we change time_coverage_start and time_coverage_end from string to double with the standard CF time units?
  2. Can we remove the comment that the date/time separator 'T' can be any character? It should be 'T' or a space only.
  3. Can we eliminate the duplicated attributes which introduce the possibility of self-inconsistency in the file by having different variable and attribute values?

Minimise use of XML

There are three places where we specify the use of XML packed into strings. These are status_xml, thresholding_xml and legend_xml. Having XML packed into strings introduces an extra language that must be parsed by users and makes the format more complex that needed.

It would be nice to remove XML entirely and just rely on the built in metadata functionality already provided by NetCDF. Here's my thoughts:

status_xml:

  • Rename this variable to status
  • Remove the reference to XML from the description
  • The contents of this field is free for the user to specify, so why limit them to XML? This allows them to store BiTE messages, or JSON, YAML, or whatever is appropriate for their system.

legend_xml:

  • Replace with two attributes legend_values and legend_labels
  • The legend_values is an integer array int[] and stores the values for each entry
  • The legend_labels is a string array string[] and stores the labels for each entry

thresholding_xml:

  • One idea would be to use a similar approach to the suggestion above but with threshold_field, threshold_min and threshold_max. Each would be an array which give the name of the field used to threshold this field along with the min/max limits.
  • Unfortunately this doesn't cater for complex conditions like the one mentioned in the note field of the XML example ("NCP only checked if DBZ > 40"). On the other hand, since this note is stored as a comment it's also not well represented in the XML (and is using an illegal '>' character).
  • Another idea would be to specify the thresholding attribute as a simple string. The example would encourage using basic mathematical syntax to combine values from the fields. So the value from the example would be "SNR > -3.0 and (DBZ < 40 or NCP > 0.15)".

Inconsistency in prefixing of metadata with `radar_` and `lidar_`

We have several optional groups that are specified at the global level as a way to cater for radar or lidar specific metadata. These are radar_parameters, lidar_parameters, radar_calibration, and lidar_calibration.

All of the variables specified under radar_parameters are prefixed with radar_. For example radar_antenna_gain. However under radar_calibration we do not prefix variable names, for example receiver_gain_hc.

Since the variables stored within these groups are already known to relate to radar or lidar by virtual of the group name, the use of a radar_ prefix is redundant.

Can we remove the radar_ and lidar_ prefixes from the variable names in the two parameter tables (sections 7.1 and 7.2)?

Move `frequency` variable into `radar_calibration` subgroup?

We currently have a frequency variable defined at the global level. To support the rare case that multiple frequencies are used this variable is defined as a 1D array, which means we are also forced to have a dedicated frequency dimension.

For the 99% of users having frequency be an array is an annoying detail to trip over. Since this variable is not linked to anything else, we also provide no way for users to determine what data/metadata is associated with each frequency.

I would like to propose moving frequency from the global level into the radar_calibration subgroup. This has the advantage of no longer requiring a dedicated dimension to support a single variable. It also unambiguously associates the other calibration parameters with a particular frequency when multiple are used.

For dual-wavelength radars where the same geometry is scanned using two frequencies simultaneously the suggested approach would be to store the data from each wavelength in a separate sweep. The geometry of the two sweeps would be identical. Something like:

group: sweep_0.5_wl1:
   dimensions:
    time = 360;
    range = 512;
  variables:
    double time(time);
    double range(range);
    calib_index(time);       # all values set to 0
    byte DBZH(time, range);  # reflectivity from frequency 1
group: sweep_0.5_wl2:
   dimensions:
    time = 360;
    range = 512;
  variables:
    double time(time);       # exact same values as the wl1 sweep
    double range(range);     # exact same values as the wl2 sweep
    calib_index(time);       # all values set to 1
    byte DBZH(time, range);  # reflectivity from frequency 2
group: radar_calibration
  dimensions:
    calib = 2;
  variables:
    float frequency(calib);  # lists the frequencies

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.