GithubHelp home page GithubHelp logo

amf-check-writer's People

Contributors

agstephens avatar gapintheclouds avatar joesingo avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amf-check-writer's Issues

Get last part of integrity checking in place.

download-from-drive:

  • check correct spreadsheets produced
  • check correct worksheets produced

create-yaml-checks:

  • check correct yaml checks produced

create-cvs:

  • check correct JSON CVs produced
  • check correct pyessv CVs produced

Looks like global attribute checks are duplicated

See:

$ diff /vagrant-share/amf-compliance-checker-work-2021/check-data-2021-08-27/yaml_checks/v2.0/AMF_global_attrs.yml  /vagrant-share/amf-compliance-checker-work-2021/check-data-2021-08-27/yaml_checks/v2.0/AMF_product_common_global-attributes_land.yml
1c1
< suite_name: global_attrs_checks:2.0
---
> suite_name: product_common_global-attributes_land_checks:2.0
3c3
< description: Check 'global_attrs' in AMF files
---
> description: Check 'product common global-attributes land' in AMF files

They are the same, but both are referenced in the main check files (land, sea, air and trajectory):

$ grep attr yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_{land,sea,air,trajectory}.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_product_common_global-attributes_land.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_product_common_global-attributes_sea.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_product_common_global-attributes_air.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_trajectory.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_trajectory.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-att

We need to work out which one should be favoured.

Integer global attr check fails due to regex check against a non-string type

The spreadsheets define a possible type as: "Integer"

However, the regex checks expect everything to be a string. This causes a warning when running the test because the check tries to perform a regular expression match against a non-string type.

The simplest fix is just to patch compliance-check-lib so that it will convert all global attrs to strings if doing a regex check:

(amf)$ cd compliance-check-lib/
(amf)$ git diff
diff --git a/checklib/code/nc_util.py b/checklib/code/nc_util.py
index 66553b2..41672e6 100644
--- a/checklib/code/nc_util.py
+++ b/checklib/code/nc_util.py
@@ -77,7 +77,9 @@ def check_global_attr_against_regex(ds, attr, regex):
     """
     if attr not in ds.ncattrs():
         return 0
-    if not re.match("^{}$".format(regex), getattr(ds, attr), re.DOTALL):
+
+    # Always coerce the attribute to a string to do the regex check
+    if not re.match("^{}$".format(regex), str(getattr(ds, attr)), re.DOTALL):
         return 1
     # Success
     return 2

NOTE: The test code for seeing if you get a warning is:

export DATA_DIR=check-data-2021-09-15
VERSION=v2.0
export PYESSV_ARCHIVE_HOME=$DATA_DIR/$VERSION/pyessv-vocabs
TEST_FILE=../NCAS-Data-Project-Training-Data/Data/ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc
amf-checker --yaml-dir $DATA_DIR/$VERSION/checks $TEST_FILE --version $VERSION

Rule for "Exact match in vocabulary" - develop and test

A new compliance checking rule for global attributes called "Exact match in vocabulary" needs to check whether the corresponding issue exists in the corresponding volcabulary.

Eg. In the 'global_attributes' sheet in the file '_common.xlsx', the check called 'source' needs an "Exact match in vocabulary" from the list contained in the 'Descriptor' column of the 'ncas_instrument_name_and_descriptors' (or in this case possibly also 'community-instrument-name-and-descriptors') sheet in the file '_vocabularies.xlsx'.

The file I've been testing with is the training file 'ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc'. For this file the source global attribute needs to equal "NCAS Mechanicle Anemometer unit 1" as given in cell L37.

Cannot use relative path for spreadsheets dir in create-cvs

Something is going wrong when using a relative path for spreadsheets dir with create-cvs:

(venv) vagrant@localhost:/vagrant/AMF_CVs$ create-cvs spreadsheets ./AMF_CVs
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Air.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Land.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Sea.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Air.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Land.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Sea.tsv'

It is looking under spreadsheets/spreadsheets instead of just spreadsheets. Using absolute path works as expected.

This may also affect create-yaml-checks.

Get amf-checker installed and working

Get amf-checker working. Stages are:

  • Get Python 3.7 running on Linux
  • Set up virtual environment for project
  • Fork amf-checker to your github
  • Clone your amf-checker
  • Modify requirements.txt to non-specific versions
  • pip install -r requirements.txt - to install all dependencies.
  • pip install --editable . --no-deps - to install local package as a link so that any local edits are reflected when you run.

Can you now run command-line script: download-from-drive?

Get our plugin working with ioos-compliance-checker.

Once we have a reproducible installation for the compliance-checker, we want to check that our plugin works with it.

The repository is here:

https://github.com/cedadev/cc-yaml

See README for instructions to getting it started.

The measure of success is whether the command-line setup works okay - i.e. the cchecker.py script is accepting our new command-line option:

--yaml <path-to-YAML-file>

Test pyessv with our old vocabularies

@gapintheclouds Assuming you have your base environment activated. Try:

git clone https://github.com/ES-DOC/pyessv

export PYESSV_ARCHIVE_HOME=<YOUR_WORKING_DIR>/compliance-check-lib/cc-vocab-cache/pyessv-archive-eg-cvs

Then save this in a script (test-pyessv.py) and see if it runs okay with python test-pyessv.py:

import pyessv

# Authority: ncas.
ncas = pyessv.NCAS
assert isinstance(ncas, pyessv.Authority)

print('ncas/amf/flux-components-variable/air-pressure')

amf = ncas.amf
assert isinstance(amf, pyessv.Scope)

fcv = amf.flux_components_variable
assert isinstance(fcv, pyessv.Collection)

air_pressure = fcv.air_pressure
assert isinstance(air_pressure, pyessv.Term)

assert pyessv.load('ncas:amf:flux-components-variable') == fcv

assert pyessv.load('ncas:amf:flux-components-variable:air-pressure') == air_pressure

It seems to work fine for me.

Get all tests working and add more for: compliance-check-lib

The tests need checking and updating.

  • In particular we need to get this working...
cd compliance-check-lib/
python -m pytest tests
  • Update the test data to test the source attribute when comparing to multiple vocab lists (NCAS and community instruments - accessed via data:description properties.
  • Add more tests as examples of specific kinds of checks that we have been working on. The tests should have enough annotation (i.e. comments) to guide somebody creating YAML checks in how to write them for a specific check class. So each possible option should be included as an example in the tests.

Tagging: @gapintheclouds

Invalid rows now show a WARNING - need to resolve in google drive

I have added a warning line in the output (to stderr) when the parser cannot understand a row (normally because the compliance checking rule is not defined yet). We can use these to retrospectively fix the spreadsheets on the google drive.

I will push to master.

Support and document different installation and usage modes

We are dependent on the IOOS compliance checker. It has a number of its own dependencies. We need a standard python environment to install it into.

Tasks:

  1. Start at current version: 4.3.3
  2. Try: Python3.8 (we might have to go back to Python3.7 if there are problems).
  3. Let's go for installing with Conda following recipes below.

Installation recipes:

Check spreadsheet compliance check rules for data product in v2.0

Updates to spreadsheets (progress):

Three of them still need attention. @barbarabrooks - please can you take a look at those 3.

Use of `<>` brackets in variable names

Currently the voc-concentration.xlsx spreadsheet contains the variable name mole_fraction_of_<voc_species_name>_in_air. This needs to be addressed by adding a system that can cope with this syntax.

The structure of the spreadsheets changed with version 3

VERSION 3.0.0 is different!!!

but this is only needs looking at when it is officially released, currently labelled as "WORKING"

https://drive.google.com/drive/u/1/folders/1GNfifCvctYJgTjUoBjfkFKx9yR_ddMjv

Data Products / v3.0.0 /
 comm & vocab/
   _common
   _vocabularies
 timeSeries/
   <instr sub-type>/
     <product>
 timeSeriesProfile/
   <instr sub-type>/
     <product>
 trajectory/
   <instr sub-type>/
     <product>

** We hope that we can just identify the product spreadsheets and
write them to the same flattened structure used in v1 and v2! **

We need to make sure that the spreadsheet scanner can walk the directory structure and download all the spreadsheets that it finds.

Migrate away from using git submodule (use "roocs" approach)

Git submodule does behave exactly as we would expect. In particular, the main github repo that holds the submodule, keeps track of the specific commit point on the submodule timeline and binds permanently to that unless you manage it. In order to manage it you might need to:

cd submod/
git checkout master # or other point
cd ../  # back to main repo
git add submod  # to tell it you want to update commit that is used in submodule
git commit -m 'Updated submodule commit point'
git push

See details here:

https://intellipaat.com/community/9971/git-update-submodule-to-latest-commit-on-origin#:~:text=The%20git%20submodule%20update%20command,this%20directly%20within%20the%20submodules.

Maybe it would be easier for us to avoid using submodules. Not sure at the moment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.