fernando-aristizabal / gval Goto Github PK

A Python framework to evaluate geospatial datasets by comparing candidate and benchmark maps to compute agreement maps and statistics.

License: GNU General Public License v3.0

Python 90.54% Dockerfile 8.89% TeX 0.57%

earth-observation earth-science evaluation evaluation-framework geospatial geospatial-analysis remote-sensing research science statistics

gval's Introduction

gVal: Geospatial Evaluation Framework

NOTE: Development of this package has migrated to noaa-owp/gval.

gVal (pronounced "g-val") is a high-level Python framework to evaluate the geospatial skill of candidate maps to benchmarks producing agreement maps and metrics.

Architecture

Inputs maps
- Candidates and Benchmarks
  - Including metadata
- Variable name
  - ie inundation, land cover, land use, backscatter, etc
- Statistical Data Type
  - Categorical (two- and multi- class)
    - encodings for positive and negative condition values
  - Continuous
- Raster attribute table: associates names to data values
- Data format
  - GDAL compatible vector
  - GDAL compatible raster
- Cataloging standards with metadata
  - modeling parameters
  - time
  - GeoNetwork
  - STAC
- Decide on storage types and in-memory data structures
  - Deserialization methods
  - Especialy for metadata (STAC, geoparquet, geojson, etc)
Comparison Prep
- The following prep operations should be done during the comparison to avoid excessive I/O operations.
  - Check for alignment between candidate and benchmark.
    - spatial
      - CRS
      - extents (reject if no alignment is found)
      - resolution
    - temporal
    - metadata
  - Data Format Check
    - Check for vector and raster data formats
- This should be done after loading datasets
  - Homogenize
    - spatial
      - Reproject
      - Match extents
      - Resample resolutions
    - temporal
      - select temporal mis-alignment criteria (done before loading)
    - metadata
      - select rules for disagreement (done before loading)
  - Statistical Data Type Conversions
    - Pass operator functions both registered and user defined
    - Conversion Types
      - Categorical to binary
      - Continuous to categorical
      - Continuous to binary
  - Data Format Conversion
    - Convert to one consistent data format for comparison1
    - Use (colortables)[https://rasterio.readthedocs.io/en/latest/topics/color.html]?
    - Include (tags)[https://rasterio.readthedocs.io/en/latest/topics/tags.html]?
  - Metadata prep
Comparison
- Comparisons should avoid opening up the entire files to avoid excessive memory use.
- Comparisons should minimize I/O operations.
- Comparison type
  - Binary
  - Categorical
    - one vs one
    - one vs all
  - Continuous
- Metrics to use:
  - registered list per comparison type
    - handle multiple names for same metric
  - user provided
  - user ignored
- Data format of comparison
  - vector or raster
Outputs
- Decide on storage types and methods to serialize
- agreement maps
  - raster, vector, both
- metric values
  - contingency tables
  - metric values
  - dataframes

Technology Stacks

Python

Serialization, Numerical Computation, and Scheduling
- PyData Stack: Numpy, Pandas, xarray, Dask, xarray-spatial
Geospatial Components
- Vector
  - OGR, fiona, shapely, geopandas
  - zarr for collections of vector files
- Raster
  - GDAL, rasterio, xarray, rioxarray
  - STAC for collections of vector files

Road Map

Checkpoint 1: Minimum Viable Product

Checkpoint 2: Extending Functionality

Extending to include continuous data inputs and metrics.
Support discretization of continuous maps to categorical conversion
Create a survey of metrics.
- Organize in hierarchy.
- Include in tables with descriptions, math formulas, and references.

Checkpoint 3: Scaling to Catalogs of Maps

Evaluations should be scaled to accept a series of candidates and benchmarks.
- These maps should be accepted as lists of objects, file paths, or catalogs.
- Catalogs should be a data structure designed for this purpose to include experiment relevant parameters associated with each map.
  - GeoNetwork
  - STAC
Candidate and benchmark maps need to be cataloged with associated metadata values
- space, time, parameters, etc
Agreement maps and metrics should be able to inherit these metadata
- Consider meta-data problem: STAC, raster tags, database, table?
When comparing catalogs, need to address the alignment problem
- Have functions to test for candidate and benchmark map alignment across the following dimensions:
  - space (extents and resolutions)
  - time (extents and resolutions)
  - modeling parameters (ie flow rates)
  - target variable (ie extents, depths, speeds, LULC, etc)
Computing statistical significance, confidence intervals, etc of a sampling of metrics.

Checkpoint 4: Extending Functionality

Accepts vectors files (points, lines, and polygons) for candidate or benchmark maps.
- Handling raster/raster, vector/raster, raster/vector, or vector/vector comparison?
Allows for metrics to be sorted by geometries with associated parameter combinations for analysis purposes.
Multi-band raster support?
Multi-class categorical extension
Analyze contingency tables with statistics:
- StatsModels
- SciPy