GithubHelp home page GithubHelp logo

opendatacube / odc-geo Goto Github PK

View Code? Open in Web Editor NEW
53.0 13.0 6.0 1.5 MB

GeoBox and geometry utilities extracted from datacube-core

Home Page: https://odc-geo.readthedocs.io/en/latest/

License: Apache License 2.0

Python 99.88% CSS 0.12%

odc-geo's Introduction

odc.geo

Documentation Status Test Status Test Coverage Conda Version

This library combines geometry shape classes from shapely with CRS from pyproj to provide projection aware Geometry. It exposes all the functionality provided by shapely modules, but will refuse operations between geometries defined in different projections. Geometries can be brought into a common projection with Geometry.to_crs method.

Based on that foundation a number of data types and utilities useful for working with geospatial metadata are implemented. Of particular importance is GeoBox. It is an abstraction for a geo-registered bounded pixel plane where a linear mapping from pixel coordinates to the real world is defined.

To make working with geo-registered raster data easier an integration with xarray is provided. Importing odc.geo.xr enables the .odc. accessor on every xarray.Dataset and xarray.DataArray. This exposes geospatial information of a raster loaded with Open Datacube or rioxarray. Methods for attaching geospatial information to xarray objects in a robust way are also provided. Geospatial information attached in this way survives most operations you might do on the data: basic mathematical operations, type conversions, cropping, serialization to most formats like zarr, netcdf, GeoTIFF.

For more details see Documentation.

Map with GeoBoxes

Origins

This repository contains geometry related code extracted from Open Datacube.

For details and motivation see ODC-EP-06 enhancement proposal.

odc-geo's People

Contributors

alex-ip avatar alexgleith avatar andrewdhicks avatar ceholden avatar cronosnull avatar emmaai avatar gfkeith avatar gypsybojangles avatar jdh-ama avatar jeremyh avatar kirill888 avatar omad avatar petewa avatar pre-commit-ci[bot] avatar richardscottoz avatar roarmstrong avatar robbibt avatar rtaib avatar simonaoliver avatar snowman2 avatar spacemanpaul avatar uchchwhash avatar v0lat1le avatar valpesendorfer avatar woodcockr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

odc-geo's Issues

Add a `.odc.explore()` method for automatically plotting raster data on an interactive map

Geopandas recently added an extremely useful .explore() method that allows users to quickly plot and inspect data on an interactive map: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html

Being able to quickly plot and inspect raster data on an interactive map would be incredibly useful, and would avoid a common workflow of having to export to GeoTIFF and load data into a GIS just to inspect it closely.

Luckily with odc-geo we can already do something like this:

import folium
import rioxarray
import odc.geo.xr 

ds = rioxarray.open_rasterio(
    "https://dea-public-data-dev.s3-ap-southeast-2.amazonaws.com/derivative/dea_intertidal/sample_data/160/2019-2021/DEV_160_2019_2021_elevation.tif",
    masked=True,
).squeeze("band")

# Create folium Map
m = folium.Map()

# Reproject to 3857
ds_reproj = ds.odc.reproject("epsg:3857")

# Add to map
ds_reproj.odc.add_to(m)

# Zoom map to bounds
m.fit_bounds(ds.odc.map_bounds())
display(m)

But this adds a lot of boilerplate code that makes this harder than it needs to be.

Proposal
Instead, we could add a new .odc.explore() method to odc-geo that mimics the .explore() functionality of Geopandas. In my mind, calling ds.odc.explore() would automatically:

  • Create a Folium or Ipyleaflet map
  • Reproject data to "epsg:3857"
  • Add the reprojected data to the map
  • Fit the map bounds to the data
  • Display the map

I'd be keen to have a go at adding something like this, but I'd love to know if there's any obvious issues with this proposal. One thing I can think of is that neither folium or ipyleaflet are currently included as a requirement for odc-geo (I imagine to keep it as small and manageable as possible), so this may be a blocker to adding a feature that depends on those packages?

Feature Proposal: GCPGeoBox

Some imagery do not use projected coordinates and instead provide a list of Ground Control Points linking a set of pixels to world coordinates in lon/lat or some projection. A popular example of such imagery is Sentinel-1 data. It would be useful to be able to attach such information to xarray data. This can be achieved in a similar way we currently support rotated geoboxes #28.

  • Record pixel coordinates of the original imagery in the x,y coordinate values
  • Record CRS of GCPs in the spatial_ref coordinate
  • Return GCPGeoBox from .odc.geobox property

GCPGeoBox would expose similar interface to GeoBox, things like slicing, querying of extents. Everything but .{affine,transform} should be possible, and even that can be approximated, although I would not expose such functionality under the same name to avoid accidental usage.

GCPGeoBox could then be used by .odc.reproject on input and output as this is supported by rasterio/gdal, allowing native pixel processing on data like Sentinel-1. Not sure about saving to COG as I'm unclear about support on the write side in rasterio (reading GCPs is supported by rasterio).

"name" param from `.odc.add_to()` not working as expected

Expected result
The .odc.add_to() method has a "name" parameter that I expected would be used to name output layers in the LayerControl widget of a Folium map. For example:

import folium
m = folium.Map(control_scale=True)
ds.elevation.odc.add_to(m, name="elevation")
ds.elevation_uncertainty.odc.add_to(m, name="elevation_uncertainty")

# Zoom map
m.fit_bounds(ds.odc.map_bounds())
folium.LayerControl().add_to(m)
display(m)

Actual result
Layers in the LayerControl dropdown are instead named "macro_element". This makes it difficult to properly visualise layers using odc.geo.
image

odc-geo version: '0.4.0'
Foium version: '0.14.0'

GridSpec.tiles produces multiple tiles unexpected

Test code:

from odc.geo.gridspec import GridSpec
from odc.geo import BoundingBox
bb = BoundingBox(left=1200000.0, bottom=-3720000.0, right=1320000.0, top=-3600000.0, crs='EPSG:3577')
grid_spec = GridSpec(crs='EPSG:3577',
                     tile_shape=(4000, 4000),
                     resolution=30)
idx_bounds = grid_spec.idx_bounds(bb)
gbox = [gbox for _, gbox in grid_spec.tiles(bb)]

Outputs

idx_bounds

BoundingBox(left=10, bottom=-31, right=11, top=-30, crs=CRS('EPSG:3577'))

gbox

[GeoBox((4000, 4000), Affine(30.0, 0.0, 1200000.0,
        0.0, -30.0, -3600000.0), CRS('EPSG:3577')),
 GeoBox((4000, 4000), Affine(30.0, 0.0, 1320000.0,
        0.0, -30.0, -3600000.0), CRS('EPSG:3577')),
 GeoBox((4000, 4000), Affine(30.0, 0.0, 1200000.0,
        0.0, -30.0, -3480000.0), CRS('EPSG:3577')),
 GeoBox((4000, 4000), Affine(30.0, 0.0, 1320000.0,
        0.0, -30.0, -3480000.0), CRS('EPSG:3577'))]

Expected outputs

[GeoBox((4000, 4000), Affine(30.0, 0.0, 1200000.0,
        0.0, -30.0, -3600000.0), CRS('EPSG:3577'))]

The cause:

odc-geo/odc/geo/gridspec.py

Lines 198 to 201 in b59a87c

for iy in range(iy1, iy2 + 1):
for ix in range(ix1, ix2 + 1):
tile_index = (ix, iy)
yield tile_index, geobox(tile_index)

The right and top boundary idx was forced +1 as shown above.

The question:

Is it a new convention that BoundingBox upper bound (bottom right) close instead of open now?

Expand visualization capabilities

  • [DONE] Output folium or ipyleaflet maps when libraries are available when displaying geometries/geoboxes/bounding boxes.
  • Provide GeoJSON outputs for more types
  • Conversion to geopandas when available

Implement robust cmap plotting

xarray's plotting methods include a robust=True flag to set vmin/vmax to the 2nd and 98th percentile, respectively ds.plot.imshow(robust=True). This ensures the colouramp doesn't overly account for outliers in the data. Implementing the same feature in ds.odc.add_to(robust=True) would be very handy.

i.e. The equivalent of this:

vmin, vmax = np.nanpercentile(xx.band.data, [2, 98])
xx.band.odc.add_to(vmin=vmin, vmax=vmax)

Setup testing for Python 3.10

Currently odc-geo is tested in one single configuration: python 3.8 with optional dependencies installed to maximize test coverage. We should be testing things with minimal dependencies and with other versions of Python.

0.3.1 broke compatibility with Python 3.10

This is somewhat confusing, but I think Python 3.10 deprecated classes that are generic within collections (Collections Abstract Base Classes): https://docs.python.org/3/whatsnew/3.10.html#collections-abc.

That means 2b32ed6 and your latest release results in the following error when using Python 3.10:

...
File "/usr/local/lib/python3.10/site-packages/odc/geo/crs.py", line 53, in _make_crs_key
    if isinstance(crs_spec, collections.Hashable):
AttributeError: module 'collections' has no attribute 'Hashable'

Support geobox extraction from rioxarrays loaded from non-rectilinear sources

Geobox from xarrays produced by rioxarray from non-rectilinear sources can not be currently extracted.

#!wget https://github.com/corteva/rioxarray/raw/master/test/test_data/input/2d_test.tif
import rioxarray
import odc.geo.xr
from odc.geo.geobox import GeoBox

xx = rioxarray.open_rasterio("2d_test.tif")
display(xx.spatial_ref.attrs)
gbox_rio = GeoBox(xx.rio.shape, xx.rio.transform(), xx.rio.crs)
display(gbox_rio)
assert xx.odc.crs == xx.rio.crs
assert xx.odc.geobox == gbox_rio # Fails here

with the current code, xx.odc.geobox is None, it should not be. We should detect presence of GeoTransform attribute on the spatial_ref coordinate and use that to construct a geobox instead of the normal way of computing rectilinear transform from x/y coordinates that might be absent (when using parse_coordinates=False option in rioxarray).

Dask colorize vmin vmax error

this failed

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_19782/44836221.py in <cell line: 1>()
----> 1 newcheckdask = odc.geo.xr.colorize(newcheck['test'], cmap=newcmp,vmin=1,vmax=240)

~/exploracornzarr/lib/python3.8/site-packages/odc/geo/_rgba.py in colorize(x, cmap, attrs, clip, vmin, vmax)
    240 
    241         assert x.chunks is not None
--> 242         data = da.map_blocks(
    243             _impl,
    244             x.data,

~/exploracornzarr/lib/python3.8/site-packages/dask/array/core.py in map_blocks(func, name, token, dtype, chunks, drop_axis, new_axis, meta, *args, **kwargs)
    732         adjust_chunks = None
    733 
--> 734     out = blockwise(
    735         func,
    736         out_ind,

~/exploracornzarr/lib/python3.8/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs)
    274         from .utils import compute_meta
    275 
--> 276         meta = compute_meta(func, dtype, *args[::2], **kwargs)
    277     return new_da_object(graph, out, chunks, meta=meta, dtype=dtype)
    278 

~/exploracornzarr/lib/python3.8/site-packages/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs)
    158                 return None
    159 
--> 160         if _dtype and getattr(meta, "dtype", None) != _dtype:
    161             with contextlib.suppress(AttributeError):
    162                 meta = meta.astype(_dtype)

~/exploracornzarr/lib/python3.8/site-packages/dask/delayed.py in __bool__(self)
    588 
    589     def __bool__(self):
--> 590         raise TypeError("Truth of Delayed objects is not supported")
    591 
    592     __nonzero__ = __bool__

TypeError: Truth of Delayed objects is not supported

this works - but as your docs suggest, gives strange results

newcheckdask = odc.geo.xr.colorize(newcheck['test'], cmap=newcmp,vmin=1,vmax=240)

Have I done something wrong, or is there a bug?

Tweak CRS hashing approach

As reported in datacube repo opendatacube/datacube-core#1230 CRS hash function can be slow due to slowness of extracting wkt string from the pyproj CRS object. Since we are already caching epsg code we might as well use epsg code for hash when it is available and only fallback to wkt when epsg is not set. Caching wkt string itself is also not a bad idea.

`GeoBox.zoom_to` with a resolution

I've been trying to use odc-geo to define the target grid for regridding with xesmf. This works great so far, but one of the features I've been missing so far is creating a Dataset from the GeoBox.

The naive version I've been using so far

def to_dataset(geobox):
    def to_variable(dim, coord):
        data = coord.values
        units = coord.units
        resolution = coord.resolution
        if resolution < 0:
            data = data[::-1]
            resolution *= -1
        return xr.Variable(
            dims=dim, data=data, attrs={"units": units, "resolution": resolution}
        )

    coords = {
        name: to_variable(name, odc_coord)
        for name, odc_coord in geobox.coordinates.items()
    }
    return xr.Dataset(coords=coords)

which obviously lacks support for affine transformations, assumes 1D coordinates, and the resolution trick should probably configurable (and off by default?)

Did I miss anything? Do you know of a better way to do this? If not, would you be open to adding something like this to odc.geo.xr?

Make `.odc.add_to()` use input array's `.name` attribute if available

The .odc.add_to() method allows users to specify a name that is used to name layers in Folium and Ipyleaflet Layer Controls. If name is left to its default None, then we get either an awkard auto-generated layer name (Folium):
image

Or no layer name (Ipyleaflet):
image

Xarray datasets can however commonly include a .name attribute, for example here:
image

It would be great for the default of name=None to use this .name attribute by default if it exists.

GeoboxTiles.tiles gives wrong results for global inputs

GeoboxTiles.tiles takes in a geometry in any projection and returns an iterator over tiles that overlap with the given geometry. In order to do that, it first needs to project input shape into a CRS native to GeoboxTiles. This breaks down when source shape covers area way outside of the valid region of the destination CRS, as a result instead of returning all the tiles nothing is returned at all. Essentially this step Geometry@crs1 -> BoundingBox(Geometry@crs2) produces wrong results, this is especially common when source image has "global coverage" and destination CRS has a limited range like epsg:3577 for example.

What's needed is a robust "bounding box of a geometry in some other projection method", that does not slow down the common case where source shape is in a safe region for coordinate transformation. Ideally we would also have a "robust to_crs method for polygons" that would safely clip sections of the source geometry that are outside of the valid range of transform.

Use `GeoTransform` to get resolution for GeoBox?

Related to: opendatacube/datacube-core#752

Currently the resolution inside of the coordinate attributes are used:

https://github.com/opendatacube/datacube-core/blob/05093b75a8f15b643c7047502ae27b662af2d9b4/datacube/utils/xarray_geoextensions.py#L133-L139

GeoTransform is a property used by GDAL to store this information (https://gdal.org/drivers/raster/netcdf.html#georeference)

Here is an example of reading in this property:
https://github.com/corteva/rioxarray/blob/f658d5f829a88204b2ec7b239735aba31694ed0a/rioxarray/rioxarray.py#L540-L545

Thoughts about supporting this as well?

Configure GDAL warp with known (X|Y)SCALE

There are undocumented parameters in GDAL warp pipeline XSCALE= and YSCALE= that allow user to override scale change estimate that GDAL itself computes from source/destination image sizes. This estimate of scale change across X/Y axis is then used by Linear/Cubic/Lanczos interpolation methods leading to blurriness in the output image

Verify thread-safety of CRS caching approach

We are caching pyproj.CRS objects here:

odc-geo/odc/geo/_crs.py

Lines 31 to 32 in 202cd8a

@cachetools.cached(_crs_cache, key=_make_crs_key)
def _make_crs(crs: Union[str, _CRS]) -> Tuple[_CRS, str, Optional[int]]:

And pyproj transformers here:

odc-geo/odc/geo/_crs.py

Lines 47 to 48 in 202cd8a

@cachetools.cached({}, key=_make_crs_transform_key)
def _make_crs_transform(from_crs, to_crs, always_xy):

  • What that means for multi-threaded access?
  • Should we use lock when populating cache?
  • Should we use thread local cache instead of locking?
  • What are the constraints of multi-threaded access in pyproj/PROJ itself
    • currently we are assuming that sharing CRS objects across threads is fine?
  • We should also add purging rules to those caches, or at least expose manual purge option.
  • We should understand cost of caching in terms of RAM, especially for transformers cache.

Port xr_reproject from odc.algo

Most of the low-level utilities needed for Dask-backed reprojection are already in odc-geo. This is mostly

class GeoboxTiles:

and also:

class ReprojectInfo:

Expected interface:

xx = dc.load(.., dask_chunks= {}) # or any other supported load backend

# automatically choose resolution and bounding box, align pixel edges to 0
# automatically choose chunk size
yy = xx.odc.to_crs("epsg:3857")

# fully defined destination pixel plane
# configurable destination chunking
yy = xx.odc.reproject(GeoBox.from_bbox(..), chunks={'x': 2048, 'y': 4096})
  • Support Dask and non-Dask inputs
  • Support xarray and dask.dataarray
  • Support xarray Datasets as well as DataArray
  • Support reasonable automatic resolution, bbox and chunking determination
  • Support any number of leading dimensions as well as optional interleaved band dimension: ..., y, x[,band]

Feature request: Support non-standard CRS in mapping functions

I would like to plot EPSG:3031 data using something like data.odc.add_to(map).

I have been experimenting with solutions using something like below which currently does not work.

Others have encountered similar issues jupyter-widgets/ipyleaflet#550 though without the same projection requirements.

For an example input we can use a sea ice concentration image

from ipyleaflet import Map, basemaps, ImageOverlay
import folium
import xarray as xr
import rioxarray as rio
from numpy.random import uniform
from odc.geo.data import country_geom
from odc.geo.xr import rasterize
from osgeo import gdal,ogr,osr

def GetExtent(ds):
    """ Return list of corner coordinates from a gdal Dataset """
    xmin, xpixel, _, ymax, _, ypixel = ds.GetGeoTransform()
    width, height = ds.RasterXSize, ds.RasterYSize
    xmax = xmin + width * xpixel
    ymin = ymax + height * ypixel

    return (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)

path = 'S_201905_concentration_v3.0.tif'

src = gdal.Open(path)

bounds = GetExtent(src)

spsLayout=Layout(width='800px', height='800px')

m = Map(basemap=basemaps.NASAGIBS.BlueMarble3031, center=(-90, 0), zoom=1, crs=projections.EPSG3031, layout=spsLayout)

image = ImageOverlay(
    url=path,
    bounds=(bounds[1], bounds[3])
)

m.add_layer(image)

m

Feature request: reprojection for `xr.Dataset` as well as `xr.DataArray`

Currently, only the odc.geo.xr.ODCExtensionDa ODC extension has a .reproject method:

https://odc-geo.readthedocs.io/en/latest/_api/odc.geo.xr.ODCExtensionDa.html#odc.geo.xr.ODCExtensionDa
https://odc-geo.readthedocs.io/en/latest/_api/odc.geo.xr.ODCExtensionDs.html#odc.geo.xr.ODCExtensionDs

However, users of ODC often load EO imagery with multiple spectral bands as a single xr.Dataset containing multiple variables.

Because of this, it would be very useful to be able to reproject an entire xr.Dataset using odc.geo so that users can transform their entire dataset in one pass, rather than having to convert each individual xr.DataArray seperately.

I believe this makes logical sense as individual variables in an xr.Dataset usually share the same overall coordinates/GeoBox. However, I can see this being problematic when a user has added additional variables to their dataset (e.g. 1D data with only time coords etc) - perhaps any variables that don't fit the GeoBox model could be skipped during the reproject (or an error raised)?

PROPOSAL: Support integer EPSG code in CRS initialization?

Related: corteva/geocube#118

odc-geo/odc/geo/crs.py

Lines 61 to 100 in 201bd5f

def __init__(self, crs_spec: Any):
"""
Construct CRS object from *something*.
:param crs_spec:
String representation of a CRS, often an EPSG code like ``'EPSG:4326'``. Can also be any
object that implements ``.to_epsg()`` or ``.to_wkt()``.
:raises: :py:class:`pyproj.exceptions.CRSError`
"""
if isinstance(crs_spec, str):
self._crs, self._str, self._epsg = _make_crs(crs_spec)
elif isinstance(crs_spec, CRS):
self._crs = crs_spec._crs
self._epsg = crs_spec._epsg
self._str = crs_spec._str
elif isinstance(crs_spec, _CRS):
self._crs, self._str, self._epsg = _make_crs(crs_spec)
elif isinstance(crs_spec, dict):
self._crs, self._str, self._epsg = _make_crs(_CRS.from_dict(crs_spec))
else:
try:
epsg = crs_spec.to_epsg()
except AttributeError:
epsg = None
if epsg is not None:
self._crs, self._str, self._epsg = _make_crs(f"EPSG:{epsg}")
return
try:
wkt = crs_spec.to_wkt()
except AttributeError:
wkt = None
if wkt is not None:
self._crs, self._str, self._epsg = _make_crs(wkt)
return
raise CRSError(
"Expect string or any object with `.to_epsg()` or `.to_wkt()` methods"
)

TYPE: missing py.typed

From mypy:

error: Skipping analyzing "odc.geo": module is installed, but missing library stubs or py.typed marker

Add a `snap_to` method to GeoBoxes to enable coercing GeoBoxes to the same pixel alignment

I have two low-resolution GeoBoxes created by "zooming out" of two existing higher resolution dataset GeoBoxes:

lowres_geobox1 = ds1.odc.geobox.zoom_out(500)
lowres_geobox2 = ds2.odc.geobox.zoom_out(500)

Although these GeoBoxes share the same resolution and CRS, they do not share the same pixel alignment, resulting in pixels that do not align correctly:
image

I would like to be able to "snap" lowres_geobox2 to the pixel alignment of lowres_geobox1, so that both of my low resolution grids align perfectly:
image

Snapping is currently supported when creating a new GeoBox directly (e.g. via GeoBox.from_bbox), however it would be useful to have a GeoBox method that would allow a user to perform snapping on existing GeoBoxes as well.

Support generic affine in xarray rasters

Motivation

Code in xx.odc.geobox assumes that xarray raster data is axis aligned. That is the off-diagonal terms of the affine matrix are zero and so entire row of pixels will have the same Y coordinate and similarly entire column same X coordinate:

y\x 0 1 2 3
0 0,0 1,0 2,0 3,0
1 0,1 1,1 2,1 3,1

This limitation means one can not use .odc.geobox on rotated images. For a lot of satellite data one swatch of observation happens on an angle, when this gets rendered into axis aligned image you end up with a large proportion of the mosaic image being empty. Supporting arbitrary affine transform could make it much easier to work with large scale mosaics for such data sources.

Possible Implementations

2-d axis

Xarray supports 2d coordinate, so one could simply compute X,Y coordinate for every pixel and store that in two 2d arrays, one for X and one for Y.

pix = img[r,c]
coord = (img.x[c], img.y[r])  # axis aligned representation
...
coord = (img.x[r,c], img.y[r,c]) # 2d-axis representation

.odc.geobox could then detect that 2d axis are used and reconstruct affine matrix from coordinates similar to current implementation but it will need to be computed from 3 points (Affine matrix has 6 degrees of freedom).

  • ๐Ÿ‘Ž Significant memory overhead (2 float32 values per pixel, so 8 bytes per pixel)
  • ๐Ÿ‘Ž This representation demands axis being Dask arrays as well.
  • ๐Ÿ‘ Same representation can be used for completely non-linear pixel to world mapping as well
  • ๐Ÿ‘ Direct support for plotting from xarray itself

Pixel coords + Affine matrix in attributes

In this scheme we keep 1-d X,Y axis, but they contain pixel coordinates and not real world coordinates, so x = [0.5, 1.5, 2.5... W-0.5], that sort of thing. We then also need to store actual affine matrix in some attributes attached to x,y axis.

In this scenario .odc.geobox needs to detect the fact that X,Y axis contain pixel coordinates rather than real world coordinates and instead extract affine matrix from attributes/encoding. Slicing should still work as we will have access to the original image coordinates which is what we need to compute world coordinates from. Need to experiment with the location and representation for affine matrix data and how that works with things like .to_netcdf|to_zarr.

  • affine format: 6 float values (string vs array vs dict)

  • attachment point: x,y axis vs crs axis vs band

  • ๐Ÿ‘ Same memory requirements as axis aligned representation

  • ๐Ÿ‘Ž xarray plotting won't display more than one dataset on the same axis properly

Reproject should prune crs attributes on output

Original crs attribute is copied from source to destination during reproject, this leads to confusion. This attribute is set by datacube and some other libraries. There could be other attributes used by different libraries, like epsg or wkt or crs_wkt, it's probably worthwhile adding "ban list" of attributes that should not be copied across from input to output during reprojection.

API: better bounding box

Currently BoundingBox is a just a named tuple of 4 floating point values with some methods attached, it does not keep track of CRS information. I think it will be more convenient and less surprising if bounding box class was aware of projection in which it is defined.

Fail in passing kwargs into gdal in reproject with dask

Issue:
_dask_rio_reproject stopped passing **kwargs into rasterio.warp and hence gdal, which would cause problem when extra options required for gdal to be working as expected, specially the transform path between lat/lon (epsg:4326) and Australian Albers (epsg:3577). Since GDA2020 was introduced, the transform path is not unique and the default option has been changing arbitrarily when upgrade on either proj or gdal happened.

Expected behaviour:
Be able to pass **kwargs into rasterio.warp

BUG: Shape/Bounds different with odc-geo compared to datacube

Currently testing transitioning from datacube to odc-geo in geocube: corteva/geocube#95

Have some scenarios in the tests where the shape is off by one:
odc-geo: (y: 100001, x: 11)
datacube: (y: 100001, x: 12)

And the boundary differs:
odc-geo: [1665945., 7018509., 1665478., 7018306.]
datacube: [1665478., 7018306., 1665945., 7018509.]

Are these changes expected?

odc.geo does not correctly calculate/load a geobox for a 1 x 1 pixel data

It appears that odc.geo does not correctly calculate a geobox for a 1 x 1 pixel data load (.odc.geobox is returned as None). If the pixel size is increased to two or more then it return the correct geometry (courtesy of @robbibt) :

assert data.odc.geobox.crs == data.geobox.crs
assert data.odc.geobox.affine == data.geobox.affine

You can replicate this bug by running this notebook:
https://bitbucket.org/geoscienceaustralia/cultivated_model/src/main/notebooks/extract_data_for_shp_custom.ipynb

The error occurs when calling collect_training_data function but it's thrown when running the following line:

https://github.com/GeoscienceAustralia/dea-notebooks/blob/c0c7e1d73576d48063ae40d345c6fb5ca7f047e8/Tools/dea_tools/spatial.py#L276

API: extend geobox slicing to support more input types

I think syntax like this should be possible and makes sense:

geobox: GeoBox  = some_geobox(...)
poly: Geometry = some_geom(...)
assert geobox.crs == poly.crs

sub_geobox = geobox[poly]
assert geobox[poly] == geobox[poly.boundingbox]

If #25 is done then we can remove same crs constraint in which case

assert geobox[poly] == geobox[poly.to_crs(geobox.crs).boundingbox]

Alternatively or in addition this could also be thought of as intersection operation and have the following syntax:

sub_geobox: GeoBox = geobox & poly
sub_geobox: GeoBox = geobox & poly.boundingbox

Returned sub_geobox will enclose parts of the supplied geometry that overlap with the original geobox in a tight fashion

assert sub_geobox.extent.contains(poly & geobox.extent)

add_to with name is not passing the name in to Folium

I'm adding a couple of xarray variables to a folium map, and it works fantastically but I can't get it to name them neatly.

Here's my code:

import folium
import odc.geo.xr

m = folium.Map(control_scale=True, tiles=None)

data.vv.odc.add_to(m, opacity=0.7, name="vv")
data.vh.odc.add_to(m, opacity=0.7, name="vh")

# Zoom map
m.fit_bounds(data.odc.map_bounds())

tile = folium.TileLayer(
    tiles = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr = 'Esri',
    name = 'Esri Satellite',
    overlay = False,
    control = True
).add_to(m)
folium.TileLayer('openstreetmap').add_to(m)

folium.LayerControl().add_to(m)
display(m)

I have the following software versions:

  • folium==0.14.0
  • odc-geo==0.4.0

And here's how the layers display:

image

Writing a tif with save_cog_with_dask writes a blank file

As mentioned, I used odc-stac to load data and create a median, then used the new dask function to write to S3.

File wrote fine, but only contains zeros.

ODC-related libraries are as follows:

odc-algo==0.2.3
odc-geo==0.4.2rc1
odc-stac==0.3.7
odc-ui==0.2.0a3

Code I used is below.

import os

import odc.geo  # noqa
from odc.algo import mask_cleanup
from odc.stac import configure_rio, load
from pystac_client import Client

from datacube.utils.dask import start_local_dask
from odc.geo.cog import save_cog_with_dask

dask_client = start_local_dask(n_workers=8, threads_per_worker=4, mem_safety_margin="2G")

catalog = "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/"

# Searching across both landsat and sentinel at 30 m
collections = ["HLSS30.v2.0", "HLSL30.v2.0"]

client = Client.open(catalog)

# BBOX over Precipitous Bluff in Tasmania
ll = (-43.55, 146.45)
ur = (-43.35, 146.75)
bbox = [ll[1], ll[0], ur[1], ur[0]]

# Search for items in the collection
items = client.search(collections=collections, bbox=bbox, datetime="2023-07-01/2023-09-30").items()

# Finds about 109 items
items = [i for i in items]
print(f"Found {len(items)} items")

# Configure GDAL. You need to export your earthdata token as an environment variable.
header_string = f"Authorization: Bearer {os.environ['EARTHDATA_TOKEN']}"
configure_rio(cloud_defaults=True, GDAL_HTTP_HEADERS=header_string)

data = load(
    items,
    bbox=bbox,
    crs="epsg:6933",
    resolution=30,
    chunks={"x": 2500, "y": 2500, "time": 1},
    groupby="solar_day",
    bands=["B04", "B03", "B02", "Fmask"]
)

# Get cloud  mask bitfields
# I think 1 is cloud, but I can't find docs...
# And bit 2 is a mess, but might be cloud shadow... not using that
# I can't actually find the cloud shadow bit
mask_bitfields = [1]  
bitmask = 0
for field in mask_bitfields:
    bitmask |= 1 << field

# Get cloud mask
cloud_mask = data["Fmask"].astype(int) & bitmask != 0

# Contract and then expand the cloud mask to remove small areas
dilated = mask_cleanup(cloud_mask, [("opening", 2), ("dilation", 3)])

masked = data.where(~dilated)
masked = masked.drop("Fmask")

# Create a simple cloud-free median, still in memory
median = masked.median("time")

cog_visual = save_cog_with_dask(
    median["B04"],
    "s3://files.auspatious.com/test_red.tif",
    compression="lzw",
    level=80,
    overview_resampling="average",
    # blocksize=[bsz, bsz // 2, bsz // 4],
    client=dask_client,  # optional, default client will be queried
)

# Seems to write a file full of zeros
cog_visual.compute()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.