corteva / geocube Goto Github PK

View Code? Open in Web Editor NEW

331.0 10.0 26.0 87.36 MB

Tool to convert geopandas vector data into rasterized xarray data.

Home Page: https://corteva.github.io/geocube

License: BSD 3-Clause "New" or "Revised" License

Makefile 2.19% Python 97.81%

python gis geopandas xarray opendatacube geospatial rasterio gdal vector raster

geocube's People

Contributors

Stargazers

Watchers

geocube's Issues

cannot import name 'clock_gettime' from 'time' (unknown location)

time.clock has been removed in Python 3.8 after being deprecated in Python 3.3. The recommendation is to change clock by either time.perf_counter or time.process_time.

This errors arises with the following code on Python 3.8.x:
from geocube.api.core import make_geocube

DEP: Drop Python 3.8 Support

Related #86

Community references:

Python 3.11 - October 2022 ref. #1110
NEP-29 says numpy drops support for Python 3.8 in April 2023.
rioxarray
pyproj
rasterio
xarray

ENH: Define datatype for make_geocube

Discussed in #61

^{Originally posted by mjyshin March 19, 2021}
Hello, I have a lot of Shapely polygons as shapes that I want to rasterize into an output image, but the output will be huge. Is there a way to use make_geocube so that when I run something like

gdf = gpd.GeoDataFrame({'mask': shape_types}, geometry=shapes)
cube = make_geocube(gdf, resolution=(-1, 1), fill=0)
mask = cube.mask.values

the mask will be an int8 array instead of float64 (shape_types will naturally only be small integers)? Thanks in advance!

Docs should specify that categorical_enums gets sorted

Docs states that the categorical_enums parameter should be a dict of lists, but then it gets sorted.

geocube/geocube/vector_to_cube.py

Line 76 in 0840e42

categories=sorted(set(categories)) + ["nodata"]

This should be stated in the documentation, as it takes the user to wrong results without warnings

make_geocube removes some polygon

Hi,

I am trying to rasterize polygons using make_geocube, but I am losing some polygons. My reference raster file is of 1 km resolution. Here is a quick look at my problem: https://github.com/aniketgupta2009/NextGen_Data/blob/main/Zonal_Statistics.ipynb

Aniket

Look into failing tests with custom_rasterize_function

test_make_geocube__custom_rasterize_function
test_make_geocube__custom_rasterize_function__filter_null

See: https://travis-ci.com/corteva/geocube/builds/146261557

Support dask-geopandas

@snowman2 mentioned using dask-geopandas to rasterize in chunks using Dask. I just tried and it doesn't seem to work out of the box. Here is an example:

from functools import partial

import geopandas
import dask_geopandas
from rasterio.enums import MergeAlg
from geocube.api.core import make_geocube
from geocube.rasterize import rasterize_image

# download: wget https://raw.githubusercontent.com/drei01/geojson-world-cities/master/cities.geojson

df = geopandas.read_file(
    "cities.geojson",
    crs="epsg:4326",
)

ddf = dask_geopandas.from_geopandas(df, npartitions=4)
ddf["mask"] = 1

geo_grid = make_geocube(
    vector_data=ddf,
    resolution=(-0.01, 0.01),
    measurements=["mask"],
    rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
    fill=0,
)

Traceback

---------------------------------------------------------------------------
VectorDataError                           Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 geo_grid = make_geocube(
      2     vector_data=ddf,
      3     resolution=(-0.05, 0.05),
      4     measurements=['mask'],
      5     rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
      6     fill=0,
      7 )
      8 geo_grid

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/api/core.py:99, in make_geocube(vector_data, measurements, datetime_measurements, output_crs, resolution, align, geom, like, fill, group_by, interpolate_na_method, categorical_enums, rasterize_function)
     35 """
     36 Rasterize vector data into an ``xarray`` object.  Each attribute will be a data
     37 variable in the :class:`xarray.Dataset`.
   (...)
     95 
     96 """
     97 geobox_maker = GeoBoxMaker(output_crs, resolution, align, geom, like)
---> 99 return VectorToCube(
    100     vector_data=vector_data,
    101     geobox_maker=geobox_maker,
    102     fill=fill,
    103     categorical_enums=categorical_enums,
    104 ).make_geocube(
    105     measurements=measurements,
    106     datetime_measurements=datetime_measurements,
    107     group_by=group_by,
    108     interpolate_na_method=interpolate_na_method,
    109     rasterize_function=rasterize_function,
    110 )

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/vector_to_cube.py:79, in VectorToCube.__init__(self, vector_data, geobox_maker, fill, categorical_enums)
     53 def __init__(
     54     self,
     55     vector_data: Union[str, os.PathLike, geopandas.GeoDataFrame],
   (...)
     58     categorical_enums: Optional[Dict[str, List]],
     59 ):
     60     """
     61     Initialize the GeoCube class.
     62 
   (...)
     77 
     78     """
---> 79     self._vector_data = load_vector_data(vector_data)
     80     self._geobox = geobox_maker.from_vector(self._vector_data)
     81     self._grid_coords = affine_to_coords(
     82         self._geobox.affine, self._geobox.width, self._geobox.height
     83     )

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/geo_utils/geobox.py:73, in load_vector_data(vector_data)
     71     raise VectorDataError("Empty GeoDataFrame.")
     72 if "geometry" not in vector_data.columns:
---> 73     raise VectorDataError(
     74         "'geometry' column missing. Columns in file: "
     75         f"{vector_data.columns.values.tolist()}"
     76     )
     78 # make sure projection is set
     79 if not vector_data.crs:

VectorDataError: 'geometry' column missing. Columns in file: [0, 1, 2]

odc-geo: anchor replaces align

https://github.com/opendatacube/odc-geo/releases/tag/v0.1.2

Geocube doesn't load gpd Dataframe

Code Sample, a copy-pastable example if possible

from geocube.api.core import make_geocube
import geopandas as gpd

# load
path = R"path.gpkg"
shp = gpd.read_file(path)

# create geocube
blds_cube = make_geocube(
    vector_data=shp,
    measurements=["to_tif"],
    output_crs=3857,
    resolution=(-2, 2),
    fill=0
)

Problem description

Geocube doesn't work...
Ends with error:
pyproj.exceptions.CRSError: Expect string or any object with .to_epsg() or .to_wkt() methods

Expected Output

Geocube works as expected. No changes to previously working code have been made

Environment Information

geocube version 0.3.2
rasterio version 1.2.10
rasterio GDAL version 3.5.0
fiona version 1.8.21y
fiona GDAL version (fio --gdal-version)
Python version 3.10.5
Operation System Information Windows

Installation method

conda

DOC: Can I perform operations on the geocube pixels during rasterization? Count, Mean, etc.

Hi this is my first foray into geocube. Thanks for your work here, clearly a good idea.

I'd like to add an example usage that I believe should be possible, but not just quite getting right.

Is it possible to count the number of polygons in a cell, or perform operations on the geocube pixels during rasterization? Imagine a geopandas dataframe with many polygons. For each cell, perform some numpy operation (like rasterstats does) on the incoming polygons. I can perform pandas like operation after creating a geocube, but it doesn't preserve the spatial object. I feel like the group_by argument is in this direction, but could not succeed.

Something like

#Turn each set of predictions into a raster
import geopandas as gpd
from shapely.geometry import Polygon
from geocube.api.core import make_geocube

#a numeric column to count
g["mask"] = 1

cube = make_geocube(vector_data=g, resolution=(-50, 50), function=mask.count())

To create a heatmap of number of polygons in each cell.

Here is a notebook to help illustrate use case (sorry the geocube svg doesn't play well with ipython on github).

https://github.com/weecology/NEON_crown_maps/blob/master/notebooks/Create%20a%20raster%20of%20polygon%20counts.ipynb

ImportError: cannot import name 'NDArray' from 'numpy.typing'

This code used to work, and now suddenly it is not importing a package within the geocube package.

Losing data when add a raster to a geocube

I don't know if it is a bug or my misunderstanding, but I'm following this example without fortune.

vector_in = geopandas.read_file(input_vector)
raster_in = rioxarray.open_rasterio(input_raster)

out_grid = make_geocube(
    vector_data=vector_in,
    measurements=["id"],
    like=raster_in,
)

When I use out_grid["process_value"] = raster_in and later out_grid.to_netcdf('mydata.nc') I see that the output has a lot of nodata. But neither vector_in or raster_in have this amount of nodata.

Here there is a QGIS project with my data.

geobox or to_wkt error while rasterizing a geodataframe with a single row

I'm getting an error that looks like it could be related to invalid or empty geometries, but it is happening with a single, valid geometry. I thought this was because I had an old version of the library, but I updated to 0.3.1 and the error persisted. Any tips would be greatly appreciated! This library has been extremely useful to my projects!

basegrid = xr.open_mfdataset('D:/to_uzbekistan/FR100_output/NLDAS2/FR_100_swed_wy1980.nc') # load a grid to use as reference
w = watersheds[3] # loop will go here

print(watershedNames[w])
#vect = gpd.read_file('../../snowhydro/data/watershed_outlines/FR/09025400.geojson') # open the watershed
vect = gpd.read_file(shapeloc, layer= w) # load the watershed
vect['gridcode'] = 1 # make an attribute to use later
out_grid = make_geocube(vector_data=vect, measurements=['gridcode'], like=basegrid, fill=np.nan) # rasterize the base grid``

Error:

FRASER RIVER AT WINTER PARK, CO
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~\Miniconda3\envs\py3\lib\site-packages\geocube\geo_utils\geobox.py:163, in GeoBoxMaker.from_vector(self, vector_data)
    162 try:
--> 163     geobox = self.like.geobox
    164 except AttributeError:

File ~\Miniconda3\envs\py3\lib\site-packages\xarray\core\common.py:239, in AttrAccessMixin.__getattr__(self, name)
    238             return source[name]
--> 239 raise AttributeError(
    240     f"{type(self).__name__!r} object has no attribute {name!r}"
    241 )

AttributeError: 'Dataset' object has no attribute 'geobox'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 8>()
      6 vect = gpd.read_file(shapeloc, layer= w) # load the watershed
      7 vect['gridcode'] = 1 # make an attribute to use later
----> 8 out_grid = make_geocube(vector_data=vect, measurements=['gridcode'], like=basegrid, fill=np.nan)

File ~\Miniconda3\envs\py3\lib\site-packages\geocube\api\core.py:99, in make_geocube(vector_data, measurements, datetime_measurements, output_crs, resolution, align, geom, like, fill, group_by, interpolate_na_method, categorical_enums, rasterize_function)
     35 """
     36 Rasterize vector data into an ``xarray`` object.  Each attribute will be a data
     37 variable in the :class:`xarray.Dataset`.
   (...)
     95 
     96 """
     97 geobox_maker = GeoBoxMaker(output_crs, resolution, align, geom, like)
---> 99 return VectorToCube(
    100     vector_data=vector_data,
    101     geobox_maker=geobox_maker,
    102     fill=fill,
    103     categorical_enums=categorical_enums,
    104 ).make_geocube(
    105     measurements=measurements,
    106     datetime_measurements=datetime_measurements,
    107     group_by=group_by,
    108     interpolate_na_method=interpolate_na_method,
    109     rasterize_function=rasterize_function,
    110 )

File ~\Miniconda3\envs\py3\lib\site-packages\geocube\vector_to_cube.py:80, in VectorToCube.__init__(self, vector_data, geobox_maker, fill, categorical_enums)
     60 """
     61 Initialize the GeoCube class.
     62 
   (...)
     77 
     78 """
     79 self._vector_data = load_vector_data(vector_data)
---> 80 self._geobox = geobox_maker.from_vector(self._vector_data)
     81 self._grid_coords = affine_to_coords(
     82     self._geobox.affine, self._geobox.width, self._geobox.height
     83 )
     84 self._fill = fill if fill is not None else numpy.nan

File ~\Miniconda3\envs\py3\lib\site-packages\geocube\geo_utils\geobox.py:165, in GeoBoxMaker.from_vector(self, vector_data)
    163         geobox = self.like.geobox
    164     except AttributeError:
--> 165         geobox = geobox_from_rio(self.like)
    166     return geobox
    168 if self.resolution is None:

File ~\Miniconda3\envs\py3\lib\site-packages\geocube\geo_utils\geobox.py:44, in geobox_from_rio(xds)
     39 except AttributeError:
     40     transform = xds[xds.rio.vars[0]].rio.transform()
     41 return GeoBox(
     42     shape=wh_(width, height),
     43     affine=transform,
---> 44     crs=CRS(xds.rio.crs.to_wkt()),
     45 )

AttributeError: 'NoneType' object has no attribute 'to_wkt'

Remove empty geometries:

Check for invalid geometries:

Error when importing make_geocube

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

# Your code here
from geocube.api.core import make_geocube

Problem description

It appears to be an issue in spyder, since the command and package works when running it in a cmd prompt python console.

It looks like something to do with the rasterio dependency but when I import rasterio on its own it imports fine.
Any suggestions on how to fix spyder?

Expected Output

none

Output of `geocube --version 0.0.11`

Traceback (most recent call last):

File "rasterio_crs.pyx", line 215, in rasterio._crs._CRS.from_epsg

File "rasterio_err.pyx", line 182, in rasterio._err.exc_wrap_int

CPLE_AppDefinedError: PROJ: proj_create_from_database: Cannot find proj.db

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "", line 1, in
from geocube.api.core import make_geocube

File "C:\Anaconda3\lib\site-packages\geocube\api\core.py", line 7, in
from geocube.geo_utils.geobox import GeoBoxMaker

File "C:\Anaconda3\lib\site-packages\geocube\geo_utils\geobox.py", line 9, in
from datacube.utils import geometry

File "C:\Anaconda3\lib\site-packages\datacube\utils\geometry_init_.py", line 52, in
from ._warp import (

File "C:\Anaconda3\lib\site-packages\datacube\utils\geometry_warp.py", line 10, in
_WRP_CRS = rasterio.crs.CRS.from_epsg(3857)

File "C:\Anaconda3\lib\site-packages\rasterio\crs.py", line 321, in from_epsg
obj._crs = _CRS.from_epsg(code)

File "rasterio_crs.pyx", line 217, in rasterio._crs._CRS.from_epsg

CRSError: The EPSG code is unknown. PROJ: proj_create_from_database: Cannot find proj.db

ModuleNotFoundError: No module named 'geocube.vector'

import geopandas

from geocube.api.core import make_geocube
from geocube.vector import vectorize

Problem description

ModuleNotFoundError Traceback (most recent call last)
Cell In[88], line 4
1 import geopandas
3 from geocube.api.core import make_geocube
----> 4 from geocube.vector import vectorize

ModuleNotFoundError: No module named 'geocube.vector

Environment Information

geocube v0.3.3

GDAL deps:
fiona: 1.8.22
GDAL[fiona]: 3.5.2
rasterio: 1.3.8.post2
GDAL[rasterio]: 3.6.4

Python deps:
appdirs: 1.4.4
click: 8.1.7
geopandas: 0.13.2
odc_geo: 0.4.1
rioxarray: 0.13.4
pyproj: 3.5.0
xarray: 2023.1.0

System:
python: 3.8.0 (default, Nov 6 2019, 16:00:02) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\gji19\AppData\Local\anaconda3\envs\GDAL\python.exe
machine: Windows-10-10.0.19041-SP0

Missing data causes rasterize_points_griddata in cubic mode to fall

from geocube.rasterize import rasterize_points_griddata
from geocube.api.core import make_geocube
from functools import partial

rasterize_function_cubic = partial(rasterize_points_griddata, method="cubic")

Notes: It would be good to do this by column as there are cases where one column may have data in a row when another does not.

[Question] How install on Windows?

Unfortunately I need the install this package on Windows O.S., because are client requirements.
Currently I have a virtual environment created with virtualenv and have installed Geopandas.
The GDAL and Fiona I downloaded the binaries and installed from this site (https://www.lfd.uci.edu/~gohlke/pythonlibs) and work's well.
So, now I need rasterize a vector and found this package, but I can't install.
Some tip to install on Windows without Conda using only pip?
Thank's

DEP: odc-geo

Migrate to https://github.com/opendatacube/odc-geo from opendatacube when it is released.

BUG: Handle numpy (1.22+) error when attempting to rasterize string

https://github.com/corteva/geocube/runs/4707259171?

    def test_make_geocube__group_by_no_measurements(input_geodata, tmpdir):
>       out_grid = make_geocube(
            vector_data=input_geodata,
            output_crs=TEST_GARS_PROJ,
            geom=json.dumps(mapping(TEST_GARS_POLY)),
            group_by="hzdept_r",
            resolution=(-10, 10),
            fill=-9999.0,
        )

test/integration/api/test_core_integration.py:644: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
geocube/api/core.py:90: in make_geocube
    return VectorToCube(
geocube/vector_to_cube.py:163: in make_geocube
    return self._get_dataset(
geocube/vector_to_cube.py:233: in _get_dataset
    grid_array = self._get_grouped_grid(
geocube/vector_to_cube.py:294: in _get_grouped_grid
    image = self._rasterize_function(
geocube/rasterize.py:101: in rasterize_image
    image = rasterio.features.rasterize(
/usr/share/miniconda/envs/test/lib/python3.8/site-packages/rasterio/env.py:387: in wrapper
    return f(*args, **kwds)
/usr/share/miniconda/envs/test/lib/python3.8/site-packages/rasterio/features.py:325: in rasterize
    if not validate_dtype(shape_values, valid_dtypes):
/usr/share/miniconda/envs/test/lib/python3.8/site-packages/rasterio/dtypes.py:186: in validate_dtype
    get_minimum_dtype(values) in valid_dtypes)
/usr/share/miniconda/envs/test/lib/python3.8/site-packages/rasterio/dtypes.py:109: in get_minimum_dtype
    min_value = values.min()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = array(['12577452', '12577266', '12577459', '12577323', '12577225',
       '12577369', '12577275'], dtype='<U8')
axis = None, out = None, keepdims = False, initial = <no value>, where = True

    def _amin(a, axis=None, out=None, keepdims=False,
              initial=_NoValue, where=True):
>       return umr_minimum(a, axis, None, out, keepdims, initial, where)
E       numpy.core._exceptions._UFuncNoLoopError: ufunc 'minimum' did not contain a loop with signature matching types (dtype('<U8'), dtype('<U8')) -> None

Export raster as 4-bit-thematic integer?

Hey Guys,
I am producing a lot of GeoTIFFs in areas with lakes. On my lakes, the raster-value would be either 1 or 0, depending on the state and where no lake is the raster should have NaNs. So far it is working but it takes very long to import the GeoTIFFs in e.g. ArcMap.
My boss asked me if it is possible to produce 4-bit thematic integers as output.
I did not find any solution for this in the docs, so I just wanted to ask here.

Thanks a lot,
Felix

Install geocube error with pg_config

When installing geocube I receive this error:

ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o1sdpu5h/psycopg2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o1sdpu5h/psycopg2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-y45z87sy
         cwd: /tmp/pip-install-o1sdpu5h/psycopg2/
    Complete output (23 lines):
    running egg_info
    creating /tmp/pip-pip-egg-info-y45z87sy/psycopg2.egg-info
    writing /tmp/pip-pip-egg-info-y45z87sy/psycopg2.egg-info/PKG-INFO
    writing dependency_links to /tmp/pip-pip-egg-info-y45z87sy/psycopg2.egg-info/dependency_links.txt
    writing top-level names to /tmp/pip-pip-egg-info-y45z87sy/psycopg2.egg-info/top_level.txt
    writing manifest file '/tmp/pip-pip-egg-info-y45z87sy/psycopg2.egg-info/SOURCES.txt'
    
    Error: pg_config executable not found.
    
    pg_config is required to build psycopg2 from source.  Please add the directory
    containing pg_config to the $PATH or specify the full executable path with the
    option:
    
        python setup.py build_ext --pg-config /path/to/pg_config build ...
    
    or with the pg_config option in 'setup.cfg'.
    
    If you prefer to avoid building psycopg2 from source, please install the PyPI
    'psycopg2-binary' package instead.
    
    For further information please check the 'doc/src/install.rst' file (also at
    <https://www.psycopg.org/docs/install.html>).
    
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I installed the psycopg2-binary successfully. I was able to import psycopg2 successfully so it is working as expected. But still geocube will error with same error as above. Do I need to add a PATH command to the psycopg2-binary install?

Using:
Python 3.8.5
pip 20.2.3

Running in docker
Ubuntu 20.04.1 LTS (Focal Fossa)

DEP: Drop Python 3.7 support

Community references:

NEP-29 says numpy dropped support for Python 3.7 in December 2021.
pandas (pandas-dev/pandas#41678)
xarray will likely drop support soon (pydata/xarray#6138)
opendatacube is Python 3.8+ already ref
django ref
pyproj (pyproj4/pyproj#930)
rioxarray (corteva/rioxarray#451)

Add merge algorithm option for rasterization

https://rasterio.readthedocs.io/en/stable/api/rasterio.features.html#rasterio.features.rasterize

DOC: Add all_touched example

Follow on from #108

Worth adding an example in the docs to make the solution more discoverable

PROJ error when importing make_geocube

I get the following error when using from geocube.api.core import make_geocube.

ERROR 1: PROJ: proj_create_from_database: /home/ubu/miniconda3/envs/geoenv/share/proj/proj.db lacks DATABASE.LAYOUT.VERSION.MAJOR / DATABASE.LAYOUT.VERSION.MINOR metadata. It comes from another PROJ installation.

Running geocube --show-versions from the cli also results in this error.

Environment Information

geocube version 0.1.2
rasterio version 1.2.10
rasterio GDAL version 3.3.2
fiona version 1.8.21
fiona GDAL version 3.4.1
Python version 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
Operation System Information Linux-4.19.128-microsoft-standard-x86_64-with-glibc2.17

Installation method

conda envionment with pip install geocube

Conda environment information (if you installed with conda):

Environment (conda list):

cf_xarray                 0.7.0              pyhd8ed1ab_0    conda-forge
fiona                     1.8.21                   pypi_0    pypi
rasterio                  1.2.10                   pypi_0    pypi
rioxarray                 0.10.2                   pypi_0    pypi
scipy                     1.8.0            py38h56a6a73_1    conda-forge
xarray                    0.20.1             pyhd3eb1b0_1

Details about conda and system ( conda info ):

active environment : geoenv
    active env location : ~/miniconda3/envs/geoenv
            shell level : 2
       user config file : ~/.condarc
 populated config files : ~/.condarc
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.9.5.final.0
       virtual packages : __linux=4.19.128=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : ~/miniconda3  (writable)
      conda av data dir : ~/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : ~/miniconda3/pkgs
                          ~/.conda/pkgs
       envs directories : ~/miniconda3/envs
                          ~/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.3 requests/2.25.1 CPython/3.9.5 Linux/4.19.128-microsoft-standard ubuntu/20.04.1 glibc/2.31
                UID:GID : 1000:1000
             netrc file : ~/.netrc
           offline mode : False

Subsequent call to `make_geocube` for a second categorical variable fails with `KeyError`

Scenario: a GeoDataFrame with two categorical variables. Call make_geocube for one variable and then, at some point, call again to rasterize another variable in the same GeoDataFrame. This fails as some internal code in VectorToCube tries to access the xarray.DataSet with the original variable name (not present in the DataSet created in the second invocation). Specifically, this seems to occur if the same GeoDataFrame is passed in. Thus, workarounds seem to be either to call make_geocube each time with a copy of the GeoDataFrame or else pass in the path string so it reads in the GeoDataFrame with each call.

To illustrate:

Imports

import geopandas as gp
import xarray as xr

from geocube.api.core import make_geocube

Key versions

!conda list xarray
# packages in environment at /home/guy/anaconda3/envs/invest_geo_3:
#
# Name                    Version                   Build  Channel
rioxarray                 0.14.1             pyhd8ed1ab_0    conda-forge
xarray                    2024.1.0           pyhd8ed1ab_0    conda-forge

!conda list geocube
# packages in environment at /home/guy/anaconda3/envs/invest_geo_3:
#
# Name                    Version                   Build  Channel
geocube                   0.4.3              pyhd8ed1ab_0    conda-forge

!conda list geopandas
# packages in environment at /home/guy/anaconda3/envs/invest_geo_3:
#
# Name                    Version                   Build  Channel
dask-geopandas            0.3.1              pyhd8ed1ab_0    conda-forge
geopandas                 0.12.2             pyhd8ed1ab_0    conda-forge
geopandas-base            0.12.2             pyha770c72_0    conda-forge

Load input vector data as GeoDataFrame

crome_22 = gp.read_file('./local_data/crome_22.gpkg')
crome_22.shape

(350755, 9)

crome_22.head()

	prob	county	cromeid	SHAPE_Length	SHAPE_Area	LUCODE	Land Cover Description	Land Use Description	geometry
0	0.946	CMB	RPA355726477514	239.998908	4156.884	PG01	Grassland	Permanent Grassland	POLYGON ((355706.313 477480.156, 355686.313 47...
1	0.640	CMB	RPA360766501417	239.998561	4156.872	PG01	Grassland	Permanent Grassland	POLYGON ((360746.313 501382.438, 360726.313 50...
2	0.354	CMB	RPA360826501036	239.998908	4156.884	PG01	Grassland	Permanent Grassland	POLYGON ((360806.313 501001.406, 360786.313 50...
3	0.856	CMB	RPA360946500481	239.998561	4156.872	PG01	Grassland	Permanent Grassland	POLYGON ((360926.313 500447.125, 360906.313 50...
4	0.998	CMB	RPA353686475852	239.998561	4156.872	WO12	Trees	Trees and Scrubs, short Woody plants, hedgerows	POLYGON ((353666.313 475817.375, 353646.313 47...

Load elevation data as `DataArray` for use in `like=` parameter:

dem = xr.load_dataarray('./local_data/lune_dem.tiff')
dem

<xarray.DataArray 'band_data' (band: 1, y: 2446, x: 1492)>
array([[[731.0231  , 725.2918  , 719.77106 , ..., 193.4232  ,
         195.0497  , 197.2258  ],
        [713.22296 , 707.98254 , 719.77106 , ..., 191.12787 ,
         191.66556 , 192.84222 ],
        [713.22296 , 707.98254 , 702.0825  , ..., 193.54457 ,
         197.48878 , 199.48225 ],
        ...,
        [ 19.37331 ,  18.617643,  17.189133, ..., 253.5     ,
         251.02068 , 247.42574 ],
        [ 22.89966 ,  18.451714,  17.763134, ..., 253.5     ,
         251.1492  , 233.6541  ],
        [ 31.439272,  24.327406,  19.548042, ..., 253.99286 ,
         253.26584 , 240.41885 ]]], dtype=float32)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 3.372e+05 3.372e+05 ... 3.819e+05 3.819e+05
  * y            (y) float64 5.133e+05 5.133e+05 5.133e+05 ... 4.4e+05 4.4e+05
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:  Area
    long_name:      data

lud_enum = {'Land Use Description': [x for x in crome_22['Land Use Description'].unique()]}
lud_enum

{'Land Use Description': ['Permanent Grassland',
'Trees and Scrubs, short Woody plants, hedgerows',
'Non-vegetated or sparsely-vegetated Land',
'Perennial Crops and Isolated Trees',
'Spring Barley',
'Maize',
'Potato',
'Water',
'Spring Wheat',
'Fallow Land',
'Winter Field beans',
'Winter Oilseed',
'Winter Wheat',
'Winter Barley',
'Beet',
'Spring Cabbage',
'Spring Oats',
'Winter Oats',
'Spring Peas',
'Spring Field beans',
'Winter Rye',
'Winter Triticale']}

lcd_enum = {'Land Cover Description': [x for x in crome_22['Land Cover Description'].unique()]}
lcd_enum

{'Land Cover Description': ['Grassland',
'Trees',
'Non-Agricultural Land',
'Cereal Crops',
'Water',
'Leguminous Crops']}

# first call  for first variable
lud_grid = make_geocube(
    vector_data=crome_22,
    measurements=['Land Use Description'],
    like=dem,
    categorical_enums=lud_enum,
)
lud_grid

<xarray.Dataset>
Dimensions:                          (y: 2446, x: 1492,
                                      Land Use Description_categories: 23)
Coordinates:
  * y                                (y) float64 5.133e+05 5.133e+05 ... 4.4e+05
  * x                                (x) float64 3.372e+05 ... 3.819e+05
  * Land Use Description_categories  (Land Use Description_categories) object ...
    spatial_ref                      int64 0
Data variables:
    Land Use Description             (y, x) int16 -1 -1 -1 -1 -1 ... -1 -1 -1 -1

Now attempt a second call to `make_geocube`

This passes in the same GeoDataFrame, but specifies the 'Land Cover Description' variable and
passes in the associated categorical enumerations defined above.
Nowhere does this call reference the 'Land Use Description' variable.

# change only the desired `measurement` and pass in relevant categorical enums
# No mention of Land Use Description here...
lcd_grid = make_geocube(
    vector_data=crome_22,
    measurements=['Land Cover Description'],
    like=dem,
    categorical_enums=lcd_enum
)
lcd_grid

This results in a KeyError:

{
	"name": "KeyError",
	"message": "\"No variable named 'Land Use Description'. Variables on the dataset include ['Land Cover Description', 'y', 'x', 'Land Cover Description_categories']\"",
	"stack": "---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/xarray/core/dataset.py in ?(self, name)
   1445             variable = self._variables[name]
   1446         except KeyError:
-> 1447             _, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
   1448 

KeyError: 'Land Use Description'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/xarray/core/dataset.py in ?(self, key)
   1546                 return self._construct_dataarray(key)
   1547             except KeyError as e:
-> 1548                 raise KeyError(
   1549                     f\"No variable named {key!r}. Variables on the dataset include {shorten_list_repr(list(self.variables.keys()), max_items=10)}\"

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/xarray/core/dataset.py in ?(self, name)
   1445             variable = self._variables[name]
   1446         except KeyError:
-> 1447             _, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
   1448 

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    209     split_key = key.split(\".\", 1)
    210     if len(split_key) != 2:
--> 211         raise KeyError(key)
    212 

KeyError: 'Land Use Description'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_44099/1077586481.py in ?()
      1 # change only the desired `measurement` and pass in relevant categorical enums
      2 # No mention of Land Use Description here...
----> 3 lcd_grid = make_geocube(
      4     vector_data=crome_22,
      5     measurements=['Land Cover Description'],
      6     like=dem,

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/geocube/api/core.py in ?(vector_data, measurements, datetime_measurements, output_crs, resolution, align, geom, like, fill, group_by, interpolate_na_method, categorical_enums, rasterize_function)
     95 
     96     \"\"\"
     97     geobox_maker = GeoBoxMaker(output_crs, resolution, align, geom, like)
     98 
---> 99     return VectorToCube(
    100         vector_data=vector_data,
    101         geobox_maker=geobox_maker,
    102         fill=fill,

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/geocube/vector_to_cube.py in ?(self, measurements, datetime_measurements, group_by, interpolate_na_method, rasterize_function)
    175                 measurements.remove(group_by)
    176             except ValueError:
    177                 pass
    178 
--> 179         return self._get_dataset(
    180             vector_data, measurements, group_by, interpolate_na_method
    181         )

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/geocube/vector_to_cube.py in ?(self, vector_data, measurements, group_by, interpolate_na_method)
    270         out_xds = xarray.Dataset(data_vars=data_vars, coords=self._grid_coords)
    271 
    272         for categorical_measurement, categoral_enums in self._categorical_enums.items():
    273             enum_var_name = f\"{categorical_measurement}_categories\"
--> 274             cat_attrs = dict(out_xds[categorical_measurement].attrs)
    275             cat_attrs[\"categorical_mapping\"] = enum_var_name
    276             out_xds[categorical_measurement].attrs = cat_attrs
    277             out_xds[enum_var_name] = categoral_enums

~/anaconda3/envs/invest_geo_3/lib/python3.10/site-packages/xarray/core/dataset.py in ?(self, key)
   1544         if utils.hashable(key):
   1545             try:
   1546                 return self._construct_dataarray(key)
   1547             except KeyError as e:
-> 1548                 raise KeyError(
   1549                     f\"No variable named {key!r}. Variables on the dataset include {shorten_list_repr(list(self.variables.keys()), max_items=10)}\"
   1550                 ) from e
   1551 

KeyError: \"No variable named 'Land Use Description'. Variables on the dataset include ['Land Cover Description', 'y', 'x', 'Land Cover Description_categories']\""
}

Note that it seems to be trying to index the DataSet using the previous variable name. Why?
I've taken a peek at the code in conjunction with the above. It seems that the VectorToCube object's _categorical_enums attribute is at the heart of it.

Gut feel - is the __init__ lines that initiate this attribute with an empty dict (a mutable) the root cause of this behaviour? It feels like the classic result of this, but I haven't dug into the behaviour of importing from geocube.api.core.

Workarounds

It seems this KeyError problem can be averted if make_geocube is either called with a fresh copy of the GeoDataFrame each time, or with the string path to the file, forcing the vector data to be read in afresh with each call.

Feature request: Pathlib support

Summary

This is a feature request to better integrate pathlib objects to avoid surprises and confusion.

Code Sample

import xarray as xr
from geocube.api.core import make_geocube
from pathlib import Path

base_dir = Path("C:/dir/to/data")
raster_fname = base_dir / "Mana.tif"
polys_fname = base_dir / "Mana_polygons.shp"

xds = xr.open_rasterio(raster_fname )

cube = make_geocube(polys_fname, like=xds)

Problem description

Shows error from last command:

Traceback (most recent call last):

  File "C:\Users\mtoews\AppData\Local\Temp/ipykernel_16964/1127047227.py", line 10, in <module>
    cube = make_geocube(polys_fname, like=xds)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\api\core.py", line 90, in make_geocube
    return VectorToCube(

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\vector_to_cube.py", line 68, in __init__
    self._vector_data = load_vector_data(vector_data)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\geo_utils\geobox.py", line 73, in load_vector_data
    vector_data = geopandas.GeoDataFrame(vector_data)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geopandas\geodataframe.py", line 122, in __init__
    super().__init__(data, *args, **kwargs)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\pandas\core\frame.py", line 730, in __init__
    raise ValueError("DataFrame constructor not properly called!")

ValueError: DataFrame constructor not properly called!

Expected Output

Naïvely expected make_geocube to handle a pathlib object. To be fair, the docstring has:

vector_data: str or geopandas.GeoDataFrame
A file path to an OGR supported source or GeoDataFrame containing the vector data.

A workaround is to either use os.fspath(polys_fname) or str(polys_fname) to convert to str for vector_data.

Environment Information

python -c "import geocube; geocube.show_versions()"

geocube v0.0.18

GDAL deps:
         fiona: 1.8.20
   GDAL[fiona]: 3.3.2
      rasterio: 1.2.10
GDAL[rasterio]: 3.3.2

Python deps:
       appdirs: 1.4.4
         click: 8.0.3
      datacube: 1.8.6
     geopandas: 0.10.2
     rioxarray: 0.8.0
        pyproj: 3.2.1
        xarray: 0.19.0

System:
        python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
    executable: C:\Users\mtoews\Miniconda3\envs\pyforge\python.exe
       machine: Windows-10-10.0.17134-SP0

Installation method

conda

Conda environment information

$ conda list  (selective)"
datacube                  1.8.6              pyhd8ed1ab_0    conda-forge
fiona                     1.8.20           py39hea8b339_2    conda-forge
gdal                      3.3.2            py39h7c9a9b1_4    conda-forge
rasterio                  1.2.10           py39h20dd13d_0    conda-forge
rioxarray                 0.8.0              pyhd8ed1ab_0    conda-forge
scipy                     1.7.1            py39hc0c34ad_0    conda-forge
xarray                    0.19.0             pyhd8ed1ab_1    conda-forge

Details about conda and system ( conda info ):

$ conda info
     active environment : pyforge
    active env location : C:\Users\mtoews\Miniconda3\envs\pyforge
            shell level : 2
       user config file : C:\Users\mtoews\.condarc
 populated config files : C:\Users\mtoews\.condarc
                          C:\Users\mtoews\Miniconda3\envs\pyforge\.condarc
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.9.7.final.0
       virtual packages : __cuda=7.5=0
                          __win=0=0
                          __archspec=1=x86_64
       base environment : C:\Users\mtoews\Miniconda3  (writable)
      conda av data dir : C:\Users\mtoews\Miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\Users\mtoews\Miniconda3\pkgs
                          C:\Users\mtoews\.conda\pkgs
                          C:\Users\mtoews\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\mtoews\Miniconda3\envs
                          C:\Users\mtoews\.conda\envs
                          C:\Users\mtoews\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.10.3 requests/2.26.0 CPython/3.9.7 Windows/10 Windows/10.0.17134
          administrator : False
             netrc file : None
           offline mode : False

DeprecationWarning: Please use `str(crs)` instead of `crs.crs_str`

datacube/utils/geometry/_base.py:301: DeprecationWarning: Please use `str(crs)` instead of 
`crs.crs_str`
  warnings.warn("Please use `str(crs)` instead of `crs.crs_str`", category=DeprecationWarning)

Support for rasterisation to dask arrays

Hi folks,

I've had a quick look through the source code and haven't able to find this functionality, so apologies if it exists and I've missed something.

What do you think about the feasibility/attractiveness of being able to run make_geocube as a delayed operation so that the returned xarray wraps a dask array rather than an in-memory numpy ndarray (by, say, passing a chunks argument somewhere as in rioxarray.open_rasterio here.)?

An example use case is in a heavy machine learning workload, where a neural network would be trained on a O(10-100)GB dataset of high resolution aerial photography with rasterised vector layers representing ground truth data.

I'm happy to take a look at this but don't have the familiarity with the codebase to know where a good seam would be for it and whether it's possible to do without breaking things downstream, so would be nice to hear your thoughts.

Cheers!

make_geocube function kills kernel

Hello! Ever since the new update whenever I use the make_geocube function my kernel dies. I tried using code and data that I have successfully ran in the pass and still have yielded the same result... Thank you so much for these tools they are invaluable!

DOC: Missing link in example

I'd like to walk through the example of zonal statistics, but the zip-file to be downloaded at step [2] is not available:
https://corteva.github.io/geocube/html/examples/zonal_statistics.html

Btw, step 3 seems to assume that one has the source code of geocube locally, and that working folder is "geocube/docs/examples". Perhaps this should be mentioned, or a dynamic path could be used instead.
ssurgo_data = gpd.read_file("../../test/test_data/input/soil_data_group.geojson")

Clarify documentation - fill parameter only used for numerical data

Discussed in #151

^{Originally posted by gtmaskall December 12, 2023}
The default behaviour of make_geocube with categorical data is to append a nodata string to the <varname>_categories variable. This makes sense from a Python indexing perspective inasmuch as the corresponding integer code for this is -1. This seems to override the fill argument. Again this makes sense with the use of -1 and the position of nodata. But if I'm correct in this, the documentation could be tightened up a little to clarify that fill only applies to numerical data.

Drop Python 3.5 Support

Dependencies dropping support, so geocube will follow.

Rasterization as DataArray

The function geocube.rasterize.rasterize_image returns a numpy array.
It would be nice if it also could be returned as a DataArray/Dataset. The extent, crs and resolution were already given as parameters.
Or is there a simple wrapper for that?

Drop python 2 support

Will be removed from rasterio, fiona, rioxarray. Already dropped in pyproj.

Look into replacing geobox functionality

Replacing the geobox functionality could remove the datacube dependency, which would remove the gdal python wrapper dependency.

Optimize make_geocube() when converting a GeoDataFrame with multiple columns

Hello there, I stumbled on this library from a stackoverflow answer you posted and I have to say I really dig it.

I'm working on a project where I have a GDF that needs to get rasterized, and this turned out to be the perfect solution. However, the GDF's that will be coming through the pipeline will be kind of big (150K rows x 700 columns) . Right now the rasterization part is becoming a bottleneck, it takes a little over an hour while the other operations happen in minutes. We can cut down the resolution of some of this data on our end, but it seems like there could be some room to optimize the function.

For example, one column with the 150K shapely Point features rasterizes in about 5 seconds using the 'nearest' interpolation method. I believe it should be possible to run the 5 second algorithm that aligns the outgoing grid with the 'nearest' vector features just once, and then simply apply the same pattern across the other n columns of data, so as n scales up, the time to execute the function doesn't scale up with it.

For the 'nearest' option, imagine rasterizing a numbered index associated with the geometry features, and then simply mapping the remaining columns to the new pattern (something akin to pandas take(). I'm not sure what's going on exactly under the hood, but I imagine something analogous could be done for the other interpolation options.

Deprecation warnings

https://travis-ci.com/github/corteva/geocube/jobs/415108723

test/integration/xarray_extensions/test_integration_xarray_extensions_vectorxarray.py::test_to_netcdf

  /home/travis/build/corteva/geocube/geocube/xarray_extensions/vectorxarray.py:59: PendingDeprecationWarning: dropping variables using `drop` will be deprecated; using drop_vars is encouraged.

    out_obj = self._obj.drop("crs")

test/integration/xarray_extensions/test_integration_xarray_extensions_vectorxarray.py::test_to_netcdf

  /home/travis/miniconda/envs/test/lib/python3.6/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

    return array(a, dtype, copy=False, order=order)

ENH: Update minimize dtype for int64 & int8 support

geocube/geocube/rasterize.py

Lines 39 to 52 in e32d163

 def _minimize_dtype(dtype: numpy.dtype, fill: float) -> numpy.dtype: 

 """ 

  If int64, convert to float64: 

  https://github.com/OSGeo/gdal/issues/3325 

  Attempt to convert to float32 if fill is NaN and dtype is integer. 

  """ 

 if numpy.issubdtype(dtype, numpy.integer): 

 if dtype.name == "int8": 

 # GDAL/rasterio doesn't support int8 

 dtype = numpy.dtype("int16") 

 if dtype.name == "int64": 

 # GDAL/rasterio doesn't support int64 

 dtype = numpy.dtype("float64")

Could you help upgrade the vulnerble dependency in geocube ?

Hi, @snowman2 , @BENR0 , I'd like to report a vulnerability issue in geocube_0.2.0.

Issue Description

I noticed that geocube_0.2.0 directly depends on rasterio_1.2.10.
However, rasterio_1.2.10 sufferes from the vulnerabilites which the C libraries exposed as following dependency graph shows.

Dependency Graph between Python and Shared Libraries

Suggested Vulnerability Patch Versions

rasterio has upgraded these vulnerable C libraries to patch versions refer to issue url.

Python build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Python projects.
As a popular python package (geocube has 3,352 downloads per month), could you please upgrade this vulnerable dependency?

Thanks for your help~
Best regards,
Joe Gardner

all_touched arg to make_geocube?

Nice to see the solution here #97 (comment)

Wonder if there is any interested in adding all_touched as an arg to make_geocube to simply this for users?

For example a user can't use all_touched and rasterize_function at the same time. and all_touched is a thin wrapper to partial(rasterize_image, all_touched=True)

Appveyor tests failing due to scipy 1.3.1

https://ci.appveyor.com/project/snowman2/geocube/builds/32101051

Tests:

test_make_geocube__custom_rasterize_function[function2-rasterize_griddata_cubic.nc]
test_make_geocube__custom_rasterize_function__filter_null[function1-rasterize_griddata_cubic_nodata.nc]

   scipy-1.3.1                |   py36h29ff71c_0        14.4 MB  conda-forge

Passing on Travis:

    scipy-1.4.1                |   py36h1dac7e4_2        19.0 MB  conda-forge

DEP: Python 3.6 Support

Related: corteva/rioxarray#215

[Feature] Is it possible to add the funcitonability of polygonizing

First, thank you very much for this package, which helps solve the pain point of converting Geopandas vector data to tif through xarray. May I ask if the package can also facilitate converting an Xarray dataset to Geopandas vectors (polygonizing). That would be much convenient than using gdal functions. Thank you very much.

Wrong shape of resulting Dataset when using `like` argument while rasterizing

When rasterizing point data with make_geocube and handing over a template dataset to like the result is wrong because the width and height is switched. This is due to the wrong order of width and height in line:

geocube/geocube/geo_utils/geobox.py

Line 43 in 6a37df2

width, height = xds.rio.shape

in the function geobox_from_rio because xds.rio.shape returns height, width:
https://github.com/corteva/rioxarray/blob/20351700c851aabe557905be3a216f9479919d95/rioxarray/rioxarray.py#L825-L827

Let me know if you like me to make a PR with the fix.

make_geocube shows confusing error: AttributeError: 'NoneType' object has no attribute '__geo_interface__'

Summary

This issue seems to occur with a large dataset, where it is not clear what the main issue is in the error trace.

Code Sample

import xarray as xr
from geocube.api.core import make_geocube
from pathlib import Path

base_dir = Path("C:/dir/to/data")
raster_fname = base_dir / "Big.tif"
polys_fname = base_dir / "Big_polygons.shp"

xds = xr.open_rasterio(raster_fname)

cube = make_geocube(str(polys_fname), like=xds)

Problem description

From the last command, a warning is first shown (this is an unreported xarray issue, see xds.crs later in report):

C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)

followed by an error trace:

Traceback (most recent call last):

  File "C:\Users\mtoews\AppData\Local\Temp/ipykernel_23028/741645946.py", line 13, in <module>
    cube = make_geocube(str(polys_fname), like=xds)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\api\core.py", line 90, in make_geocube
    return VectorToCube(

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\vector_to_cube.py", line 163, in make_geocube
    return self._get_dataset(

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\vector_to_cube.py", line 238, in _get_dataset
    grid_array = self._get_grid(

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\vector_to_cube.py", line 345, in _get_grid
    image_data = self._rasterize_function(

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geocube\rasterize.py", line 79, in rasterize_image
    zip(geometry_array.apply(mapping).values, data_values),

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\geopandas\geoseries.py", line 624, in apply
    result = super().apply(func, convert_dtype=convert_dtype, args=args, **kwargs)

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\pandas\core\series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\pandas\core\apply.py", line 1043, in apply
    return self.apply_standard()

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\pandas\core\apply.py", line 1098, in apply_standard
    mapped = lib.map_infer(

  File "pandas\_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer

  File "C:\Users\mtoews\Miniconda3\envs\pyforge\lib\site-packages\shapely\geometry\geo.py", line 205, in mapping
    return ob.__geo_interface__

AttributeError: 'NoneType' object has no attribute '__geo_interface__'

As for the inputs:

>>> xds
<xarray.DataArray (band: 1, y: 3466, x: 2400)>
[8318400 values with dtype=float64]
Coordinates:
  * band     (band) int32 1
  * y        (y) float64 6.234e+06 6.234e+06 6.233e+06 ... 5.368e+06 5.368e+06
  * x        (x) float64 1.492e+06 1.492e+06 1.493e+06 ... 2.092e+06 2.092e+06
Attributes:
    transform:         (250.0, 0.0, 1492000.0, 0.0, -250.0, 6234000.0)
    crs:               +init=epsg:2193
    res:               (250.0, 250.0)
    is_tiled:          1
    nodatavals:        (nan,)
    scales:            (1.0,)
    offsets:           (0.0,)
    AREA_OR_POINT:     Area
    TIFFTAG_SOFTWARE:  MATLAB 9.7, Mapping Toolbox 4.9
>>> import geopandas
>>> polys = geopandas.read_file(polys_fname)
>>> polys.shape
(47772, 5)
>>> polys.dtypes
reach        int64
down         float64
order          int64
rids          object
geometry    geometry
dtype: object

Approximately half of the polygons cover the raster.

Expected Output

Expected make_geocube to work without issue, or provide a better error message for the underling issue.

Environment Information

python -c "import geocube; geocube.show_versions()"

geocube v0.0.18

GDAL deps:
         fiona: 1.8.20
   GDAL[fiona]: 3.3.2
      rasterio: 1.2.10
GDAL[rasterio]: 3.3.2

Python deps:
       appdirs: 1.4.4
         click: 8.0.3
      datacube: 1.8.6
     geopandas: 0.10.2
     rioxarray: 0.8.0
        pyproj: 3.2.1
        xarray: 0.19.0

System:
        python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
    executable: C:\Users\mtoews\Miniconda3\envs\pyforge\python.exe
       machine: Windows-10-10.0.17134-SP0

Installation method

conda

Conda environment information

$ conda list  (selective)"
datacube                  1.8.6              pyhd8ed1ab_0    conda-forge
fiona                     1.8.20           py39hea8b339_2    conda-forge
pyproj                    3.2.1            py39h39b2389_2    conda-forge
gdal                      3.3.2            py39h7c9a9b1_4    conda-forge
rasterio                  1.2.10           py39h20dd13d_0    conda-forge
rioxarray                 0.8.0              pyhd8ed1ab_0    conda-forge
scipy                     1.7.1            py39hc0c34ad_0    conda-forge
xarray                    0.19.0             pyhd8ed1ab_1    conda-forge

Details about conda and system ( conda info ):

$ conda info
     active environment : pyforge
    active env location : C:\Users\mtoews\Miniconda3\envs\pyforge
            shell level : 2
       user config file : C:\Users\mtoews\.condarc
 populated config files : C:\Users\mtoews\.condarc
                          C:\Users\mtoews\Miniconda3\envs\pyforge\.condarc
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.9.7.final.0
       virtual packages : __cuda=7.5=0
                          __win=0=0
                          __archspec=1=x86_64
       base environment : C:\Users\mtoews\Miniconda3  (writable)
      conda av data dir : C:\Users\mtoews\Miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\Users\mtoews\Miniconda3\pkgs
                          C:\Users\mtoews\.conda\pkgs
                          C:\Users\mtoews\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\mtoews\Miniconda3\envs
                          C:\Users\mtoews\.conda\envs
                          C:\Users\mtoews\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.10.3 requests/2.26.0 CPython/3.9.7 Windows/10 Windows/10.0.17134
          administrator : False
             netrc file : None
           offline mode : False

REF: Update grid mapping attribute handling

https://github.com/corteva/geocube/runs/2592740629

 ValueError: failed to prevent overwriting existing key grid_mapping in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

This change likely has something to do with it: corteva/rioxarray#284

DOC: Missing file link in example

I'd like to walk through the example of zonal statistics, but step [3] is fail:
https://corteva.github.io/geocube/html/examples/zonal_statistics.html

The link to the TIF file returns a 404.
https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/n42w091/USGS_13_n42w091.tif

DOC: rasterize_image with MergeAlg.add always returns dataset with only NaN's

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

import geopandas as gpd
import functools
from geocube.api.core import make_geocube
from geocube.rasterize import rasterize_image
from rasterio.enums import MergeAlg

ra_image = functools.partial(rasterize_image, merge_alg=MergeAlg.add)

gdfdata = gpd.read_file(
    "https://github.com/corteva/geocube/blob/master/test/test_data/input/time_vector_data.geojson",
    crs="epsg:4326",
)

gdfdata["test_attr"] = gdfdata["test_attr"].div(gdfdata["test_attr"]).astype({"test_attr": "int16"})

test = make_geocube(
    vector_data=gdfdata,
    measurements=["test_attr"],
    datetime_measurements=None,
    interpolate_na_method=None,
    resolution=[0.1, 0.1],
    rasterize_function=ra_image
)

Problem description

I have a point dataset which I want to rasterize. There are multiple points for each target pixel. I want to "sum" up the values of all points in that pixel while rasterizing.

Expected Output

If the rasterize_image function is used with merge_alg=MergeAlg.add each pixel in the resulting dataset should contain the sum of the values of all points that fall within that pixel.

Environment Information

geocube v0.1.2.dev0

GDAL deps:
fiona: 1.8.21
GDAL[fiona]: 3.4.1
rasterio: 1.2.10
GDAL[rasterio]: 3.3.2

Python deps:
appdirs: 1.4.4
click: 8.0.4
datacube: 1.8.6
geopandas: 0.10.2
rioxarray: 0.10.2
pyproj: 3.3.0
xarray: 2022.3.0

System:
python: 3.8.12 (default, Sep 13 2021, 09:11:55) [GCC 9.1.0]
executable: /bin/python3
machine: Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.2.5

Installation method

ImportError: cannot import name 'crs_to_wkt'

Hi,
I have installed geocube using conda but I have issues with the import when I do

from geocube.api.core import make_geocube

I get the import error set as title of this issue.

Installation method/steps

conda install -c conda-forge geocube

Environment Information

geocube.show_version() gives the following

geocube v0.0.15
GDAL deps:
fiona: 1.8.18
GDAL[fiona]: 3.1.4
rasterio: 1.2.0
GDAL[rasterio]: 3.1.4
Python deps:
appdirs: 1.4.4
click: 7.1.2
datacube: 1.8.3
geopandas: 0.8.1
rioxarray: 0.3.0
pyproj: 2.6.1.post1
xarray: 0.16.2
System:
python: 3.6.11 | packaged by conda-forge | (default, Aug 5 2020, 20:09:42) [GCC 7.5.0]
executable: /home/decide/miniconda2/envs/geo_env/bin/python
machine: Linux-4.15.0-135-generic-x86_64-with-debian-buster-sid

	def _minimize_dtype(dtype: numpy.dtype, fill: float) -> numpy.dtype:
	"""
	If int64, convert to float64:
	https://github.com/OSGeo/gdal/issues/3325

	Attempt to convert to float32 if fill is NaN and dtype is integer.
	"""
	if numpy.issubdtype(dtype, numpy.integer):
	if dtype.name == "int8":
	# GDAL/rasterio doesn't support int8
	dtype = numpy.dtype("int16")
	if dtype.name == "int64":
	# GDAL/rasterio doesn't support int64
	dtype = numpy.dtype("float64")

corteva / geocube Goto Github PK

geocube's People

Contributors

Stargazers

Watchers

Forkers

geocube's Issues

Discussed in #61

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Environment Information

Installation method

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of geocube --version 0.0.11

Problem description

Environment Information

Environment Information

Installation method

Conda environment information (if you installed with conda):

Imports

Key versions

Load input vector data as GeoDataFrame

Load elevation data as DataArray for use in like= parameter:

Now attempt a second call to make_geocube

Workarounds

Summary

Code Sample

Problem description

Expected Output

Environment Information

Installation method

Conda environment information

Discussed in #151

Issue Description

Dependency Graph between Python and Shared Libraries

Suggested Vulnerability Patch Versions

Summary

Code Sample

Problem description

Expected Output

Environment Information

Installation method

Conda environment information

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Environment Information

Installation method

Installation method/steps

Environment Information

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Output of `geocube --version 0.0.11`

Load elevation data as `DataArray` for use in `like=` parameter:

Now attempt a second call to `make_geocube`