GithubHelp home page GithubHelp logo

thinkingmachines / geowrangler Goto Github PK

View Code? Open in Web Editor NEW
45.0 5.0 14.0 73.72 MB

🌏 A python package for wrangling geospatial datasets

Home Page: https://geowrangler.thinkingmachin.es/

License: MIT License

Python 5.35% Jupyter Notebook 94.64% Makefile 0.02%
geopandas gis python

geowrangler's Introduction

Geowrangler

License: MIT Code style: black Code style: isort Code style: flake8 Versions Docs

Tools for wrangling with geospatial data

Overview

Geowrangler is a python package for geodata wrangling. It helps you build data transformation workflows with no out-of-the-box solutions from other geospatial libraries.

We have surveyed our past geospatial projects to extract these solutions for our work and hope it will be useful for others as well.

Our audience are researchers, analysts and engineers delivering geospatial projects.

We welcome your comments, suggestions, bug reports and code contributions to make Geowrangler better.

Modules

  • Grid Tile Generation
  • Geometry Validation
  • Vector Zonal Stats
  • Raster Zonal Stats
  • Area Zonal Stats
  • Distance Zonal Stats
  • Demographic and Health Survey (DHS) Processing Utils
  • Geofabrik (OSM) Data Download
  • Ookla Data Download

Check this page for more details about our Roadmap

Installation

pip install geowrangler

Documentation

The documentation for the package is available here

Development

Development Setup

If you want to learn more about Geowrangler and explore its inner workings, you can setup a local development environment. You can run geowrangler's jupyter notebooks to see how the different modules are built and how they work.

Pre-requisites

  • OS: Linux, MacOS, Windows Subsystem for Linux (WSL) on Windows

  • Requirements:

    • python 3.7 or higher
    • virtualenv, venv or conda for python environment virtualization
    • poetry for dependency management

Github Repo Fork

If you plan to make contributions to geowrangler, we encourage you to create your fork of the Geowrangler repo.

This will then allow you to push commits to your forked repo and then create a Pull Request (PR) from your repo to the main geowrangler repo for approval by geowrangler's maintainers.

Development Installation

We recommend creating a virtual python environment via virtualenv or conda for your development environment. Please see the relevant documentation for more details on installing these python environment managers. Afterward, continue the geowrangler setup instructions below.

Set-up with virtualenv

First, install libgeos ( version >=3.8 required for building pygeos/shapely) through the ff. commands.

See libgeos documentation for installation details on other systems.

sudo apt update # updates package info
sudo apt install build-essential # installs GCC
sudo apt install libgeos-dev

Next, set-up your Python env with virtualenv and install pre-commits and the necessary python libs by running the following commands.

Remember to replace <your-github-id> in the github url below with your GitHub ID to clone from your fork.

git clone https://github.com/<your-github-id>/geowrangler.git
cd geowrangler
virtualenv -p /usr/bin/python3.9 .venv
source .venv/bin/activate
pip install pre-commit "poetry>=1.2.0"
pre-commit install
poetry install

You're all set! Run the tests to make sure everything was installed properly.

Whenever you open a new terminal, you can cd <your-local-geowrangler-folder> and run poetry shell to activate the geowrangler environment.

Set-up with conda

Run the following commands to set-up a conda env and install geos.

conda create -y -n geowrangler-env python=3.9 # replace geowrangler-env if you prefer a different env name
conda deactivate # important to ensure libs from other envs aren't used
conda activate geowrangler-env
conda install -y geos

Then run the following to install the pre-commits and python libs.

cd geowrangler # cd into your geowrangler local directory
pip install pre-commit poetry>=1.2.0
pre-commit install
poetry install

You're all set! Run the tests to make sure everything was installed properly.

Whenever you open a new terminal, run conda deactivate && conda activate geowrangler-env to deactivate any conda env, and then activate the geowrangler environment.

Jupyter Notebook Development

The code for the geowrangler python package resides in Jupyter notebooks located in the notebooks folder.

Using nbdev, we generate the python modules residing in the geowrangler folder from code cells in jupyter notebooks marked with an #export comment. A #default_exp <module_name> comment at the first code cell of each notebook directs nbdev to put the code in a module named <module_name> in the geowrangler folder.

See the nbdev cli documentation for more details on the commands to generate the package as well as the documentation.

Running notebooks

Run the following to view the jupyter notebooks in the notebooks folder

poetry run jupyter lab

Generating and viewing the documentation site

To generate and view the documentation site on your local machine, the quickest way is to setup Docker. The following assumes that you have setup docker on your system.

poetry run nbdev_build_docs --mk_readme False --force_all True
docker-compose up jekyll

As an alternative if you don't want to use Docker you can install jekyll to view the documentation site locally.

nbdev converts notebooks within the notebooks/ folder into a jekyll site.

From this jekyll site, you can then create a static site.

To generate the docs, run the following


poetry run nbdev_build_docs -mk_readme False --force_all True
cd docs && bundle i && cd ..

To run the jekyll site, run the following

cd docs
bundle exec jekyll serve

Running tests

We are using pytest as our test framework. To run all tests and generate a generate a coverage report, run the following.

poetry run pytest --cov --cov-config=.coveragerc -n auto

To run a single test or test file

# for a single test function
poetry run pytest tests/test_grids.py::test_create_grids
# for a single test file
poetry run pytest tests/test_grids.py

Contributing

Please read CONTRIBUTING.md and CODE_OF_CONDUCT.md before anything.

Development Notes

For more details regarding our development standards and processes, please see our wiki.

geowrangler's People

Contributors

alronlam avatar butchtm avatar dependabot[bot] avatar joshuacortez avatar jtmiclat avatar rnvllflores avatar tm-abby-moreno avatar tm-dafrose-bajaro avatar tm-jace-peralta avatar tm-jc-nacpil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

geowrangler's Issues

Vector Zonal Stats: Error when aligning OSM POIs to an Indonesian Regency

Colab notebook for testing:

Scenario:

  • Created an AOI GDF representing the Subang Regency of Indonesia.
  • Tried simply counting the POIs within this area, but got the error seen in the screenshot:

gw_aoi_index_error

gw_aoi_index_error_full_stacktrace

Could it be a typo? Since the key error mentions "GeoWrangleer_aoi_index"

In case it's relevant, noting that the function works when I passed in grid tiles instead of the Subang Regency gdf itself.
gw_aoi_works

Getting the `CalledProcessError` when running `poetry install`

I ran poetry install and got the following error

• Updating nbdev (1.2.8 -> 1.2.9 c151342): Failed

  CalledProcessError

  Command '['git', '--git-dir', '/home/jt/.cache/pypoetry/virtualenvs/geowrangler-U9oiUrW5-py3.9/src/nbdev/.git', '--work-tree', '/home/jt/.cache/pypoetry/virtualenvs/geowrangler-U9oiUrW5-py3.9/src/nbdev', 'checkout', 'c15134220b4d6b96dd67952e27a57a8e5c1bf4c3']' returned non-zero exit status 128.

  at ~/.poetry/lib/poetry/utils/_compat.py:217 in run
      213│                 process.wait()
      214│                 raise
      215│             retcode = process.poll()
      216│             if check and retcode:
    → 217│                 raise CalledProcessError(
      218│                     retcode, process.args, output=stdout, stderr=stderr
      219│                 )
      220│         finally:
      221│             # None because our context manager __exit__ does not use them.
      

This error also came up once we merged #30 to master https://github.com/thinkingmachines/geowrangler/runs/6833093902 but not when the PR was reviewed: https://github.com/thinkingmachines/geowrangler/actions/runs/2453771035

Fix 02_vector_zonal_stats.html

From https://github.com/thinkingmachines/geowrangler/runs/7194843190?check_suite_focus=true

converting: /home/runner/work/geowrangler/geowrangler/notebooks/index.ipynb
converting: /home/runner/work/geowrangler/geowrangler/notebooks/02_vector_zonal_stats.ipynb
An error occurred while executing the following cell:
------------------
show_doc(_fix_agg)
------------------
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 show_doc(_fix_agg)
NameError: name '_fix_agg' is not defined
NameError: name '_fix_agg' is not defined
converting: /home/runner/work/geowrangler/geowrangler/notebooks/tutorial.geometry_validation.ipynb
converting: /home/runner/work/geowrangler/geowrangler/notebooks/00_validation.ipynb
converting: /home/runner/work/geowrangler/geowrangler/notebooks/tutorial.grids.ipynb
converting: /home/runner/work/geowrangler/geowrangler/notebooks/00_grids.ipynb
converting: /home/runner/work/geowrangler/geowrangler/notebooks/tutorial.vector_zonal_stats.ipynb
Conversion failed on the following:
02_vector_zonal_stats.ipynb

Bing Tile vector zonal stats: auto-conversion of quadkey to str

Ran into this error when trying to do bing tile vector zonal stats. Root cause was because of loading a dataframe from a CSV. The quadkeys were being automatically interpreted as ints by pandas.

On the user-side, it's easy enough to just manually fix this and change the column dtype to string. But wondering if in the bing tile vector zonal stats function, we can (and should) just do this auto-conversion for convenience (at least just for the purposes of joining based on quadkeys)?

bingtile_quadkey_autostr


Related to Feature #42

Ookla Data Download

Download and cache from Ookla’s S3 bucket given parameters (wired/wireless, year, quarter).

Vector Zonal Stats: Support for vector data with non-point geometry (e.g. tiles)

Creating an issue to track that Vector Zonal Stats currently only primarily supports point geometries.

There is a work-around where you simply convert these non-point geometries into points by getting the centroids. From there you can utilize the VZS feature already. But that might cause some imprecision.

A popular dataset example that doesn't come in the form of points is Ookla, where data is in the form of tiles.

Case 1: AOI Tile is smaller than data tile (Ookla)
For example, if your AOI tiles are 1m x 1m, but Ookla tiles are 30m x 30m, and assuming they are aligned (all AOI tiles are within an Ookla tile), then ideally, the average download speed of the Ookla tile should be attributed to all 900 AOI tiles within it. However, if we convert to point geometries, only one of the AOI tiles will intersect with the Ookla tile centroid and get the right attributes; the rest will be null.

Case 2: AOI Tile is bigger than data tile (Ookla)
In the reverse case where the AOI tile is bigger than the Ookla tile (which is likely the majority case), this should be more tolerable because it is likely that each AOI tile will get multiple Ookla tile centroids anyway, leading to a reasonable approximation.

Vector Zonal Stats: Colab crashing due to RAM consumption

Colab notebook for testing:
https://colab.research.google.com/drive/147HWUgaBztsZuBPrI_HTckBrz_vl9l1l#scrollTo=wvLenjgDUgod

Scenario

  • Created AOI grid tiles for the Subang Regency in Indonesia (~36k 250mx250m grid tiles)
  • Tried to get the average population density per tile using HRSL vector data (CSV file is around 1.8GB)

Error
Colab crashes due to exceeding the RAM limit.
gw_vzs_hrsl

Just creating this issue to check if there are straightforward ways to optimize. Otherwise, are there workarounds for handling such vector datasets that are relatively large?

Distinguishing between grids and tiles

This might be pedantic, but what do you think about distinguishing between grids and grid tile for added clarity?

  • Grid Tile (or tile): a single square
  • Grid: the collection of squares

This will imply some renaming in the GridGenerator class

Geometry Validation: Catching polygons with less than 3 unique vertices

This could be a feature to consider for Geowrangler Geometry Validation

I encountered an error where I tried to upload a geopandas dataframe to BQ and it said

GenericGBQException: Reason: 400 Error while reading data, error message: Invalid geography value for column 'geometry', error: Polygon loop should have at least 3 unique vertices, but only had 2; in WKB geography

It turns out there was a "polygon" that was actually a line. I verified it by computing the area which was actually 0.

'POLYGON ((122.95320551089915 11.473736609261481, 122.952381 11.4737421, 122.95320551089915 11.47373660926148, 122.95320551089915 11.473736609261481))'

The weird thing is it's not caught by is_valid on the epsg:4326 GeoSeries but it's caught by is_valid when the GeoSeries was projected to epsg:3123. I expected is_valid to return FALSE even if the polygon was not projected.

Perhaps this can be something geowrangler's geometry validation can also catch?

Change default fillna to be false and allow configurable fill value

Discussed in #62

Originally posted by mosesckim June 30, 2022
Noticed default is to fill NaNs with zero after aggregation; this might make it difficult to identify original NaNs if there are actual zeros in the aggregation.

https://github.com/thinkingmachines/geowrangler/blob/master/geowrangler/vector_zonal_stats.py#L205

Also, a suggestion would be, in the case fillna option is set to True, make the replacement value (0 in this case at the moment) a variable users can input (e.g. -1, etc.)

Thanks!
Moses

Enable "cache" behavior by default when downloading OSM data

Hello, noticed that the OSM data download does not support any kind of caching.

It would be nice to support this natively so users don't have to keep writing their own file existence checks when they need to re-run cells in a Jupyter notebook or re-run scripts. Otherwise, the line of code would download the file again, resulting to long runtimes.

Maybe we can:

  • Add an optional param like overwrite to the function. E.g. geofabrik.download_geofabrik_region("laos", "../test_dir", overwrite=False)
  • Make this False by default, so that caching is enabled by default.
  • This param will still allow users to overwrite the file should they wish to do so (e.g. it's been a while and they want to get a newer version of the OSM data).

Clarification about grid generation units

Hi! I'm currently using the grid generation tutorial and would like to clarify if the output grids in the example are in degrees or meters?

In the create grids section, the tutorial mentions that the units of the grids are dependent on the units of the projection.

Create a grid generator with a size of 50000. The units of the grid size are dependent on the projection of the geodataframe, in this case, EPSG:4326.

When printing out the CRS of region3_gdf, it shows that the units are in decimal degrees.

>> region3_gdf.crs  # CRS info
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

However, the output grids in the tutorial are described to be in kilometer scale:

grid_generator5k = grids.SquareGridGenerator(
    region3_gdf, 5000
)  # 5 km x 5 km square cells

Let me know also if this is the correct place for this question or if I should place this in Discussions instead. Thanks! :D

Add more broken geometry examples

@ncdejito created additional broken geometries to add to the testing

{
"type": "FeatureCollection",
"name": "broken2",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "description": "correct" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343802163610064, 16.376524213304915 ], [ 120.343822299718241, 16.37652526707825 ], [ 120.343823581106932, 16.376508933591005 ], [ 120.343802895832169, 16.376508406704307 ], [ 120.343802163610064, 16.376524213304915 ] ] ] } },
{ "type": "Feature", "properties": { "description": "counterclockwise coordinates" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343795024444461, 16.376520173840436 ], [ 120.343777268058162, 16.376519295695978 ], [ 120.343779647780039, 16.376504894126214 ], [ 120.343797038055257, 16.376506123528557 ], [ 120.343795024444461, 16.376520173840436 ] ] ] } },
{ "type": "Feature", "properties": { "description": "self-intersecting polygons (e.g. twirled edges)" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343746880840385, 16.376508231075398 ], [ 120.34376646778199, 16.376510689880028 ], [ 120.343768481392758, 16.376492424473494 ], [ 120.343748528340186, 16.376490141297555 ], [ 120.343759511671877, 16.376513148684623 ], [ 120.343746880840385, 16.376508231075398 ] ] ] } },
{ "type": "Feature", "properties": { "description": "slither polygons" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343721070010858, 16.376486453090216 ], [ 120.343736263619718, 16.376488736266193 ], [ 120.34373571445316, 16.376486804348058 ], [ 120.343721070010858, 16.376486453090216 ] ] ] } },
{ "type": "Feature", "properties": { "description": "coordinates outside of -180,180" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 71.163398107805307, -88.300673651538673 ], [ 77.641630757327448, -88.362103634964285 ], [ 75.050337697518586, -88.738011928477775 ], [ 69.867751577900847, -88.612832998982199 ], [ 71.163398107805307, -88.300673651538673 ] ] ] } },
{ "type": "Feature", "properties": { "description": "holes" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343828569370118, 16.376482940511725 ], [ 120.343827196453617, 16.376463972586816 ], [ 120.343804589095839, 16.376465289803889 ], [ 120.34380632812335, 16.376483818656357 ], [ 120.343828569370118, 16.376482940511725 ] ], [ [ 120.343810080761713, 16.376468538939285 ], [ 120.343823352287544, 16.376467397351178 ], [ 120.343824725204001, 16.376478901046394 ], [ 120.343810263817204, 16.376479252304257 ], [ 120.343810080761713, 16.376468538939285 ] ] ] } },
{ "type": "Feature", "properties": { "description": "non-closed polygon" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343799646596622, 16.37644096519395 ], [ 120.343815206316521, 16.376441140822909 ], [ 120.343814840205468, 16.376428319908158 ], [ 120.343800012707675, 16.376429022424055 ] ] ] } },
{ "type": "Feature", "properties": { "description": "multipolygon" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 120.343776581599954, 16.376478374159603 ], [ 120.343789395487008, 16.376478549788533 ], [ 120.343789578542498, 16.376465553247304 ], [ 120.343776947711007, 16.376466607020941 ], [ 120.343776581599954, 16.376478374159603 ] ] ], [ [ [ 120.343764499935091, 16.376454137365769 ], [ 120.343777313822116, 16.376453786107874 ], [ 120.343778046044221, 16.376441843338764 ], [ 120.343765781323782, 16.376442370225647 ], [ 120.343764499935091, 16.376454137365769 ] ] ], [ [ [ 120.343738872161055, 16.376437979501546 ], [ 120.343751136881494, 16.376438155130508 ], [ 120.343752601325718, 16.376426036731463 ], [ 120.343739970494241, 16.376425685473517 ], [ 120.343738872161055, 16.376437979501546 ] ] ] ] } },
{ "type": "Feature", "properties": { "description": "polygon z" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343707386610134, 16.376456069284234, 0.0 ], [ 120.343727888829335, 16.376455718026328, 0.0 ], [ 120.343720200497131, 16.376444653402149, 0.0 ], [ 120.343707752721187, 16.376445531546949, 0.0 ], [ 120.343707386610134, 16.376456069284234, 0.0 ] ] ] } },
{ "type": "Feature", "properties": { "description": "complex self-intersecting polygon" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 120.343830354161554, 16.376432886261462 ], [ 120.343846096936986, 16.376432008116602 ], [ 120.343845547770428, 16.376417782169376 ], [ 120.34382962193942, 16.376418309056326 ], [ 120.343834564438708, 16.376425685473521 ], [ 120.343839140826901, 16.376425685473521 ], [ 120.343838591660344, 16.376421821635979 ], [ 120.343834747494199, 16.376421646006996 ], [ 120.343830354161554, 16.376432886261462 ] ] ] } }
]
}

Create meta function to install necessary packages and resources to run in colab

Currently we are installing a bunch of things within notebooks just so it works in collab

"# hide\n",
"# no_test\n",
"! [ -e /content ] && pip install -Uqq git+https://github.com/thinkingmachines/geowrangler.git\n",
"! [ -e /content ] && pip install -Uqq git+https://github.com/butchland/nbdev.git@add-black-format\n",
"# downgrade tornado if in colab\n",
"! [ -e /content ] && pip install -Uqq tornado==5.1.0"

Is this something we should keep doing?

Bing Tile vector zonal stats much slower compared to the regular one

Tried refactoring my code to generate Ookla vector zonal stats using Bing Tile quadkeys instead of the regular spatial joins.

In theory, it should be faster. But in practice, regular vector zonal stats takes a few seconds (~15-30s) while the bing tile version takes too long (tried running for ~13mins and interrupted it). Not sure if I'm doing something wrong.

Here's a Colab notebook to replicate it:
https://colab.research.google.com/drive/1IdwTu2oQjL6fBPwe-Kgk1KReeIc0n8NW#scrollTo=3FYr2rJu-wwj

The notebook needs this file: phl_tiles.csv

A current hunch is that doing raw spatial joins benefits from spatial indexes, but the implementation of matching/aggregating by quadkeys does not.


Related to Feature #42

Decoupling the bounding box from the AOIs

In the GridGenerator class here, the overall bounding box (minx, miny, maxx, maxy) are automatically derived from the projected gdf. Perhaps it would be good to make the overall bounding box be an optional parameter. If None, it could automatically compute the bounding box from the gdf, otherwise it uses the user defined bounding box.

Having a user defined bounding box is useful in making consistently defined grids. This is useful in cases where our AOIs are not necessarily encompass the entire country. We can define our overall bounding box based on the country admin boundaries, and regardless of what AOI we supply as a gdf input, the x and y coordinates of the grid tiles remain consistent.

DHS Wealth Index Calculation

Calculate DHS Wealth Index given specified data. Useful when re-calculating the wealth index across multiple countries. Or when applying the same procedure to non-DHS surveys (e.g. Indonesia Susenas).

Add default aggregations param value for create_distance_zonal_stats

Since the create_distance_zonal_stats function will always have one aggregation by default (the distance to nearest), maybe we can just add set default aggregations=[] for convenience? It's useful when all I really want to do is just get distance to nearest.

Noting this could be a good first issue.

aggregations_default

Update the nbdev version to nbdev2

This will improve the doc site aesthetics (see new nbdev)

  • replaces jekyll w/ quarto for site generation
  • removes need for custom nbdev (since black formatting is already implemented in nbdev2)
  • lots of other improvements

Caveats:

  • some features (e.g. source links) are still being worked on
  • nbdev2 commands have changed slightly and will require readjustments esp. w/ GH workflows.
  • need to test the update thoroughly before deploying to prod.

Optimizing grid tile generation

I think there’s room for further optimization especially for generate_grids in the GridGenerator class. Right now the grid tiles are first generated across the entire span of xrange and yrange and then filtered out after. While this isn’t an issue for very coarse grids, this can easily run into runtime and memory issues for fine grids.

Instead of generating all the tiles and then filtering after, we can generate only the grid tiles we need.

  1. This drastically saves on memory since our gdf won’t need to store tiles it doesn’t need.
  2. This saves on runtime since we don’t need to generate unnecessary tiles, and don’t need to reproject unnecessary tiles.

To determine which grid tiles to generate in the first place, we can use the cheapest possible geometric operations.

  1. Looping through xrange and yrange and intersecting tiles with the gdf’s unary_union can be expensive since the unary_union is a single geometry most likely has a large number of points.
  2. We can instead generate tiles per polygon within the gdf. (If there are multipolygons we can explode to polygons). Per polygon, we can:
    1. Find the bounding box of the polygon. (should be cheap)
    2. Determine which tiles are within the polygon’s bounding box. (should be cheap since you don’t need to use shapely operations)
    3. Check which tiles are within the polygon itself by doing an intersection. (this is the most computationally expensive part but mitigated since we have less tiles to compare with, and a single polygon should have less points than the unary union)
    4. Avoid generating tiles that have already been generated. (can be a quick Python set lookup) This is relevant in cases where two polygons share the same tiles.

Can make a PR for this too!

Catch case where the aoi is outside the bounds of the grid generator

The following errors out if no polygons/cells are within the aoi

for x_idx, x in enumerate(xrange):
for y_idx, y in enumerate(yrange):
polygons.append(
{
"x": x_idx + x_idx_offset,
"y": y_idx + y_idx_offset,
"geometry": self.create_cell(x, y),
}
)
dest = GeoDataFrame(polygons, geometry="geometry", crs=self.grid_projection)

Catch instances like this making empty dataframes if polygon is nullish

    if polygons:
        dest = GeoDataFrame(polygons, geometry="geometry", crs=self.grid_projection)
        dest_reproject = dest.to_crs(self.gdf.crs)
        final = dest_reproject[dest_reproject.intersects(self.gdf.unary_union)]
        return final
    else:
        return GeoDataFrame({"x":[], "y":[], "geometry":[]}, geometry="geometry", crs=self.gdf.crs)

Fix issue with grid generator when mask returns all false but are within bounds

In lines

def get_range_subset(
self, x_min: float, y_min: float, x_max: float, y_max: float, cell_size: float
) -> Tuple[float, List[float], float, List[float]]:
"""Returns a subset of grids from the orginal boundary based on the boundary and a grid size"""
xrange = np.arange(self.x_min, self.x_max, cell_size)
yrange = np.arange(self.y_min, self.y_max, cell_size)
x_mask = (xrange >= x_min) & (xrange <= x_max)
y_mask = (yrange >= y_min) & (yrange <= y_max)

there is a case the resulting x_mask returns false if x_min and x_max, the bounds of the aoi, is less then the cell size

self.x_min is 12621582.219997052
self.x_max is 14243844.181000795
cell_size = 100
x_min = 13762392.958057601
x_max = 13762473.616812669

in this scenario, the following returns an empty array

 xrange = np.arange(self.x_min, self.x_max, cell_size)
 np.nonzero(x_mask)

the solution is to add a buffer to the x_max

 x_mask = (xrange >= x_min) & (xrange <= x_max + cell_size)

Multiband support for Raster zonal stats

Right now, the create_raster_zonal_stats method of the raster zonal stats module only support single bands per call. Being able to handle multiple bands might be a nice enhancement to minimize having to do multiple passes on the same raster file across the same zones.
The underlying rasterstats modules has some suggestions on how to handle this (which I'm documenting below so that it may serve as a future reference for an implementation)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.