ungarj / mapchete Goto Github PK

View Code? Open in Web Editor NEW

193.0 7.0 28.0 11 MB

Tile-based geodata processing using rasterio & Fiona

License: MIT License

Python 99.19% HTML 0.81%

tile pyramid raster geoprocessing rasterio gis vector fiona earth-observation earth-sciences

mapchete's Introduction

Tile-based geodata processing.

Mapchete processes raster and vector geodata in digestable chunks.

Processing larger amounts of data requires chunking the input data into smaller tiles and process them one by one. Python provides a lot of useful packages to process geodata like shapely or numpy. From within your process code you will have access to the geodata in the form of NumPy arrays for raster data or GeoJSON-like feature dictionaries for vector data.

Internally the processing job is split into tasks which can processed in parallel using either concurrent.futures or build task graphs and use dask to intelligently process either on the local machine or on a cluster.

With the help of fiona and rasterio Mapchete takes care about resampling and reprojecting geodata, applying your Python code to the tiles and writing the output either into a single file or into a directory of files organized in a WMTS-like tile pyramid. Details on tiling scheme and available map projections are outlined in the tiling documentation.

(standard Web Mercator pyramid used in the web)

Usage

You need a .mapchete file for the process configuration. The configuration is based on the YAML syntax.

process: my_python_process.py  # or a Python module path: mypythonpackage.myprocess
zoom_levels:
    min: 0
    max: 12
input:
    dem: /path/to/dem.tif
    land_polygons: /path/to/polygon/file.geojson
output:
    format: PNG_hillshade
    path: /output/path
pyramid:
    grid: mercator

# process specific parameters
resampling: cubic_spline

You also need either a .py file or a Python module path where you specify the process itself.

def execute(mp, resampling="nearest"):

    # Open elevation model.
    with mp.open("dem") as src:
        # Skip tile if there is no data available or read data into a NumPy array.
        if src.is_empty(1):
            return "empty"
        else:
            dem = src.read(1, resampling=resampling)

    # Create hillshade using a built-in hillshade function.
    hillshade = mp.hillshade(dem)

    # Clip with polygons from vector file and return result.
    with mp.open("land_polygons") as land_file:
        return mp.clip(hillshade, land_file.read())

You can then interactively inspect the process output directly on a map in a browser (first, install dependencies by pip install mapchete[serve] go to localhost:5000):

$ mapchete serve hillshade.mapchete --memory

The serve tool recognizes changes in your process configuration or in the process file. If you edit one of these, just refresh the browser and inspect the changes (note: use the --memory flag to make sure to reprocess each tile and turn off browser caching).

Once you are done with editing, batch process everything using the execute tool.

$ mapchete execute hillshade.mapchete

Documentation

There are many more options such as zoom-dependent process parameters, metatiling, tile buffers or interpolating from an existing output of a higher zoom level. For deeper insights, please go to the documentation.

Mapchete is used in many preprocessing steps for the EOX Maps layers:

Merge multiple DEMs into one global DEM.
Create a customized relief shade for the Terrain Layer.
Generalize landmasks & coastline from OSM for multiple zoom levels.
Extract cloudless pixel for Sentinel-2 cloudless.

Installation

via PyPi:

$ pip install mapchete

from source:

$ git clone [email protected]:ungarj/mapchete.git && cd mapchete
$ pip install .

To make sure Rasterio, Fiona and Shapely are properly built against your local GDAL and GEOS installations, don't install the binaries but build them on your system:

$ pip install --upgrade rasterio fiona shapely --no-binary :all:

To keep the core dependencies minimal if you install mapchete using pip, some features are only available if you manually install additional dependencies:

# for contour extraction:
$ pip install mapchete[contours]

# for dask processing:
$ pip install mapchete[dask]

# for S3 bucket reading and writing:
$ pip install mapchete[s3]

# for mapchete serve:
$ pip install mapchete[serve]

# for VRT generation:
$ pip install mapchete[vrt]

License

MIT License

mapchete's People

Contributors

Stargazers

Watchers

mapchete's Issues

mapchete_serve multiprocessing

quick solution: get neighbor tiles and process parallel as it can be assumed, these tiles will be requested next

complex solution: multi-threaded on demand tile processing, including locks

zoom level filtering error on input files

input_files:
DEM:
zoom=1: file1.vrt
zoom=2: file2.vrt
zoom=3: file3.vrt
``
config parser returns file1.vrt for all zoom levels

implement .mapchete files as input files

read from tile pyramids defined in .mapchete file
process tiles if missing

write basic documentation

basic concept
command line usage
modules (tilematrix, tilematrix_io)

use geotiff.js for serve client

awesome geotiff.js can render GeoTIFFs on OpenLayers

baselevel interpolation fails when using pixelbuffer

process pixelbuffer setting causes the following error:

ValueError: could not broadcast input array from shape (3,256,256) into shape (3,276,276)

automatically determine max zoom level

read raster pixel resolution and determine highest zoomlevel without oversampling

create read_vector_window()

read vector data and clip to tile (+buffer)

automatic zoom to output in web client upon startup

performance issue on larger VRTs

silent errors in user process

errors raised in user process do not propagate to the mapchete command line tool

rethink input_files parameter

input_files should be rather named input as it could potentially include non-file based data (e.g. a PostGIS layer
enable passing on metadata per input, e.g. besides path also the bounding box as it would speed up configuration parsing when having many inputs

data cut off on mercator grid

when there is just one metatile left on a zoom level, there is a data loss (or nodata stripe) on the eastern boundary of the map next to the antimeridian

expose all output formats to REST endpoint

not just PNG and PNG_hillshade, also GTiff and GeoJSON, even if standard client cannot handle the data (raise warning if used with mapchete serve)

maybe an additional metatiling setting is required: web_metatiling, which should behave like: process_metatiling >= output_metatiling >= web_metatiling

use other contour extraction method

to get rid of the matplotlib dependency

possible approaches:

use rasterio.features
GDAL

examine if write function can be removed from user process

it could be more convenient to simply let the user return an array and mapchete handles the rest

add "mixed" (PNG/JPEG) as viewing output format

generalize vector

move generalize function to plugin

test data validity before write_raster

check if numpy array / tuple and convert if possible; return appropriate error if invalid

create write_vector_window()

clip to tile boundaries & write vector data

s2a: rgb creation

read s2a bands
create rgb png

add --debug flag for serve

enable flask's stacktrace propagation

fix metatiling bug

in mapchete_serve, metatiling is not correctly handled

examine read_vector_window for huge OSM coastline dataset

... to be used for e.g. clipping

merge_dem: rescaling

select primary DEM
select secondary DEMs if necessary
rescale
simple overlay

create basic installer

automatic installation of dependencies

write fails when vector schema has just one entry

schema needs more than one item otherwise mapchete file parsing will fail

try out celery as backend

examine using celery instead of multiprocessing

invalid geometries while reading and clipping vector

https://github.com/ungarj/mapchete/blob/pyramid_seed/mapchete/commons.py#L130

string assertion error on some machines

on a test machine, this line threw an error: https://github.com/ungarj/mapchete/blob/master/mapchete/io/raster.py#L132

A string was passed, so it's unclear what caused the error.

use rasterio's mask handling approach

https://github.com/mapbox/rasterio/blob/master/rasterio/_io.pyx#L571

Comparison with PyWPS

Hi folks, sorry for making a GH issue on this, but I couldn't find a good way to get in touch.

I watched the FOSS4G presentation on this tool, and I'm very interested. mapchete looks great for my use case, trying to break a never-ending PostGIS spatial join query into bite-sized chunks.

During the FOSS4G presentation, in the questions, a crowd member stated something along the lines of:

Why didn't you use PyWPS?... It does everything
(emphasis added)

I'm currently verifying the truth behind that comment, and I'd love to hear the mapchete maintainers give their thoughts on the crossover between the 2 libraries, and what PyWPS is missing that mapchete provides.

I'm yet to start experimenting with either package, but I've read the docs and some source code for both. My initial (possibly incorrect) take is that:

mapchete is focused on running custom geoprocessing jobs, while PyWPS is for creating a standards-compliant web server to wrap these geoprocessing jobs
mapchete supports spatial tiling for processing jobs, PyWPS does not (this is a big one, arguably the main appeal of mapchete)
mapchete is analagous to a geospatially-focused MapReduce library, while PyWPS is an open-source version of an ArcGIS Server Geoprocessing service
mapchete uses modern Pythonic libraries such as Fiona and rasterio to assist with code-driven processing, while PyWPS requires the use of OGC-standard XML documents for running processes

It's probably evident that I'm excited about mapchete, and not excited about PyWPS. After 2 hours of research, including watching the FOSS4G presentation, I've concluded that the crowd comment was misleading, and I should ignore it and move forward with mapchete.

However, 2 hours isn't very much research, and I'm quite happy to be mistaken in my assumptions.

What are the main overlaps and differentiators between mapchete and PyWPS?

To be clear, my use case is for speeding up batch geoprocessing jobs for internal company use. I'm interested in the tiling mechanisms mostly. I don't need to create a OGC-compliant web server for XML-based geoprocessing - I already have a simple JSON API which allows me to trigger these jobs.

hillshade raises warnings & is probably slow

try to:

find proper GDAL bindings, or
implement in Cython

have a look at optimized algorithm in GDAL

python 3 support

error when empty feature list in clip_array_with_vector

merge_dem: YAML config

define input parameters
command line params or yaml

add .SAFE file support for Sentinel-2

merge_dem: clip to coastline

create hillshade function

instead of using GDAL system calls.

configuration preparation inefficient

When reading a file group, according to logging, the configuration is prepared each time. This seems to be inefficient.

return errors to web view

add input data groups

add option to combine input data into iterable groups

add GeoPackage support

segmentize input file bounding box before reprojecting

to get rid of unnecessary tiles at corners

resolve reprojection issue

Reprojection of EUDEM (EPSG:3035) to 4326 didn't work.

rasterio transform_bounds() returns wrong coordinates (maybe I misused it)

update client to ol3 or leaflet

--bounds parameter shall process full tiles

Custom bounding box currently gets intersected with process area, potentially clipping output within a tile. The bounding box however should restrict the potential work tiles to tiles intersecting with the bounding box, not clipping the output.

generalize: invalid geometries

resolve empty loop issue

shapely.geos warnings

shapely raises a lot of warnings (self-intersection, ...)

try:

lower log level when not using --debug flag or
earlier clean up geometries

add handling driver plugins

Handling input and output drivers has become tedious as driver-specific functions (create directories/subdirectories for file-based output, handling DB connections for PostGIS, file extensions for .tif, .SAFE.zip etc.) are all over the code and complicate code maintenance. Furthermore, adding new drivers (e.g. GeoPackage, SQLite caches, etc.) would increase the clutter even more.

Therefore, create a generic class/API with generic functions (read, write, open, is_empty, ...) and every driver instanciates this class but provides it's own implementations. This cleans up the code and makes it possible to create a plug-in system, where other drivers could easily be added to the main package. It also has the effect that many dependencies move from the main package to the plugins (e.g. GeoAlchemy2 from mapchete to mapchete-postgis, bloscpack to mapchete-numpy, sqlite to mapchete-sqlite, lxml to mapchete-safe etc.)

ungarj / mapchete Goto Github PK

mapchete's Introduction

Usage

Documentation

Installation

License

mapchete's People

Contributors

Stargazers

Watchers

Forkers

mapchete's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs