GithubHelp home page GithubHelp logo

ungarj / mapchete Goto Github PK

View Code? Open in Web Editor NEW
193.0 7.0 28.0 11 MB

Tile-based geodata processing using rasterio & Fiona

License: MIT License

Python 99.19% HTML 0.81%
tile pyramid raster geoprocessing rasterio gis vector fiona earth-observation earth-sciences

mapchete's Introduction

Tile-based geodata processing.

Documentation Status

Mapchete processes raster and vector geodata in digestable chunks.

Processing larger amounts of data requires chunking the input data into smaller tiles and process them one by one. Python provides a lot of useful packages to process geodata like shapely or numpy. From within your process code you will have access to the geodata in the form of NumPy arrays for raster data or GeoJSON-like feature dictionaries for vector data.

Internally the processing job is split into tasks which can processed in parallel using either concurrent.futures or build task graphs and use dask to intelligently process either on the local machine or on a cluster.

With the help of fiona and rasterio Mapchete takes care about resampling and reprojecting geodata, applying your Python code to the tiles and writing the output either into a single file or into a directory of files organized in a WMTS-like tile pyramid. Details on tiling scheme and available map projections are outlined in the tiling documentation.

(standard Web Mercator pyramid used in the web)

Usage

You need a .mapchete file for the process configuration. The configuration is based on the YAML syntax.

process: my_python_process.py  # or a Python module path: mypythonpackage.myprocess
zoom_levels:
    min: 0
    max: 12
input:
    dem: /path/to/dem.tif
    land_polygons: /path/to/polygon/file.geojson
output:
    format: PNG_hillshade
    path: /output/path
pyramid:
    grid: mercator

# process specific parameters
resampling: cubic_spline

You also need either a .py file or a Python module path where you specify the process itself.

def execute(mp, resampling="nearest"):

    # Open elevation model.
    with mp.open("dem") as src:
        # Skip tile if there is no data available or read data into a NumPy array.
        if src.is_empty(1):
            return "empty"
        else:
            dem = src.read(1, resampling=resampling)

    # Create hillshade using a built-in hillshade function.
    hillshade = mp.hillshade(dem)

    # Clip with polygons from vector file and return result.
    with mp.open("land_polygons") as land_file:
        return mp.clip(hillshade, land_file.read())

You can then interactively inspect the process output directly on a map in a browser (first, install dependencies by pip install mapchete[serve] go to localhost:5000):

$ mapchete serve hillshade.mapchete --memory

The serve tool recognizes changes in your process configuration or in the process file. If you edit one of these, just refresh the browser and inspect the changes (note: use the --memory flag to make sure to reprocess each tile and turn off browser caching).

Once you are done with editing, batch process everything using the execute tool.

$ mapchete execute hillshade.mapchete

Documentation

There are many more options such as zoom-dependent process parameters, metatiling, tile buffers or interpolating from an existing output of a higher zoom level. For deeper insights, please go to the documentation.

Mapchete is used in many preprocessing steps for the EOX Maps layers:

  • Merge multiple DEMs into one global DEM.
  • Create a customized relief shade for the Terrain Layer.
  • Generalize landmasks & coastline from OSM for multiple zoom levels.
  • Extract cloudless pixel for Sentinel-2 cloudless.

Installation

via PyPi:

$ pip install mapchete

from source:

$ git clone [email protected]:ungarj/mapchete.git && cd mapchete
$ pip install .

To make sure Rasterio, Fiona and Shapely are properly built against your local GDAL and GEOS installations, don't install the binaries but build them on your system:

$ pip install --upgrade rasterio fiona shapely --no-binary :all:

To keep the core dependencies minimal if you install mapchete using pip, some features are only available if you manually install additional dependencies:

# for contour extraction:
$ pip install mapchete[contours]

# for dask processing:
$ pip install mapchete[dask]

# for S3 bucket reading and writing:
$ pip install mapchete[s3]

# for mapchete serve:
$ pip install mapchete[serve]

# for VRT generation:
$ pip install mapchete[vrt]

License

MIT License

Copyright (c) 2015 - 2022 EOX IT Services

mapchete's People

Contributors

dependabot[bot] avatar geowill avatar scartography avatar sevcikp avatar ungarj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mapchete's Issues

mapchete_serve multiprocessing

quick solution: get neighbor tiles and process parallel as it can be assumed, these tiles will be requested next

complex solution: multi-threaded on demand tile processing, including locks

rethink input_files parameter

  • input_files should be rather named input as it could potentially include non-file based data (e.g. a PostGIS layer
  • enable passing on metadata per input, e.g. besides path also the bounding box as it would speed up configuration parsing when having many inputs

data cut off on mercator grid

when there is just one metatile left on a zoom level, there is a data loss (or nodata stripe) on the eastern boundary of the map next to the antimeridian

expose all output formats to REST endpoint

not just PNG and PNG_hillshade, also GTiff and GeoJSON, even if standard client cannot handle the data (raise warning if used with mapchete serve)

maybe an additional metatiling setting is required: web_metatiling, which should behave like: process_metatiling >= output_metatiling >= web_metatiling

Comparison with PyWPS

Hi folks, sorry for making a GH issue on this, but I couldn't find a good way to get in touch.

I watched the FOSS4G presentation on this tool, and I'm very interested. mapchete looks great for my use case, trying to break a never-ending PostGIS spatial join query into bite-sized chunks.

During the FOSS4G presentation, in the questions, a crowd member stated something along the lines of:

Why didn't you use PyWPS?... It does everything
(emphasis added)

I'm currently verifying the truth behind that comment, and I'd love to hear the mapchete maintainers give their thoughts on the crossover between the 2 libraries, and what PyWPS is missing that mapchete provides.

I'm yet to start experimenting with either package, but I've read the docs and some source code for both. My initial (possibly incorrect) take is that:

  • mapchete is focused on running custom geoprocessing jobs, while PyWPS is for creating a standards-compliant web server to wrap these geoprocessing jobs
  • mapchete supports spatial tiling for processing jobs, PyWPS does not (this is a big one, arguably the main appeal of mapchete)
  • mapchete is analagous to a geospatially-focused MapReduce library, while PyWPS is an open-source version of an ArcGIS Server Geoprocessing service
  • mapchete uses modern Pythonic libraries such as Fiona and rasterio to assist with code-driven processing, while PyWPS requires the use of OGC-standard XML documents for running processes

It's probably evident that I'm excited about mapchete, and not excited about PyWPS. After 2 hours of research, including watching the FOSS4G presentation, I've concluded that the crowd comment was misleading, and I should ignore it and move forward with mapchete.

However, 2 hours isn't very much research, and I'm quite happy to be mistaken in my assumptions.

What are the main overlaps and differentiators between mapchete and PyWPS?

To be clear, my use case is for speeding up batch geoprocessing jobs for internal company use. I'm interested in the tiling mechanisms mostly. I don't need to create a OGC-compliant web server for XML-based geoprocessing - I already have a simple JSON API which allows me to trigger these jobs.

resolve reprojection issue

Reprojection of EUDEM (EPSG:3035) to 4326 didn't work.

rasterio transform_bounds() returns wrong coordinates (maybe I misused it)

--bounds parameter shall process full tiles

Custom bounding box currently gets intersected with process area, potentially clipping output within a tile. The bounding box however should restrict the potential work tiles to tiles intersecting with the bounding box, not clipping the output.

shapely.geos warnings

shapely raises a lot of warnings (self-intersection, ...)

try:

  • lower log level when not using --debug flag or
  • earlier clean up geometries

add handling driver plugins

Handling input and output drivers has become tedious as driver-specific functions (create directories/subdirectories for file-based output, handling DB connections for PostGIS, file extensions for .tif, .SAFE.zip etc.) are all over the code and complicate code maintenance. Furthermore, adding new drivers (e.g. GeoPackage, SQLite caches, etc.) would increase the clutter even more.

Therefore, create a generic class/API with generic functions (read, write, open, is_empty, ...) and every driver instanciates this class but provides it's own implementations. This cleans up the code and makes it possible to create a plug-in system, where other drivers could easily be added to the main package. It also has the effect that many dependencies move from the main package to the plugins (e.g. GeoAlchemy2 from mapchete to mapchete-postgis, bloscpack to mapchete-numpy, sqlite to mapchete-sqlite, lxml to mapchete-safe etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.