GithubHelp home page GithubHelp logo

isabella232 / stetl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from geopython/stetl

0.0 0.0 0.0 7.04 MB

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.

Home Page: https://www.stetl.org

License: GNU General Public License v3.0

Python 98.59% Shell 0.09% XSLT 0.75% Dockerfile 0.57%

stetl's Introduction

Stetl - Streaming ETL

Stetl, streaming ETL, pronounced "staedl", is a lightweight ETL-framework for geospatial data conversion.

Build Status Documentation Status Gitter Chat

Notice: the Stetl GH repo is now at the GeoPython GH organization.

License

Stetl is released under a GNU GPL v3 license (see LICENSE.txt).

Documentation

The Stetl website and documentation can be found via http://stetl.org. For a quick overview read the 5-minute Stetl-introduction, or a more detailed presentation. Stetl was presented at several events like the FOSS4G 2013 in Nottingham and GeoPython 2016.

Concepts

Stetl basically glues together existing parsing and transformation tools like GDAL/OGR, Jinja2 and XSLT with custom Python code. By using native libraries like libxml2 and libxslt (via Python lxml) Stetl is speed-optimized.

A configuration file, in Python config .ini format, specifies a chained sequence of transformation steps: typically an Input connected to one or more Filters, and finally to an Output. At runtime, this sequence is instantiated and run as a linked series of Python objects. These objects are symbolically specified (by their module/class name) and parameterized in the config file. Via the stetl -c <config file> command, the transformation is executed.

Stetl has been proven to handle 10's of millions of GML objects without any memory issues. This is achieved through a technique called "streaming and splitting". For example: using the OgrPostgisInput module an GML stream can be generated from the database. A component called the GmlSplitter can split this stream into manageable chunks (like 20000 features) and feed this upstream into the ETL chain.

Use Cases

Stetl has been found particularly useful for complex GML-related ETL-cases, like those found within EU INSPIRE Data Harmonization and the transformation of GML/XML-based National geo-datasets to for example PostGIS.

Most of the data conversions within the Dutch NLExtract Project apply Stetl.

Stetl also proved to be very effective in IoT-related transformations involving the SensorWeb/SOS.

Examples

Browse all examples under the examples dir. Best is to start with the basic examples

Installation

Stetl can be installed via PyPi pip install stetl and recently as a Stetl Docker image. More on installation in the documentation.

Contributing

Anyone and everyone is welcome to contribute. Please take a moment to review the guidelines for contributing.

Origins

Stetl originated in the INSPIRE-FOSS project: 2009-2013 now archived. Since then Stetl evolved into a wider use like transforming Dutch GML-based Open Datasets such as IMGEO/BGT (Large Scale Topography) and IMKAD/BRK (Cadastral Data) and Sensor Data Transformation and Calibration.

Finally

The word "stetl" is also an alternative writing for "shtetl": http://en.wikipedia.org/wiki/Stetl : "...Material things were neither disdained nor extremely praised in the shtetl. Learning and education were the ultimate measures of worth in the eyes of the community, while money was secondary to status..."

stetl's People

Contributors

borrob avatar ewsterrenburg avatar fsteggink avatar justb4 avatar lamby avatar reinout avatar sebastic avatar thijsbrentjens avatar vnuhaan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.