GithubHelp home page GithubHelp logo

imclab / loader Goto Github PK

View Code? Open in Web Editor NEW

This project forked from astuntechnology/loader

0.0 2.0 0.0 447 KB

Simple GML & KML loader written in Python using OGR

License: MIT License

Assembly 7.94% PLpgSQL 2.98% Python 89.08%

loader's Introduction

A loader for geographic data in GML and KML

(that needs some preparation before loading via ogr2ogr)

Author: Astun Technology Ltd.

Contact: support [at] astuntechnology.com

A GML and KML loader written in Python that makes use of OGR 1.9. Source data can be in GML or KML format (including compressed files in GZ or ZIP format) and can be output to any of the formats supported by OGR. The source data can be prepared using a simple Python to both make it suitable for loading with OGR (useful with complex feature types) or to add value by deriving attributes.

The loader was originally written to load Ordnance Survey OS MasterMap Topographic Layer data in GML/GZ format but has since been used to load other GML and KML data.

Dependencies

  • OGR 1.9

    • OGR is part of the GDAL suite of tools for translating and manipulation geospatial data.
  • Python 2.6+ or 3

    • Python 2.6 or above (including 3) is required. Most modern Linux operating systems will already have 2.6 or above.
    • Python lxml module for parsing and manipulating XML

Installation details are available on the project wiki

Usage

First configure Loader by editing loader.config specifying:

Basic configuration

  • src_dir
    • The directory containing your source files or an individual file. All supported files in the specified directory and it's descendants will be loaded.
  • out_dir
    • The directory used to store the translated data if writing to a file based format such as ESRI Shape, MapInfo TAB etc.
  • tmp_dir
    • The directory used to store temporary working files during loading.
  • ogr_cmd
    • The ogr2ogr command that will be used to load the data. Here you can specify the destination format and any associated settings (for example database connection details if you are writing to PostGIS).
  • prep_cmd
    • The command used to prepare the source data so it is suitable for loading with OGR, choose one that is suitable for your source data such as prep_osgml.prep_osmm_topo for OS MasterMap Topo.
  • post_cmd
  • An optional command to be run once OGR has created it's output. Called once per file, useful for loading SQL dump files etc.
  • gfs_file
    • OGR .gfs file used to define the feature attributes and geometry type of the features read from the GML again choose a suitable gfs file for your source data such as ../gfs/osmm_topo_postgres.gfs for loading OS MasterMap Topo into PostgreSQL.

See python/loader.config for further explanation and details of available tokens. Environment variables can be used with any of the options by using a token of the form: $HOME, ${HOME} or %TEMP% (Windows only)

Then run from the command-line:

python loader.py loader.config

Additional arguments can be passed to override the values in the config file (useful when running more than one instance of the loader) for example to specify a different source directory (src_dir):

python loader.py loader.config src_dir=./data/tq

Some configuration examples are available on the project wiki

To-do

  • Data

    • OS OSMM Water Network Layer

      • Improve support for elements that require an external code list by fetching the code list when it's available
      • Support for nil attributes such as: <net:inNetwork nilReason="missing" xsi:nil="true" />, <hy-n:length xsi:nil="true" uom="m" nilReason="missing" />, <water:level xsi:nil="true" nilReason="missing" />`
      • Add example to wiki
    • OS MasterMap ITN

      • Add roadpartiallinkinformation, roadpartialrouteinformation feature types
  • Core loader.py

    • Specify gfs file via command-line using GML_GFS_TEMPLATE path_to_template.gfs
    • Add exception and message when source data is not found
    • Identify errors with subprocess calls and raise
    • Use standard logging instead of print
    • Parallel processing, either:
      • Run loader.py instances in parallel one per core each processing a single input file
      • A single loader.py process which spawns one process per feature type
        • Would allow using --config GML_READ_MODE SEQUENTIAL_LAYERS as each ogr2ogr instance would only be loading a single feature type
        • When loading PostgreSQL features could potentially be piped: prepgml4ogr | ogr2ogr | psql
  • Potential improvements due to changes in OGR

    • Use --config GML_GFS_TEMPLATE path/to/file.gfs to specify template instead of copying template file for each source file (requires GDAL 1.9.0)
    • Use --config GML_READ_MODE SEQUENTIAL_LAYERS with GML files that include multiple feature types that appear sequentially to avoid the GML being scanned multiple times (requires GDAL 1.9.0)
    • Make use of ability to use GML attributes as feature attributes using the element@attribute syntax in the GFS file (and remove relevant prep logic that creates an element to hold the attribute values) (requires GDAL 1.11.0)
    • Use /vsigzip/ to read gz directly

Authors

See AUTHORS.md.

License

MIT, Copyright (c) 2014 Astun Technology Ltd. (http://astuntechnology.com). See LICENSE.txt for full terms.

The logic to apply style_code and style_description values to OSMM Topography Layer data is derived from the ESRI UK OSMM-Styling project licensed under Apache-2.0.

loader's People

Contributors

aileenheal avatar andynberry avatar archaeogeek avatar walkermatt avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.