GithubHelp home page GithubHelp logo

cleartables / cleartables Goto Github PK

View Code? Open in Web Editor NEW
32.0 9.0 7.0 264 KB

An osm2pgsql style to simplify OSM data use

License: MIT License

Makefile 1.20% Lua 97.04% Python 1.76%
osm2pgsql openstreetmap-data

cleartables's Introduction

ClearTables

An osm2pgsql multi-backend style designed to simplify consumption of OSM data for rendering, export, or analysis.

ClearTables is currently under rapid development, and schema changes will frequently require database reloads.

Requirements

  • osm2pgsql 0.90.1 or later. Early versions after 0.86.0 may still work with bugs.
  • Lua, required for both osm2pgsql and testing the transforms
  • PostgreSQL 9.1 or later
  • PostGIS 2.0 or later
  • Python with PyYAML
  • Make. Any version of Make should work, or the commands are simple enough to run by hand.

Usage

make
createdb ct
psql -d ct -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'
cat sql/types/*.sql | psql -1Xq -d ct
# Add other osm2pgsql flags for large imports, updates, etc
osm2pgsql -d ct --number-processes 2 --output multi --style cleartables.json extract.osm.pbf
cat sql/post/*.sql | psql -1Xq -d ct

Replace ct with the name of your database if naming it differently.

osm2pgsql will connect to PostgreSQL once per process for each table, for a total of processes * tables connections. If PostgreSQL max_connections is increased from the default, --number-processes can be increased. If --number-processes is omitted, osm2pgsql will attempt to use as many processes as hardware threads.

Principles

These are still a bit vague, and might be split into principles and practices

  • Simplify data for the consumer

  • Use PostgreSQL types other than text if appropriate

  • Use boolean for yes/no values

  • Use enum types where there's a limited list of possibilities independent of data to be included, or a well defined ordering

FAQ

Why no addresses in the building table?

Addresses and buildings have a many-to-many relationship. Multiple addresses inside one building are very common, and multiple buildings in one address can be found. If rendering, a separate table is fine, and if doing an analysis these cases need to be considered which requires joins.

Why road refs as an array?

A road may have multiple refs, and it's wrong to ignore this. To pretend that there's only one ref, use SQL like array_to_string(refs, E'\n') or array_to_string(refs, ';'). The latter will reform the ref tag as it was in the original data.

Why no support for osm2pgsql --hstore?

ClearTables uses the hstore type but doesn't support --hstore.

  1. The goal of ClearTables is to abstract away OSM tagging. Copying all the tags to the output is contrary to this.

  2. Copying all tags is technically possible, but wouldn't be done with --hstore, instead it would be done similar to the names column. The --hstore option doesn't work well when using custom column names which may collide with OSM tags.

  3. With tables for different types of features fine-grained selection of appropriate columns is possible and hstore isn't necessary.

  4. Values within a hstore are untyped which is contrary to the principle of using appropriate types.

Contributing

Bug reports, suggestions and (especially!) pull requests are very welcome on the Github issue tracker. Please check the tracker to see if your issue is already known, and be nice. For questions, please use IRC (irc.oftc.net or http://irc.osm.org, channel #osm-dev) and http://help.osm.org.

Code style

  • 2sp for YAML, 4sp for Lua
  • tags are OSM tags, cols are database columns
  • Space after function name when defining a function, e.g. function f (args)
  • Tests for all Lua functions except ones which are only tail calls

Table names

  • Use _polygon and _point suffix when there will be two tables holding the same type of object represented differently (e.g. most POIs)
  • Use _area when there isn't a corresponding _point table for the same object, but there is another table for points or lines of a similar class but different objects (e.g. wood_areas for forests and wood_line for rows of trees)

Lua guidelines

  • Always set columns to strings, even if they're only true/false. It's unwise to count on anything else making it from Lua to C to C++ to PostgreSQL. This lets PostgreSQL do the only coversion.
  • Test particular columns of a transform function instead of the entire output table, e.g. assert(transform({foo="bar"}).baz == "qux") instead of assert(deepcompare(transform({foo="bar"}), {baz="qux"})).

Getting started

Issues tagged with new column are often good ones to get started with. Issues tagged experimental are focused on researching new best practices and state of the art.

Similar projects

Additional Reading

cleartables's People

Contributors

pnorman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cleartables's Issues

Normalize rail tags to fewer values

Currently possible values are rail, subway, narrow_gauge, light_rail, preserved, funicular, monorail, miniature, turntable, tram, disused, construction.

This needs reducing, and service being handled.

Perhaps map rail, narrow_gauge, preserved to rail; subway, light_rail, monorail to transit, and have another column for further classification

Sidewalks tagged on ways

Two possible ways come to mind of storing this

  1. A both/left/right/none enum for a sidewalk column
  2. sidewalk_left and sidewalk_right boolean columns

Expand maxspeed checks

What happens with a maxspeed that matches string.find(v, "mph") but doesn't end in ?mph?

Add landcover layer

e.g. natural=wood, landuse=forest, natural=scrub, etc and amenity=school, landuse=residential, etc.

Should this be split into two keeping natural features (e.g. forest) from use-based constructs (e.g. school)?

Route ref handling

Proper handling of routes would involve taking information from route relations and adding it to the component ways, as @kevinkreiser proposes in osm2pgsql-dev/osm2pgsql#230. Absent that functionality all that can be done at transform time is more intelligent parsing of the ref tag, which is probably required before more advanced relation-based processing can be done.

osm2pgsql turns the ref field into a text[] in SQL. Something similar could be done in transform.

The array value input options include using a string constant, and because the value assigned directly to a table column, the explicit type cast is not needed.

This is a good way to go, because it avoids the need to modify osm2pgsql to support different types being returned from Lua transforms, but it requires a function to escape ".

A failure to escape would not result in SQL injection, but an odd ref array.

Make yaml2mml work the same on Windows

cause endline issues are no fun

But is this necessary here? the JSON is the same, just a different text expression, and we don't check the JSON into git

Names in multiple languages

There are three obvious options for adding multi-lingual names

  1. Add a fixed list of columns (e.g. name_en) which are populated from the name tag and name:en etc, roughly with cols.name_en = tags["name:en"] or tags["name"] or nil
  2. Add a fixed list of columns (e.g. name_en, name_de) which contain the name:en, etc tag values
  3. Add a name_int hstore column with all alternate-language names. cols.name_int = "en=>"..tags["name:en"] extended to arbitrary languages.

1 and 2 can't handle arbitrary lists of languages. 2 and 3 require stylesheet logic for fallback names if you want to default to another language, e.g. COALESCE(name_en,name) AS name

cc @yurik, @MaxSem, since I know they've looked at international languages

Easier way to do POI point/polygon tables?

It will be common to have pairs of tables like

- name: education_point
  type: point
  comment: Education facilities mapped as points
  tagtransform: education.lua
  tagtransform-node-function: education_nodes
  tagtransform-way-function: drop_all
  tagtransform-relation-function: drop_all
  tagtransform-relation-member-function: drop_all
  tags:
  - <<: *name
  - <<: *names
  - &education
    name: education
    type: text
    comment: Type of education facility
- name: education_poly
  type: polygon
  comment: Education facilities mapped as polygons
  tagtransform: education.lua
  tagtransform-node-function: drop_all
  tagtransform-way-function: education_ways
  tagtransform-relation-function: education_rels
  tagtransform-relation-member-function: education_rel_members
  tags:
  - <<: *name
  - <<: *names
  - <<: *education
  - <<: *way_area

It'd be nice to be able to specify these only once and have it expanded by script, also creating a view which UNION ALLs them

Nicer yaml2json error reporting

Traceback (most recent call last):
  File "./yaml2json.py", line 3, in <module>
    json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=2, separators=(',', ': '))
  File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/usr/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib/python2.7/dist-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 111, in compose_sequence_node
    node.value.append(self.compose_node(node, index))
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 111, in compose_sequence_node
    node.value.append(self.compose_node(node, index))
  File "/usr/lib/python2.7/dist-packages/yaml/composer.py", line 77, in compose_node
    "second occurence", event.start_mark)
yaml.composer.ComposerError: found duplicate anchor 'way_area'; first occurence
  in "<stdin>", line 27, column 5
second occurence
  in "<stdin>", line 162, column 5
make: *** [cleartables.json] Error 1

We can catch yaml.composer.ComposerError give a nicer message then exit 1. This avoids an uninformative backtrace

Avoid deepcompare in tests

Change tests to do foo({baz="boof"}).bar == "qux" instead of deepcompare(foo({baz="boof"}), {bar="qux"})

building=no area

> require "common"
> print (isarea({building="no"}))
true

building=no shouldn't assumed to be an area. probably true for all unconditional_polygon_keys

drop z_order

We shouldn't need it with enumerated road types

Handle construction consistently

Right now railway=construction is put into railway tables, but the equivalent isn't done for highway.

Should these go into their own table?

Benchmark single-key acceptance function styles

I've used both

return tags["aeroway"] and (tags["aeroway"] == "taxiway" or tags["aeroway"] == "runway")

and

return tags["aeroway"] == "taxiway" or tags["aeroway"] == "runway"

Is one faster than the other? Does this depend on how many specific tags are being checked?
Generally P(specific tags match | key matches) is high, but P(key matches) might be low.

Have a way to generate a Lua file from YAML

Most files follow a similar convention, so it should be possible to create all the functions needed for a new lua file by scanning cleartables.yaml for all the layers which use it.

Taginfo JSON

See http://wiki.openstreetmap.org/wiki/Taginfo/Projects

It'd be good to list ClearTables on Taginfo. A few constraints

  • The taginfo JSON will be generated from existing files, not be a new file which needs updating whenever tag usage changes
  • Generated files do not get checked into git

Travis can generate artifacts but they need uploading somewhere. Once CT becomes stable enough to tag versions I can upload from Travis to Github.

Reading the documentation, it's not clear how to specify some things. cc @joto

  • ClearTables uses the access tag, but only the values no, private, destination, customers, delivery, yes, permissive or designated. Should this be added as multiple objects in the "tags" list?

e.g.

"tags": [
  {
    "key":"access",
    "value":"no"
  },
  {
    "key":"access",
    "value":"private"
  },
  ...
]

Some tags do not accept "all values" like maxspeed which accepts ^%d+%.?%d*$, or the previous expression appended with ?mph, or a special value like RO:urban.


The ideal way to generate the JSON would to be to evaluate the transforms.
cleartables.yaml lists all Lua files and I could instrument a table with overriden accessor methods via metatables that log when they're called, but I'm not sure that's enough. Taking the airport_point as an example, I can tell from the YAML that for nodes it calls

function airport_nodes (tags, num_keys)
    return generic_node(tags, accept_airport, transform_airport)
end

This is equivalent to

function airport_nodes (tags, num_keys)
    if tags["aeroway"] and (tags["aeroway"] == "aerodrome" or tags["aeroway"] == "heliport") then
        cols = {}
        cols.airport = tags["aeroway"] -- guaranteed by accept_airport to be either aerodrome or heliport
        cols.name = tags["name"]
        cols.names = names(tags)
        cols.iata = tags["iata"] and string.sub(tags["iata"],0,3) or nil
        cols.iaco = tags["iaco"] and string.sub(tags["iaco"],0,4) or nil
        cols.ref = tags["ref"] or cols.iata or cols.iaco or nil
        return 0, cols
    end
    return 1, {}
end

I can tell that the aeroway key is used from the tags["aeroway"] call, but even if I return non-nil for that call, I can't tell that it's comparing it against aerodrome. If I returned an object from tags["aeroway"] I could then override the == operation, but this quickly gets very complicated.

And if I don't know the tags to meet the accept check, I'll never reach the part of the function call which uses tags["iata"].

airport_points is a simple table and doesn't need all the power of Lua, but other tables do (e.g transportation)

Another option is expanding the luadoc strings. This brings a few issues

  1. The function which is called is not where all the documentation belongs, e.g. airport documentation belongs on accept_airport and transform_airport, not airport_nodes for airport_point and airport_ways, airport_rels, and airport_rel_members for airport_polygon.

    The functions called for airport_nodes({}, 0) won't include the transform so I can't use that to determine the docstrings I need to parse.

    I could override the generic_[node|line_way|area_way|polygon_way|multipolygon_members functions to capture the function names (need code locations not function objects, how?) to get the right docstrings for each table

    This may not be enough. For example, road access is done

    cols.motor_access = access(tags["motor_vehicle"] or
                           tags["vehicle"] or 
                           tags["access"] or 
                           highway[tags["highway"]]["motor_access"])

    Access is in another file, so even though the use of the motor_vehicle, vehicle, access, and highway is documented here, I wouldn't document what values of those tags are accepted, as that belongs in the access function documentation

  2. I don't consider potential duplication between the code and docstrings a blocker, but it's not ideal

The third option I can think of is using the testsuite. If I override require and wrapped functions in the lua file with a wrapper that added to the metatable of any table parameters. Unlike the first option, this would capture all keys used since all code paths would be followed, but it wouldn't capture all tag values.

Add rail layer

Add rail table, very similar to roads. Lua to go into existing transportation file.

Support maxspeed:forward/backward

How? Similar question to #39 except obviously an enum won't work

Ideas:

  1. maxspeed and maxspeed_reverse columns, where cols.maxspeed = tags["maxspeed:forward"] or tags["maxspeed"] or nil and cols.maxspeed_reverse = tags["maxspeed:reverse"] or tags["maxspeed"] or nil
  2. maxspeed and maxspeed_reverse columns, where cols.maxspeed = tags["maxspeed:forward"] or tags["maxspeed"] or nil and cols.maxspeed_reverse = tags["maxspeed:reverse"] or nil
  3. 1., except with maxspeed_forward and maxspeed_backward as column names

1 adds what is duplicate information most of the time
2 avoids that, but might be less clear
3 is explicit about one being forward and the other being reverse, but that doesn't matter most of the time.

Malformed array literal

Osm2pgsql failed due to ERROR: result COPY_END for roads failed: ERROR:  malformed array literal: "{"Naturlehrpfad "Schwarzer Damm", grüner Strich auf weißem Grund"}"
DETAIL:  Unexpected array element.
CONTEXT:  COPY roads, line 2903747, column refs: "{"Naturlehrpfad "Schwarzer Damm", grüner Strich auf weißem Grund"}"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.