GithubHelp home page GithubHelp logo

livinthelookingglass / overpassify Goto Github PK

View Code? Open in Web Editor NEW
36.0 3.0 4.0 86 KB

A Python to OverpassQL transpiler

Home Page: https://pypi.python.org/pypi/overpassify

License: GNU Lesser General Public License v3.0

Python 99.33% Makefile 0.67%
python transpiler openstreetmap overpass-api overpass-turbo overpass osm

overpassify's Introduction

overpassify

A Python to OverpassQL transpiler, now on both GitHub and GitLab

OverpassQL is the language used to query features on OpenStreetMap. Unfortunately, it's not very readable.

The goal here is to enable people to write in a more developer-friendly language, and still have it work on the existing infrastructure. As of now, overpassify can take a snippet like:

from overpassify import overpassify

@overpassify
def query():
    search = Area(3600134503)
    ways = Way(search, highway=...)
    odd_keys_demo = Way(search, **{Regex('maxspeed(?::.+)?'): Regex('.+ mph')})
    nodes = Node(search)
    out(ways, geom=True, count=True)
    out(nodes, geom=True, count=True)
    noop()

And from that generate:

(area(3600134503);) -> .search;
(way["highway"](area.search);) -> .ways;
(way[~"maxspeed(?::.+)?"~".+ mph"](area.search);) -> .odd_keys_demo;
(node(area.search);) -> .nodes;
.ways out count;
.ways out geom;
.nodes out count;
.nodes out geom;

That last noop() is because of issue #2. And as a note, this library assumes you never use a variable name of the form tmp*. That format will probably be changed to something even less likely in the future, but some translations (for instance, a full ternary) require the use of temporary variables.

Overview

I'll say this from the outset: overpassify will support a subset of Python. Some things are just too difficult (and maybe impossible) to automatically translate. Functions are an example of that.

On the other hand, it will try to support a superset of easily-usable OverpassQL. Some of those extra features won't be as efficient as their Python counterparts, but they will be available.

Currently overpassify supports 41/56 of the features listed in the OverpassQL guide, and additionally supports ternary statements, if blocks, break, and continue.

Classes

This library provides wrappers for five types. Set(), Node(), Way(), Area(), and Relation(). Those last four are all considered subclasses of Set().

This library also provides support for strings and numbers. In the future it will provide support for regex and other things in specific places.

(Note: Currently nested constructors have some problems in implementation)

Assignment

This works about the way you'd expect it to. There are a couple caveats though.

  1. You cannot assign a non-Set() to a variable. This means only those five classes listed above.
  2. You cannot assign multiple variables in one line. No a, b = b, a, and the like. This could potentially be changed later.

Number and Set Arithmetic

Another supported feature is the ability to manipulate these sets and numbers.

Adding sets will produce the union of those sets. Adding numbers will produce their sum.

Subtracting two sets will produce their difference. Subtracting numbers will do the same.

Set Filtering

You are also allowed to filter a Set()'s contents by type. For instance, Way.filter(<some set>) would yield all the ways within <some set>.

Set intersections

A similar process will allow you to take the intersection of arbitrary numbers of named sets. So Set.intersect(a, b) will yield all elements common between a and b. You cannot, at the moment, use an expression inside this function. It must be predefined.

You can also use this in tandem with Set Filtering. So Area.intersect(a, b) would yield only the areas common between a and b.

Searching for Sets

This library also supports most of the ways OverpassQL can search for information. This currently includes:

  1. Checking within an area (or set of areas)
  2. Fetching by ID
  3. Tag matching
  4. Conditional filters (see next section)

The first two are just given as arguments to the constructor. If you put in Way(12345), that will find the Way with ID 12345. If you put in Way(<some area>), it will return all ways within that area.

You can also define areas using the Around() function. This has two useful overloads. The first takes the form Around(<some set>, <radius in meters>). The second takes the form Around(<radius in meters>, <latitude>, <longitude>).

Tag matching can be done with keyword arguments. So if you look for Node(highway="stop"), that will find you all stop signs. It also supports existence checking (Way(highway=...)), and non-existence checking (Area(landuse=None)), and regex matching (Way(highway=Regex("path|cycleway|sidewalk"))).

For keys which are not usable as a keyword, you can use a "splatted" dictionary. For instance Node(**{'maxspeed:advisory': Regex('.+ mph')}). The same follows for regex key matching, though regex key matching must be with a regex value.

You can also search by both an area and a filter. For instance: Way(<your hometown>, maxspeed=None).

Ternary Expressions and Conditional Filters

You can also filter using the familiar a if b else c. This would mean that if b is truthy, a should become b, and otherwise become c.

Unfortunately, since this is not a native feature to OverpassQL, it ends up evaluating both sides of that statement.

If you want c to be an empty set, however, we can optimize that. So foo = a if b else <type>() is the syntax to use there.

Additional performance is lost because OverpassQL does not support a conditional being the only filter. This means that we need to provide some other filter, and one in current use is to divide it by type and reconstruct. Because of this, filtering down to the appropriate set type yields significantly batter performance.

Returning Data

In OverpassQL, data can be returned in pieces throughout the function. It's more equivalent to Python's yield than return. The function we use for that here is out().

out() takes in one positional argument, and many possible keyword arguments. It yields data for the positional argument using all the types defined in the keywords.

For instance out(<set of nodes>, geom=True, body=True, qt=True) would return all the data that MapRoulette needs to draw those points on their map.

As a sidenote, the value given for these keywords is never actually checked. It could as easily be geom=False as geom=True, and overpassify will not care.

For-Each Loop

Here you can use the traditional Python for loop:

for way in ways:
    out(way, geom=True)

It does not yet support the else clause, and though it supports break and continue, please be aware that this will dramatically slow runtime in that loop.

If Statements

This is a feature that OverpassQL cannot do without some emulation. So what we do here is:

  1. Grab an individual item that will probably be stable over long periods of time; in this case, the Relation() representing Antarctica
  2. Use a conditional filter on that relation to get a one item or zero item Set()
  3. Iterate over that in a for loop
  4. If there is an else clause, use a conditional filter with the negation of the test given to get a one item or zero item Set()
  5. Iterate over the else clause in a for loop

Settings

We also provide a wrapper for the option headers. Note that this will raise an error if it's not on the first line of your query.

The valid keywords for Settings() are as follows:

  • timeout: The maximum number of seconds you would like your query to run for
  • maxsize: The maximum number of bytes you would like your query to return
  • out: The format to return in. It defaults to XML, but you can set it to "json" or a variant on "csv", as described in the OverpassQL spec
  • bbox: The string describing a global bounding box. It is used to limit the area your query can encompass, and should take the form "<southern lat>,<western lon>,<northern lat>,<eastern lon>"
  • date: The string describing what date you would like to query for. This allows you to look at past database states. Note that it needs an extra set of quotes, so it would look like date='"2012-09-12T06:55:00Z"'
  • diff: Similar to the above, except it will return the difference between that query run at each time. If you give one time, it will assume you want to compare to now. It would look like diff='"2012-09-12T06:55:00Z","2014-12-24T13:33:00Z"'
  • adiff: Similar to the above, except that it tells you what happened to each absent element

Rough Translation Table

Feature OverpassQL Python
Assignment <expr> -> .name name = <expr>
Unions (<set>; ...; <set>) <set> + ... + <set>
Difference (<set> - <set>) <set> - <set>
Intersection .<set>.<set> Set.intersect(<set>, <set>)
Type-filtering way.<set> Way.filter(<set>)
Searching
..By ID area(1) or way(7) Area(1) or Way(7)
..In an area way(area.<set>) Way(<set>)
..By tags way["tag"="value"] Way(tag=value)
..By tag existence way["tag"] Way(tag=...)
..By tag nonexistence way[!"tag"] Way(tag=None)
..By regex way["highway"~"a|b"](area.<set>) Way(<set>, highway=Regex("a|b"))
..By inverse regex way["highway"!~"a|b"](area.<set>) Way(<set>, highway=NotRegex("a|b"))
..In area + tag way["highway"](area.<set>) Way(<set>, highway=...)
Ternary very long <expr> if <condition> else <expr>
Conditional Filter <type>.<set>(if: <condition>) <expr> if <condition> else <type>()
For Loop foreach.<set>->.<each>(<body>) for <each> in <set>:\n <body>
If Statement very long if <condition>:\n <body>\nelse:\n <body>
Recursing
..Up .a < or .a < -> .b a.recurse_up() or b = a.recurse_up()
..Up (w/ relations) .a << or .a << -> .b a.recurse_up_relations()
..Down .a > or .a > -> .b a.recurse_down()
..Down (w/ relations) .a >> or .a >> -> .b a.recurse_down_relations()
is_in filers
..On a set .a is_in -> .areas_with_part_of_a areas_containing_part_of_a = is_in(a)
..On a lat/lon pair is_in(0, 0) -> .areas_with_0_0 areas_containing_0_0 = is_in(0, 0)

Features Not Yet Implemented

  1. Filters
    1. Recursion Functions
    2. Filter By Bounding Box
    3. Filter By Polygon
    4. Filter By "newer"
    5. Filter By Date Of Change
    6. Filter By User
    7. Filter By Area Pivot
  2. ID Evaluators
    1. id() And type()
    2. is_tag() And Tag Fetching
    3. Property Count Functions
  3. Aggregators
    1. Union and Set
    2. Min and Max
    3. Sum
    4. Statistical Counts
  4. Number Normalizer
  5. Date Normalizer

overpassify's People

Contributors

livinthelookingglass avatar micparke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

overpassify's Issues

Add support for conditionals

This is for an actual if statement, not a ternery.

Essentially, my idea is to hack my way to a conditional using something like the following:

(node(1);) -> .tmp_var;
(node.tmp_var(if: <condition>);) -> .tmp_var;
foreach.tmp_var(
<body>
);

Support Basic Operations

This is the initial set. When this issue is closed, this code should be able to do:

from overpassify import overpassify
@print
@overpassify
def query():
    search = Area(3600134503)
    ways = Way(search)
    nodes = Node(search)
    out(ways, geom=True, count=True)
    out(nodes, geom=True, count=True)

This should translate to something like:

(area["name"="Trowbridge Park"]) -> .search;
way(.search) -> .ways;
node(.search) -> .nodes;
.ways;
out geom;
out count;
.nodes;
out geom;
out count;

Ensure that typing concepts match up

I think there's a minor confusion on my part.

@mmd-osm, is a single node, for instance generated by node(1), also considered a set? Or do you need to explicitly construct it as such?

If it's the latter, then my object model is fundamentally wrong, and I have a better excuse to do a redesign.

Add support for ternerys

This can be done using the normal if filter in OverpassQL.

Essentially,

a = <expr> if b else <expr2>

Translates to

(<expr>;) -> .a;
(<expr2>;) -> ._;
(.a(if: b); ._(if: !(b));) -> .a;

There's an implementation issue, however. OverpassQL does not support having an if being the only filter. A potential solution would be to filter the set by itself. If that doesn't work, it may need to be broken into each type component and reassembled.

Add TMP_PREFIX constant

This should allow for two goals:

  1. Allow people to change the prefix if desired
  2. Allow the prefix to be refactored easily

Long Term: Restructure to store everything in temporary variables

Doing this would alleviate a lot of the pain points around nested functions. It also would be rather difficult, and probably require a redesign. Certainly it would need a smarter approach to translation than the one that is currently being used. On the bright side, it would allow the removal of the "don't use names in tmp*" caveat.

Support regex tag filters

One way to do this would be to make a tag handler. It would be something like:

@singledispatch
def parse_tag(criteria, tag):
    raise TypeError("unsupported tag criteria")


parse_tag.register(re.RegexType)
def _(regex, tag):
    return '"{}"~"{}"'.format(tag, regex.string)


parse_tag.register(str)
def _(value, tag):
    return '"{}"="{}"'.format(tag, value)


parse_tag.register(type(Ellipsis))
def _(e, tag):
    return '"{}"'.format(tag)


parse_tag.register(type(None))
def _(n, tag):
    return '!"{}"'.format(tag)

That should support all but "regex not equal". I don't know how you would support that one in such a scheme.

Provide a settings function

Should allow you to set timeout, change out mode, etc. Should raise an error if it isn't at the top, if feasible.

Provide "around" function

There should be two cases to handle:

  1. No set fed in, so use _
  2. A set fed in, so use that

Radius should be required, and if not fed as a float, should have a .0 appended for consistency

Transform only works on the top layer

To fix this: have the body of for loops recursively transform. All the other block-based ones transform into a for loop anyways, so no need to waste effort in other places.

Bug: Cannot get full source code when derived from annotation

When code is gotten from annotation, the last line is truncated from the source code. This means that the last line is not translated, potentially resulting in lost data. To avoid this, I would suggest adding out(Set()) to the end of your code. If this bug is ever fixed, that will result in no extra data anyways.

Support OverpassQL's "recurse" features

The syntax and transformations here are going to be difficult to decide on. For instance, do I use the specific functions? The operator? And how do these get tagged?

Way.nodes(<set>) feels reasonable, but so does <set>.get_children_nodes()

Support while blocks

I'm mostly making this so I can point to it later.

As near as I can tell, while will be impossible to fully implement in this language. It requires two things:

  1. The ability to emulate a conditional branch (done)
  2. The ability to EITHER go back to a different point in code OR loop indefinitely and escape

That second point looks impossible here. I would me happy to be shown wrong though.

Does not support key regex

Pretty sure it does on normal

from overpassify import overpassify

@overpassify
def query():
    search = Area(3600134503)
    ways = Way(search, highway=..., **{Regex('a'): Regex('b')})
    nodes = Node(search)
    out(ways, geom=True, count=True)
    out(nodes, geom=True, count=True)
    noop()

Add support for foreach loops

This one will be a bit more difficult. That said, the syntax transformation isn't too hard. If Python's can be considered:

For *item* in *set*:
    *body*

Then OverpassQL is

foreach.set->.item(
    body
);

If statements are now broken

from overpassify import overpassify
def test():
    search = Area(3600134503) + Area(3600134502)
    relevant = Node(search, highway=...)
    for item in relevant:
        out(item, count=True)
        if 4:
            break

print(overpassify(test))

This produces incorrect results. The name of the break variable in the if statement ends up being different than the name of the break variable above it.

Long term: Support user-defined functions

This would depend heavily on #15. But in a world where that is implemented, you could essentially pretend that these functions are macros. After all, the result gets stored in a temporary variable anyways.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.