GithubHelp home page GithubHelp logo

htenkanen / pyrosm Goto Github PK

View Code? Open in Web Editor NEW
347.0 7.0 41.0 17.82 MB

Read OpenStreetMap data from Protobuf files into GeoDataFrame with Python, faster.

Home Page: https://pyrosm.readthedocs.io/en/latest/

License: MIT License

Python 72.92% Batchfile 0.21% Cython 26.77% Makefile 0.10%
openstreetmap geopandas python

pyrosm's Introduction

Pyrosm

Conda version PyPI version Documentation Status Coverage Status PyPI - Downloads DOI License

Pyrosm is a Python library for reading OpenStreetMap data from Protocolbuffer Binary Format -files (*.osm.pbf) into Geopandas GeoDataFrames. Pyrosm makes it easy to extract various datasets from OpenStreetMap pbf-dumps including e.g. road networks, buildings, Points of Interest (POI), landuse and natural elements. Also fully customized queries are supported which makes it possible to parse the data from OSM with more specific filters.

Pyrosm is easy to use and it provides a somewhat similar user interface as OSMnx. The main difference between pyrosm and OSMnx is that OSMnx reads the data over internet using OverPass API, whereas pyrosm reads the data from local OSM data dumps that can be downloaded e.g. from GeoFabrik's website. This makes it possible to read data faster thus allowing e.g. parsing street networks for the whole country fairly efficiently (however, see caveats).

The library has been developed by keeping performance in mind, hence, it is mainly written in Cython (Python with C-like performance) which makes it fast to parse OpenStreetMap data from PBF files. Pyrosm is built on top of another Cython library called Pyrobuf which is a faster Cython alternative Google's version with C++ backend. Google's Protocol Buffers is a commonly used and efficient method to serialize and compress structured data which is also used by OpenStreetMap contributors to distribute the OSM data in PBF format (Protocolbuffer Binary Format).

Documentation is available at https://pyrosm.readthedocs.io.

Current features

  • download PBF data easily from hundreds of locations across the world
  • read street networks (separately for driving, cycling, walking and all-combined)
  • read buildings from PBF
  • read Points of Interest (POI) from PBF
  • read landuse from PBF
  • read "natural" from PBF
  • read boundaries from PBF (+ allow searching by name)
  • read any other data from PBF by using a custom user-defined filter
  • filter data based on bounding box
  • export networks as a directed graph to igraph, networkx and pandana

Install

Pyrosm is distributed via PyPi and conda-forge.

The recommended way to install pyrosm is using conda package manager:

$ conda install -c conda-forge pyrosm

You can also install the package with pip:

$ pip install pyrosm

Troubleshooting

Notice that pyrosm requires geopandas to work. On Linux and Mac installing geopandas with pip should work without a problem, which is handled automatically when installing pyrosm.

However, on Windows installing geopandas with pip is likely to cause issues, hence, it is recommended to install Geopandas before installing pyrosm. See instructions from Geopandas website.

When should I use Pyrosm?

Pyrosm can of course be used whenever you need to parse data from OSM into geopandas GeoDataFrames. However, pyrosm is better suited for situations where you want to fetch data for whole city or larger regions (even whole country).

If you are interested to fetch OSM data for smaller areas such as neighborhoods, or search data around a specific location/address, we recommend using OSMnx which is more flexible in terms of specifying the area of interest. That being said, it is also possible to extract neighborhood level information with pyrosm and filter data based on a bounding box (see docs).

How to use?

Using pyrosm is straightforward. See docs for instructions how to use the library.

Get in touch + contributions

If you find a bug from the tool, have question, or would like to suggest a new feature to it, you can make a new issue here.

We warmly welcome contributions to pyrosm to make it better. If you are interested in contributing to the library, please check the contribution guidelines.

Development

You can install a local development version of the tool by 1) installing necessary packages with conda and 2) building pyrosm from source:

  1. install conda-environment for Python 3.12 by:

    • Python 3.12 (you might want to modify the env-name which is test by default): $ conda env create -f ci/312-conda.yaml
  2. build pyrosm development version from master (activate the environment first):

    • pip install -e .

You can run tests with pytest by executing:

$ pytest . -v

License and copyright

Pyrosm is licensed under MIT (see license).

The OSM data is downloaded from two sources:

Website Website

Data © Geofabrik GmbH, BBBike and OpenStreetMap Contributors

All data from the OpenStreetMap is licensed under the OpenStreetMap License.

Caveats

Filtering large files by bounding box

Although pyrosm provides possibility to filter even larger data files based on bounding box, this process can slow down the reading process significantly (1.5-3x longer) due to necessary lookups when parsing the data. This might not be an issue with smaller files (up to ~100MB) but with larger data dumps this can take longer than necessary.

Hence, a recommended approach with large data files is to first filter the protobuf file based on bounding box into a smaller subset by using a dedicated open source Java tool called Osmosis which is available for all operating systems. Detailed installation instructions are here, and instructions how to filter data based on bounding box are here.

pyrosm's People

Contributors

christophfink avatar htenkanen avatar leonlowitzki avatar realead avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyrosm's Issues

Feature request: Regular expressions for use within filters, read complete osm.pbf file

I wanted to extract all addr: related objects from an osm.pbx file and tried several ways. Excluding doesnt seem to work as an empty gdf is returned. Constructing a full gdf with all data was not easily done either.

Now I thought it would be nice to have the possibilities to use regular expressions in general,
and alone reading a complete osm.pbf file into a geodataframe is very valuable as well.

ENH: Use pygeos get_parts() to explode multigeometries

Currently when exporting the street network into graph edges, the ways (multigeometries) need to be exploded so that the connectivity in the network works when routing.

Exploding the geometries using gdf.explode() is time consuming, but pygeos v0.9.0 will have get_parts() function that will improve significantly the efficiency of turning multigeometries into single geometries. Start using this when available.

Influence: Improved speed when parsing nodes and edges using osm.get_network(nodes=True)

polygon in filtered ways

My question is why sometimes there are polygon geometries in the filtered ways (lines), and sometimes not (I tried different values from the custom_filter and also different places)? What are those polygon geometries?

from pyrosm import OSM, get_data
import time

my_place = 'helsinki'
fp = get_data(my_place)
osm = OSM(fp)

# 1: Using filter_type='keep', keep only 'river' and 'stream' in waterway
waterways_keep = ['river', 'stream']

my_rivers_streams = osm.get_data_by_custom_criteria(custom_filter={
    'waterway': waterways_keep},
    filter_type='keep',
    keep_nodes=False,
    keep_ways=True,
    keep_relations=True)

my_rivers_streams['geom_type'] = my_rivers_streams.geometry.geom_type
print(my_rivers_streams['geom_type'].unique())
# out: ['LineString' 'Polygon']

# 2: Using filter_type='keep', keep only 'river' in waterway
waterways_keep = ['river']

my_rivers = osm.get_data_by_custom_criteria(custom_filter={
    'waterway': waterways_keep},
    filter_type='keep',
    keep_nodes=False,
    keep_ways=True,
    keep_relations=True)

my_rivers['geom_type'] = my_rivers.geometry.geom_type
print(my_rivers['geom_type'].unique())
# out: ['LineString']

Error when retrieving data area restricted by bounding box

Hi,
I was trying to extract some data in a bounding box with a Polgyon, but ran into an issue. You can see the traceback below.
I also duplicated the code from the docs and used shapely geometry from GeoDataFrame directly from an existing boundary in a pbf file, but got the same results. As I used the code directly from the docs, I have to assume that this is a bug in the library.

Traceback
This shows the traceback for getting pois, but it looks the same when getting other information (like in the docs code).

Traceback (most recent call last):
  File "/home/XXX/sandbox/osm/main.py", line 62, in <module>
    run_osm()
  File "/home/XXX/sandbox/osm/main.py", line 57, in run_osm
    pois = osm.get_pois(custom_filter=custom_filter)
  File "/home/XXX/sandbox/osm/venv/lib/python3.8/site-packages/pyrosm/pyrosm.py", line 492, in get_pois
    self._read_pbf()
  File "/home/XXX/sandbox/osm/venv/lib/python3.8/site-packages/pyrosm/pyrosm.py", line 85, in _read_pbf
    nodes, ways, relations, way_tags = parse_osm_data(self.filepath,
  File "pyrosm/pbfreader.pyx", line 310, in pyrosm.pbfreader.parse_osm_data
  File "pyrosm/pbfreader.pyx", line 312, in pyrosm.pbfreader.parse_osm_data
  File "pyrosm/pbfreader.pyx", line 327, in pyrosm.pbfreader._parse_osm_data
  File "pyrosm/pbfreader.pyx", line 124, in pyrosm.pbfreader.parse_dense
IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 3200446

Using Python 3.8.5 on Manjaro.
See all dependencies below:

 • Installing click (7.1.2)
  • Installing attrs (20.2.0)
  • Installing click-plugins (1.1.1)
  • Installing cligj (0.5.0)
  • Installing markupsafe (1.1.1)
  • Installing munch (2.5.0)
  • Installing numpy (1.19.2)
  • Installing python-dateutil (2.8.1)
  • Installing pytz (2020.1)
  • Installing cython (0.29.21)
  • Installing fiona (1.8.17)
  • Installing jinja2 (2.11.2)
  • Installing pandas (1.1.3)
  • Installing pyproj (2.6.1.post1)
  • Installing shapely (1.7.1)
  • Installing cykhash (1.0.2)
  • Installing geopandas (0.8.1)
  • Installing pygeos (0.8)
  • Installing pyrobuf (0.9.3)
  • Installing python-rapidjson (0.9.1)
  • Installing pyrosm (0.5.3)

Code snippet:

    fp = get_data("Berlin")
    osm = OSM(fp)
    bounding_box = osm.get_boundaries(name="Friedrichshain")
    bbox_geom = bounding_box['geometry'].values[0]
    print(bbox_geom)
    osm = OSM(fp, bounding_box=bbox_geom)
    custom_filter = {'amenity': ["bar"]}
    pois = osm.get_pois(custom_filter=custom_filter)
    pois.to_csv("export.csv", sep='\t')

ENH: Add possibility to optimize memory usage

Pyrosm parses quite a lot of attributes by default from the OSM data. Add possibility to optimize memory usage by keeping only the most necessary attributes in the results and dropping everything else out. This helps when working with large files.

The memory optimization should influence the behavior when parsing the PBF (i.e. not parsing less important attributes at all) and when parsing the tags (keeping only the most relevant ones).

Relates to #53

Add possibility to crop PBF and save to disk

For processing very large pbf files, it is better to allow the user to first crop the pbf into smaller chunk and save that to disk (i.e. mimic what can be done with Osmosis). This should also be relatively fast with Cython.

pygeos geometry arrays

I was reading the documentation here: https://pyrosm.readthedocs.io/en/latest/benchmarking.html, and specifically the comment on memory usage

The most memory consuming part currently is constructing Shapely geometries into GeoDataFrame. There might be improvements coming on this once Geopandas starts to support Pygeos geometry arrays.

I'm interested in this, I'm hitting memory limits with some large files. Do you have any further information on the support for this in Geopandas, and/or the plans for support in pyrosm.

Possible I can help contribute if there is a need.

Custom filter fails when Tag doesn't exist in dataset

'Tags' object has no attribute 'park_ride' File "C:\Users\X\AppData\Roaming\Python\Python36\site-packages\pyrosm\pyrosm.py", line 497, in get_pois tags_as_columns += getattr(self.conf.tags, k)

I'm using a custom filter to get POI info. When using a basic filter as shown in the documentation, things work fine. I am using a more detailed filter which includes some more rare tags, which is when this error above occurs.

Is there a way to ignore if a tag isn't in the underlying dataset but appears in the custom filter? or is there a simple way for me to look at the tags in the loaded dataset, then remove those from the custom filter that aren't in the loaded tags?

Data extraction is not working for city "marseille"

The city is listed in cities, but trying to get data gives an error

~/anaconda3/lib/python3.7/site-packages/pyrosm/data/init.py in get_data(dataset, update, directory)
133
134 elif dataset in sources._all_sources:
--> 135 return retrieve(search_source(dataset),
136 update, directory)
137

~/anaconda3/lib/python3.7/site-packages/pyrosm/data/init.py in search_source(name)
99 if name in available2:
100 return sources.subregions.dict[subregion].dict[name]
--> 101 raise ValueError(f"Could not retrieve url for '{name}'.")
102
103

ValueError: Could not retrieve url for 'marseille'

Fix numpy deprecation warning

Numpy has started to throw a deprecation warning:
/home/hentenka/miniconda3/envs/pyrosm/lib/python3.8/site-packages/pyrosm/buildings.py:21: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated.

Go through the code-base and ensure ragged nested arrays are constructed correctly with dtype=object, more info: https://stackoverflow.com/questions/63097829/debugging-numpy-visibledeprecationwarning-ndarray-from-ragged-nested-sequences

How to keep all components, not just the largest strongly connected one

The documentation states:

By default, Pyrosm will only keep connected edges in the output graph (largest strongly connected component). This means that all “isolated islands” of the network will be filtered out because those cannot be reached from other parts of the network (you can also disable this behavior, see below).

I looked around, but I couldn't find the parameter to control this behavior.

The OSM data for my area is terribly dirty and incomplete, among the many problems is many disconnected walking elements (such as pedestrian bridges, tunnels, and parks). I plan to test some methods to automatically connect these elements to the main network, but to do that I need to keep them.

My guess is that the feature is supported, and perhaps even documented somewhere, but I just couldn't find it.

Install fails

I can't install pyrosm:

When I pip install pyrosm, I get a very long stack trace that ends like this:

 pyrosm/data_filter.pyx:186:11: 'Int64Set_from_buffer' is not a constant, variable or function identifier
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-d8e3udyj/pyrosm/setup.py", line 107, in <module>
        compiler_directives={'language_level': "3",
      File "/home/nlehuby/.local/share/virtualenvs/python-JOH7DBZn/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
        cythonize_one(*args)
      File "/home/nlehuby/.local/share/virtualenvs/python-JOH7DBZn/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
        raise CompileError(None, pyx_file)
    Cython.Compiler.Errors.CompileError: pyrosm/data_filter.pyx
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I'm on ubuntu, and here is my config:

  • python --version: Python 3.7.7
  • pip --version: pip 20.1 from /home/nlehuby/.local/share/virtualenvs/python-JOH7DBZn/lib/python3.7/site-packages/pip (python 3.7)

It collects pyrosm-0.5.2.tar.gz

BUG: Can't apply custom filter to bounded data

As the title says, I get an error when I try to apply a custom filter to bounded data.

After the installation fiasco, I was finally able to use this package to isolate one "Ku" from the Kanto PBF file (using a Shapely geometry object from another dataset as a "bounding_box" [sic] because I couldn't get it from the name "Shibuya" like I can via the Overpass API).

That worked, and I was able to get a geoPandas dataframe of the walking network in Shinjuku (although I don't know exactly what kinds of "highways" are including/excluded in the "walking network"). Unfortunately, what I really need there is a NetworkX graph of the network, and that can't be built from the geoDF, so I'll need to figure out a different way to process the bounded data into a NetworkX graph

Anyway, where the geoDF would be great is for all the "shops" in my area. So I did basically a copy/paste of the example in the documentation, and it crashed:

thisAreaData = pyrosm.OSM('../Data/OSMData/kanto-latest.osm.pbf', bounding_box=boundingPolygon)
allStoreFilter = {"shop": True, "tourism": True, "amenity": True, "leisure": True}
pois = thisAreaData.get_pois(custom_filter=allStoreFilter)

throws

File "G:/My Drive/Codebase/ExtractingOSMData_v01.py", line 217, in <module>
   pois = thisAreaData.get_pois(custom_filter=allStoreFilter)

 File "C:\WinPython\python-3.6.5.amd64\lib\site-packages\pyrosm\pyrosm.py", line 540, in get_pois
   self.bounding_box)

 File "C:\WinPython\python-3.6.5.amd64\lib\site-packages\pyrosm\pois.py", line 32, in get_poi_data
   tags_as_columns, bounding_box)

 File "pyrosm\frames.pyx", line 134, in pyrosm.frames.prepare_geodataframe

 File "pyrosm\frames.pyx", line 178, in pyrosm.frames.prepare_geodataframe

 File "C:\WinPython\python-3.6.5.amd64\lib\site-packages\geopandas\tools\sjoin.py", line 92, in sjoin
   l_idx, r_idx = sindex.query_bulk(input_geoms, predicate=predicate, sort=False)

 File "C:\WinPython\python-3.6.5.amd64\lib\site-packages\geopandas\sindex.py", line 410, in query_bulk
   res = super().query_bulk(geometry, predicate)

 File "C:\WinPython\python-3.6.5.amd64\lib\site-packages\pygeos\strtree.py", line 188, in query_bulk
   return self._tree.query_bulk(geometry, predicate)

TypeError: One of the arguments is of incorrect type. Please provide only Geometry objects.

The only thing that is different is that instead of using "get_data" for a city (which I couldn't get to work), I have a bounded area of the full Kanto dataset stored locally. So, is this operation not supported?

Integrate khash function into pyrosm natively

Currently pyrosm relies on using cykhash for the fast lookups. However, we only use two functions from cykhash, and as that library is not available on PyPi nor conda-forge, it would be better to integrate the cykhash.khashsets.Int64Set_from_buffer and cykhash.isin_int64 functions directly to pyrosm to avoid the dependency.

Release on conda-forge

To make installations more stable on all operating systems, publish the tool on conda-forge

Use Pygeos Geometry arrays if Geopandas 0.8.0 is installed

The latest Geopandas supports Pygeos geometry arrays. Hence, if user has geopandas 0.8.0 installed there is no need to convert geometries from pygeos to shapely. Using Geometry Arrays (Pygeos) probably makes pyrosm run faster and more memory efficient as well.

We need to have a flag informing if the user has older Geopandas than 0.8.0 installed. In such cases, geometries still needs to be converted to shapely.

Cannot install pyrosm

I've been trying to install this pyrosm on Windows10 and Python 3.6 because it looks potentially useful (although inadequately documented... like, which elements are included in the "walking" network?) for isolating parts of OSM data within a polygon before converting the remaining "highways" into a NetworkX graph.

However, I tried to use pip installl pyrosm, but I cannot install the package because I get the infamous "A GDAL API version must be specified" error.

` Collecting fiona
Using cached Fiona-1.8.17.tar.gz (1.3 MB)
ERROR: Command errored out with exit status 1:
command: 'c:\winpython\python-3.6.5.amd64\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = ...
Complete output (1 lines):
A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`

The thing is, I already have the newest version of GDAL and Fiona 1.8.17 installed, and I already added GDAL to my path variables, and I can already use geopandas, fiona, shapely, rtree, etc. normally.

I couldn't find a list of requirements and/or dependencies in the documentation aside from Py3.6, so maybe the real problem is something else.

I hope to be able to use this package, and so I appreciate any assistance in properly installing it.

Failed to install pyrosm on Python3.7 OSX

$ sudo python3.7 -m pip install pyrosm
WARNING: The directory '/home/fippo/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pyrosm
  Downloading pyrosm-0.5.3.tar.gz (2.0 MB)
     |████████████████████████████████| 2.0 MB 1.7 MB/s
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3.7 /usr/local/lib/python3.7/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-pa39_a85/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel Cython cykhash pyrobuf
       cwd: None
  Complete output (58 lines):
  WARNING: The directory '/home/fippo/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
  Collecting setuptools
    Downloading setuptools-50.3.2-py3-none-any.whl (785 kB)
  Collecting wheel
    Downloading wheel-0.35.1-py2.py3-none-any.whl (33 kB)
  Collecting Cython
    Downloading Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl (2.0 MB)
  Collecting cykhash
    Downloading cykhash-1.0.2.tar.gz (26 kB)
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
      Preparing wheel metadata: started
      Preparing wheel metadata: finished with status 'done'
  Collecting pyrobuf
    Downloading pyrobuf-0.9.3.tar.gz (258 kB)
      ERROR: Command errored out with exit status 1:
       command: /usr/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-4yhaa6ai/pyrobuf/setup.py'"'"'; __file__='"'"'/tmp/pip-install-4yhaa6ai/pyrobuf/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-4yhaa6ai/pyrobuf/pip-egg-info
           cwd: /tmp/pip-install-4yhaa6ai/pyrobuf/
      Complete output (33 lines):
      WARNING: The directory '/home/fippo/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
      rendering 'pyrobuf_list.pyx' from '/tmp/pip-install-4yhaa6ai/pyrobuf/pyrobuf/protobuf/templates/pyrobuf_list_pyx.tmpl'
      rendering 'pyrobuf_list.pxd' from '/tmp/pip-install-4yhaa6ai/pyrobuf/pyrobuf/protobuf/templates/pyrobuf_list_pxd.tmpl'
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/tmp/pip-install-4yhaa6ai/pyrobuf/setup.py", line 158, in <module>
          zip_safe=False,
        File "/usr/local/lib/python3.7/dist-packages/setuptools/__init__.py", line 144, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/tmp/pip-install-4yhaa6ai/pyrobuf/setup.py", line 72, in run_commands
          self.ext_modules.extend(self.pyrobufize_builtins())
        File "/tmp/pip-install-4yhaa6ai/pyrobuf/setup.py", line 135, in pyrobufize_builtins
          include_path=['pyrobuf/src'])
        File "/usr/lib/python3/dist-packages/Cython/Build/Dependencies.py", line 749, in cythonize
          ctx = c_options.create_context()
        File "/usr/lib/python3/dist-packages/Cython/Compiler/Main.py", line 577, in create_context
          self.cplus, self.language_level, options=self)
        File "/usr/lib/python3/dist-packages/Cython/Compiler/Main.py", line 75, in __init__
          from . import Builtin, CythonScope
        File "/usr/lib/python3/dist-packages/Cython/Compiler/CythonScope.py", line 5, in <module>
          from .UtilityCode import CythonUtilityCode
        File "/usr/lib/python3/dist-packages/Cython/Compiler/UtilityCode.py", line 3, in <module>
          from .TreeFragment import parse_from_strings, StringParseContext
        File "/usr/lib/python3/dist-packages/Cython/Compiler/TreeFragment.py", line 17, in <module>
          from .Visitor import VisitorTransform
        File "/usr/lib/python3/dist-packages/Cython/Compiler/Visitor.py", line 15, in <module>
          from . import ExprNodes
        File "/usr/lib/python3/dist-packages/Cython/Compiler/ExprNodes.py", line 2713
          await = None
                ^
      SyntaxError: invalid syntax
      ----------------------------------------
  ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  WARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.
  You should consider upgrading via the '/usr/bin/python3.7 -m pip install --upgrade pip' command.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3.7 /usr/local/lib/python3.7/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-pa39_a85/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel Cython cykhash pyrobuf Check the logs for full command output.
WARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.
You should consider upgrading via the '/usr/bin/python3.7 -m pip install --upgrade pip' command.

setup.py install of Cython and cykhash breaks in some contexts

Hey! Really neat package! Will love to be a part & contribute.

I've had an adventure to get the module to install with pipenv. First, the module was trying to install with python2.7 because setup.py is executed with /usr/bin/env python which for me (and on many systems) resolves to python2, and same with pip resolving to pip2. I've made a pull request to fix these issues #37 and make installation with python3 more explicit.

The install of Cython and cykhash is indeed a hacky way (as suggested by the comments in file) and I'm super keen to help find a way to make these dependencies resolve more organically. In particular, the explicit installing of these modules via pip doesn't play well with virtual environments/pipenv.

Any ideas on how to make the dependencies resolve in a more self-contained & reproducible manner?

QUESTION: custom filter Bug

I want to filter out all values from 'waterway' tag, except for 'river' and 'stream'.
Using filter_type='exclude' will get the desired values but also other values from 'waterway', which should be filtered out (if I don't misunderstand). And filter_type='keep' works fine in this case. Why?

from pyrosm import OSM, get_data
import time


fp = get_data('helsinki')
osm = OSM(fp)

# Show all tags that are converted into columns from waterway features
waterways = osm.conf.tags.waterway

# Using filter_type='exclude'
waterways_exclude = waterways
waterways_exclude.remove('river')
waterways_exclude.remove('stream')

# Parse lines and time it
start_time = time.time()
my_waterways = osm.get_data_by_custom_criteria(custom_filter={
    'waterway': waterways_exclude},
    filter_type='exclude',
    keep_nodes=False,
    keep_ways=True,
    keep_relations=True)
print(f"Parsing lines lasted {round(time.time() - start_time, 0)} seconds.")
print(f"Number of lines parsed: {len(my_waterways)}")

# Count the number of each waterway type
type_waterways = my_waterways.groupby('waterway')
print(type_waterways.count())

# Parsing lines lasted 13.0 seconds.
# Number of lines parsed: 3775
#                 id  timestamp  version  tags  geometry  osm_type  changeset
# waterway                                                                   
# brook           72         72       72    30        72        72          0
# construction     6          6        6     6         6         6          0
# fish_pass        1          1        1     0         1         1          0
# flood_path       1          1        1     0         1         1          0
# proposed         1          1        1     1         1         1          0
# river           39         39       39    37        39        39          1
# ster             1          1        1     0         1         1          0
# stream        3654       3654     3654  2195      3654      3654          2

# Using filter_type='keep'
waterways_keep = ['river', 'stream']

start_time = time.time()
my_waterways = osm.get_data_by_custom_criteria(custom_filter={
    'waterway': waterways_keep},
    filter_type='keep',
    keep_nodes=False,
    keep_ways=True,
    keep_relations=True)
print(f"Parsing lines lasted {round(time.time() - start_time, 0)} seconds.")
print(f"Number of lines parsed: {len(my_waterways)}")

type_waterways = my_waterways.groupby('waterway')
print(type_waterways.count())

# Parsing lines lasted 1.0 seconds.
# Number of lines parsed: 3693
#             id  timestamp  version  tags  geometry  osm_type  changeset
# waterway                                                               
# river       39         39       39    37        39        39          1
# stream    3654       3654     3654  2195      3654      3654          2

Relational Filter

Hi Henrikki,
Thx for making useful python library available:)
additionally to the functionalities offered by the library, I am looking for a way to filter OSM data like this:
e.g all playground inside parks
using overpassAPI I would do it like this:

[bbox:{{bbox}}][timeout:1800];

way["leisure"="park"];map_to_area->.a;
( node(area.a)[leisure=playground];
  way(area.a)[leisure=playground];
 );
foreach(
  (._;>;);
  is_in;
  way(pivot)["leisure"="park"];
  out geom;
);

In other words, is there a way to filter OSMfeatures with distance to other OSMfeatures?

Many thanks,

Sébastien

Feature request: optionally, return a node geodataframe from osm.get_network()

Thank you very much for the package. Downloading first from Geofabrik and then parsing a network is much faster than doing it in osmnx.

However, a common use case for pyrosm is to extract a network and then to analyze it in osmnx. For that, Constructing a graph from geodataframe using osmnx is necessary, by osmnx.graph_from_gdfs().

Signature: osmnx.graph_from_gdfs(gdf_nodes, gdf_edges, graph_attrs=None)

Parameters
----------
gdf_nodes : geopandas.GeoDataFrame
    GeoDataFrame of graph nodes
gdf_edges : geopandas.GeoDataFrame
    GeoDataFrame of graph edges, must have crs attribute set
graph_attrs : dict
    the new G.graph attribute dict; if None, add crs as the only
    graph-level attribute

Returns
-------
G : networkx.MultiDiGraph
"""

Currently, osm.get_network() only returns an edge gdf. Is is possible to also a node gdf? Thanks.

Filters on highway is and, others are or ?

Thanks for this great project.

The boolean logic of filters seems to be a "or" on each key, except on "highway"
{
"cycleway": ["yes", "track", "opposite_track", "share_busway", "shared", "lane", "opposite_lane", "opposite", "shared_lane", "opposite_share_busway"],
"cycleway:left": ["lane", "yes"],
"cycleway:right": ["lane", "yes"],
"cycleway:both": ["lane", "yes"],
"bicycle": ["designated", "shared", "yes"],
}

gives 11410 on my dataset provence-alpes-cote-d-azur-latest.osm.pbf
while
{
"cycleway": ["yes", "track", "opposite_track", "share_busway", "shared", "lane", "opposite_lane", "opposite", "shared_lane", "opposite_share_busway"],
"cycleway:left": ["lane", "yes"],
"cycleway:right": ["lane", "yes"],
"cycleway:both": ["lane", "yes"],
"highway": ["cycleway"],
"bicycle": ["designated", "shared", "yes"],
}
gives 1825 rows.

Is highway considered a "and" during filtering ?

If so, could it be optional or at least documented ?

Error: "ValueError: Provide at least 4 coordinates to create a linearring."

Description

I am trying to use pyrosm to parse my OSM data. However I am failing to run two important methods. Namely "get_buildings()" and "get_landuse()". Both return a similar error attached below. Please advice if something might be wrong at my end, or put it up as a reported bug if its something internal

Code:

from pyrosm import OSM
from pyrosm import get_data
fp = get_data("test_pbf")
osm = OSM(fp)
buildings = osm.get_buildings()
buildings.plot()

Error:

/home/saifullah/anaconda3/envs/ox/bin/python /home/saifullah/Saif/FYP/Projects/covid19-preprocess/src/Parser-test.py
/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pyrosm/buildings.py:21: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
nodes, ways, relation_ways, relations = get_osm_data(node_arrays=None,
Traceback (most recent call last):
File "/home/saifullah/Saif/FYP/Projects/covid19-preprocess/src/Parser-test.py", line 6, in
buildings = osm.get_buildings()
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pyrosm/pyrosm.py", line 213, in get_buildings
gdf = get_building_data(self._node_coordinates,
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pyrosm/buildings.py", line 38, in get_building_data
gdf = prepare_geodataframe(nodes, node_coordinates, ways,
File "pyrosm\frames.pyx", line 68, in pyrosm.frames.prepare_geodataframe
File "pyrosm\frames.pyx", line 75, in pyrosm.frames.prepare_geodataframe
File "pyrosm\frames.pyx", line 37, in pyrosm.frames.prepare_way_gdf
File "pyrosm\geometry.pyx", line 360, in pyrosm.geometry.create_way_geometries
File "pyrosm\geometry.pyx", line 349, in pyrosm.geometry._create_way_geometries
File "pyrosm\geometry.pyx", line 318, in pyrosm.geometry.create_polygon_geometry
File "pyrosm\geometry.pyx", line 309, in pyrosm.geometry.create_polygon_geometry
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pygeos/decorators.py", line 55, in wrapped
return func(*args, **kwargs)
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pygeos/creation.py", line 88, in polygons
shells = linearrings(shells)
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pygeos/creation.py", line 72, in linearrings
return _wrap_construct_ufunc(lib.linearrings, coords, y, z)
File "/home/saifullah/anaconda3/envs/ox/lib/python3.8/site-packages/pygeos/creation.py", line 21, in _wrap_construct_ufunc
return func(coords)
ValueError: Provide at least 4 coordinates to create a linearring.

Invalid geometries during handling of get_buildings()

Dear Henrikki

First things first, thanks a lot for sharing this great and useful project!

Checking out your tutorials has worked fine for me, however I keep running into GEOSExceptions when trying to get building dataframes for bigger areas (e.g. "Helsinki" or "Zuerich"); apparently due to some issue with invalid geometries that some relations result in when constructing them from the data dump.

Here's an output that reproduces the error; any idea on how to correct that would be very much appreciated!

fp_h = pyrosm.get_data("Helsinki", directory="/Users/user/Downloads")
osm_h = pyrosm.OSM(fp_h)
buildings_h = osm_h.get_buildings()

Returns

buildings_h = osm_h.get_buildings()
Traceback (most recent call last):

  File "<ipython-input-14-457334a3e0f2>", line 1, in <module>
    buildings_h = osm_h.get_buildings()

  File "/Users/user/opt/anaconda3/lib/python3.7/site-packages/pyrosm/pyrosm.py", line 218, in get_buildings
    self.bounding_box

  File "/Users/user/opt/anaconda3/lib/python3.7/site-packages/pyrosm/buildings.py", line 40, in get_building_data
    bounding_box)

  File "pyrosm\frames.pyx", line 68, in pyrosm.frames.prepare_geodataframe

  File "pyrosm\frames.pyx", line 78, in pyrosm.frames.prepare_geodataframe

  File "pyrosm\frames.pyx", line 57, in pyrosm.frames.prepare_relation_gdf

  File "pyrosm\relations.pyx", line 175, in pyrosm.relations.prepare_relations

  File "pyrosm\relations.pyx", line 176, in pyrosm.relations.prepare_relations

  File "pyrosm\relations.pyx", line 165, in pyrosm.relations._prepare_relations

  File "pyrosm\relations.pyx", line 134, in pyrosm.relations.get_relations

  File "/Users/user/opt/anaconda3/lib/python3.7/site-packages/pygeos/predicates.py", line 258, in is_valid
    result = lib.is_valid(geometry, **kwargs)

GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing

Thanks a lot,
cheers,

Evelyn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.