GithubHelp home page GithubHelp logo

osm-search / nominatim-data-analyser Goto Github PK

View Code? Open in Web Editor NEW
11.0 4.0 3.0 2.24 MB

QA Tool for Nominatim. Helps to improve the OpenStreetMap data quality and therefore the Nominatim search results.

License: GNU General Public License v2.0

Python 89.67% Makefile 1.12% C++ 9.20%
openstreetmap nominatim

nominatim-data-analyser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nominatim-data-analyser's Issues

Optimize supercluster-vt zoom descent.

When iterating over zooms until reaching max zoom, the amount of tile increases exponentially. Calling index.getTile(z, x, y) for each tile is enough to make the execution time way longer as we reach higher zoom level.

The generation could be made much faster by ignoring tiles for which we have already seen that they contain no features or clusters.
We should try to use a data structure with O(1) access time to store ignored tile.

Add an 'Open in JOSM' link in pop-up bubble

Hi,

To fix an error found by the tool, the user has to right-click the link>Copy Link, switch to JOSM windows, hit Ctrl+Shift+O and enter to load the object. Can you please add a link to open in JOSM next to the 'Node ID' link (shown in the pop-up bubble) ?

log function requires an integer

When tippecanoe returns an error, then an exception is thrown while the error is being printed:

tippecanoe: must specify -o out.mbtiles or -e directory
Traceback (most recent call last):
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 42, in call_tippecanoe
    result = subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['tippecanoe', '--output-to-directory=/srv/qa-data.nominatim.org/qa-data/addr_housenumber_no_digit/vector-tiles', '--force', '--no-tile-compression', '--no-tile-size-limit', '--no-feature-limit', '--buffer=120', '--no-clipping', '-r1', '--cluster-distance=60']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cli.py", line 17, in <module>
    Core().execute_all()
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/core.py", line 28, in execute_all
    self.execute_one(file_without_ext)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/core.py", line 36, in execute_one
    PipelineAssembler(loaded_yaml, name).assemble().process_and_next()
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 36, in process_and_next
    result = self.process(data)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 28, in process
    self.call_tippecanoe(self.base_folder_path, feature_collection)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 61, in call_tippecanoe
    self.log(logging.FATAL, e)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 84, in log
    LOG.log(level, f'Rule <{self.exec_context.rule_name}> : {msg}')
  File "/usr/lib/python3.8/logging/__init__.py", line 1508, in log
    raise TypeError("level must be an integer")
TypeError: level must be an integer
Command exited with non-zero status 1

This looks like a bug in the log() function in pipe.py.

Statistics over found issues

The analyser should collect some statistics over the number of errors it finds in each run. In the long run this should be displayed in the front end but for now it would just be useful to have the data on the server. I'd like to be able to compare runs before and after changes to the Nominatim code.

My suggestion would be to simply log the data in a table in the database. That makes it easy to generate summaries as required. A simple table would do with columns for date, name of the QA check and number of errors. Maybe add an extra_data column in JSONB, so we are future proof against any additional data we might want to save in the future.

"place nodes close" does not take higher level admin borders into account?

Getting a warning here about two place nodes named "Bruchmühlen" being close to each other.

https://nominatim.org/qa/#map=12.95/52.21/8.43&layer=place_nodes_close

This is a false positive as the village is (for reasons I never really fully understood) split by a state border, the southern and western part belonging to Lower Sachsony, and the north-east part to North Rhine Westfalia

So the two are indeed two separate administrative entities known by the same name, but having different ZIP codes (32289 vs. 49328), are in different states and districts, and have different car license plate letters ("HF" vs. "OS")

take layer information into account on addr:streename check?

I'm getting a false positive here, saying that the nodes streetname "Jahnplatz" conflicts with the parent "Jahnplatz (U)"

The node is on the buildings ground floor over ground, while "Jahnplatz (U)" is actually at layer=-2 underground (and wrongly labeled as "raiyway=platform" and "highway=footpath"):

https://nominatim.org/qa/#map=21.03/52.02/8.53&layer=addr_street_wrong_name

Node is: https://www.openstreetmap.org/node/5139522754

Platform way is: https://www.openstreetmap.org/way/260631154

It's probably a rare situation to have overlapping streets at different levels, and in this specific case the tagging is also questionable, but it may make sense to prioritize streets on the same level as the object tagged with addr:street in this check?

Add multi threading feature

Adding multi threading to the tool would reduce a lot the time needed to execute all the rules. As python threads run concurrently the real benefit will come when query are executed on the PostgreSQL server (maybe by using a connection pool) and when we call clustering-vt. Clustering-vt and PostgreSQL queries are the most time consuming operations when executing a rule so threads would definitively improve the performance of the tool.

It can be interesting to check if it would be better to execute a whole rule in its own thread or if threads would be spawn locally when executing PostgreSQL queries and when calling clustering-vt.

In the first case, we should make sure some operations are thread safe, for example when accessing the config or when writing/reading to files.

False negatives in "Suspicious addr:street tag"

The "Suspicious addr:street tag" layer is missing quite a few nodes that Nominatim assigns a street with a name different from the value of their addr:street tag, because the value isn't in the name field of any nearby highway objects.

Some example nodes

The street names are recorded in name:left or name:right of the highway objects (example), but I thought Nominatim ignores those tags? Since these nodes don't work correctly in Nominatim (it assigns the correct street but then shows the address as 21 Leith Walk, see here), it's odd that the QA tool doesn't flag them.

make it easy to share current map position

Osmoscope adds URL fragments with current map position and zoom whenever one moves the map, e.g. #map=17.991666666666667/6.8867/52.24745 This makes it easy to share the current view with other users.

Add "Last Update" to layer info

It would be useful to have the date of the database, so we can see if the updates worked and the right data is shown. The date can be queried from the database with SELECT lastimportdate FROM import_status.

Suspicious addr:street tag turns up results which aren't present on linked OSM object

https://nominatim.org/qa/#map=18.16/49.32/-123.14&layer=addr_street_wrong_name turns up a bunch of errors, for example one with this node

It claims "street_name: Park Royal S" and "parent_name: Park Royal South". Assuming that addr:street is the street_name, it is not set on the way, and the tagging has not changed recently.

addr:street was present on an enclosing way that was not in the area I initially downloaded.

The instructions make no mention of addr:street tags on different objects.

Duplicate entries in layers.json

Rerunning the tile generation usually only adds new layers and leaves existing ones untouched. This is a good behaviour when you want to just regenerate a single layer. But there are some corner cases where you end up with bogus entries in the file:

  • when changing the WebPrefixPath, all entries now exist with the old and new prefix
  • when changing the name of the layer, the old layer name remains in the layer.json

Two possible solutions to the problem:

  • when running with --execute-all always create the layers.json file from scratch
  • always remove entries that do not correspond to the current prefix path

Updating QA vector tiles

We generate about 3.5G of vector tiles at the moment which need to be updated daily.

One obvious update strategy would be to just overwrite the existing tiles with new ones. But this may leave 'ghost tiles' where data has gone and no new tile is generated. So old data needs to be deleted. Removing the entire data set before regenerating the new one is not an option because then the vector tiles would not be accessible while the new ones are generated. That means the new tiles need to be generated in a temporary location and then switch the new tile. This is a workable solution for now but means that every day 3.5G of data needs to be deleted, which takes quite a bit of time because the directory consists of lots of small files.

It would be nice if the vector tile generator could support in-place updates that overwrites existing tiles and deletes ghost tiles if needed. Bonus points if it also just not writes new tiles when the content hasn't changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.