osm-search / nominatim-data-analyser Goto Github PK

View Code? Open in Web Editor NEW

11.0 4.0 3.0 2.24 MB

QA Tool for Nominatim. Helps to improve the OpenStreetMap data quality and therefore the Nominatim search results.

License: GNU General Public License v2.0

Python 89.67% Makefile 1.12% C++ 9.20%

openstreetmap nominatim

nominatim-data-analyser's People

Stargazers

Watchers

Forkers

antojvlt lonvia nslxndr

nominatim-data-analyser's Issues

Print timestamp with log output

It would be nice to have timestamp prefixes for every log line, so it is easy to spot long-running steps. For what it is worth, Nominatim uses: https://github.com/osm-search/Nominatim/blob/925195725dfcb7f1a6795c50244c1df6cb7242ce/nominatim/cli.py#L79

Optimize supercluster-vt zoom descent.

When iterating over zooms until reaching max zoom, the amount of tile increases exponentially. Calling index.getTile(z, x, y) for each tile is enough to make the execution time way longer as we reach higher zoom level.

The generation could be made much faster by ignoring tiles for which we have already seen that they contain no features or clusters.
We should try to use a data structure with O(1) access time to store ignored tile.

False-positive for neighbouring place=square

https://www.openstreetmap.org/node/9122275031
is next to the Otto-Weidt-Platz:
https://www.openstreetmap.org/way/642581102

Add an 'Open in JOSM' link in pop-up bubble

Hi,

To fix an error found by the tool, the user has to right-click the link>Copy Link, switch to JOSM windows, hit Ctrl+Shift+O and enter to load the object. Can you please add a link to open in JOSM next to the 'Node ID' link (shown in the pop-up bubble) ?

log function requires an integer

When tippecanoe returns an error, then an exception is thrown while the error is being printed:

tippecanoe: must specify -o out.mbtiles or -e directory
Traceback (most recent call last):
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 42, in call_tippecanoe
    result = subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['tippecanoe', '--output-to-directory=/srv/qa-data.nominatim.org/qa-data/addr_housenumber_no_digit/vector-tiles', '--force', '--no-tile-compression', '--no-tile-size-limit', '--no-feature-limit', '--buffer=120', '--no-clipping', '-r1', '--cluster-distance=60']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cli.py", line 17, in <module>
    Core().execute_all()
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/core.py", line 28, in execute_all
    self.execute_one(file_without_ext)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/core.py", line 36, in execute_one
    PipelineAssembler(loaded_yaml, name).assemble().process_and_next()
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 38, in process_and_next
    result = pipe.process_and_next(result)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 36, in process_and_next
    result = self.process(data)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 28, in process
    self.call_tippecanoe(self.base_folder_path, feature_collection)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipes/output_formatters/vector_tile_formatter.py", line 61, in call_tippecanoe
    self.log(logging.FATAL, e)
  File "/srv/qa-data.nominatim.org/Nominatim-Data-Analyser/analyser/core/pipe.py", line 84, in log
    LOG.log(level, f'Rule <{self.exec_context.rule_name}> : {msg}')
  File "/usr/lib/python3.8/logging/__init__.py", line 1508, in log
    raise TypeError("level must be an integer")
TypeError: level must be an integer
Command exited with non-zero status 1

This looks like a bug in the log() function in pipe.py.

Statistics over found issues

The analyser should collect some statistics over the number of errors it finds in each run. In the long run this should be displayed in the front end but for now it would just be useful to have the data on the server. I'd like to be able to compare runs before and after changes to the Nominatim code.

My suggestion would be to simply log the data in a table in the database. That makes it easy to generate summaries as required. A simple table would do with columns for date, name of the QA check and number of errors. Maybe add an extra_data column in JSONB, so we are future proof against any additional data we might want to save in the future.

waterway=boatyard can have addresses

waterway=boatyard is for "a place for constructing, repairing and storing vessels out of the water" but triggers the addr:* tags on non-addressable places check.

Although I suspect this tag is inappropriate in my particular case, I see no reason a boatyard couldn't have an address.

False positives with "addr:* tags on non-addressable places" for place=farm

This layer has a number of thousands of errors in Australia. Most of them seem to be places that have been tagged place=farm and represent a single farming property, which would also have a street address. The same problem is probably happening with place=isolated_dwelling and place=plot.

Use service roads to help determine the "right" road for an address

Example: https://www.openstreetmap.org/way/497029571
https://nominatim.org/qa/#map=16.90/39.10/-108.42&layer=addr_street_wrong_name

In this instance, the address is wrongly detected as belonging to 34 1/4 Road, when it properly belongs with 34 Road. The building with the address has a service road (highway=service service=driveway) leading from the proper road to the building.

"place nodes close" does not take higher level admin borders into account?

Getting a warning here about two place nodes named "Bruchmühlen" being close to each other.

https://nominatim.org/qa/#map=12.95/52.21/8.43&layer=place_nodes_close

This is a false positive as the village is (for reasons I never really fully understood) split by a state border, the southern and western part belonging to Lower Sachsony, and the north-east part to North Rhine Westfalia

So the two are indeed two separate administrative entities known by the same name, but having different ZIP codes (32289 vs. 49328), are in different states and districts, and have different car license plate letters ("HF" vs. "OS")

Add "address far away from its road"

It often indicates address problems, road data problems or both.

https://wiki.openstreetmap.org/wiki/Mr%C3%B3wki runs such QA server for Poland and it was/is very useful.

pink: 150m - 1500m from road to address
yellow: over 1500m from road to address

take layer information into account on addr:streename check?

I'm getting a false positive here, saying that the nodes streetname "Jahnplatz" conflicts with the parent "Jahnplatz (U)"

The node is on the buildings ground floor over ground, while "Jahnplatz (U)" is actually at layer=-2 underground (and wrongly labeled as "raiyway=platform" and "highway=footpath"):

https://nominatim.org/qa/#map=21.03/52.02/8.53&layer=addr_street_wrong_name

Node is: https://www.openstreetmap.org/node/5139522754

Platform way is: https://www.openstreetmap.org/way/260631154

It's probably a rare situation to have overlapping streets at different levels, and in this specific case the tagging is also questionable, but it may make sense to prioritize streets on the same level as the object tagged with addr:street in this check?

Add documentation for clustering-vt

Add some documentation for the clustering-vt to explain how it works.

Clicking on JOSM link while zoomed out takes you to the wrong area

When I am zoomed out and click the link for an error on node 5164325981 JOSM loads the wrong area. If I zoom in, I get the correct area when I click the link. I suspect coordinate rounding, as the area I am sent to is nearby, but does not contain the error.

can't see last update info for some layers

related: #17

the html data is there, the max-height: 500px is just too small to show it at the bottom

add link to parent

see: https://nominatim.org/qa/#map=19/52.53776/13.36440&layer=addr_street_wrong_name
for object https://www.openstreetmap.org/way/891386318

it is not immediately obvious which street or way is referenced as parent

Add multi threading feature

Adding multi threading to the tool would reduce a lot the time needed to execute all the rules. As python threads run concurrently the real benefit will come when query are executed on the PostgreSQL server (maybe by using a connection pool) and when we call clustering-vt. Clustering-vt and PostgreSQL queries are the most time consuming operations when executing a rule so threads would definitively improve the performance of the tool.

It can be interesting to check if it would be better to execute a whole rule in its own thread or if threads would be spawn locally when executing PostgreSQL queries and when calling clustering-vt.

In the first case, we should make sure some operations are thread safe, for example when accessing the config or when writing/reading to files.

remove visualizer/build directory from git

I'm not sure if the directory was added by mistake. There is a build directory mentioned in the .gitignore file (near the end of the file).

link this repository on published website

https://nominatim.org/qa/#map=2.91/0.00/0.00 links https://github.com/AntoJvlt/Nominatim-Data-Analyser/issues that redirects to https://github.com/AntoJvlt/Nominatim-Data-Analyser/pulls

please come to the "Issues" section of the github repository to discuss this.

Linking issues and mentioning issues in repo with disabled issues seem not intentional

False negatives in "Suspicious addr:street tag"

The "Suspicious addr:street tag" layer is missing quite a few nodes that Nominatim assigns a street with a name different from the value of their addr:street tag, because the value isn't in the name field of any nearby highway objects.

Some example nodes

The street names are recorded in name:left or name:right of the highway objects (example), but I thought Nominatim ignores those tags? Since these nodes don't work correctly in Nominatim (it assigns the correct street but then shows the address as 21 Leith Walk, see here), it's odd that the QA tool doesn't flag them.

Add tests for clustering-vt and build/test clustering-vt in CI workflow

Add c++ tests for the clustering-vt module.
Build and test clustering-vt in the github workflow.

make it easy to share current map position

Osmoscope adds URL fragments with current map position and zoom whenever one moves the map, e.g. #map=17.991666666666667/6.8867/52.24745 This makes it easy to share the current view with other users.

Add "Last Update" to layer info

It would be useful to have the date of the database, so we can see if the updates worked and the right data is shown. The date can be queried from the database with SELECT lastimportdate FROM import_status.

Suspicious addr:street tag turns up results which aren't present on linked OSM object

https://nominatim.org/qa/#map=18.16/49.32/-123.14&layer=addr_street_wrong_name turns up a bunch of errors, for example one with this node

It claims "street_name: Park Royal S" and "parent_name: Park Royal South". Assuming that addr:street is the street_name, it is not set on the way, and the tagging has not changed recently.

addr:street was present on an enclosing way that was not in the area I initially downloaded.

The instructions make no mention of addr:street tags on different objects.

Duplicate entries in layers.json

Rerunning the tile generation usually only adds new layers and leaves existing ones untouched. This is a good behaviour when you want to just regenerate a single layer. But there are some corner cases where you end up with bogus entries in the file:

when changing the WebPrefixPath, all entries now exist with the old and new prefix
when changing the name of the layer, the old layer name remains in the layer.json

Two possible solutions to the problem:

when running with --execute-all always create the layers.json file from scratch
always remove entries that do not correspond to the current prefix path

remove node_modules folder from git

The .gitignore file already contains node_modules but there's a folder full of modules in the clustering-vt subdirectory.

Updating QA vector tiles

We generate about 3.5G of vector tiles at the moment which need to be updated daily.

One obvious update strategy would be to just overwrite the existing tiles with new ones. But this may leave 'ghost tiles' where data has gone and no new tile is generated. So old data needs to be deleted. Removing the entire data set before regenerating the new one is not an option because then the vector tiles would not be accessible while the new ones are generated. That means the new tiles need to be generated in a temporary location and then switch the new tile. This is a workable solution for now but means that every day 3.5G of data needs to be deleted, which takes quite a bit of time because the directory consists of lots of small files.

It would be nice if the vector tile generator could support in-place updates that overwrites existing tiles and deletes ghost tiles if needed. Bonus points if it also just not writes new tiles when the content hasn't changed.

osm-search / nominatim-data-analyser Goto Github PK

nominatim-data-analyser's People

Stargazers

Watchers

Forkers

nominatim-data-analyser's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs