komoot / photon Goto Github PK

View Code? Open in Web Editor NEW

1.8K 72.0 278.0 10.59 MB

an open source geocoder for openstreetmap data

License: Apache License 2.0

Shell 0.26% Python 2.42% Java 93.83% Makefile 0.01% CSS 1.56% JavaScript 0.62% HTML 1.29%

elasticsearch java openstreetmap photon geocoding geocoder search reverse-geocoding

photon's People

Contributors

Stargazers

Watchers

Forkers

jonaskmt smartcommunitylab diegoguidotti mirkowerner yohanboniface cinemascop89 aredridel ruslan-atr g4vroche richterb clement-tourriere christophlingg gijs lonvia mwethington josefcs kpanic detlevn daryltucker muzuro umitkose falkplan huangsong rajanski mantesat sdole lilithwittmann karussell maximorlovsky nathan-muir lyxung routexl amnesia7 chromacode friedrichmueller candiani d-n-ust cylon-v zhaochunliang maraev a4x rafaeljribeiro sevenred1995 sguignot hipsterpercaso aaam masda70 nagyistoce pannous amandasaurus twain47 sellingerd dbuentello christophmayrhofer longuyen1 shanni lambder coolbole hendrikroth iyobo 425296516 aalidnow sunflowersofjava mogic-le mannieg biddyweb nagyistge cyecp andrelohmann wangchunyu907 gtergeomatica mikecee ayudhien dgram i-mobility schiegg reuschling sujancse rcplay jy03189211 mfdz spaceforcellc ddmng va2ron1 ramy-ahmed respe xydinesh from1900 bazantech rawder manzhikov amineja frolv by-implication shashank734 flitsmeister otbutz cryptoguys asebert soualid

photon's Issues

Spark App vs. python

Do you intend do move the query python code into Java e.g. via spark?

Allow searching on osm_key and osm_value

There should be another filter parameter in the API to search in a specific area (bounding box) for a specific tag combination or even a set of combinations ('I want food').

For that osm_key and osm_value have to be indexed in the mapping. Or maybe we make this configurable if this increases size too much.

Photon should be able to do searches over all available language variants mapped in OSM (i.e. all name:* tags). Bonus points if it can improve search results by guessing the language of the query right and re-weight the results accordingly.

Wrong info in README

In README it is indicated that demo UI is in src/main/python/demo, it seems to be in website/ or it's not the demo UI in that dir ?

[Question] Location bias

How does location bias work?

On http://photon.komoot.de/api results seem to be only loosely influenced by different lat|lon parameters, and some results that are much farther from the given point, come before others that are much nearer.

Is it possible to order results based on distance from the point given as lat|lon?

Would it be possible to provide a bounding box to search, so that search wouldn't return results outside that box?

assemble a representative set of test cases

So we can track the search performance of photon. Then we can improve photon's search logic and know if it's getting better.

see https://hackpad.com/Photon-sprint-2-r8OrAVWa2oI#:h=Testcases-and-user-feedback

Wordending

Using wordending is kind of a workaround for nedgegram searches like

berlin erlange

which would match berlinerstraße erlangen but better should only match stuff like 'berlin erlange*'.

When this workaround is used - why not avoid edge ngram at all and tokenize the query, plus do a prefix query for the last term? This would save space and memory with same quality. The only problem could be performance but my simple tests for small data don't tell me problems there.

osm2geojson

You mention in the readme that it can take up to 12 days to create the nominatim database dump from scratch. I spend a bit of time recently coming up with a way to convert the osm planet xml dump to a joined json data set that you can build in around 15 hours. All it does is join nodes, ways, and relations but from there it should be quite easy to extract whatever information you need without having to setup nominatim.

I've actually been considering doing a little geocoder on top of elastic search based on this so I'm quite interested in your experiences with photon.

If you are interested, the project for my conversion tool is here: https://github.com/jillesvangurp/osm2geojson

Unable to find "Via Ca' de Volpi" even with exact string

Searching
Via Ca' de Volpi
yields no result even if a street with the exact name exists, see this way

Thanks.

Cristian

http://photon.komoot.de/data/world.zip is not available any more...

Following initial step from the quickstart (at https://github.com/komoot/photon) fails ...

# important: we do not yet provide this dump, creation will be finished soon
java -jar target/photon-0.1-SNAPSHOT.jar -import-snapshot http://photon.komoot.de/data/world.zip

... because http://photon.komoot.de/data/world.zip is not available any more!

Where can I download world.zip ?

Importer > Import "importance" Nominatim field

The Nominatim importance field is not imported from Nominatim to Solr.

I think it should be available to be used in boost functions on some cases :

I plan to use photon with a map and sort search results according top map bounds/center.
But...
I would like queries like "Eiffel tower" or "Statue of Liberty" to push the original places first or at least fairly up in the search results (I can accept that if a replica is in or near map bounds, it shows up first).

I believe this could be accomplished by introducing importance field from nominatim and a custom query.

Am I wrong ?
Would the drawback on index size be worth it for you ?
The importance field is mapped in NominatimEntry model but does not make it to Solr index.
Any historical reason ?

I think I can deal with the solr query, but the import part is tougher to me, since I am not exactly an experimented Java developer...
I'll give it a try (unless you think I am totally wrong), but if anyone feels like they can acheive this "easily" they're welcome.

cannot find small housenumbers in type-ahead mode

when searching for street + housenumber, e.g. bödmerstraße 7 the right address can be found because both tokens can be found in collector's raw field.

searching for bödmerstr 7 won't be successful. bödmerstr is found in the edgengram field but 7 got stripped because edgengram starts with 2 characters:

"photonngram": {
    "type": "edgeNGram",
    "min_gram": 2,
    "max_gram": 15
}

create a load test

check the performance and find bottlenecks of photon to evaluate usgage on osm.org

maybe sarah can provide some logs of search term from nominatim. on komoot we also logs of this kind

support synonyms

mount everest can be found on photon, but many users are used to type mt everest and don't get any results so far.

synonyms from the old solr implementation
https://github.com/komoot/photon/tree/master/solrconfig/testing/conf

pelias created a list too https://github.com/mapzen/pelias/blob/master/lib/pelias/tasks/synonym.rake
https://github.com/mapzen/pelias/blob/master/config/feature_synonyms.yml

related to #169 and #97

provide data dumps

installing nominatim and importing osm files is time consuming. people who don't care about continious updates will be very happy if we provide data dumps (world / country extracts in most common languages). setup photon will then be dead easy and fast.

Data import log

I wanted to try photon with Italy data I started yesterday the import and it is not over yet: how do I see where they are? Is there available an import log?
Thank you
Alessandra

Build failure / curly single quote problem

While trying to build photon I get the following build failure

/photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,17] error: unmappable character for encoding ASCII

The same error appears at other places in the same file; it would seem the character used in the source for the single quote in error messages ("Can´t...") is not mappable in ASCII: probably a curly quote instead of a straight single quote?

Would it be possible to search/replace that character in the source?

I think "Can't" would work, but "Can´t" can't.

Full stack trace below:

[DEBUG] Source roots:
[DEBUG]  /photon/src/main/java
[INFO] Compiling 16 source files to /photon/target/classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.332s
[INFO] Finished at: Wed May 28 09:05:31 UTC 2014
[INFO] Final Memory: 15M/481M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project photon: Compilation failure: Compilation failure:
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,18] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[205,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[205,18] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[230,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[230,18] error: unmappable character for encoding ASCII
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project photon: Compilation failure
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure
        at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:516)
        at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:114)
        at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
        at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
        ... 19 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Proposal: Don't start server after import/dump actions

Right now, the server keeps running after import and dump actions.

This might not be the desired behavior all the time. How about one of these solutions:

1.) Introduce a "-shutdown" parameter, that shuts down photon after executing imports/export/indexing.

2.) Shut down automatically after one of the i/o actions.

I would refactor the code in App.java and send it as a pull request if so desired. Also I can add more description to the docs about the behavior.

Importing other sources

I would like at some time to import other sources than OSM into Photon, like the datasets from openaddresses.io or the BANO project, as soon as the licence is compliant.
This will need some changes in the mapping, like having source and source_id keys (instead of just osm_id)

And in the importer, we will need to find a way not to import the data from OSM where it can duplicate with those other sources (that we think are "better" than OSM if we chose to import them), like skipping all place=house if country=France, etc.

I will work on importing some BANO data, to have a first use case that we can discuss.

Missing fields in schema.xml

When I am trying to upload iceland data I got some errors of missing fields. I had to add thes fields to the schema:

Probably I made some errors during the installation.

unknown -l option

Hi,

I tried to use NominatimImporter with the options described in documentation sample, but it seems that -l option doesn't exist (reading the source code I don't find it).

greetings
Mirko

allow typos

currently small typos like jamiaca instead of jamaica return no result. typos are very common in the world of search engines, for sure there are already best practices to solve it with elastic search.

mittelberg

searching for the village mittelberg reveals a lot of places in mittelberg but the actual village cannot be found.

consider international names of streets

Feature request: bounding box in results

To show the results for areas at the correct scale it would be helpful if the bounding box could be provided directly as part of the result. With the current osm webservices one would either request the full osm data via overpass/xapi or try to get the matching result from nominatim (which unfortunately does not support query by osm_id)

Address parser

Another solution for the problem described in #56 could be to use an address parser. If the search backend is in Java one could use this address parser here

Returning weak partial matches

Sometimes one needs to do a geocoding lookup on an address that contains irrelevant data, such as an unknown condominium name. For example when searching for "Grand Parkview Asoke Unit 233/11 Sukhumvit 21" then none of the information is in OSM's database other than "Sukhumvit 21".

I would consider this an example of a weak partial match. For my use case, such irrelevant data needs to be ignored and therefore partial matches need to be returned too. E.g. http://photon.komoot.de/api/?q=grand%20parkview%20asoke%20sukhumvit%2021 should return the same as http://photon.komoot.de/api/?q=20sukhumvit%2021

Is it possible to perhaps add a score index to each returned feature, such that one can programatically detemrine if the result's score is high enough to be displayed? This varies by application.

Some entries are 'null', should be none existing or contain a value

Is it expected that some entries don't have a 'default' or 'en' entry? Even worse they contain 'null' (not printed here as gson is too clever)

{
  "_index": "photon",
  "_type": "place",
  "_id": "1406467",
  "_score": 10.824916,
  "_source": {
    "osm_key": "place",
    "city": {
      "default": "Frankfort"
    },
    "osm_value": "village",
    "osm_id": "306444989",
    "ranking": 11,
    "context": {
      "default": "Saint Mary, Middlesex County"
    },
    "id": "1406467",
    "country": {
      "de": "Jamaika",
      "it": "Giamaica",
      "fr": "Jamaïque"
    },
    "coordinate": "18.418850,-77.053417",
    "name": {
      "default": "Frankfort"
    }
  }
}

500 Internal Error

Hi there,

I have imported a sub-region from geofabrik.de, imported it into nominatim database and finally started photon. Everything works quite charming, unfortunately the query localhost:2322/api?q=Gutenberg results in a "500 Internal Error" the console output is:
"Error spark.webserver.MatchFilter - java.lang.NullPointerException"

I have no idea whats wrong, do you have any suggestions?

Cheers
Ivan

Improve performance of Nominatim importer

Possible changes to the importer to speed up the process and reduce IO load:

parallelize nominatim export and elastic search import
query place_addressline directly instead of using get_addressdata
cache country names for context and use calculated_country_code for lookups
reuse address for places with rank_search > 27 and same parent

core.properties

I am trying to install photon on Ubuntu. I have manually installed SOLR and tried to load the photon config. SOLR is not able to load collection1 without the cor.properties file.

consider cutoff_frequency

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

It's a sort of "dynamic" stop words.

Decompound words

One more normalization has to be done to improve search. E.g. Erlangerstraße will be split into "Erlanger straße". This has to be done while indexing and searching.

There is a plugin but it is GPL due to one used library, it could be less restrict but phyton code has to be ported to Java: jprante/elasticsearch-analysis-decompound#5

For POIs there is also often the Bahnhof vs. Hauptbahnhof problem. But probably we should get a main railway station that is not named like this also important in a different country. But probably this should be handled via a different fix: #318

Precisions in readme : Import Hardware

Hi,

You mention in readme "It takes up to 10 days and sufficient RAM to import the entire world,".

Could you define "sufficient RAM" ?
I may be about to rent a new server for this purpose (unless "sufficient" is 300GB...)

documentation for leaflet plugin

yohan created a leaflet plugin (https://github.com/komoot/leaflet.photon) to easily use photon's public API in a leaflet map.

it is yet a secret, so let people know about it on the project page!

US postcode are imported even for non US instances

This is only reproduceable on local installations:

Some queries have after their results several fields with completely wrong results. E.g. using an extracted part of austria, the query

http://localhost:2322/api?lang=de&q=Kaisersch%C3%BCtzenstra%C3%9Fe%2017

returns normal results and afterwards there are several fields:

{
            "properties": {
                "osm_id": 31344,
                "postcode": "17503",
                "osm_value": "postcode",
                "country": "Vereinigte Staaten von Amerika"
            },
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -76.08483701582313,
                    39.93753065153189
                ]
            }
        },

Which shouldn't even be in the extract. Is there an error in the extraction, or does photon "invent" these results?

build a generic test framework to compare geocoders

see https://hackpad.com/Photon-sprint-2-r8OrAVWa2oI#:h=Testcases-and-user-feedback

Phonetic analyzer

One could try this analyzer. Not sure if this is necessary if fuzzy is enabled. E.g. fuzzy is already able to find strasse vs. straße

adapt nominatim bridge from geo.io

remove street duplicates

it is likely that one street in osm is composed by multiple independent osm ways, e.g.:

at the moment when searching for this street Bödmerstraße, photon returns multiple results of the same street. ideally, connected ways with the same name are merged on index time.

Many address features are missing?

Address geocoding on a worldwide level is complicated for many reasons, of which one is that every single country lists addresses in different ways. For example, in some countries provinces and states are important (Netherlands), whereas in others districts and sub-districts are more important (Thailand).

When I query your API for a road in Bangkok, Thailand, like so:

http://photon.komoot.de/api/?q=sukhumvit

I get the following:

    {
      "features": [
        {
          "properties": {
            "name": "Sukhumvit",
            "osm_value": "neighbourhood",
            "country": "Thailand",
            "osm_id": 2203974487,
            "osm_key": "place",
            "postcode": "10110"
          },
          "geometry": {
            "coordinates": [
              100.565073,
              13.73384
            ],
            "type": "Point"
          },
          "type": "Feature"
        },
        {
          "properties": {
            "country": "Thailand",
            "name": "Sukhumvit Road",
            "osm_value": "tertiary",
            "street": "Sukhumvit Road",
            "osm_id": 232865089,
            "osm_key": "highway"
          },
          "geometry": {
            "coordinates": [
              102.45796,
              12.18993
            ],
            "type": "Point"
          },
          "type": "Feature"
        },
        ... ETC ...
      ],
      "type": "FeatureCollection"
    }

The first entry is the one I need, but the problem is that there is a lot more data in the OSM database about this entry than your API shows. For example, it should return the following entries when available, just like Nominatim does:

- (Object) administrative
- (Object) attraction
- (Object) city
- (Object) city_district
- (Object) clothes
- (Object) commercial
- (Object) country
- (Object) country_code
- (Object) county
- (Object) house_number
- (Object) pedestrian
- (Object) place
- (Object) postcode
- (Object) road
- (Object) state
- (Object) state_district
- (Object) suburb
- (Object) town
- (Object) village

In the case of my query we already know this data: postcode = 10110 (given), country = thailand (given), district = Watthana District (not given, but should be), sub-district = Sukhumvit (not given, but should be), city = Bangkok, province = Bangkok Metropolitan Area (not given, but should be), etc.

Can I make this change myself to let photon return all known data that is relevant to the query? If I would have queried a US street I want to know all data that is available too, so in that case it should return the state, among other fields.

I hope I expressed my request clearly, if not, please tell me. ;)

Integration external test cases

there are a bunch of existing test cases (from nominatim, komoot, ...) that can be integrated into our own test framework to help us finding bugs an improvments

[Question] Getting started

How does one import data without installing Nominatim? The readme says

curl http://localhost:4567/dump/import # not working yet!

so what should one do...?

how does search as you type work?

Could you provide an elaboration on the search as you type feature? Does it create a new query for every word that is inserted?

Moreover, is there an online demo available that I could check out?

Any feedback is highly appreciated.

Tom

spatial extent of search item

when searching for extended items like germany you will see a marker a wood in a very high zoom level.

currently one the centroid of the geometries is stored. if we would store the extent of items too, we could adjust the zoom level to see the entire item on the map.

Use alias 'photon' instead of index

Instead of creating an index called 'photon' it should be thought about creating an index called e.g. 'photon_1' or 'photon_' and add an alias then called 'photon'. This could make it easy in production to feed a new index without touching the old and fast switching. Not sure if this should be part of this project but this is an easy change and increases flexibility.

enable reverse geocoding

... and let other people know your feature on the project website

Support multiple names

Sometimes OSM items have more than one name which are stored in tags like int_name, alt_name, loc_name, official_name ...

photon would benefit to consider these names too

Provide filters

It always depends on how the search mechanism should work, but from my point of view filters could be generally very usable (and would be for our case).

Filters:
Provide a way to filter results, e.g.

q=Hauptstraße&city=Salzburg

and the result will only show results from the city of Salzburg.

and as an idea:

q=Hauptstraße&city=Salzburg&city:mode=fuzzy

this result will also include fuzzy matches, like
Salzberg and Salzbrug (typo).

Filters could be:

city
country
postcode (e.g. 5020 with high fuzzyness would match 5* and low fuzzyness 502?)
name
osm_key (or generalized version)
osm_value (or generalized version)

Why:
Often, when searching for addresses, the user is confronted with impartial and incorrect addresses. The only thing he knows is a part of the name and in which city or part of the city the address is in. This would help greatly for people who have to research addresses.

For frontend-development (i.e. the komoot main site), the query could be totally adjusted to the needs or even user preferences just by the passed parameters.

Additionally as crazy idea (don't know if it is possible with elasticsearch):
Provide a "near" parameter, so you have:

q=Haubdstr&near=(Salzburg  OR lat/lon)

and the search parameter can have lesser matching the nearer it gets to the point provided by "near".

test fails

QueryParsingException[[photon] script_score the script could not be loaded

Where is this script?

komoot / photon Goto Github PK

photon's People

Contributors

Stargazers

Watchers

Forkers

photon's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs