komoot / photon Goto Github PK
View Code? Open in Web Editor NEWan open source geocoder for openstreetmap data
License: Apache License 2.0
an open source geocoder for openstreetmap data
License: Apache License 2.0
Do you intend do move the query python code into Java e.g. via spark?
There should be another filter parameter in the API to search in a specific area (bounding box) for a specific tag combination or even a set of combinations ('I want food').
For that osm_key and osm_value have to be indexed in the mapping. Or maybe we make this configurable if this increases size too much.
Photon should be able to do searches over all available language variants mapped in OSM (i.e. all name:* tags). Bonus points if it can improve search results by guessing the language of the query right and re-weight the results accordingly.
In README it is indicated that demo UI is in src/main/python/demo
, it seems to be in website/
or it's not the demo UI in that dir ?
How does location bias work?
On http://photon.komoot.de/api results seem to be only loosely influenced by different lat|lon parameters, and some results that are much farther from the given point, come before others that are much nearer.
Is it possible to order results based on distance from the point given as lat|lon?
Would it be possible to provide a bounding box to search, so that search wouldn't return results outside that box?
So we can track the search performance of photon. Then we can improve photon's search logic and know if it's getting better.
see https://hackpad.com/Photon-sprint-2-r8OrAVWa2oI#:h=Testcases-and-user-feedback
Using wordending is kind of a workaround for nedgegram searches like
berlin erlange
which would match berlinerstraße erlangen
but better should only match stuff like 'berlin erlange*'.
When this workaround is used - why not avoid edge ngram at all and tokenize the query, plus do a prefix query for the last term? This would save space and memory with same quality. The only problem could be performance but my simple tests for small data don't tell me problems there.
You mention in the readme that it can take up to 12 days to create the nominatim database dump from scratch. I spend a bit of time recently coming up with a way to convert the osm planet xml dump to a joined json data set that you can build in around 15 hours. All it does is join nodes, ways, and relations but from there it should be quite easy to extract whatever information you need without having to setup nominatim.
I've actually been considering doing a little geocoder on top of elastic search based on this so I'm quite interested in your experiences with photon.
If you are interested, the project for my conversion tool is here: https://github.com/jillesvangurp/osm2geojson
Searching
Via Ca' de Volpi
yields no result even if a street with the exact name exists, see this way
Thanks.
Cristian
Following initial step from the quickstart (at https://github.com/komoot/photon) fails ...
# important: we do not yet provide this dump, creation will be finished soon
java -jar target/photon-0.1-SNAPSHOT.jar -import-snapshot http://photon.komoot.de/data/world.zip
... because http://photon.komoot.de/data/world.zip is not available any more!
Where can I download world.zip
?
The Nominatim importance field is not imported from Nominatim to Solr.
I think it should be available to be used in boost functions on some cases :
I plan to use photon with a map and sort search results according top map bounds/center.
But...
I would like queries like "Eiffel tower" or "Statue of Liberty" to push the original places first or at least fairly up in the search results (I can accept that if a replica is in or near map bounds, it shows up first).
I believe this could be accomplished by introducing importance field from nominatim and a custom query.
I think I can deal with the solr query, but the import part is tougher to me, since I am not exactly an experimented Java developer...
I'll give it a try (unless you think I am totally wrong), but if anyone feels like they can acheive this "easily" they're welcome.
when searching for street + housenumber, e.g. bödmerstraße 7
the right address can be found because both tokens can be found in collector's raw field.
searching for bödmerstr 7
won't be successful. bödmerstr
is found in the edgengram field but 7
got stripped because edgengram starts with 2 characters:
"photonngram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
check the performance and find bottlenecks of photon to evaluate usgage on osm.org
maybe sarah can provide some logs of search term from nominatim. on komoot we also logs of this kind
mount everest
can be found on photon, but many users are used to type mt everest
and don't get any results so far.
synonyms from the old solr implementation
https://github.com/komoot/photon/tree/master/solrconfig/testing/conf
pelias created a list too https://github.com/mapzen/pelias/blob/master/lib/pelias/tasks/synonym.rake
https://github.com/mapzen/pelias/blob/master/config/feature_synonyms.yml
installing nominatim and importing osm files is time consuming. people who don't care about continious updates will be very happy if we provide data dumps (world / country extracts in most common languages). setup photon will then be dead easy and fast.
I wanted to try photon with Italy data I started yesterday the import and it is not over yet: how do I see where they are? Is there available an import log?
Thank you
Alessandra
While trying to build photon I get the following build failure
/photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,17] error: unmappable character for encoding ASCII
The same error appears at other places in the same file; it would seem the character used in the source for the single quote in error messages ("Can´t...") is not mappable in ASCII: probably a curly quote instead of a straight single quote?
Would it be possible to search/replace that character in the source?
I think "Can't" would work, but "Can´t" can't.
Full stack trace below:
[DEBUG] Source roots:
[DEBUG] /photon/src/main/java
[INFO] Compiling 16 source files to /photon/target/classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.332s
[INFO] Finished at: Wed May 28 09:05:31 UTC 2014
[INFO] Final Memory: 15M/481M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project photon: Compilation failure: Compilation failure:
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[131,18] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[205,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[205,18] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[230,17] error: unmappable character for encoding ASCII
[ERROR]
[ERROR] /photon/src/main/java/de/komoot/photon/importer/elasticsearch/Server.java:[230,18] error: unmappable character for encoding ASCII
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project photon: Compilation failure
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure
at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:516)
at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:114)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Right now, the server keeps running after import and dump actions.
This might not be the desired behavior all the time. How about one of these solutions:
1.) Introduce a "-shutdown" parameter, that shuts down photon after executing imports/export/indexing.
2.) Shut down automatically after one of the i/o actions.
I would refactor the code in App.java and send it as a pull request if so desired. Also I can add more description to the docs about the behavior.
I would like at some time to import other sources than OSM into Photon, like the datasets from openaddresses.io or the BANO project, as soon as the licence is compliant.
This will need some changes in the mapping, like having source
and source_id
keys (instead of just osm_id
)
And in the importer, we will need to find a way not to import the data from OSM where it can duplicate with those other sources (that we think are "better" than OSM if we chose to import them), like skipping all place=house
if country=France
, etc.
I will work on importing some BANO data, to have a first use case that we can discuss.
When I am trying to upload iceland data I got some errors of missing fields. I had to add thes fields to the schema:
Probably I made some errors during the installation.
Hi,
I tried to use NominatimImporter with the options described in documentation sample, but it seems that -l option doesn't exist (reading the source code I don't find it).
greetings
Mirko
currently small typos like jamiaca
instead of jamaica
return no result. typos are very common in the world of search engines, for sure there are already best practices to solve it with elastic search.
searching for the village mittelberg
reveals a lot of places in mittelberg but the actual village cannot be found.
To show the results for areas at the correct scale it would be helpful if the bounding box could be provided directly as part of the result. With the current osm webservices one would either request the full osm data via overpass/xapi or try to get the matching result from nominatim (which unfortunately does not support query by osm_id)
Another solution for the problem described in #56 could be to use an address parser. If the search backend is in Java one could use this address parser here
Sometimes one needs to do a geocoding lookup on an address that contains irrelevant data, such as an unknown condominium name. For example when searching for "Grand Parkview Asoke Unit 233/11 Sukhumvit 21" then none of the information is in OSM's database other than "Sukhumvit 21".
I would consider this an example of a weak partial match. For my use case, such irrelevant data needs to be ignored and therefore partial matches need to be returned too. E.g. http://photon.komoot.de/api/?q=grand%20parkview%20asoke%20sukhumvit%2021 should return the same as http://photon.komoot.de/api/?q=20sukhumvit%2021
Is it possible to perhaps add a score index to each returned feature, such that one can programatically detemrine if the result's score is high enough to be displayed? This varies by application.
Is it expected that some entries don't have a 'default' or 'en' entry? Even worse they contain 'null' (not printed here as gson is too clever)
{
"_index": "photon",
"_type": "place",
"_id": "1406467",
"_score": 10.824916,
"_source": {
"osm_key": "place",
"city": {
"default": "Frankfort"
},
"osm_value": "village",
"osm_id": "306444989",
"ranking": 11,
"context": {
"default": "Saint Mary, Middlesex County"
},
"id": "1406467",
"country": {
"de": "Jamaika",
"it": "Giamaica",
"fr": "Jamaïque"
},
"coordinate": "18.418850,-77.053417",
"name": {
"default": "Frankfort"
}
}
}
Hi there,
I have imported a sub-region from geofabrik.de, imported it into nominatim database and finally started photon. Everything works quite charming, unfortunately the query localhost:2322/api?q=Gutenberg results in a "500 Internal Error" the console output is:
"Error spark.webserver.MatchFilter - java.lang.NullPointerException"
I have no idea whats wrong, do you have any suggestions?
Cheers
Ivan
Possible changes to the importer to speed up the process and reduce IO load:
place_addressline
directly instead of using get_addressdata
I am trying to install photon on Ubuntu. I have manually installed SOLR and tried to load the photon config. SOLR is not able to load collection1 without the cor.properties file.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
It's a sort of "dynamic" stop words.
One more normalization has to be done to improve search. E.g. Erlangerstraße will be split into "Erlanger straße". This has to be done while indexing and searching.
There is a plugin but it is GPL due to one used library, it could be less restrict but phyton code has to be ported to Java: jprante/elasticsearch-analysis-decompound#5
For POIs there is also often the Bahnhof vs. Hauptbahnhof problem. But probably we should get a main railway station that is not named like this also important in a different country. But probably this should be handled via a different fix: #318
Hi,
You mention in readme "It takes up to 10 days and sufficient RAM to import the entire world,".
Could you define "sufficient RAM" ?
I may be about to rent a new server for this purpose (unless "sufficient" is 300GB...)
yohan created a leaflet plugin (https://github.com/komoot/leaflet.photon) to easily use photon's public API in a leaflet map.
it is yet a secret, so let people know about it on the project page!
This is only reproduceable on local installations:
Some queries have after their results several fields with completely wrong results. E.g. using an extracted part of austria, the query
http://localhost:2322/api?lang=de&q=Kaisersch%C3%BCtzenstra%C3%9Fe%2017
returns normal results and afterwards there are several fields:
{
"properties": {
"osm_id": 31344,
"postcode": "17503",
"osm_value": "postcode",
"country": "Vereinigte Staaten von Amerika"
},
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-76.08483701582313,
39.93753065153189
]
}
},
Which shouldn't even be in the extract. Is there an error in the extraction, or does photon "invent" these results?
One could try this analyzer. Not sure if this is necessary if fuzzy is enabled. E.g. fuzzy is already able to find strasse vs. straße
it is likely that one street in osm is composed by multiple independent osm ways, e.g.:
at the moment when searching for this street Bödmerstraße
, photon returns multiple results of the same street. ideally, connected ways with the same name are merged on index time.
Address geocoding on a worldwide level is complicated for many reasons, of which one is that every single country lists addresses in different ways. For example, in some countries provinces and states are important (Netherlands), whereas in others districts and sub-districts are more important (Thailand).
When I query your API for a road in Bangkok, Thailand, like so:
http://photon.komoot.de/api/?q=sukhumvit
I get the following:
{
"features": [
{
"properties": {
"name": "Sukhumvit",
"osm_value": "neighbourhood",
"country": "Thailand",
"osm_id": 2203974487,
"osm_key": "place",
"postcode": "10110"
},
"geometry": {
"coordinates": [
100.565073,
13.73384
],
"type": "Point"
},
"type": "Feature"
},
{
"properties": {
"country": "Thailand",
"name": "Sukhumvit Road",
"osm_value": "tertiary",
"street": "Sukhumvit Road",
"osm_id": 232865089,
"osm_key": "highway"
},
"geometry": {
"coordinates": [
102.45796,
12.18993
],
"type": "Point"
},
"type": "Feature"
},
... ETC ...
],
"type": "FeatureCollection"
}
The first entry is the one I need, but the problem is that there is a lot more data in the OSM database about this entry than your API shows. For example, it should return the following entries when available, just like Nominatim does:
- (Object) administrative
- (Object) attraction
- (Object) city
- (Object) city_district
- (Object) clothes
- (Object) commercial
- (Object) country
- (Object) country_code
- (Object) county
- (Object) house_number
- (Object) pedestrian
- (Object) place
- (Object) postcode
- (Object) road
- (Object) state
- (Object) state_district
- (Object) suburb
- (Object) town
- (Object) village
In the case of my query we already know this data: postcode = 10110
(given), country = thailand
(given), district = Watthana District
(not given, but should be), sub-district = Sukhumvit
(not given, but should be), city = Bangkok
, province = Bangkok Metropolitan Area
(not given, but should be), etc.
Can I make this change myself to let photon return all known data that is relevant to the query? If I would have queried a US street I want to know all data that is available too, so in that case it should return the state, among other fields.
I hope I expressed my request clearly, if not, please tell me. ;)
there are a bunch of existing test cases (from nominatim, komoot, ...) that can be integrated into our own test framework to help us finding bugs an improvments
How does one import data without installing Nominatim? The readme says
curl http://localhost:4567/dump/import # not working yet!
so what should one do...?
Could you provide an elaboration on the search as you type feature? Does it create a new query for every word that is inserted?
Moreover, is there an online demo available that I could check out?
Any feedback is highly appreciated.
Tom
when searching for extended items like germany
you will see a marker a wood in a very high zoom level.
currently one the centroid of the geometries is stored. if we would store the extent of items too, we could adjust the zoom level to see the entire item on the map.
Instead of creating an index called 'photon' it should be thought about creating an index called e.g. 'photon_1' or 'photon_' and add an alias then called 'photon'. This could make it easy in production to feed a new index without touching the old and fast switching. Not sure if this should be part of this project but this is an easy change and increases flexibility.
... and let other people know your feature on the project website
Sometimes OSM items have more than one name which are stored in tags like int_name, alt_name, loc_name, official_name ...
photon would benefit to consider these names too
It always depends on how the search mechanism should work, but from my point of view filters could be generally very usable (and would be for our case).
Filters:
Provide a way to filter results, e.g.
q=Hauptstraße&city=Salzburg
and the result will only show results from the city of Salzburg.
and as an idea:
q=Hauptstraße&city=Salzburg&city:mode=fuzzy
this result will also include fuzzy matches, like
Salzberg
and Salzbrug
(typo).
Filters could be:
Why:
Often, when searching for addresses, the user is confronted with impartial and incorrect addresses. The only thing he knows is a part of the name and in which city or part of the city the address is in. This would help greatly for people who have to research addresses.
For frontend-development (i.e. the komoot main site), the query could be totally adjusted to the needs or even user preferences just by the passed parameters.
Additionally as crazy idea (don't know if it is possible with elasticsearch):
Provide a "near" parameter, so you have:
q=Haubdstr&near=(Salzburg OR lat/lon)
and the search parameter can have lesser matching the nearer it gets to the point provided by "near".
QueryParsingException[[photon] script_score the script could not be loaded
Where is this script?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.