Comments (7)
Hi Daniel,
Currently the results of a term search in Snowstorm are sorted by length only.
Shorter terms should come back first. Alphabetical sorting is not currently implemented in any language so sorting by languages other than english has not been used.
Sorting by length seems to work well. Is this sufficient for you?
Kai
from snowstorm.
Hi Kai,
this is not so much about sorting (which is relevant as well) as it is about character matching. In Swedish o and ö are distinct characters and should not match, while e.g. in German ö is just a variant (umlaut) of o and here they do match. This is kept in different collation rules for each language.
So, this is a quite important function for non-English languages, but it has basic support in elastic, see https://www.elastic.co/guide/en/elasticsearch/guide/master/character-folding.html
/Daniel
from snowstorm.
For reference, I've added mongodb collations to the sct-snapshot-rest-api in this commit: danka74/sct-snapshot-rest-api@e704c32
from snowstorm.
Did some testing with a local installation of snowstorm. Currently it seems that strings are matched binary not using any collation rules, e.g. searching for "magyar agar" returns no hits whereas "magyar agår" returns 132436001 | Magyar Agår dog breed (organism) | whereas the snapshot-api uses hardcoded folding of characters (e.g. 'å' becomes 'a') if selected.
from snowstorm.
Hi @danka74, sorry slow response, I've been away.
Yes, the current behaviour is to not convert any special characters to a simpler form during search but match using the variant given. It sounds like this is not adequate for some languages like German. Thanks for your example to help me understand this.
Although we have the language code to hand when we index Description components it may not be necessary to change the analyser at index time. The simplest approach may be to rely on the request language header and to use a different search analyser based on the language being requested. If terms in the German language are being requested both the exact characters in the search string and the folded version could be used to match descriptions. Matches against the original search characters should probably be given a greater search score. Would that work for you?
from snowstorm.
For the record; Daniel and I have started a branch to collaborate on this feature. We will play around to find the best Elasticsearch settings. We have identified that it would be best to set the correct Elasticsearch language analyser at index time. Using the Description language code field during import / component creation to set the analyser is a possible solution. We will continue to play with this as time allows.
from snowstorm.
Closing as duplicate of #41 which has had more recent chatter and is now fixed in dev.
from snowstorm.
Related Issues (20)
- Add Target active status to relationship HOT 2
- getting error while uploading extensions HOT 5
- Dependant Version Effective Time is set with the same value for all versions HOT 3
- Trying to integrate kibana here HOT 3
- HOW TO RUN SNOWSTORM IN ECS CLUSTER? HOT 8
- Running snowstorm on OpenJDK 11.0.11 fails - (Looks like disk space issue) HOT 3
- Snowstorm FHIR API does not interpret ECL v2.0 query correctly (returning refset member fields) HOT 6
- Implicit ValueSets created via $expand should have a status and a timestamp HOT 3
- Failed RF2 SNAPSHOT import on branch MAIN/SNOMEDCT-ES HOT 5
- Problem with ECL matching role grouped [0..0] cardinality
- Loading concepts with embedded definitions HOT 1
- Performance improvements HOT 3
- Problem with language in a query via api HOT 2
- Error un load snomed in snowstorm HOT 3
- Get ICD10 Code HOT 2
- Classifications HOT 9
- --exit flag is not working properly HOT 3
- API FHIR validation HOT 3
- Failed to import IPS Terminology RF2 file HOT 4
- get problems according to human body system HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snowstorm.