GithubHelp home page GithubHelp logo

Comments (7)

kaicode avatar kaicode commented on August 10, 2024

Hi Daniel,

Currently the results of a term search in Snowstorm are sorted by length only.
Shorter terms should come back first. Alphabetical sorting is not currently implemented in any language so sorting by languages other than english has not been used.

Sorting by length seems to work well. Is this sufficient for you?

Kai

from snowstorm.

danka74 avatar danka74 commented on August 10, 2024

Hi Kai,
this is not so much about sorting (which is relevant as well) as it is about character matching. In Swedish o and ö are distinct characters and should not match, while e.g. in German ö is just a variant (umlaut) of o and here they do match. This is kept in different collation rules for each language.
So, this is a quite important function for non-English languages, but it has basic support in elastic, see https://www.elastic.co/guide/en/elasticsearch/guide/master/character-folding.html
/Daniel

from snowstorm.

danka74 avatar danka74 commented on August 10, 2024

For reference, I've added mongodb collations to the sct-snapshot-rest-api in this commit: danka74/sct-snapshot-rest-api@e704c32

from snowstorm.

danka74 avatar danka74 commented on August 10, 2024

Did some testing with a local installation of snowstorm. Currently it seems that strings are matched binary not using any collation rules, e.g. searching for "magyar agar" returns no hits whereas "magyar agår" returns 132436001 | Magyar Agår dog breed (organism) | whereas the snapshot-api uses hardcoded folding of characters (e.g. 'å' becomes 'a') if selected.

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

Hi @danka74, sorry slow response, I've been away.

Yes, the current behaviour is to not convert any special characters to a simpler form during search but match using the variant given. It sounds like this is not adequate for some languages like German. Thanks for your example to help me understand this.

Although we have the language code to hand when we index Description components it may not be necessary to change the analyser at index time. The simplest approach may be to rely on the request language header and to use a different search analyser based on the language being requested. If terms in the German language are being requested both the exact characters in the search string and the folded version could be used to match descriptions. Matches against the original search characters should probably be given a greater search score. Would that work for you?

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

For the record; Daniel and I have started a branch to collaborate on this feature. We will play around to find the best Elasticsearch settings. We have identified that it would be best to set the correct Elasticsearch language analyser at index time. Using the Description language code field during import / component creation to set the analyser is a possible solution. We will continue to play with this as time allows.

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

Closing as duplicate of #41 which has had more recent chatter and is now fixed in dev.

from snowstorm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.