GithubHelp home page GithubHelp logo

Comments (18)

martent avatar martent commented on June 3, 2024

We need to use a search engine with fuzzy search support that works well for the Swedish mapping of pronunciation/spelling. We have several options, make a switch from MySQL to Postgres as the general DB. The latter have better support for this. Use a separate free text search engine like Elastic or Sphinx with support for fuzzy search.

Let’s discuss the level of ambition for this.

Related to finding employees based on free text info #13.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Should we have an AFK discussion regarding this and #13?

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

yes.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024
  1. Evaluate of ElasticSearch is the right search server to use.
  2. Investigate if we
    1. need to set it up on a new virtual instance, or
    2. should move MySQL to a separate virtual instance, or
    3. should add more resources to the existing server, or
    4. should lower the level of aggression on the news update queue worker
  3. Integrate the employee directory with the text search engine.
  4. Fine-tune weighting of fields in the search.
  5. Develop a UI that will not make simple name searches we have today polluted.

The above will be time-boxed.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Fuzzy search for autocomplete is ready to setup and deploy in test as soon as there is 2GB more RAM on the instance. We may not be able to run Elasticsearch for both test and prod on the same instance, but for the first deployment we can fine-tune things for test first and then turn it off before deploying into prod.

The Levenshtein editing distance is set to 2, meaning that you can have to spelling errors in the string and still get a match. Correct matches are scored higher. The setting can be changed to 1 or to a percentage of the matching with the indexed terms.

The current settings for index and search analyzers has a few experimental features that can be evaluated and either expanded or abandoned. The following synonyms are used:

- "carlsson, carlson => karlsson"
- "karlson => karlsson"
- "hanson => hansson"
- "carl => karl"

And this character mappings are added:

- "û => y"
- "ph => f"

This means that the search strings "hanson" and "hansson" is the exact same search, as well as "bylund" and "bûlund". This is both good and bad. Good if the user is wrong about the spelling, bad if s/he is right.

The search also matches phone and cell phone numbers. This does not make any noise in the search. The matching is done from the back and not from the front of the indexed term as it is for names since the phone numbers in the directory are cut of in the front randomly.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

The server is configured and a test version is deployed with the settings mentioned in the previous comment.

I will mess with the settings for Elasticsearch on Wednesday, no results in the in the autocomplete means I'm working on something, sort of. I will also split the prod and test indices and search clients.

In the scope of this release, fuzzy search is enabled in the autocomplete in the masthead and on the search page but the full search results is not fuzzy. This means that if you typ svennevall you will get two items in the autocomplete but if you execute a search by hitting enter, pressing search or selecting "View all matches" you will not get any matches. We can hide the "View all" to get around this strangeness until we deploy a fuzzy search for the full search as well.

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

Spelling error in the bottom of the suggest list: Visa all alla träffar"

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

Testing the fuzzyness. "anna-karin jangmar" does not give an auto suggest for Anna-Karin Jangmark. An example of too much fuzzyness?

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

"jan-inge ahlfridh" is also a no autosuggest results search query. I think there is a problem with how elastic handles hypen.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Typo fixed in malmostad/intranet-assets@ea4c279

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Yes, there is a problem with the hyphen. Not in Elastic, but in my Elastic text analyzer 🉑

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Fixed the hyphen bug and the typos above. Also boosted exact matches in a better way.

New version deployed in test.

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

Ok. Deploy in prod!

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Autocomplete out in production.

Next step is to jack in Elastic in the full search.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Full search in test is now using Elastic with the same text analyzers as the autocomplete.

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

The "123 employees matched your query" is a little bit weird, it doesn't say anything about how many Svens we have but how fuzzy we are.

from intranet-dashboard.

jesperbylund avatar jesperbylund commented on June 3, 2024

Good! Deploy in prod!

from intranet-dashboard.

martent avatar martent commented on June 3, 2024

Out in production.

from intranet-dashboard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.