GithubHelp home page GithubHelp logo

key's Introduction

key

Key Search engine

KEY is a python3.7 implementation of basic search engine, suitable for using as the autosuggest component. Based on nested dict Trie data structure, and aims to keep low memory footprint:

memory

Features:

  • Loads denormalized geodata records from csv.

  • Optimizes space usage by internally normalizing data via parsing and linking parent entities.

  • Creates parent entities if they are missing in dataset. This allows to quickly fill parent data from the most detailed child entities.

  • Supports flexible lookup by parsing record names into current and old name (specified in parentheses).

  • Supports partial lookup by adding records to DictTrie (HashMap Trie) including all word suffixes.

  • Provides import/export to normalized csv with parent ids.

  • Handles lookups in wrong keyboard layout (e.g. key -> лун).

  • Provides CLI tool for import/export, and interactive query mode.

  • Provides simple Flask backend with search endpoint, and React frontend for fullstack experience.

Live version

Things to improve

  • Explore Suffix automaton (MA-FSA, DAWG), which could be used to compress tree branches, but it's not suitable for data that's changed often. However, a balanced combination could be found for items with less frequent updates.

  • Explore CTrie data structure, which relies on atomic CAS operation. Python can be expanded with atomos, which aims to provide support of said operations.

  • Multiprocess data import/export, grouped by geo type.

  • Serialize and export Trie to speed up initialization process.

  • Optimize Address names and keep them in separate index. Most often they have same name for all languages {street_id : address_numbers}

  • Search results ranking should be added. Results matched by prefix should be shown first, and then all results matched in the middle of the word. One idea is to track what is searched more often and rank those items higher. Second idea is to sort results by geo type, increasing/decreasing area.

  • More intelligent lookups can be added, to account for typos and mixed/wrong cyrillic layouts, transliteration.

  • Database could handle some work, but for simplicity sake all indexes are in memory to simulate some kind of cached storage (which is not used again for simplicity).

  • Pandas for csv could be used, but standard csv module with few tricks is more than enough.

Notable mentions:

key's People

Contributors

orlovol avatar

key's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.