Comments (8)
Hi there :)
When you're dealing with text search, it is more interesting to have not a hash table, but a prefix tree or a ngram tree, so you can execute partial searches on terms.
A trie is fairly simple to implement with only vanilla libs, but specializing it to ngrams may need some external libs or way more work.
Some inspirational examples:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
https://whoosh.readthedocs.io/en/latest/ngrams.html
One other thing is in regards to Data/Search index model. In most search engines, you have a clear separation between the data you store and the search indexes you use. So a good thing on this idea is to turn the Data model into a K/V store, but keep an index for it in another data structure, so that the data is easily addressable via Key, but searchable via a more elaborate search index :)
from dialetus-service.
from dialetus-service.
Wow, that is amazing! I liked the proposal, thinking about scalability, long-term and even about other projects that may arise in the future. Please send a PR for the project!
from dialetus-service.
I'll work on that, but I'm kinda slow these times working in something quite complex. I will start writing the implementation and tech specs for it. 🎉
from dialetus-service.
@mateusduboli You're absolutely right about using trees for search purposes, I was being simplistic in my solution and didn't consider a better searchable solution for looking for words. 🤦♂
Instead of having a user readable JSON file, we would design some logic to retrieve data from searched characters. It makes sense to me. Is this aligned with the project long run expectations @mvfsillva?
from dialetus-service.
from dialetus-service.
I found it super interesting, I did not know the ngram tree, I think it completely aligns with the expectation of the project.
If I understand this correctly it will improve the performance to look for words and also in the semantics of the data storage
from dialetus-service.
Hey, guys, @mtmr0x @mateusduboli let's do it \0/
from dialetus-service.
Related Issues (20)
- New property HOT 15
- New property: Location HOT 1
- Update search endpoint response HOT 1
- Update the project to serverless
- List of States in Brazil
- Refactor: move database to supabase
- add dialect Mato Grosso do Sul state region
- add dialect Rio Grande do Sul state region HOT 1
- add dialect Santa Catarina state region HOT 2
- add dialect São Paulo state region HOT 2
- add dialect Espírito Santo state region
- add dialect Sergipe state region
- add dialect Alagoas state region HOT 2
- add dialect Pernambuco state region
- add dialect Paraíba state region
- add dialect Ceará state region
- add dialect Rio Grande do Norte state region
- Looking for an specific expression HOT 4
- Each word have a synonyms HOT 3
- problem with deploy in now 2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dialetus-service.