Comments (7)
Hey @jonian, thanks! Supporting aggregations would be nice, but currently I don't have any use for them, so it wasn't in my plans. What's your use case? And what kind of API are you interested in by the way?
from tantiny.
I'm using elasticsearch aggregations to build filters. Currently I use only term count and min/max on numeric fields. I want to move away from es and was thinking to use meilisearch with facets distribution, but I saw your project and really liked it.
Any API would be great, I don't have something specific in mind. Maybe something like the API provided by searchkick.
I would really like to contribute on this feature, but unfortunately my experience with rust is very limited.
If it is of any help/inspiration, here is a tantivy-aggregations repo.
from tantiny.
Tantivy has collectors that can be used for aggregations and especially MultiCollector and can be used when the Collector types are unknown at compile time.
from tantiny.
Yeah, I'm aware of collectors, but at the time I was writing the code I didn't come up with a good use case for them (apart from the obvious one for which I used the TopDocs collector), so I decided to keep things simple. But looking at the screenshot in the searchkick documentation makes me realize that aggregations are actually pretty useful and it would be great to add ability to customize what data is aggregated during the search (but only to some extent because implementing fully custom collectors is tricky for the same reason it's difficult to implement #17).
So, here's what I propose:
- Add
Tantiny::Collector
object (in the same fashionTantiny::Tokenizer
is done). There will be some predefined collectors (TopDocs and others collectors from tantivy docs) that user could configure individually (again, same as tokenizers). - Add
collectors:
option toTantiny::Index.new
where we would pass an array ofTantiny::Collector
objects which will be used by default during the search (and also allow to override it when calling the#search
method). - Make
#search
return whatever the collector collected (in case of multiple collectors it should probably be a hash with collectors as keys and the data they collected as values).
from tantiny.
This does require some work, but at least it's more or less straighforward to implement. However, I don't know which collectors would cover your use case. As for filtering by numeric fields, it's already supported (check out the range_query
), but I'm not sure what collector would work for aggregating the term count.
Maybe I will draft a PR in the next couple of weeks if I have spare time. You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).
P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.
from tantiny.
The proposed collector API looks great! Thank you for considering adding this feature. If only term count is added I think getting other aggregations like min, max, avg and mean can be calculated by the user. Is counting on numeric fields possible?
You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).
I was thinking I can contribute by adding some features like meilisearch-rails to make easier to integrate with ActiveRecord but if you think I can help with the rust extension I will give it a try.
P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.
I want to move away from elastic mainly because of the high memory usage. Right now it uses more memory than my apps with one being multi-tenant. Also the recent change in their license is another factor.
I will test Meilisearch in the coming weeks and I will share my experience. My use case is a bit complex, users define filters from the admin interface and elastic works well with that.
from tantiny.
Is counting on numeric fields possible?
Nope, currently only filtering.
from tantiny.
Related Issues (8)
- Error loading library HOT 8
- Custom tokenizer HOT 4
- Term types HOT 3
- in memory indexing HOT 2
- Returning the match indexes along with results? HOT 3
- How do I release Lock? HOT 1
- Failed to build gem HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tantiny.