GithubHelp home page GithubHelp logo

Aggregations about tantiny HOT 7 OPEN

baygeldin avatar baygeldin commented on July 2, 2024
Aggregations

from tantiny.

Comments (7)

baygeldin avatar baygeldin commented on July 2, 2024

Hey @jonian, thanks! Supporting aggregations would be nice, but currently I don't have any use for them, so it wasn't in my plans. What's your use case? And what kind of API are you interested in by the way?

from tantiny.

jonian avatar jonian commented on July 2, 2024

I'm using elasticsearch aggregations to build filters. Currently I use only term count and min/max on numeric fields. I want to move away from es and was thinking to use meilisearch with facets distribution, but I saw your project and really liked it.

Any API would be great, I don't have something specific in mind. Maybe something like the API provided by searchkick.

I would really like to contribute on this feature, but unfortunately my experience with rust is very limited.
If it is of any help/inspiration, here is a tantivy-aggregations repo.

from tantiny.

jonian avatar jonian commented on July 2, 2024

Tantivy has collectors that can be used for aggregations and especially MultiCollector and can be used when the Collector types are unknown at compile time.

from tantiny.

baygeldin avatar baygeldin commented on July 2, 2024

Yeah, I'm aware of collectors, but at the time I was writing the code I didn't come up with a good use case for them (apart from the obvious one for which I used the TopDocs collector), so I decided to keep things simple. But looking at the screenshot in the searchkick documentation makes me realize that aggregations are actually pretty useful and it would be great to add ability to customize what data is aggregated during the search (but only to some extent because implementing fully custom collectors is tricky for the same reason it's difficult to implement #17).

So, here's what I propose:

  • Add Tantiny::Collector object (in the same fashion Tantiny::Tokenizer is done). There will be some predefined collectors (TopDocs and others collectors from tantivy docs) that user could configure individually (again, same as tokenizers).
  • Add collectors: option to Tantiny::Index.new where we would pass an array of Tantiny::Collector objects which will be used by default during the search (and also allow to override it when calling the #search method).
  • Make #search return whatever the collector collected (in case of multiple collectors it should probably be a hash with collectors as keys and the data they collected as values).

from tantiny.

baygeldin avatar baygeldin commented on July 2, 2024

This does require some work, but at least it's more or less straighforward to implement. However, I don't know which collectors would cover your use case. As for filtering by numeric fields, it's already supported (check out the range_query), but I'm not sure what collector would work for aggregating the term count.

Maybe I will draft a PR in the next couple of weeks if I have spare time. You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).

P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.

from tantiny.

jonian avatar jonian commented on July 2, 2024

The proposed collector API looks great! Thank you for considering adding this feature. If only term count is added I think getting other aggregations like min, max, avg and mean can be calculated by the user. Is counting on numeric fields possible?

You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).

I was thinking I can contribute by adding some features like meilisearch-rails to make easier to integrate with ActiveRecord but if you think I can help with the rust extension I will give it a try.

P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.

I want to move away from elastic mainly because of the high memory usage. Right now it uses more memory than my apps with one being multi-tenant. Also the recent change in their license is another factor.

I will test Meilisearch in the coming weeks and I will share my experience. My use case is a bit complex, users define filters from the admin interface and elastic works well with that.

from tantiny.

baygeldin avatar baygeldin commented on July 2, 2024

Is counting on numeric fields possible?

Nope, currently only filtering.

from tantiny.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.