GithubHelp home page GithubHelp logo

Comments (15)

c0da avatar c0da commented on June 2, 2024 4

Hello @fbaligand ,
I'm trying it right now and processing logs from 2016 :) It seems it's working perfectly fine.. Thank you very much for this enhancement!

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024 2

Nice !
Happy to see that it match needs to both of you @c0da and @SolomonShorser-OICR !
The way to implement this feature is now quite clear to me, So I will do it soon !

Stay tuned... :)

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024 2

Hi @SolomonShorser-OICR and @c0da,

I have the pleasure to announce you that the feature is now released !
You can download right now v2.8.0 and use the brand new option "timeout_timestamp_field" that will allow you to process old logs !

Feel free to give feedback about the feature !

from logstash-filter-aggregate.

SolomonShorser-OICR avatar SolomonShorser-OICR commented on June 2, 2024 1

Hmmmm...

Maybe for timeouts that are based on a user-specified field, the timeout management is checked when each event (or n events) is processed.

Calculating timeouts would need to be based on the user-specified timestamp field. So when a user specifies that @log_based_timestamp should be used for the aggregation, new maps are created such that their "map create time" is set to the value of @log_based_timestamp when they are first created. A timeout is detected when the most recent @log_based_timestamp - "map create time" > "timeout amount".

Checking for expired maps when processing every n events might not be as performant as doing it every 5 minutes in real-time, but I might be willing to live with that if it lets me get aggregate data on old logs.

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024 1

@SolomonShorser-OICR
First, sorry for my late answer.
So tell me if this implementation matches your need :

  • add a new option named 'timeout_timetamp_field', that contains field name to use to compute timeout
  • when this option is set, each time an event arrives in aggregate filter, aggregate filter computes difference between event timestamp and aggregate map creation timestamp. If timeout is reached and push_map_as_event_on_timeout=true, map is pushed as new event and then removed from aggregate filter maps.

Does it match your need ?

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

Hi @SolomonShorser-OICR,

In the very first version of aggregate plugin, timeout management was based on event's @timestamp field.
And because of that, I had problems when parsing old logs (several days before). So I changed the code to use "current" time.
The problem I had was that timeout (set to 30 minutes) occurred before all "one task" logs were processed.
That's why I changed that to use "current" time.

For this reason, I'm afraid it's not possible to manage to match your need.

So that I understand better your need, could you provide your aggregate configuration that do what you need ?

from logstash-filter-aggregate.

SolomonShorser-OICR avatar SolomonShorser-OICR commented on June 2, 2024

I was trying to do get a count of "website visitors" for some web applications. It is very similar to this example. After a certain number of "clicks" or visits, I set some fields, depending on which parts of the site they visited.

There is no real "end" event that I can use to end the aggregation, so I was relying on "timeout" to end aggregation - if a user has not "clicked" after n minutes, it is considered a single session. The aggregation will work on logs that are currently being written, but I've been tasked with analyzing data in old logs.

I'd also mention that I'm creating a @log_timestamp which is based on the timestamp field in the log files, and I'm using that as the main timestamp field for the index in kibana. Since this field contains the historic timestamps, it would be nice to be able to aggregate based on @log_timestamp, and not @timestamp.

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

I should explain timeout management :
timeout management is launched every 5 seconds and look at all pending aggregate maps to know if there are "expired". And so, to know if a map is "expired", I compute "current time" - "map create time". If it is superior to timeout, it is expired.
So the problem is : how to detect "timeout occurs" based on a timestamp field which is very old ?
I mean : given that I detect timeout based on difference between "current time" and aggregate map initial timestamp field time, the timeout will occur always.
If you have a way to detect timeout based on an old timestamp field, tell me, I would be happy to know it.

from logstash-filter-aggregate.

c0da avatar c0da commented on June 2, 2024

that would be absolutely great, @fbaligand!!

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

@c0da Great to see you enjoy the feature !

@SolomonShorser-OICR does it match your need ?

from logstash-filter-aggregate.

SolomonShorser-OICR avatar SolomonShorser-OICR commented on June 2, 2024

@fbaligand That sounds like it would let the aggregate plugin work on historic data! :)

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

@SolomonShorser-OICR @c0da
You can find documentation to use the feature here :
https://www.elastic.co/guide/en/logstash-versioned-plugins/current/v2.8.0-plugins-filters-aggregate.html#v2.8.0-plugins-filters-aggregate-timeout_timestamp_field

from logstash-filter-aggregate.

SolomonShorser-OICR avatar SolomonShorser-OICR commented on June 2, 2024

@fbaligand I saw, this looks great! I've had to put some of my Kibana experimental work on the back-burner for a while, but I hope to get back to this and try it out in the next few weeks.

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

Great !
Happy to see you enjoy the feature !

from logstash-filter-aggregate.

fbaligand avatar fbaligand commented on June 2, 2024

As feature works as expected, I close the issue.

from logstash-filter-aggregate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.