Comments (15)
Hello @fbaligand ,
I'm trying it right now and processing logs from 2016 :) It seems it's working perfectly fine.. Thank you very much for this enhancement!
from logstash-filter-aggregate.
Nice !
Happy to see that it match needs to both of you @c0da and @SolomonShorser-OICR !
The way to implement this feature is now quite clear to me, So I will do it soon !
Stay tuned... :)
from logstash-filter-aggregate.
Hi @SolomonShorser-OICR and @c0da,
I have the pleasure to announce you that the feature is now released !
You can download right now v2.8.0 and use the brand new option "timeout_timestamp_field" that will allow you to process old logs !
Feel free to give feedback about the feature !
from logstash-filter-aggregate.
Hmmmm...
Maybe for timeouts that are based on a user-specified field, the timeout management is checked when each event (or n events) is processed.
Calculating timeouts would need to be based on the user-specified timestamp field. So when a user specifies that @log_based_timestamp
should be used for the aggregation, new maps are created such that their "map create time" is set to the value of @log_based_timestamp
when they are first created. A timeout is detected when the most recent @log_based_timestamp
- "map create time" > "timeout amount".
Checking for expired maps when processing every n events might not be as performant as doing it every 5 minutes in real-time, but I might be willing to live with that if it lets me get aggregate data on old logs.
from logstash-filter-aggregate.
@SolomonShorser-OICR
First, sorry for my late answer.
So tell me if this implementation matches your need :
- add a new option named 'timeout_timetamp_field', that contains field name to use to compute timeout
- when this option is set, each time an event arrives in aggregate filter, aggregate filter computes difference between event timestamp and aggregate map creation timestamp. If timeout is reached and
push_map_as_event_on_timeout=true
, map is pushed as new event and then removed from aggregate filter maps.
Does it match your need ?
from logstash-filter-aggregate.
In the very first version of aggregate plugin, timeout management was based on event's @timestamp field.
And because of that, I had problems when parsing old logs (several days before). So I changed the code to use "current" time.
The problem I had was that timeout (set to 30 minutes) occurred before all "one task" logs were processed.
That's why I changed that to use "current" time.
For this reason, I'm afraid it's not possible to manage to match your need.
So that I understand better your need, could you provide your aggregate configuration that do what you need ?
from logstash-filter-aggregate.
I was trying to do get a count of "website visitors" for some web applications. It is very similar to this example. After a certain number of "clicks" or visits, I set some fields, depending on which parts of the site they visited.
There is no real "end" event that I can use to end the aggregation, so I was relying on "timeout" to end aggregation - if a user has not "clicked" after n minutes, it is considered a single session. The aggregation will work on logs that are currently being written, but I've been tasked with analyzing data in old logs.
I'd also mention that I'm creating a @log_timestamp
which is based on the timestamp field in the log files, and I'm using that as the main timestamp field for the index in kibana. Since this field contains the historic timestamps, it would be nice to be able to aggregate based on @log_timestamp
, and not @timestamp
.
from logstash-filter-aggregate.
I should explain timeout management :
timeout management is launched every 5 seconds and look at all pending aggregate maps to know if there are "expired". And so, to know if a map is "expired", I compute "current time" - "map create time". If it is superior to timeout, it is expired.
So the problem is : how to detect "timeout occurs" based on a timestamp field which is very old ?
I mean : given that I detect timeout based on difference between "current time" and aggregate map initial timestamp field time, the timeout will occur always.
If you have a way to detect timeout based on an old timestamp field, tell me, I would be happy to know it.
from logstash-filter-aggregate.
that would be absolutely great, @fbaligand!!
from logstash-filter-aggregate.
@c0da Great to see you enjoy the feature !
@SolomonShorser-OICR does it match your need ?
from logstash-filter-aggregate.
@fbaligand That sounds like it would let the aggregate plugin work on historic data! :)
from logstash-filter-aggregate.
@SolomonShorser-OICR @c0da
You can find documentation to use the feature here :
https://www.elastic.co/guide/en/logstash-versioned-plugins/current/v2.8.0-plugins-filters-aggregate.html#v2.8.0-plugins-filters-aggregate-timeout_timestamp_field
from logstash-filter-aggregate.
@fbaligand I saw, this looks great! I've had to put some of my Kibana experimental work on the back-burner for a while, but I hope to get back to this and try it out in the next few weeks.
from logstash-filter-aggregate.
Great !
Happy to see you enjoy the feature !
from logstash-filter-aggregate.
As feature works as expected, I close the issue.
from logstash-filter-aggregate.
Related Issues (20)
- Documentation update for use case 4 HOT 9
- Error with aggregate_maps_path HOT 3
- multiple aggregate with different task_id confused HOT 2
- [@metadata][something] missing in aggregated event HOT 5
- Logstash filter Aggregate with multiple fields
- Need able to aggregate the inner structure HOT 6
- Logstash automatically loading the config file HOT 1
- [UPDATE QUERY] - pushes same data multiple times in nested array json HOT 8
- NoMethodError: undefined method `multi_filter' for nil:NilClass HOT 3
- Timeout values of one aggregate block affect another aggregate block (with different task_id pattern) HOT 4
- final flusher does not flush event when push_map_as_event_on_timeout used HOT 3
- how to split jdbc result and then migrating to nested array HOT 5
- testing aggregates => LogStash::ConfigurationError: Aggregate plugin: more than one filter which defines timeout options. But only defining once. HOT 3
- Pipeline crash if timeout_timestamp_field is missing from event HOT 3
- Pipeline crash when the aggregate maps is loaded from a file.aggregate
- Aggregate + JDBC HOT 6
- Can't set timeout value based on event filed or metadata HOT 4
- How can i aggregate fix amount of events ? HOT 9
- Aggregate non ordered logs, array of value HOT 7
- aggregate nested not work HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logstash-filter-aggregate.