logstash-plugins / logstash-filter-aggregate Goto Github PK

The aim of this filter is to aggregate informations available among several events (typically log lines) belonging to a same task, and finally push aggregated information into final task event.

License: Apache License 2.0

Ruby 100.00%

logstash-filter-aggregate's Introduction

Aggregate Logstash Plugin

This is a plugin for Logstash.

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

Documentation

Latest aggregate plugin documentation is available here.

Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one central location.

For formatting code or config example, you can use the asciidoc [source,ruby] directive
For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide

Changelog

Read CHANGELOG.md.

Need Help?

Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.

Developing

1. Plugin Developement and Testing

Code

To get started, you'll need JRuby with the Bundler gem installed.
Create a new plugin or clone and existing from the GitHub logstash-plugins organization. We also provide example plugins.
Install dependencies

bundle install

Test

Update your dependencies

bundle install

Run tests

bundle exec rspec

2. Running your unpublished Plugin in Logstash

2.1 Run in a local Logstash clone

Edit Logstash Gemfile and add the local plugin path, for example:

gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"

Install plugin

# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify

Run Logstash with your plugin

bin/logstash -e 'filter {awesome {}}'

At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.

2.2 Run in an installed Logstash

You can use the same 2.1 method to run your plugin in an installed Logstash by editing its Gemfile and pointing the :path to your local plugin development directory or you can build the gem and install it using:

Build your plugin gem

gem build logstash-filter-awesome.gemspec

Install the plugin from the Logstash home

# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify

Start Logstash and proceed to test the plugin

Contributing

All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.

Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.

It is more important to the community that you are able to contribute.

For more information about contributing, see the CONTRIBUTING file.

logstash-filter-aggregate's People

Contributors

Stargazers

Watchers

logstash-filter-aggregate's Issues

Aggregation without a clear final event?

Hi, i got a log file from a migration task and i want to aggregate logs but I have no defined final event in the loglines. is there a possible to merge the events from logdata like:

2016-05-03 15:18:55.128 [FINER] {CustomCorrelationID} Message Send to XY GroupID=T10RBE and MsgSegNumber="1"
2016-05-03 15:18:55.487 [FINER] {CustomCorrelationID} Message Send to XY GroupID=T10RBE and MsgSegNumber="2"

2016-05-03 15:18:56.148 [FINER] {CustomCorrelationID} Message Send to XY GroupID=X48ZUR and MsgSegNumber="1" ...

the message number debends on the groupid and varys from MsgSegNumber="1" to MsgSegNumber="645" (so no logical order like always 10 entires per groupid or sth), there is no marker for final event in the logs ;-/ the groupid entrys come one after another, so it's impossible to see before theres a line with a new GroupID and the MsgSegNumber = 1 to see theres a new migration event.

i like to aggregate like: "message" => "GroupId", "count_msgSegNumber" => 16 (the last value of each groupid)

Does Logstash log the misused Aggregate action ? (update action called on not created map)

I was wondering if currently there is an option to see if aggregate was called with map_action update on event that does not have map created yet (--debug option for logstash ?)

Background:

I had problem when some of my data in Elasticsearch had missing fields (maybe 5%). I've lowered number of concurrent workers pushing data to Redis and situation got better but still some of events were incomplete. Then I found that I have my aggregate setup wrong and some map creations were not created in scenario one event was sent too fast (push workers are not guaranteed to send events to redis in same order) and called with update on create action. After fixup all events are good.

So - is there an option to check it in logs ? I see that on misaction filter simply returns:

   aggregate_maps_element = @@aggregate_maps[task_id]
      if (aggregate_maps_element.nil?)
        return if @map_action == "update"

But does it have impact on logging ? So I can see like "no filter action taken" or something ?

Could we potentially add something like "tag_on_misaction" which will add custom tag on such events ? Like tag_on_misaction => "_aggregate_update_on_mapless_event"

Plugin fails verification during install

When installing this plugin via either a local gem or ruby gems, the plugin fails verification.
Tested on LS 1.5.0 and 1.5.2.

Catch and log exceptions during code call execution

I have rather complex aggregate setup. Sometimes there is an exceptions in one of the codes and it's really hard to get the view of the error.

Do you think it's good idea to re-raise exception with proper @codeblock data ? The simplest way should be simple include @code into exception so it's a lot easier to find out where the invalid code is.

I've monkey patched @codeblock.call to something like this

begin
   @codeblock.call(event, map)
rescue => exception
   raise RuntimeError.new("There is an error with \n#{@code}\n, Error:\n#{exception}")
end

It's not best idea but at least I have the @code which is likely to be unique in my setup.

Tests failing

http://build-eu-1.elasticsearch.org/job/logstash-plugin-filter-aggregate-unit/jdk=JDK7,nodes=metal-pool/18/console

/cc @fbaligand

Could you please extend example #4 in docs for this usecase

Using jdbc input plugin, you get these 3 events from :

  { "country_name": "France", "town_name": "Paris", "street": "Rue Baillet" }
  { "country_name": "France", "town_name": "Marseille", "street": "Boulevard Chave" }
  { "country_name": "USA", "town_name": "New-York", "street": "Washington Street" }

And you would like these 2 result events to push them into elasticsearch :


{ "country_name": "France",
  "places": [ 
                  {"town_name": "Paris", "street": "Rue Baillet"},
                  {"town_name": "Marseille", "street": "Boulevard Chave"}
            ]
 }
{ "country_name": "USA", "places": [{"town_name": "New-York", "street": "Washington Street"}] }

Cannot use logstash aggregate filter

I am using logstash aggregate filter within docker-elk container (https://github.com/deviantony/docker-elk), but when I up the container I find this error messages:

logstash_1       | [2017-04-26T12:39:28,726][ERROR][logstash.plugins.registry] Problems loading a plugin with {:type=>"filter", :name=>"aggregate", :path=>"logstash/filters/aggregate", :error_message=>"NameError", :error_class=>NameError, :error_backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:221:in `namespace_lookup'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:157:in `legacy_lookup'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:133:in `lookup'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:175:in `lookup_pipeline_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:137:in `lookup'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:88:in `plugin'", "(eval):76:in `initialize'", "org/jruby/RubyKernel.java:1079:in `eval'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:60:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:139:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:277:in `create_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:95:in `register_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:264:in `execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:183:in `run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}

logstash_1       | [2017-04-26T12:39:28,742][ERROR][logstash.agent           ] Cannot load an invalid configuration {:reason=>"Couldn't find any filter plugin named 'aggregate'. Are you sure this is correct? Trying to load the aggregate filter plugin resulted in this error: Problems loading the requested plugin named aggregate of type filter. Error: NameError NameError"}

dockerelk_logstash_1 exited with code 1

here's my logstash.conf:

input {
    tcp {
        port => 5000
        codec => multiline {
            pattern => "^%{TIMESTAMP_ISO8601} "
            negate => true
            what => previous
        }
    }
}

filter {

	if "Dumping Scrapy stats" in [message] {

		## Scraped items, invalid items
		grok{
				match => [ "message", "'item_scraped_count': %{NUMBER:scraped_items:int}" ]
		}
		grok{
				match => [ "message", "'invalid_items_count': %{NUMBER:invalid_items:int}" ]
		}

		aggregate {
        	task_id => "%{type}"
        	code => "
        		map['scraped_items_agg'] = event.get('scraped_items')
        		map['invalid_items_agg'] = event.get('invalid_items')
        		"
			map_action => "update"
			end_of_task => true
       		push_previous_map_as_event => true
    	    }

	}

	else{
		grok{
				match => [ "message", "Crawled iteration for merchant %{WORD:merchant_name} started" ]
			}

		aggregate {
        	task_id => "%{type}"
        	code => "
        		map['merchant_name_agg'] = event.get('merchant_name')
        		"
			map_action => "create"
    	    }			
		}

	if "_grokparsefailure" in [tags] {            
			drop { }
		}

	grok{
			match => [ "message", "%{DATE_EU:timestamp}" ]
	}
	date{
		   	match => [ "timestamp", "yy-MM-dd" ]
		   	target => "@timestamp"	
		   	}
}

output {
	if "_grokparsefailure" not in [tags]{
		stdout {
		codec => rubydebug
		}
	}
}

A sample input logfile:

2017-01-01 07:53:44 [monitor_utils.py] INFO: Crawled iteration for merchant ariika started
2017-01-01 07:53:44 [utils.py] INFO: UpdateCrawlIteration._start_crawl_iteration function took 0.127 s
2017-01-01 07:57:22 [statscollectors.py] INFO: Dumping Scrapy stats:
{'item_scraped_count': 22,
 'invalid_items_count': 84}

Last event does not aggregate properly

Version: ES 5.02 with elasticsearch-jdbc-2.3.4.1, openjdk version "1.8.0_111"
Operating System: ubuntu16.04.1, Oracle 12c on linux
Config File (if you have sensitive info, please remove it):
input {
jdbc {
# Oracle jdbc connection string to Oracle database
jdbc_connection_string => "jdbc:oracle:thin:@//host:port/sid"
# The user we wish to execute our statement as
jdbc_user => "xxx"
jdbc_password=> "xxx"
# The path to our downloaded jdbc driver
jdbc_driver_library => "PATH/ojdbc7.jar"
# The name of the driver class for Oracle
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
# our query
statement => "select t1.id,t1.name,t1.description, t2.date_of_birth from t1 inner join t2 on t1.ID=t2.T1_ID"
}
}
filter {
aggregate {
task_id => "%{id}"
code => "
map['id'] ||= event.get('id')
map['name'] ||= event.get('name')
map['description'] ||= event.get('description')
```
          map['date_of_birth'] ||= []
          map['date_of_birth'] <<= event.get('date_of_birth')
       "
       push_previous_map_as_event => true
       timeout => 5
```
}
}
output {
elasticsearch {
index => "test"
document_type => "aggregate"
document_id => "%{id}"
hosts => "host"
}
}
Sample Data:
DROP TABLE t1;

CREATE TABLE t1 (id NUMBER (10) NOT NULL, name VARCHAR2 (30 BYTE) NOT NULL, description VARCHAR2 (200 BYTE));

ALTER TABLE t1 ADD (
CONSTRAINT t1_pk
PRIMARY KEY
(id)
ENABLE VALIDATE);
COMMIT;

INSERT INTO t1
VALUES (1, 'name1', '1st name');

INSERT INTO t1
VALUES (2, 'name2', '2nd name');

INSERT INTO t1
VALUES (3, 'name3', '3rd name');

COMMIT;

drop table t2;
CREATE TABLE t2 (id NUMBER (10) NOT NULL, t1_id NUMBER (10) NOT NULL, date_of_birth DATE NOT NULL);

ALTER TABLE t2 ADD (
CONSTRAINT t2_pk
PRIMARY KEY
(id)
ENABLE VALIDATE);
COMMIT;

INSERT INTO t2
VALUES (1, 1, to_date('1/1/2000','mm/dd/yyyy'));
INSERT INTO t2
VALUES (2, 1, to_date('2/1/2000','mm/dd/yyyy'));
INSERT INTO t2
VALUES (3, 1, to_date('3/1/2000','mm/dd/yyyy'));
INSERT INTO t2
VALUES (4, 2, to_date('4/1/2000','mm/dd/yyyy'));
INSERT INTO t2
VALUES (5, 3, to_date('5/1/2000','mm/dd/yyyy'));
INSERT INTO t2
VALUES (6, 3, to_date('6/1/2000','mm/dd/yyyy'));
commit;

Steps to Reproduce:
Create the table in Oracle 12C as script provide above
Create the config file as provided above as oracle-out_test.conf
run:
sudo ./bin/logstash -f oracle-out_test.conf --path.settings=/etc/logstash

Results are as follows:
curl -XGET localhost:9200/test/_search?pretty
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "aggregate",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2016-12-06T22:54:35.107Z",
"date_of_birth" : [
"2000-04-01T08:00:00.000Z"
],
"name" : "name2",
"@Version" : "1",
"description" : "2nd name",
"id" : 2,
"tags" : [ ]
}
},
{
"_index" : "test",
"_type" : "aggregate",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2016-12-06T22:54:35.107Z",
"date_of_birth" : [
"2000-01-01T08:00:00.000Z",
"2000-02-01T08:00:00.000Z",
"2000-03-01T08:00:00.000Z"
],
"name" : "name1",
"@Version" : "1",
"description" : "1st name",
"id" : 1,
"tags" : [ ]
}
},
{
"_index" : "test",
"_type" : "aggregate",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2016-12-06T22:54:34.993Z",
"date_of_birth" : "2000-06-01T07:00:00.000Z",
"name" : "name3",
"@Version" : "1",
"description" : "3rd name",
"id" : 3,
"tags" : [ ]
}
}
]
}
}

Did you see that the last doc date_of_birth is: "date_of_birth" : "2000-06-01T07:00:00.000Z",
while the first doc is: "date_of_birth" : [
"2000-04-01T08:00:00.000Z"
],

Last doc does not aggregate.

Also, why is the mapping like this:
curl -XGET localhost:9200/test/_mapping?pretty
{
"test" : {
"mappings" : {
"aggregate" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"@Version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"date_of_birth" : {
"type" : "date"
},
"description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}

I would expect to have "type":"nested" somewhere. Did I do anything wrong?

Thanks

Timeout event generation

Hi guys!

I have seen a discussion in a different thread but it was closed. Today I came across a requirement for our usecase where we want to aggregate our data, but we do not have end events. So I googled a bit and found this and thought the timeout usecase would be ideal for me. I cloned your repo and implemented the following:

Add "timeout_code"
This field is the same as code, but it is code that will be executed for the timeout action. I am not sure if this is necessary or if I can simply reuse the code property.
Add "timeout_id" which is nothing but a string mapping towards the task_id. It defines how this will be mapped in the final event.
Add event generation for expired events.
Whenever flush is called, I now generate a new event per key in the aggregated_map. The key is added by default, as is the creation timestamp. They key is mapped to "timeout_id".
If no timeout_code is defined, nothing is happening (i am hoping this ensures backwards compatibility)
Changing the expiry:
Instead of comparing the timestamp of the creation, i now update a "last_modified" property each time there is a new event that is aggregated. This way I can aggregate over a longer period of time without having to restart the aggregation because timeout from the start event was found.

I added test cases for the changes that I have made.

You can find my changeset here: pandaadb@ba57fac

I would be happy to create a pull request if you guys think that this is a useful feature.

I am coming from a java background and am not too experienced with ruby coding, so if there are serious issues with my code, please give me a chance to correct them :)

Let me know what you think,

thanks,

Artur

timeout = 0, sets timeout to 1800 sec

According to the documentation:
"Timeout:
The default value is 0, which means no timeout so no auto eviction. "

But when I set the timeout to 0 is it set 1800s

Logstash Version: 2.3.2
Aggregate version: 2.1.1
Operating System: Raspbian GNU/Linux 8
Config File:

input {
  file {
    path => "/var/log/dnsmasq.log"
    start_position => "beginning"
    type => "dnsmasq"
  }
}

filter {
  if [type] == "dnsmasq" {
    grok {
      match =>  [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: ?(%{NONNEGINT:num} )?%{NOTSPACE:action} %{IP:clientip} %{MAC:clientmac} ?(%{HOSTNAME:clientname})?"]
      match =>  [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: ?(%{NONNEGINT:num} )?%{USER:action}?(\[%{USER:subaction}\])? %{NOTSPACE:domain} %{NOTSPACE:function} %{IP:clientip}"]
      match =>  [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: %{NOTSPACE:action} %{DATA:data}"]
    }

    if [action] =~ "DHCPACK" {
      if ![clientname] {
        mutate {
          add_field => { "clientname" => "No name" }
        }
      }
      aggregate {
        task_id => "%{clientip}"
        code => "map['clientmac'] = event['clientmac']; map['clientname'] = event['clientname'];"
        map_action => "create_or_update"
        # timeout = 0 sets the timeout to the default value 1800 seconds.
        timeout => 172800
      }
    } else if [action] == "query" {
      aggregate {
        task_id => "%{clientip}"
        code => "event['clientmac'] = map['clientmac']; event['clientname'] = map['clientname']"
        map_action => "update"
      }
      if ![clientname] {
        mutate {
          add_field => { "clientname" => "%{clientip}" }
        }
      }
      if ![clientmac] {
        mutate {
          add_field => { "clientmac" => "%{clientip}" }
        }
      }
    } else if [action] == "reply" {
        mutate {
          rename => { "clientip" => "serverip" }
        }
      geoip {
        source => "serverip"
      }
    } else
    {
      drop{}
    }
  }
}
output {
 elasticsearch { hosts => ["localhost:9200"] }
}

Aggregate Function not working in Logstash version 5.X

Hi,
In previous i was using the version 2.3 for logstash in that aggregation was worked fine. Few days back i changed my ELK into latest version. Now im using logstash 5.2.0 in that aggregate(2.5.1) is not working anyone can help me out of it, Please find the below code for reference
Im trying to move the response file fields into request file. Creating aggregate fn in response and updating it in request, you find the both the type below

    input {
    beats{
		port => 10523
		}
	
}

filter 
{
	if [type] == "ddoa_req" {
		xml{
			source => "message"
			store_xml => false 
			remove_namespaces => true
			xpath => [				
				"//context/correlation/dealerID/text()","dlr_dlrCode"			
				
				
				]
		}
		
				mutate {
				gsub => [					
					"request_fileName","-",""			
						]									
				}
			
		mutate{
			remove_field => ["message","type"]
		}
		aggregate {
				task_id => "%{request_fileName}"
				code => "event.set('response_fileName', map['response_fileName']) 
						event.set('app_res_releaseID', map['app_res_releaseID']) 
						event.set('app_res_creatorNameCode', map['app_res_creatorNameCode'])
						event.set('app_res_senderNameCode', map['app_res_senderNameCode'])
						event.set('app_res_sysVersion', map['app_res_sysVersion'])
						event.set('app_res_creationDateTime', map['app_res_creationDateTime'])
						event.set('app_res_bodID', map['app_res_bodID'])
						event.set('app_res_destinationName', map['app_res_destinationName'])
						event.set('response_desc', map['response_desc'])
						event.set('response_status', map['response_status'])
						event.set('response_reason', map['response_reason'])"
						map_action => "update"
						end_of_task => true
					}
					fingerprint {
                                source => ["request_fileName"]
                                target => "fingerprint"
                                key => "78787878"
                                method => "SHA1"
                                concatenate_sources => true
								}
								
								
	}
	if [type] == "ddoa_res" {
		xml{
			source => "message"
			store_xml => false
			remove_namespaces => true
			xpath => [				
				"//context/correlation/fileName/text()","response_fileName",					
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/@releaseID","app_res_releaseID",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/Sender/CreatorNameCode/text()","app_res_creatorNameCode",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/Sender/SenderNameCode/text()","app_res_senderNameCode",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/Sender/SystemVersion/text()","app_res_sysVersion",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/CreationDateTime/text()","app_res_creationDateTime",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/BODID/text()","app_res_bodID",						
				"//body/ProcessMessageResponse/payload/content/ConfirmBOD/ApplicationArea/Destination/DestinationNameCode/text()","app_res_destinationName",
				"//*[local-name()='Description']/text()","response_desc",
				"//*[local-name()='ReasonCode']/text()","response_reason"
								
				
				]
		
		}
		
		
		grok{
		match =>{"message" => ".?(?<response_status>(BODSuccessMessage))"}
		}
		grok{
		match =>{"message" => ".?(?<response_status>(BODFailureMessage))"}
		}
		
		
		translate {
      field => "response_status"
      destination => "response_status"
      override => true	 
      dictionary => ["BODSuccessMessage","Success",
					 "BODFailureMessage","Failure"]
    }
				mutate {
				gsub => [					
					"response_fileName","-",""			
						]									
				}
				
			mutate{
			remove_field => ["message","type","path"]
		}
		
		aggregate {
				task_id => "%{response_fileName}"
				code => "map['response_fileName'] = event.get('response_fileName')
						map['app_res_releaseID']  = event.get('app_res_releaseID') 
						map['app_res_creatorNameCode'] = event.get('app_res_creatorNameCode')
						map['app_res_senderNameCode'] = event.get('app_res_senderNameCode')
						map['app_res_sysVersion'] = event.get('app_res_sysVersion')
						map['app_res_creationDateTime'] = event.get('app_res_creationDateTime')
						map['app_res_bodID'] = event.get('app_res_bodID');
						map['app_res_destinationName'] = event.get('app_res_destinationName')
						map['response_desc'] = event.get('response_desc')
						map['response_status'] = event.get('response_status')
						map['response_reason'] = event.get('response_reason')"
						map_action => "create"
					} 
					fingerprint {
                                source => ["response_fileName"]
                                target => "fingerprint"
                                key => "78787878"
                                method => "SHA1"
                                concatenate_sources => true
								}
										
	}
	
  }
output{

	elasticsearch {		  
		  index => "logstash-dd.ddoa_req_log_v1"		  
		  hosts => ["localhost:9200"]				
				document_id => "%{fingerprint}" # !!! prevent duplication		  
		}
			
		
	stdout {
				codec => rubydebug
			}
   
}

help with the codec option aggregate plugin

Hello. I understand that my post is not issue in the conventioanl sense. But I need help and I don't know where can I go. I need use aggregate plugin, but I don't know ruby. So I can not properly describe the codec option. I would be very grateful if anyone can help me do it.
I have the next case:
Hi All,

I am having hard times trying to merge multi-line Postgresql logs with GROK into one Logstash event.

My log file look like this:

Jul 22 17:03:27 my.host example.com[24977]: [137-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc1
Jul 22 17:03:27 my.host example.com[24977]: [137-2] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc2
Jul 22 17:03:27 my.host example.com[24597]: [2953-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38053) Proc ID: 24597 etc
Jul 22 17:03:27 my.host example.com[3637]: [3779-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(17809) Proc ID: 3637 etc
Jul 22 17:03:27 my.host example.com[24977]: [138-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc1
Jul 22 17:03:27 my.host example.com[3637]: [3780-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(17809) Proc ID: 3637 etc
Jul 22 17:03:27 my.host example.com[24977]: [138-2] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc2
Jul 22 17:03:27 my.host example.com[24977]: [139-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc
Jul 22 17:03:27 my.host example.com[24597]: [2954-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(38053) Proc ID: 24597 etc
Jul 22 17:03:27 my.host example.com[24597]: [2954-2] #11 SELECT count(*) FROM table#015

I try to use:

multiline {
pattern => "... [\d+-1]"
negate => true
what => "previous"
}

but I have:

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1]...
line 5: ...[3780-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

That's not what I need. I want:

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[138-1] and [138-2]...
line 5: ...[3780-1]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4: ...[3780-1]...
line 5: ...[138-1] and [138-2]...
line 6: ...[139-1]...
line 7: ...[2954-1]...

I understand that aggregate plugin help me. But I don't understand how I can use it.
Thanks

Want to get startTime and endTime of an event.

I wanted to aggregate logs which have no start and end event. But I want to know when was the first time the log of that type came in and the last time it came in a timeout duration.

For example -

Logs

13-Mar-2018 22:54:22.281 google.com
13-Mar-2018 22:54:22.282 google.com
13-Mar-2018 22:54:22.283 google.com
13-Mar-2018 22:54:22.284 google.com
13-Mar-2018 22:54:22.285 google.com
13-Mar-2018 22:54:22.286 twitter.com
13-Mar-2018 22:54:22.287 twitter.com
13-Mar-2018 22:54:22.288 twitter.com

Current Filter -

aggregate {
		task_id => "%{query}"
	       code => "map['occurrences'] ||= 0; map['occurrences'] += 1; map['query'] = event.get('query');map['end_time'] = event.get('@timestamp');
	       push_map_as_event_on_timeout => true
	       timeout_task_id_field => "query_id"
	       timeout => 30 # 30 sec timeout
	       timeout_tags => ['_summary']
	       timeout_code => "event.set('duration', '30')"
}

Output -

{
  "_index": "domxx",
  "_id": "",
  "_version": 1,
  "_source": {
    "occurrences": 3320,
    "duration": "30",
    "query_id": "google.com",
    "end_time": "2018-03-15T21:20:40.130Z",
    "tags": [
      "_summary"
    ]
  }
}

In this scenario the end_time works fine since the field gets overwritten but is there any way to find start_time ?

Thanks

error aggregate in logstash 5.0

Hello
When I use logstash 2.4 I didn't have problem. But when I use logstash 5.0 my config is not work.
I have the next error
[ERROR][logstash.filters.aggregate] Aggregate exception occurred {:error=>#<NoMethodError: Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')). Please consult the Logstash 5.0 breaking changes documentation for more details.>
What is it?

Read code from a ruby file.

I'm confused about the feature request, so I'm creating an issue. Feel free to close it if this doesn't belong here.

For the sake of testing purposes, I want to be able to write something like this:

filter {
  aggregate {
    task_id => "%{my_beautiful_group_field}"
    code => "/my/absolute/path/to/a/tested/file.rb"
  }
}

I can fork the code and submit a PR, but I don't know if that's the common process to do it.

Let me know what you think.

Cheers.

Track timeout time based on a defined timstamp field rather than the time of the platform

In this issue: #33, an idea was mentioned:

Track timeout time based on a defined timstamp field rather than the time of the platform (this is important when re-parsing old data since otherwise all data will be read within timeout and aggregated wrongly)

I think this feature would be extremely useful for processing events in old log files. I have log files for some systems going back several years, and I think having this ability would make it possible to do advanced log analysis on application usage by distinct users over time.

As far as I can tell, aggregation is based on the time that the events are received, so there is no way to do aggregations on old data - it can only be done as it happens in real time.

adding shared field to link related logs

I have a query on the logstash discussion board which is related to the aggregate filter:
https://discuss.elastic.co/t/logstash-adding-shared-field-to-link-related-logs/68933/1

As I noted in my post I haven't found a way to use the aggregate filter to create a shared event field which links related logs due to the master log arriving after the other logs.

The config I used is like the following repeated for call_ids[0] -> call_ids [4]:

if ([call_ids][0] != "") {
                aggregate {
                        task_id         => "%{call_ids[0]}"
                        code            => "
                                        if !map.key?(:call_id_link)
                                                if event.get('call_id_link')
                                                        map['call_id_link'] = event.get('call_id_link')
                                                else
                                                        map['call_id_link'] = event.get('[@metadata][fingerprint]')
                                                        event.set('call_id_link', map['call_id_link'])
                                                end
                                        end
                                        "
                        timeout         => 60
                }
        }

do you have any advise on how to proceed?

thanks for your help,

colm

Aggregating maps for multiple tasks

Hey guys,

So, following is part of my logstash conf file:

    filter {
        if ( [resource_id] and [resource_type] ) {
            aggregate {
                task_id => "%{resource_type}_%{resource_id}"
                code => "
                    map['tags'] ||= ['aggregated']
                    map['event_count'] ||= 0 ; map['event_count'] += 1 ;
                    event_hash = event.to_hash
                    event.to_hash.each do |key,value|
                        map[key] = value unless map.has_key?(key)
                    end
                "
                push_previous_map_as_event => true
                timeout => 10
            }

            if "aggregated" not in [tags] {
                drop {}
            }
        }
    }

When I pass the following 4 values consecutively:

    {"resource_id": 1, "resource_type": "flight"}
    {"resource_id": 2, "resource_type": "flight"}
    {"resource_id": 1, "resource_type": "flight"}
    {"resource_id": 2, "resource_type": "flight"}

Issue: The value for resource_id 1 gets pushed as event instantly and only the last task with resource_id 2 halts for 10 seconds(timeout) and then gets pushed out, moreover the counter value for each task is not correct both tasks should have counter 2 but instead it's 1, it's like aggregation is not happening.

Expectation: Each task gets it's own hash in code and gets pushed after 10 seconds subsequently.

Is this possible in anyway, instead of changing the code part of conf and using unique keys for map, I kinda don't want a separate key in logstash for each value as there can be many resources?

Logstash-filter-aggregate (2.5.0) fails with latest logstash (>=5.0.2)

Hi,

after update logstash from 5.0.0 to 5.0.2 my pipeline fails when aggregating events with logstash-filter-aggregate.

Is something changed with aggragate syntax? It's a compatibility issue?

    Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
    SystemStackError: stack level too deep
        synchronize at org/jruby/ext/thread/Mutex.java:149
             filter at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-aggregate-2.5.0/lib/logstash/filters/aggregate.rb:427
       multi_filter at /usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:156
               each at org/jruby/RubyArray.java:1613
       multi_filter at /usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:153
       multi_filter at /usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:41
         initialize at (eval):441
               each at org/jruby/RubyArray.java:1613
         initialize at (eval):432
               call at org/jruby/RubyProc.java:281
        filter_func at (eval):302
       filter_batch at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:260
               call at org/jruby/RubyProc.java:281
               each at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:186
               each at org/jruby/RubyHash.java:1342
               each at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:185
       filter_batch at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:258
        worker_loop at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246
      start_workers at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:225

Thank you,

Jesús López

Option to store aggregate map to file that can be loaded upon logstash restart

I am currently using the aggregate plugin to read headers which can change dynamically in a file and correlate them to the data that follows.

An issue I have is if logstash is stopped and restarted the existing map data is lost from memory and therefore I cannot correlated the processed events to their appropriate header.

Would it be possible to create an enhancement which stores the map data to a file that can be reloaded upon a logstash restart? Are there any other suggestions?

Logstash Filter Aggregate (2.6.3) - timeout push does't work

I'm running Logstash version 5.6.3-1 with the aggregate plugin version 2.6.3 .
This is a snip from the relevant part from my Logstash config:

if [program] == "sophos-message" {
    aggregate {
      task_id => "%{logsource}%{sophos-message_queueid}"
      code => "
      map ||= {}
      event.to_hash.each { |k,v|
        if k.to_s !~ /\@.+/
          map[k] ||= []
          if !map[k].is_a?(Array) then map[k] = [map[k]] end
          if v.is_a?(Array) then map[k] |= v else map[k] |= [v] end
          if map[k].count == 1 then map[k] = map[k].first end
        end
      }
      map.each { |k,v|
        event.set(k, v)
      }
      "
      map_action => "create_or_update"
      inactivity_timeout => 10
      timeout => 1800
      push_map_as_event_on_timeout => true
      end_of_task => true
      timeout_task_id_field => 'aggr_taskid'
      timeout_tags => ['_aggr_sophos_timeout']
      add_tag => ['_aggr_sophos_end', '_aggr_sophos_end_1' ]
    }
    if "_aggr_sophos_end_2" not in [tags] {
      aggregate {
        task_id => "%{logsource}%{sophos-message_queueid}"
        code => "
        map ||= {}
        event.to_hash.each { |k,v|
          if k.to_s !~ /\@.+/
            map[k] ||= []
            if !map[k].is_a?(Array) then map[k] = [map[k]] end
            if v.is_a?(Array) then map[k] |= v else map[k] |= [v] end
            if map[k].count == 1 then map[k] = map[k].first end
          end
        }
        event.cancel()
        "
        map_action => "create"
      }
    }
  }

With the first aggregation, I want to end the aggregation with "end_of_task". After that I can check if a tag is present in my event.
If the tag is not present, the event should be aggregate again with the same task id.
Now my problem. If there comes after the second aggregation no "end_of_task" part, I want that the "inactivity_timeout" push this event after 10 seconds as a new event (push_map_as_event_on_timeout). But this doesn't happens. The event is never send to elasticsearch. I can't declare the timeout options at the second aggregation again, because it has the same task id as the first. Is this a bug? Where is the problem?

numeric task_id is not accepted and exception is hard to understand

with this file in /tmp/aggregate.data:

{ "pid": 1, "action" : "connect",  "user" : "a" }
{ "pid": 1, "action" : "transfer" }
{ "pid": 2, "action" : "connect",  "user" : "b" }
{ "pid": 2, "action" : "transfer" }
{ "pid": 1, "action" : "disconnect" }
{ "pid": 2, "action" : "disconnect" }

and this configuration:

input {
        file {
                path => "/tmp/aggregate.data"
                type => "test"
                start_position => beginning
                sincedb_path => "/dev/null"
                codec => "json"
        }
}

filter {
        if [action] == "connect" {
                aggregate {
                        task_id => "%{pid}"
                        code => "map['user'] = event['user']"
                        map_action => "create"
                }
        }

        if [action] == "transfer" {
                aggregate {
                        task_id => "%{pid}"
                        code => "event['user'] = map['user']"
                }
        }

        if [action] == "disconnect" {
                aggregate {
                        task_id => "%{pid}"
                        code => "event['user'] = map['user']"
                        end_of_task => true
                }
        }
}

output {
        stdout {
                codec => rubydebug
        }
}

the filterworker throws the exception

Exception in filterworker {"exception"=>#<NoMethodError: undefined method `empty?' for 1:Fixnum>, "backtrace"=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-aggregate-0.1.4/lib/logstash/filters/aggregate.rb:200:in `filter'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/filters/base.rb:163:in `multi_filter'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/filters/base.rb:160:in `multi_filter'", "(eval):151:in `cond_func_4'", "org/jruby/RubyArray.java:1613:in `each'", "(eval):148:in `cond_func_4'", "(eval):93:in `filter_func'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:219:in `filterworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:157:in `start_filters'"], :level=>:error}

it would be nice if a number would be accepted as the task_id

workaround is possible by either adding:

       mutate {
           convert => [ "pid", "string" ]
       }

or using something like

   task_id => "_%{pid}"

Events are not aggregated properly [logstash-filter-aggregate (2.7.2)]

Hi, I am not sure if it is a bug or a wrong configuration. I want to aggregate the contents of the"from" and "to" events from a sendmail log into one event.

About 3% of the existing pairs are not aggregated in the output.

Here is my config:

filter {
	if ([message] =~ /from=/) {
		grok {
			match => {"message" => "%{SYSLOGTIMESTAMP:[from][date]} mail sm-mta\[%{BASE10NUM:[from][smmta]}\]: %{DATA:hash}: from=<%{DATA:[from][mail]}> ..."}
		}
		aggregate {
			task_id => "%{hash}"
			code => "map['[from]'] = event.get('[from]')"
			map_action => "create"
			push_map_as_event_on_timeout => true
			timeout_task_id_field => "hash"
			inactivity_timeout => 240
			timeout => 61516800
			timeout_tags => ["aggregate_timeout"]
		}
		if "aggregate_timeout" not in [tags] {
			drop{}
		}
	}
	else {
		grok {
			match => {"message" => "%{SYSLOGTIMESTAMP:[to][date]} mail sm-mta\[%{BASE10NUM:[to][smmta]}\]: %{DATA:hash}: to=%{DATA:[to][mail]}, ..."}
		}
		aggregate {
			task_id => "%{hash}"
			code => "event.set('[from]', map['[from]'])"
			map_action => "update"
			end_of_task => true
		}
	}
}

Is there a maximum of open maps that the aggregate filter can hold?

logstash 6.1.1
logstash-filter-aggregate 2.7.2

NoMethodError with Logstash 5.6.4

I upgraded to LS 5.6.4 and with no configuration change on the aggregate filter, I see this error:

[2017-11-15T18:53:48,011][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `[]' for nil:NilClass>, :backtrace=>["/usr/local/Cellar/logstash/5.6.4/libexec/vendor/bundle/jruby/1.9/gems/logstash-filter-aggregate-2.7.1/lib/logstash/filters/aggregate.rb:267:in `extract_previous_map_as_event'", "/usr/local/Cellar/logstash/5.6.4/libexec/vendor/bundle/jruby/1.9/gems/logstash-filter-aggregate-2.7.1/lib/logstash/filters/aggregate.rb:303:in `flush'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/filter_delegator.rb:63:in `flush'", "(eval):19:in `initialize'", "org/jruby/RubyProc.java:281:in `call'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/pipeline.rb:551:in `flush_filters'", "org/jruby/RubyArray.java:1613:in `each'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/pipeline.rb:550:in `flush_filters'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/pipeline.rb:592:in `flush_filters_to_batch'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/pipeline.rb:392:in `worker_loop'", "/usr/local/Cellar/logstash/5.6.4/libexec/logstash-core/lib/logstash/pipeline.rb:342:in `start_workers'"]}

Event order is not right when Aggregate Multiple line

Version: 6.1.2
Operating System: centos6

hi! I want to make a json data from application's log, so I try to use an elk to my java application.
but I have a problem while aggregate multiple lines which has same task id into one json data.
Explain to an example.

My application's log print a information on different lines over different lines And
Multiple tasks are working at the same time, so each line of application's log are mixed. for example-

12:30:03.02 task1 endpoint : /board/write
12:30:01.02 task1 client_ip : 1.1.1.1
12:30:03.02 task2 endpoint : /board/read
12:30:04.02 task1 data : {title:"hello world", content:"hellow elk"}
12:30:05.02 task3 result : success
12:30:06.02 task2 data : {uid:"1", token:"asdf"}
12:30:07.02 task2 result : success
12:30:08.02 task1 result : fail
............
..skip...
............
12:52:00.01 task1 endpoint : /board/write
12:52:00.18 task1 client_ip : 2.2.2.2
12:52:00.22 task1 data : {title:"hello world2", content:"hellow elk2"}
12:52:01.02 task2 client_ip : 3.3.3.3
12:52:01.86 task1 result : success

I have a such like log. and I expect to make a json data by using logstash.
{
task_id:"1"
information : [
endpoint:"/board/write",
client_ip:"1.1.1.1",
data:"{title:"hello world", content:"hellow elk"}",
result:"fail"
]
start_time:"12:30:03.02"
end_time:"12:30:08.02"
}
...........
..skip..
...........
{
task_id:"1"
information : [
endpoint:"/board/write"
client_ip:"2.2.2.2",
data:"{title:"hello world2", content:"hellow elk2"}",
result:"success"
]
start_time:"12:52:00.01"
end_time:"12:52:01.86"
}

so i use aggregate filter. if each line has a 'result', 'end_of_task' is true. other case, set the key and value(client_ip and 2.2.2.2) to map. and it works.

but sometimes it makes a problem like that.

{
task_id:"1"
information : [null]
start_time:null
end_time:"12:30:08.02"
}
...........
..skip..
...........
{
task_id:"1"
information : [
endpoint:"/board/write",
client_ip:"1.1.1.1",
data:"{title:"hello world", content:"hellow elk"}",
result:"fail"
endpoint:"/board/write"
client_ip:"2.2.2.2",
data:"{title:"hello world2", content:"hellow elk2"}",
result:"success"
]
start_time:"12:30:03.02"
end_time:"12:52:01.86"
}

and I check logstash debug log, It has been confirmed that the order of the lines is shuffled from 'pipeline receive'.
give me a hint or some opinion..!

How to add nested fields in the aggregate?

I have the following data:
{ "key":1, "field1":.., .."fieldn":.., "address":"1 street"}
{ "key":1, "field1":.., .."fieldn":.., "address":"2 street"}
...
{ "key":1, "field1":.., .."fieldn":.., "address":"n street"}
My intended out is:
{ "key":1, "field1":.., .."fieldn":.., "address":{"address":"1 street",...,"address":"n street"}}

I know I can use the following to get the intended output:
aggregate{
task_id => "%{key}"
code => "
...
map['address'] ||= []
map['address'] << {'address' => event.get('address')}
...
}
But, the resulted document does not have a nested address but a property of address.
How do I make it a nested document instead of a property?

Thanks

Aggregate filter has problem with JDBC input plugin with paging enabled

Looks like the filter has some problem with JDBC input plugin when jdbc_paging_enabled=> true. Since it flush at the paging size. Paging enabled basically break the SQL into multiple SQLs with paging size(using rownum). Aggregation still works. But now you can not determine the last record.(you have multiple of them)
Any way out of this? Of course you can disable the paging.

Aggregate filter not run at all

I'm trying to carry forward a year that is specified infrequently in my logs, and it seemed like the aggregate filter would work beautifully for this, however the filter doesn't seem to actually do anything (I've added the add_tag setting to see if the code was the only problem and the tag wasn't added either) other blocks in the same part of the conditional run fine (I get the one and two lines printed to stdout given the correct input).

if ([timestamp] !~ /.+/) {
   drop {}
} else if ([message] =~ /^started/) {
   ruby {code=>"puts 'one'"}
   aggregate {
      task_id => "%{path}"
      code => "map['year'] = Time.at(event['timestamp'].to_f).year"
      add_tag => ['hi']
   }
} else {
   ruby {code=>"puts 'two'"}
   aggregate {
      task_id => "%{path}"
       code => "event['timestamp'].gsub!(/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/,map['year'].to_s+' \1')"
       map_action => "update"
       add_tag => ['there']
   }
}

Logstash 5.6.1 incremental refresh is failing, but clean refresh works fine!

Hi all;

I am facing this logstash5.6.1 issue in Prod environment, but it works fine in QA.

1- Incremetal refresh (oracle drivers) using parameter " :sql_last_value" is failing. It loads only one record as per debug log.

[2017-09-25T10:01:15,081][INFO ][logstash.inputs.jdbc ] (5.768000s) SELECT count(*) "COUNT" FROM (SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > 1506309300) "T1" FETCH NEXT 1 ROWS ONLY

2-  Iremental refresh using same configuration file works fine in QA. When I have started testing this in Prod, its  failing. Setup is same and Java version is also same.

3- Complete refresh using logstash configuration parameter "clean_run => true " works fine in production and QA. 

4- One more info, actually I have manually edited "logstash-ESWATER.lastrun"file and put the value for incremetal refresh. It works fine in QA but its failing in PROD. 
5- configuration , bebug log and command mentioned below.. Please help 

Logstash configuration file
===========================
input {
    jdbc {
        jdbc_validate_connection => true
        jdbc_connection_string => "jdbc:oracle:thin:@server40:1521/RptPWAMI"
        jdbc_user => "reporting"
        jdbc_password => "trapRABaWa756uvE"
        jdbc_driver_library => "/home/oracle/logstash/logstash-5.4.0/ojdbc7.jar"
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        statement => "SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > :sql_last_value"
        use_column_value => true
        tracking_column => "device_consumption_epoc"
        tracking_column_type => "numeric"
        record_last_run => true
        last_run_metadata_path => "/home/oracle/logstash/logstash-5.4.0/config/water/logstash-ESWATER.lastrun"
#        schedule => "0 6 * * *"
#         clean_run => true
 }
}
filter {
if [mlatitude] and [mlongitude] {
mutate {
add_field => ["[geoip][meterlocation]","%{mlongitude}"]
add_field => ["[geoip][meterlocation]","%{mlatitude}"]
}
}
mutate {
 convert => [ "[geoip][meterlocation]", "float" ]
}
if [glatitude] and [glongitude] {
mutate {
add_field => ["[geoip][gatewaylocation]","%{glongitude}"]
add_field => ["[geoip][gatewaylocation]","%{glatitude}"]
}
}
mutate {
 convert => [ "[geoip][gatewaylocation]", "float" ]
}
}
output {
    #stdout { codec => rubydebug }
     elasticsearch {
      index => "water_consumption"
      document_type => "meter_reads"
       }
}

Logstash debug Command
=======================
./logstash -f /home/oracle/logstash/logstash-5.4.0/config/water/logstashwater.conf --debug

Logstash debug log
=======================
..
[2017-09-25T10:01:08,586][DEBUG][logstash.outputs.elasticsearch] Found existing Elasticsearch template. Skipping template management {:name=>"logstash"}
[2017-09-25T10:01:08,587][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1"]}
[2017-09-25T10:01:08,589][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>10, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1250}
[2017-09-25T10:01:08,668][INFO ][logstash.pipeline        ] Pipeline main started
[2017-09-25T10:01:08,675][DEBUG][logstash.agent           ] Starting puma
[2017-09-25T10:01:08,676][DEBUG][logstash.agent           ] Trying to start WebServer {:port=>9600}
[2017-09-25T10:01:08,678][DEBUG][logstash.api.service     ] [api-service] start
[2017-09-25T10:01:08,696][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2017-09-25T10:01:13,670][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline
[2017-09-25T10:01:15,081][INFO ][logstash.inputs.jdbc     ] (5.768000s) SELECT count(*) "COUNT" FROM (SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > 1506309300) "T1" FETCH NEXT 1 ROWS ONLY
[2017-09-25T10:01:15,083][DEBUG][logstash.inputs.jdbc     ] Executing JDBC query {:statement=>"SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > :sql_last_value", :parameters=>{:sql_last_value=>1506309300}, :count=>30}
[2017-09-25T10:01:18,670][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline
[2017-09-25T10:01:21,191][INFO ][logstash.inputs.jdbc     ] (6.107000s) SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > 1506309300
[2017-09-25T10:01:21,281][DEBUG][logstash.inputs.jdbc     ] closing {:plugin=>"LogStash::Inputs::Jdbc"}
[2017-09-25T10:01:21,281][DEBUG][logstash.pipeline        ] Input plugins stopped! Will shutdown filter/output workers.
[2017-09-25T10:01:21,282][DEBUG][logstash.pipeline        ] filter received {"event"=>{"devicefirmware"=>nil, "buildingname"=>"Mazaya-4", "devicedailyconsumptionobiscode"=>"8-0:2.8.0*255", "devicescount"=>#..
..
..
"gatewaytimestamplastonline"=>2017-09-24T23:43:28.000Z, "devicetypename"=>"Diehl/Hydrometer Hydrus DN 15-20", "devicegroupname"=>"643-WADI AL SAFA 2", "billingcycleid"=>"D31"}}
[2017-09-25T10:01:21,447][DEBUG][logstash.pipeline        ] Shutdown waiting for worker thread #<Thread:0x49fcc629>
..
..
<Thread:0x5533263d>
[2017-09-25T10:01:21,487][DEBUG][logstash.pipeline        ] Shutdown waiting for worker thread #logstash <Thread:0x53715b8e>
[2017-09-25T10:01:21,487][DEBUG][logstash.pipeline        ] Shutdown waiting for worker thread #<Thread:0x46bb120e>
[2017-09-25T10:01:21,488][DEBUG][logstash.filters.mutate  ] closing {:plugin=>"LogStash::Filters::Mutate"}
[2017-09-25T10:01:21,489][DEBUG][logstash.filters.mutate  ] closing {:plugin=>"LogStash::Filters::Mutate"}
[2017-09-25T10:01:21,489][DEBUG][logstash.outputs.elasticsearch] closing {:plugin=>"LogStash::Outputs::ElasticSearch"}
[2017-09-25T10:01:21,489][DEBUG][logstash.outputs.elasticsearch] Stopping sniffer
[2017-09-25T10:01:21,489][DEBUG][logstash.outputs.elasticsearch] Stopping resurrectionist
[2017-09-25T10:01:21,552][DEBUG][logstash.outputs.elasticsearch] Waiting for in use manticore connections
[2017-09-25T10:01:21,552][DEBUG][logstash.outputs.elasticsearch] Closing adapter #<LogStash::Outputs::ElasticSearch::HttpClient::ManticoreAdapter:0x4fd5d428>
[2017-09-25T10:01:21,553][DEBUG][logstash.pipeline        ] Pipeline main has been shutdown
[2017-09-25T10:01:21,682][DEBUG][logstash.instrument.periodicpoller.os] PeriodicPoller: Stopping
[2017-09-25T10:01:21,683][DEBUG][logstash.instrument.periodicpoller.jvm] PeriodicPoller: Stopping
[2017-09-25T10:01:21,684][DEBUG][logstash.instrument.periodicpoller.persistentqueue] PeriodicPoller: Stopping
[2017-09-25T10:01:21,684][DEBUG][logstash.instrument.periodicpoller.deadletterqueue] PeriodicPoller: Stopping
[2017-09-25T10:01:21,693][WARN ][logstash.agent           ] stopping pipeline {:id=>"main"}
[2017-09-25T10:01:21,694][DEBUG][logstash.pipeline        ] Closing inputs
[2017-09-25T10:01:21,695][DEBUG][logstash.inputs.jdbc     ] stopping {:plugin=>"LogStash::Inputs::Jdbc"}
[2017-09-25T10:01:21,700][DEBUG][logstash.pipeline        ] Closed inputs


Without debug 
==================
oracle@bin]$ ./logstash -f /home/oracle/logstash/logstash-5.4.0/config/water/logstashwater.conf
..
[2017-09-25T10:26:46,034][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1"]}
[2017-09-25T10:26:46,036][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>10, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1250}
[2017-09-25T10:26:46,113][INFO ][logstash.pipeline ] Pipeline main started
[2017-09-25T10:26:46,141][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2017-09-25T10:26:52,951][INFO ][logstash.inputs.jdbc ] (6.124000s) SELECT * FROM ESWATERV5 WHERE device_consumption_epoc > 1506310200
[2017-09-25T10:26:53,131][WARN ][logstash.agent ] stopping pipeline {:id=>"main"}

Add data to aggregate from a log line without specific task_id

Hello everyone,

Given the following events:

event1 = {
  "queueid" => "ABC1234",
  "process" => "qmgr",
  "from" => "[email protected]"
}

event2 = {
  "queue_id" => "ABC1234",
  "process" => "smtp",
  "to" => "[email protected]",
  "messageid" => "1517305644.29809.1@Organization"
}

event3 = {
  "process" => "spamd",
  "messageid" => "1517305644.29809.1@Organization",
  "spam_result" => "Yes"
}

Would it be possible to aggregate them into something like this?:

finalEvent = {
  "queueid" => "ABC1234",
  "from" => "[email protected]",
  "to" => "[email protected]",
  "messageid" => "1517305644.29809.1@Organization",
  "spam_result" => "Yes"
}

I'm being able to aggregate the first two events by using 'queueid' as task_id, but I'm not sure if it would be possible to add info from event3 to this aggregate based on it's 'messageid' field.

Thanks in advance!

Enable threadsafe = false flag like in multiline filter?

I was looking into new features of ELK and I noticed that they've added the option for plugin to specify that it won't work with multiple workers. From https://github.com/logstash-plugins/logstash-filter-multiline/blob/master/lib/logstash/filters/multiline.rb

    # this filter cannot be parallelized because message order
    # cannot be garanteed across threads, line #2 could be processed
    # before line #1
    @threadsafe = false

Is it good to add such feature to aggregate ?

How can I fix the problem

A_start
A_1
A_2
B_start
B_1
A_end
B_2
B_end

SSLSocket#session= is not supported

When installing, this warning is printed

root@rcl-nas:logstash# ./bin/plugin install logstash-filter-aggregate
Validating logstash-filter-aggregate
Installing logstash-filter-aggregate
WARNING: SSLSocket#session= is not supported
Installation successful

Performance issue with aggregate Plugin of Logstash

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

Version: 5.3.0
Operating System: RHEL
Config File (if you have sensitive info, please remove it):
Sample Data:
Steps to Reproduce:

Hi,

I am facing performance issue when using aggregate plugin in logstash.

Scenario :
When we are trying to use aggregate plugin with multiple workers for logstash we are getting some fields as null values as messages are getting distributed to different workers and aggregate plugin is unable to get the unique id. Here data is getting processed in less time but we are facing the null issue as mentioned above. So we drop down to one worker when using aggregate plugin.

In case of one worker it is taking almost twice the time for processing and application performance is drastically getting reduced.

Could any one please specify an alternative for improving the performance of processing at logstash end.

Thanks in Advance.

gem

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

Version:
Operating System:
Config File (if you have sensitive info, please remove it):
Sample Data:
Steps to Reproduce:

Re-start of logstash dies when no data were provided for some aggregate task_id patterns

Hi again,

Sorry for many issues. I am testing aggregate 2.5.1 with Logstash 2.4.1.

This is a scenario I experience that the restart of logstash dies immediately after reading the .aggregate map file.

start logstash A.conf including task_id patterns T1 and T2, but with providing data only for T1
stop logstash A.conf, which generates .aggregate map file
re-start logstash A.conf, which dies after reading the .aggregate file

So, I suspect that the .aggregate file does not contain the map values for T2 and it makes a certain conflict when the logstash re-start reads the file. I bet you will know exactly what the issue is.

Below is the logstash error log for your information:

log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
NoMethodError: undefined method `length' for nil:NilClass
     remove_expired_maps at /opt/logstash/vendor/local_gems/4b05611b/logstash-filter-aggregate-2.5.1/lib/logstash/filters/aggregate.rb:592
             synchronize at org/jruby/ext/thread/Mutex.java:149
     remove_expired_maps at /opt/logstash/vendor/local_gems/4b05611b/logstash-filter-aggregate-2.5.1/lib/logstash/filters/aggregate.rb:590
                   flush at /opt/logstash/vendor/local_gems/4b05611b/logstash-filter-aggregate-2.5.1/lib/logstash/filters/aggregate.rb:560
              initialize at (eval):248
                    call at org/jruby/RubyProc.java:281
           flush_filters at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.1-java/lib/logstash/pipeline.rb:436
                    each at org/jruby/RubyArray.java:1613
           flush_filters at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.1-java/lib/logstash/pipeline.rb:435
  flush_filters_to_batch at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.1-java/lib/logstash/pipeline.rb:467
             worker_loop at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.1-java/lib/logstash/pipeline.rb:227
           start_workers at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.4.1-java/lib/logstash/pipeline.rb:201

Thanks.

Logstash-aggregate 2.4.0 does not generate aggregate_maps_path file when it stops

Hi,

I'm a big fan of logstash-aggregate filter and we're using it in the production system. As for this topic, I can see the aggregate version 2.5.1 seems having fixed this bug only with logstash 2.4 or later. But we want to keep the existing logstash 2.3.2 for the time being and hope to have this fix, as a sort of aggregate 2.4.1. Wondering this is is a quick fix and doable sooner or later.

Many thanks!

Feature request: different timeout values by task_id

Currently the timeout parameter is global across multiple aggregrate tasks with different task_ids. I would like to request that this be changed such that different task_ids can have different timeout values.

Currently I have low volume tasks where I would like to set the timeout to a large number and conversely there are high volume tasks where I would like to set the timeout smaller to alleviate on memory usage.

Support of Logstash 5

When I try to install the plugin with Logstash 5 (rc1 or beta1) I get the following error:

https://gist.github.com/mrauter/c52b2a77c873f46e94827c8d11cfd219

Help me please with ruby code for aggregate complex multiline postgresql log

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
I'm sorry. But the https://discuss.elastic.co/c/logstash has limit of symbols and I can't create my question.
Dear Fabien Baligand helped me once in this topic https://discuss.elastic.co/t/specific-grok-filter-for-multi-line-postgresql-log/56286/12 , for which he thanks a lot.

But nfortunately, after a time, I realized that I did not get the right result. My logs proved to be more complicated than I had imagined them. And I describe my problem more fully.
My log look like this:

Aug 11 11:34:53 server5 marker2[21881]: [35-1] 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: operator: SQL_QUERY1 pdo_stmt_00000004
Aug 11 11:34:53 server5 marker2[21881]: [36-1] 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.032 ms
Aug 11 11:34:53 server8 marker3[19238]: [29-1] 2016-08-11 11:34:53.020 MSK User: user2 Database: my_db2 Host: 192.168.50.11(17050) Proc ID: 19238 app_name2 LOG: operator: SQL_QUERY1 pdo_stmt_00000002
Aug 11 11:34:53 server5 marker2[21878]: [85-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.118 ms, разбор pdo_stmt_0000000d: SQL_QUERY2
Aug 11 11:34:53 server8 marker3[19238]: [30-1] 2016-08-11 11:34:53.020 MSK User: user2 Database: my_db2 Host: 192.168.50.11(17050) Proc ID: 19238 app_name2 LOG: duration: 0.036 ms
Aug 11 11:34:53 server5 marker2[21877]: [125-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: operator: SQL_QUERY1 pdo_stmt_00000013
Aug 11 11:34:53 server5 marker2[21877]: [126-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.039 ms
Aug 11 11:34:53 server5 marker2[21872]: [181-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.123 ms, разбор pdo_stmt_0000001d: SQL_QUERY3_part1
Aug 11 11:34:53 server5 marker2[21872]: [181-2] SQL_QUERY3_part2$a
Aug 11 11:34:53 server1 marker10[29370]: [18189-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(26091) Proc ID: 29370 [n/d] LOG: operator: SQL_QUERY4
Aug 11 11:34:53 server5 marker2[21881]: [37-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.057 ms, разбор pdo_stmt_00000005: SQL_QUERY4$a$b$c
Aug 11 11:34:53 server5 marker2[21878]: [86-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.335 ms, LOG Bind pdo_stmt_0000000d: SQL_QUERY2
Aug 11 11:34:53 server5 marker2[21878]: [87-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: do pdo_stmt_0000000d: SQL_QUERY2
Aug 11 11:34:53 server1 marker10[29370]: [18190-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(26091) Proc ID: 29370 [n/d] LOG: duration: 0.247 ms
Aug 11 11:34:53 server5 marker2[21878]: [88-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.067 ms
Aug 11 11:34:53 server5 marker2[21877]: [127-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.056 ms, разбор pdo_stmt_00000014:
Aug 11 11:34:53 server5 marker2[21877]: [127-2] SQL_QUERY5_part1
Aug 11 11:34:53 server5 marker2[21877]: [127-3] SQL_QUERY5_part2
Aug 11 11:34:53 server12 marker10[16792]: [15470-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61603) Proc ID: 16792 [n/d] LOG: operator: SQL_QUERY8_part1
Aug 11 11:34:53 server12 marker10[16792]: [15470-2] SQL_QUERY8_part2
Aug 11 11:34:53 server12 marker10[16792]: [15470-3] SQL_QUERY8_part3
Aug 11 11:34:53 server12 marker10[16792]: [15470-4] SQL_QUERY8_part4
Aug 11 11:34:53 server5 marker2[21877]: [127-4] SQL_QUERY5_part3
Aug 11 11:34:53 server5 marker2[21877]: [127-5] SQL_QUERY5_part4
Aug 11 11:34:53 server5 marker2[21877]: [127-6] SQL_QUERY5_part5
Aug 11 11:34:53 server5 marker2[21877]: [127-7] SQL_QUERY5_part6
Aug 11 11:34:53 server5 marker2[21872]: [182-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.262 ms, LOG Bind pdo_stmt_0000001d: SQL_QUERY3_part1
Aug 11 11:34:53 server5 marker2[21872]: [182-2] SQL_QUERY3_part2$a
Aug 11 11:34:53 server5 marker2[21872]: [182-3] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 DETAILS: parametrs: $a = 'xyz'
Aug 11 11:34:53 server5 marker2[21872]: [183-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: do pdo_stmt_0000001d: SQL_QUERY3_part1
Aug 11 11:34:53 server5 marker2[21872]: [183-2] SQL_QUERY3_part2$a
Aug 11 11:34:53 server5 marker2[21872]: [183-3] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 DETAILS: parametrs: $a = 'xyz'
Aug 11 11:34:53 server5 marker2[21881]: [38-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.172 ms, LOG Bind pdo_stmt_00000005: SQL_QUERY4$a$b$c
Aug 11 11:34:53 server5 marker2[21881]: [38-2] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 DETAILS: parametrs: $a = '123', $b = '456', $c = '789'
Aug 11 11:34:53 server5 marker2[21878]: [89-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: operator: SQL_QUERY1 pdo_stmt_0000000d
Aug 11 11:34:53 server5 marker2[21881]: [39-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: do pdo_stmt_00000005: SQL_QUERY4$a$b$c
Aug 11 11:34:53 server12 marker10[16792]: [15471-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61603) Proc ID: 16792 [n/d] LOG: duration: 0.371 ms
Aug 11 11:34:53 server12 marker10[15948]: [20762-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61135) Proc ID: 15948 [n/d] LOG: operator: SQL_QUERY6_part1
Aug 11 11:34:53 server12 marker10[15948]: [20762-2] SQL_QUERY6_part2
Aug 11 11:34:53 server12 marker10[15948]: [20762-3] SQL_QUERY6_part3
Aug 11 11:34:53 server12 marker10[15948]: [20762-4] SQL_QUERY6_part4
Aug 11 11:34:53 server5 marker2[21881]: [39-2] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 DETAILS: parametrs: $a = '123', $b = '456', $c = '789'
Aug 11 11:34:53 server5 marker2[21878]: [90-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.037 ms
Aug 11 11:34:53 server5 marker2[21872]: [184-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.056 ms
Aug 11 11:34:53 server5 marker2[21881]: [40-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.048 ms
Aug 11 11:34:53 server14 marker10[7185]: [14663-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(39067) Proc ID: 7185 [n/d] LOG: operator: SQL_QUERY7
Aug 11 11:34:53 server5 marker2[21877]: [128-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.097 ms, LOG Bind pdo_stmt_00000014:
Aug 11 11:34:53 server5 marker2[21877]: [128-2] SQL_QUERY5_part1
Aug 11 11:34:53 server5 marker2[21877]: [128-3] SQL_QUERY5_part2
Aug 11 11:34:53 server5 marker2[21877]: [128-4] SQL_QUERY5_part3
Aug 11 11:34:53 server5 marker2[21877]: [128-5] SQL_QUERY5_part4
Aug 11 11:34:53 server5 marker2[21877]: [128-6] SQL_QUERY5_part5
Aug 11 11:34:53 server5 marker2[21877]: [128-7] SQL_QUERY5_part6
Aug 11 11:34:53 server5 marker2[21877]: [129-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: do pdo_stmt_00000014:
Aug 11 11:34:53 server5 marker2[21877]: [129-2] SQL_QUERY5_part1
Aug 11 11:34:53 server5 marker2[21877]: [129-3] SQL_QUERY5_part2
Aug 11 11:34:53 server5 marker2[21877]: [129-4] SQL_QUERY5_part3
Aug 11 11:34:53 server5 marker2[21877]: [129-5] SQL_QUERY5_part4
Aug 11 11:34:53 server5 marker2[21877]: [129-6] SQL_QUERY5_part5
Aug 11 11:34:53 server5 marker2[21877]: [129-7] SQL_QUERY5_part6
Aug 11 11:34:53 server5 marker2[21877]: [130-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.039 ms
Aug 11 11:34:53 server5 marker2[21872]: [185-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: operator: SQL_QUERY1 pdo_stmt_0000001d
Aug 11 11:34:53 server5 marker2[21872]: [186-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.040 ms
Aug 11 11:34:53 server12 marker10[15948]: [20763-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61135) Proc ID: 15948 [n/d] LOG: duration: 0.971 ms
Aug 11 11:34:53 server14 marker10[7185]: [14664-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(39067) Proc ID: 7185 [n/d] LOG: duration: 0.889 ms

and I want the next result:

composition | result

[35-1] + [36-1] | Aug 11 11:34:53 server5 marker2[21881]: [35-1] 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: operator: SQL_QUERY1 pdo_stmt_00000004 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.032 ms
[29-1] + [30-1] | Aug 11 11:34:53 server8 marker3[19238]: [29-1] 2016-08-11 11:34:53.020 MSK User: user2 Database: my_db2 Host: 192.168.50.11(17050) Proc ID: 19238 app_name2 LOG: operator: SQL_QUERY1 pdo_stmt_00000002 2016-08-11 11:34:53.020 MSK User: user2 Database: my_db2 Host: 192.168.50.11(17050) Proc ID: 19238 app_name2 LOG: duration: 0.036 ms
[85-1] | Aug 11 11:34:53 server5 marker2[21878]: [85-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.118 ms, разбор pdo_stmt_0000000d: SQL_QUERY2
[86-1] | Aug 11 11:34:53 server5 marker2[21878]: [86-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.335 ms, LOG Bind pdo_stmt_0000000d: SQL_QUERY2
[87-1] + [88-1] | Aug 11 11:34:53 server5 marker2[21878]: [87-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: do pdo_stmt_0000000d: SQL_QUERY2 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.067 ms
[125-1] + [126-1] | Aug 11 11:34:53 server5 marker2[21877]: [125-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: operator: SQL_QUERY1 pdo_stmt_00000013 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.039 ms
[181-1] + [181-2] | Aug 11 11:34:53 server5 marker2[21872]: [181-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.123 ms, разбор pdo_stmt_0000001d: SQL_QUERY3_part1 SQL_QUERY3_part2$a
[18189-1] + [18190-1] | Aug 11 11:34:53 server1 marker10[29370]: [18189-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(26091) Proc ID: 29370 [n/d] LOG: operator: SQL_QUERY4 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(26091) Proc ID: 29370 [n/d] LOG: duration: 0.247 ms
[37-1] | Aug 11 11:34:53 server5 marker2[21881]: [37-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.057 ms, разбор pdo_stmt_00000005: SQL_QUERY4$a$b$c
[127-(1-7)] | Aug 11 11:34:53 server5 marker2[21877]: [127-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.056 ms, разбор pdo_stmt_00000014: SQL_QUERY5_part1 SQL_QUERY5_part2 SQL_QUERY5_part3 SQL_QUERY5_part4 SQL_QUERY5_part5 SQL_QUERY5_part6
[15470-(1-4)] + [15471-1] | Aug 11 11:34:53 server12 marker10[16792]: [15470-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61603) Proc ID: 16792 [n/d] LOG: operator: SQL_QUERY8_part1 SQL_QUERY8_part2 SQL_QUERY8_part3 SQL_QUERY8_part4 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61603) Proc ID: 16792 [n/d] LOG: duration: 0.371 ms
[182-(1-3)] | Aug 11 11:34:53 server5 marker2[21872]: [182-1] 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.262 ms, LOG Bind pdo_stmt_0000001d: SQL_QUERY3_part1 SQL_QUERY3_part2$a 2016-08-11 11:34:53.022 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 DETAILS: parametrs: $a = 'xyz'
[183-(1-3)] + [184-1] | Aug 11 11:34:53 server5 marker2[21872]: [183-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: do pdo_stmt_0000001d: SQL_QUERY3_part1 SQL_QUERY3_part2$a 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 DETAILS: parametrs: $a = 'xyz' 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.056 ms
[38-(1-2)] | Aug 11 11:34:53 server5 marker2[21881]: [38-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.172 ms, LOG Bind pdo_stmt_00000005: SQL_QUERY4$a$b$c 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 DETAILS: parametrs: $a = '123', $b = '456', $c = '789'
[39-(1-2)] + [40-1] | Aug 11 11:34:53 server5 marker2[21881]: [39-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: do pdo_stmt_00000005: SQL_QUERY4$a$b$c 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 DETAILS: parametrs: $a = '123', $b = '456', $c = '789' 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.048 ms
[89-1]+[90-1] | Aug 11 11:34:53 server5 marker2[21878]: [89-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: operator: SQL_QUERY1 pdo_stmt_0000000d 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62679) Proc ID: 21878 app_name3 LOG: duration: 0.037 ms
[20762-(1-4)]+[20763-1] | Aug 11 11:34:53 server12 marker10[15948]: [20762-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61135) Proc ID: 15948 [n/d] LOG: operator: SQL_QUERY6_part1 SQL_QUERY6_part2 SQL_QUERY6_part3 SQL_QUERY6_part4 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(61135) Proc ID: 15948 [n/d] LOG: duration: 0.971 ms
[14663-1]+[14664-1] | Aug 11 11:34:53 server14 marker10[7185]: [14663-1] 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(39067) Proc ID: 7185 [n/d] LOG: operator: SQL_QUERY7 2016-08-11 11:34:53.021 MSK User: user3 Database: my_db3 Host: 192.168.5.34(39067) Proc ID: 7185 [n/d] LOG: duration: 0.889 ms
[128-(1-7)] | Aug 11 11:34:53 server5 marker2[21877]: [128-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.097 ms, LOG Bind pdo_stmt_00000014: SQL_QUERY5_part1 SQL_QUERY5_part2 SQL_QUERY5_part3 SQL_QUERY5_part4 SQL_QUERY5_part5 SQL_QUERY5_part6
[129-(1-7)] + [130-1] | Aug 11 11:34:53 server5 marker2[21877]: [129-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: do pdo_stmt_00000014: SQL_QUERY5_part1 SQL_QUERY5_part2 SQL_QUERY5_part3 SQL_QUERY5_part4 SQL_QUERY5_part5 SQL_QUERY5_part6 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62677) Proc ID: 21877 app_name4 LOG: duration: 0.039 ms
[185-1]+[186-1] | Aug 11 11:34:53 server5 marker2[21872]: [185-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: operator: SQL_QUERY1 pdo_stmt_0000001d 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.12(62667) Proc ID: 21872 app_name5 LOG: duration: 0.040 ms

And if possible, it is better that it:

[35-1] + [36-1] | Aug 11 11:34:53 server5 marker2[21881]: [35-1] 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: operator: SQL_QUERY1 pdo_stmt_00000004 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.032 ms
It looks like this:
[35-1] + [36-1] | Aug 11 11:34:53 server5 marker2[21881]: [35-1] 2016-08-11 11:34:53.021 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: operator: SQL_QUERY1 pdo_stmt_00000004 duration: 0.032 ms

And that:

[39-(1-2)] + [40-1] | Aug 11 11:34:53 server5 marker2[21881]: [39-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: do pdo_stmt_00000005: SQL_QUERY4$a$b$c 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 DETAILS: parametrs: $a = '123', $b = '456', $c = '789' 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: duration: 0.048 ms

looks like this:

[39-(1-2)] + [40-1] | Aug 11 11:34:53 server5 marker2[21881]: [39-1] 2016-08-11 11:34:53.023 MSK User: user1 Database: my_db1 Host: 192.168.7.11(57557) Proc ID: 21881 app_name1 LOG: do pdo_stmt_00000005: SQL_QUERY4$a$b$c DETAILS: parametrs: $a = '123', $b = '456', $c = '789' duration: 0.048 ms

Please help me and I will be very grateful to you.

aggregate 2.4.0 issue when using with split plugin

Hi again,

I'm not sure this issue could be in some way related to the other one I raised.

Here is my sample code to reproduce it. Sorry but it might not be a very concise example. When the 2.4.0 aggregate plugin is used with split, it looks it does not pass the maps over to the split events. The same code works correct with the lower version. All tested with the same logstash version 2.3.2. Please correct if the code is missing something.

With the following code run, I tried providing the std input like this (two lines of records):
t1 f1
t2 f2

And then checked if the s_data getting the correct value assigned before split, which is "value". But the result shows 'nil' for this field of s_data.

input {
    stdin {}
}

filter {

    grok { match => { "message" => "%{DATA:t_id} %{DATA:f_id}$" } }

    aggregate {
        task_id => "%{t_id}"
        code => "
                if map['f_ids']
                    map['f_ids'] << event['f_id']
                else
                    map['f_ids'] = [event['f_id']]
                end
                event['f_ids'] = map['f_ids']
                "
    }

    aggregate {
        task_id => "%{f_id}"
        code => "
                map['data'] = 'value'
                event['data'] = map['data']
                "
    }

    if [f_id] == "f2" {
        split {
            field => "f_ids"
            target => "s_id"
        }
        aggregate {
            task_id => "%{s_id}"
            code => "
                    event['s_data'] = map['data']
                    "
        }
    }

}

output {
    stdout { codec => rubydebug }
}

Filter does not always check map correctly after timeout

inactivity_timeout is not working as I expected. I have the configuration below where I tried to make Logstash maintain a heartbeat state in map. I expect Logstash to evict map at timeout but maybe it does not. Logstash does not always seem to recover from timeout but remains in an error state where it does not react to new activity by tagging an event with 'heartbeatStateChanged' but stays inactive.

Version: logstash-filter-aggregate 2.6.0, Logstash 5.3.1
Operating System: CentOS Linux release 7.3.1611
Config File:
/etc/logstash/logstash.yml:
# pipeline.workers: 1
making Logstash "pipeline.workers"=>16

     if [host] and [application] {
         aggregate {
             task_id => "%{host}%{application}"
             code => "
                 if map['heartbeat'] != 'OK'
                     event.tag('heartbeatStateChanged')
                     event.set('heartbeat', 'OK')
                     event.set('heartbeat_severity', 1)
                 end
                 map['heartbeat'] = 'OK'
                 map['host'] = event.get('host')
                 map['application'] = event.get('application')
                 map['component'] = 'main'
                 map['component_id'] = 'main'
             "
             timeout_code => "
                 event.tag('heartbeatStateChanged')
                 event.set('heartbeat_severity', 2)
                 event.set('heartbeat', 'ALARM')
                 event.set('message', '2 minutes elapsed since the last heartbeat.')
             "
             push_map_as_event_on_timeout => true
             inactivity_timeout => 120
             timeout => 999999999
         }
     }

Steps to Reproduce:
Feed log lines to Logstash from Filebeat. Sometimes a timeout occurs. After the timeout, new activity occurs with the same task_id. Then Logstash should tag 'heartbeatStateChanged' but it does not always do so.

Aggregate seemingly not working

Hey, I was wondering if someone could tell me what I'm doing wrong with my aggregate config here. I keep getting all the messages separately, and I can't figure out for the life of me why they're not aggregating:

Example input:

INFO  [2016-03-11 08:13:05] : @0123456789123.jpg Found file!
WARN  [2016-03-11 08:13:05] : @0123456789123.jpg Upscaling (is 500x500, must be 500x500)
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg No swatch to crop
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Skipped Image derivative: 2000 (image too small)
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Skipped Image derivative: 1500 (image too small)
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Skipped Image derivative: 900 (image too small)
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Made Image derivative: 500
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Made Image derivative: 450
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Made Image derivative: 400
DEBUG [2016-03-11 08:13:05] : @0123456789123.jpg Made Image derivative: 300
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 275
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 215
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 180
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 150
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 100
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 80
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 75
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 70
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 60
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 45
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 40
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 30
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 18
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Made Image derivative: 16
DEBUG [2016-03-11 08:13:06] : @0123456789123.jpg Backing up to /data/backup
INFO  [2016-03-11 08:13:06] : @0123456789123.jpg It took 1.2s to complete operations

input { stdin {} }

filter {
  grok { match => ["message", "(?<severity>[A-Z]+)\s{1,4}\[(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] : (@(?<file>\d{13}(_AV\d*)?(\.\w{2,4})?))? ?(?<rest>.+)"] }
    # Timestamp matching and override
    date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss"] }
    mutate {
      replace => { "message" => "%{rest}" }
      remove_field => ["rest"]
    }

    if [severity] == "FATAL" { mutate { add_field => { "error" => "%{message}" } } }

    if [message] =~ "Found file" {
      aggregate {
        task_id => "%{file}"
        code => ""
        map_action => "create"
      }
    }

    # Create tags based on whether it has a swatch
    if [message] =~ "Swatch found" { mutate { add_field => { "swatch" => "true"  } } }
    if [message] =~ "No swatch"    { mutate { add_field => { "swatch" => "false" } } }

    # Detect if upscaled, add tag
    if [message] =~ "Upscaling" {
      grok {
        match => ["message", "is (?<upscaled_from>\d+x\d+)"]
        add_tag => ["upscaled"]
      }
    }

    # For derivatives, get the type and size of derivative
    if [message] =~ "derivative: " {
      grok { match => ["message", "(?<derivative>(Image|Swatch)) derivative: (?<size>\d+(_SW)?)"] }

      aggregate {
        task_id => "%{file}"
        code => "map['d'] ||= {}; map['d']['%{derivative}'] ||= {}; map['d']['%{derivative}']['%{size}'] = ('%{message}' !~ /^Skipped/)"
        map_action => "update"
      }
    }

    if ([message] =~ "It took") or ([severity] == "ERROR") {
      if [severity] == "ERROR" {
        grok {
          match => ["message", "to errors: (?<error>.+)"]
          add_field => { "error" => "%{error}" }
        }
      } else {
        grok { match => ["message", "It took (?<processing_time>[\d\.]+?)s"] }
      }

      aggregate {
        task_id => "%{file}"
        code => ""
        # timeout => 90
        map_action => "update"
        end_of_task => true
      }

    }
  }
}

output { stdout { codec => rubydebug } }

Map failure after too many log lines

Hey :)

it seems like there is a problem when the logfile exceeds a specific amount of lines.
I made a config where i search the lines for a word, when its found i insert a count in the map.
In the final event i add a field with the value of the count.

I tried it with 3 different files.

test.txt has 1000006 lines, it works like intended.
second_text.txt has 1980448 lines, and the pattern i look for is at the end of the file. In the final event the field is not included.
third_text.txt has 1980448 lined, and the pattern i look for is in the first half of the file. In the final event the field is not included.

I attached everything needed.

test.txt
third_text.txt
second_text.txt
logstash_conf.txt

Unfortunately we have a lot of Logfiles which exceed 3000000 lines per file.

Could you look into it? Thanks!

EOFError: End of file reached

Hi,

My logstash seemed to slow for consumer, so I restarted it. When I restarted it I had an error:

[2017-12-06T09:18:18,165][ERROR][logstash.pipeline        ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<EOFError: End of file reached>, :backtrace=>["org/jruby/RubyMarshal.java:149:in `load'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-aggregate-2.7.0/lib/logstash/filters/aggregate.rb:134:in `block in register'", "org/jruby/RubyIO.java:1156:in `open'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-aggregate-2.7.0/lib/logstash/filters/aggregate.rb:134:in `block in register'", "org/jruby/ext/thread/Mutex.java:148:in `synchronize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-aggregate-2.7.0/lib/logstash/filters/aggregate.rb:93:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:388:in `register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:399:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:399:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:801:in `maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:409:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:333:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:293:in `block in start'"], :thread=>"#<Thread:0x37a72b73@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:290 run>"}

I use a large timeout and inactivity_timeout values, is there a limit for the size of the map? When I delete the map file and restarted, everything works again. Now, I reduced the value of the timeout to see.

Version: Logstash 6.0.0
Operating System: RHEL7
Options: timeout => 604800 (1 week) ; inactivity_timeout => 86400 (1 week) ; push_map_as_event_on_timeout => false

Problems with multiple worker threads

Does the following look like user error or a bug? With one worker, I see expected results:

$ ~/Downloads/logstash-2.0.0/bin/logstash -f /tmp/logstash.conf -w1 < /tmp/test.log
Default settings used: Filter workers: 1
Logstash startup completed
{
       "message" => "start: foo\n1\n2\n3\nfinished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:10:02.312Z",
          "host" => "test",
         "count" => 4
}
{
       "message" => "start: bar\n1\n2\n3\n4\n5\nfinished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:10:02.313Z",
          "host" => "test",
         "count" => 6
}
{
       "message" => "start: baz\n1\nfinished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:10:02.313Z",
          "host" => "test",
         "count" => 2
}
Logstash shutdown completed

With multiple workers, results are all over the place:

$ ~/Downloads/logstash-2.0.0/bin/logstash -f /tmp/logstash.conf -w3 < /tmp/test.log
Default settings used: Filter workers: 3
Logstash startup completed
{
       "message" => "start: foo\n2\n3\nfinished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:18:13.349Z",
          "host" => "test",
         "count" => 3
}
{
       "message" => "start: bar\n2\n1\n3\n4\n5\n1\nfinished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:18:13.352Z",
          "host" => "test",
         "count" => 7
}
{
       "message" => "finished",
      "@version" => "1",
    "@timestamp" => "2015-11-13T19:18:13.351Z",
          "host" => "test"
}
Logstash shutdown completed

test.log:

start: foo
1
2
3
finished
start: bar
1
2
3
4
5
finished
start: baz
1
finished

logstash.conf:

input {
  stdin {}
}
output {
  stdout {
    codec => rubydebug { metadata => true }
  }
}
filter {
  mutate {
    replace => {
      'host' => 'test'
    }
  }
  if [message] =~ /^start/ {
    aggregate {
      task_id => '%{host}:test'
      code => "map['msgs'] = [event['message']]; map['count'] = 0;"
      map_action => 'create'
    }
    drop {}
  }
  else {
    aggregate {
      task_id => '%{host}:test'
      code => "map['msgs'] << event['message']; map['count'] += 1;"
      map_action => 'update'
    }
  }
  if [message] =~ /^finish/ {
    aggregate {
      task_id => '%{host}:test'
      code => 'event["message"] = map["msgs"].join("\n"); event["count"] = map["count"];'
      map_action => 'update'
      end_of_task => true
    }
  }
  else {
    drop {}
  }
}

External code usage

I was thinking if it's possible to include feature for including external code into code of aggregate. I've done some fast test with include and it works - I can use new functions from external file.

The reason for this is that I currently have a lot of code that does some checks, merging on map and calculations (like time duration). It starts to be very very messy and I would like to create some general functions so I can pass whole event and map and do necessary calculations. Therefore the code will not look like:

map['foo'] ||={}; map['foo2'] ||={}; map['start'] = 0; map['foo'] = event['stop'] - map['start'] if map['start']; map['start'] = event['start'] if event['start'] and map['start'] == 0

but like:

start_stop_calculate(map, event)

What do you think about such idea ? Basic include is trivial but ensuring that only proper functions can be called is more complicated (i.e move aggregate call to separate class)

Feature requests

Here's the use case I'm trying to handle. Assume there are messages like:

something interesting happened: cool-metric:42
something interesting happened: cool-metric:2

Another case is handling messages sent via syslog not intended for per-line processing:

The kernel msgs when a system boots. I'd prefer to see that as one (or at least fewer) documents in elasticsearch.
Redirecting a shell script's output to logger especially when there are chunks that should be read as a whole (eg output from diff).

Features I would like this or a similar plugin to provide:

Be able to specify a count or interval for grouping events the way the collate plugin does for cases when there is no start & stop event. This plugin has timeout but instead of dropping the msgs, I would like the option to aggregate them.
Have the filter automatically define a count (eg [@metadata][aggregate][count] = 2) and maybe interesting numbers from the msg (eg [@metadata][aggregate][cool-metric][sum] = 44). Maybe can be accomplished via code but wonder how well that works in practice with larger code blocks.
Define which fields get merged and how. Initially, I was thinking of joining message by newline but make it configurable some day if needed. May not care about the original message at all and prefer something like this for the collated message: %{message} occurred "%{[@metadata][collate][count]} times. Again, maybe can be handled via code.

Do these features fall in the scope of this plugin? If not, any thoughts on what I should call a new plugin? Is there another effort to create such a plugin? Thanks!

Can I change task_id in flight?

This is maybe not the correct place to ask this but I found it very hard to get examples on how to use aggregate. I have a case where I want to follow a event after the unique identifier changes. Look at these logs:

Feb 21 11:01:31 m20 sshd[11097]: Connection from 172.18.37.40 port 38944 on 172.18.5.20 port 22
[...]
Feb 21 11:01:32 m20 sshd[11097]: User child is on pid 11104
Feb 21 11:01:32 m20 sshd[11104]: Starting session: shell on pts/1 for pan from 172.18.37.40 port 38944 id 0

I want to aggregate on the pid, but as you can see the last log line have a new pid. Is it possible to do this?

Thanks!

Unexpected behaviour of timeout parameter

I'm experiencing unexpected behavior of the timeout parameter.

I've set the logstash parameter as follows:

timeout => 86400

Which should correspond to 1 day, but I'm seeing that it is expiring about 1 hr and 20 min after creating the task.

My questions about this parameter are:

For a given task_id, does it matter what aggregate code block contains the timeout parameter (e.g. create, update, end_of_task etc.)? Is it ok if it is in multiple aggregate code blocks?
If there are multiple tasks (different task_ids) with different desired timeout parameters could they interfere with each other somehow?
What happens if timeout is set to 0?

As a sidenote I believe the info provided here is out of date as the default is 1800 not 0:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html#plugins-filters-aggregate-timeout

I'm using logstash v2.3.2

logstash-plugins / logstash-filter-aggregate Goto Github PK

logstash-filter-aggregate's Introduction

Aggregate Logstash Plugin

Documentation

Changelog

Need Help?

Developing

1. Plugin Developement and Testing

Code

Test

2. Running your unpublished Plugin in Logstash

2.1 Run in a local Logstash clone

2.2 Run in an installed Logstash

Contributing

logstash-filter-aggregate's People

Contributors

Stargazers

Watchers

Forkers

logstash-filter-aggregate's Issues

composition | result

Recommend Projects

Recommend Topics

Recommend Org

Jobs