logstash-plugins / logstash-input-kafka Goto Github PK

Kafka input for Logstash

License: Apache License 2.0

Ruby 90.24% Shell 9.76%

logstash-input-kafka's Introduction

Logstash Plugin

This is a plugin for Logstash.

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

Kafka Input Plugin Has Moved

This Kafka Input Plugin is now a part of the Kafka Integration Plugin. This project remains open for backports of fixes from that project to the 9.x series where possible, but issues should first be filed on the integration plugin.

Logging

Kafka logs do not respect the Log4J2 root logger level and defaults to INFO, for other levels, you must explicitly set the log level in your Logstash deployment's log4j2.properties file, e.g.:

logger.kafka.name=org.apache.kafka
logger.kafka.appenderRef.console.ref=console
logger.kafka.level=debug

Documentation

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html

Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one central location.

For formatting code or config example, you can use the asciidoc [source,ruby] directive
For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide

Need Help?

Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.

Developing

1. Plugin Developement and Testing

Code

To get started, you'll need JRuby with the Bundler gem installed.
Create a new plugin or clone and existing from the GitHub logstash-plugins organization. We also provide example plugins.
Install dependencies

bundle install
rake install_jars

Test

Update your dependencies

bundle install
rake install_jars

Run tests

bundle exec rspec

2. Running your unpublished Plugin in Logstash

2.1 Run in a local Logstash clone

Edit Logstash Gemfile and add the local plugin path, for example:

gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"

Install plugin

# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify

Run Logstash with your plugin

bin/logstash -e 'filter {awesome {}}'

At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.

2.2 Run in an installed Logstash

You can use the same 2.1 method to run your plugin in an installed Logstash by editing its Gemfile and pointing the :path to your local plugin development directory or you can build the gem and install it using:

Build your plugin gem

gem build logstash-filter-awesome.gemspec

Install the plugin from the Logstash home

# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify

Start Logstash and proceed to test the plugin

Contributing

All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.

Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.

It is more important to the community that you are able to contribute.

For more information about contributing, see the CONTRIBUTING file.

logstash-input-kafka's People

Contributors

Stargazers

Watchers

Forkers

joekiller suyograo talevy colinsurprenant ph semaifour camilosierrah kargreat mspiegle dmvk hugoren hotienvu cooper6581 deepbluesnow guyboertje brightyang karanpradhan anupchat davidbyrneinsight dedemorton gquintana jamstah roongr2k7 dadanielz xylakant fignewtons pbkdf3 ingafeick mnikhil-git tsouza treziac phirov juanpaulo hamza2404 consulthys saginoam 347154468 dlyoungerman wuyang308 fhopf original-brownbear jackice svrc muwiess gurvindersingh jsvd omribromberg gnuhpc jerryjxj snowch young8 rogervaas robbavey dkelley-accretive shihp kwame998 ujet huojuntao adslqa yaauie basseljabak1 natbusa nnordrum makubi v-a payscale lepek proteusvacuum webmat jrask karenzone yubobo lde raistlinzx synhershko qiu957919102 appcxb luis-arias-forex shaharmor shjshj aikoven jackdavidson banno w32-blaster skbly7 arenard vbohata hackery hyetpang yourheroonly prabhur24 llllb irhawks whpei mathant i-e-java simuhunluo falbanese occidere hovercross

logstash-input-kafka's Issues

Logstash not able to shutdown successfully on ES 2.0 with Kafka input

sudo /etc/init.d/logstash stop

Killing logstash (pid 28592) with SIGTERM 
Waiting logstash (pid 28592) to die... 
Waiting logstash (pid 28592) to die... 
Waiting logstash (pid 28592) to die... 
Waiting logstash (pid 28592) to die... 
Waiting logstash (pid 28592) to die... 
logstash stop failed; still running.

ps -ef | grep -i java

logstash 28592 1 99 23:21 pts/0 01:36:24 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Djava.rmi.server.hostname=some_host -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9994 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xmx4g -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Djava.rmi.server.hostname=some_host -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9994 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/conf.d -l /var/log/logstash/logstash.log

sudo /etc/init.d/logstash status

logstash is running

End up having to sudo kill -9 28592

tail logstash.log

All there is in the logs are:

{:timestamp=>"2015-11-14T00:02:51.597000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:02:51-08:00", {"input_to_filter"=>18, "filter_to_output"=>1, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:02:56.598000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:02:56-08:00", {"input_to_filter"=>18, "filter_to_output"=>1, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:01.598000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:01-08:00", {"input_to_filter"=>19, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:06.599000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:06-08:00", {"input_to_filter"=>20, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:11.599000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:11-08:00", {"input_to_filter"=>20, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:16.601000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:16-08:00", {"input_to_filter"=>19, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:21.601000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:21-08:00", {"input_to_filter"=>20, "filter_to_output"=>2, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:26.602000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:26-08:00", {"input_to_filter"=>20, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:31.603000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:31-08:00", {"input_to_filter"=>19, "filter_to_output"=>1, "outputs"=>[]}], :level=>:warn} 
{:timestamp=>"2015-11-14T00:03:36.604000-0800", :message=>["INFLIGHT_EVENTS_REPORT", "2015-11-14T00:03:36-08:00", {"input_to_filter"=>0, "filter_to_output"=>0, "outputs"=>[]}], :level=>:warn}

The above suggests that the Kafka input is no longer sending anything through, but LS is not able to shutdown successfully.

Logstash config:

input {
  kafka {
      zk_connect => "hosts"
      consumer_threads => "1"
      group_id => "group"
      topic_id => "topic"
      tags => "fromKafka"
  }
}

filter {
   if "fromKafka" in [tags]
   {
     metrics
     {
        meter => "events"
        add_tag => "kafka_metric"
        flush_interval => "30"
     }
   }

   if "lumberjack_metric" in [tags] or "syslog_metric" in [tags] or "kafka_metric" in [tags]
   {
     mutate
     {
        add_field => { "host" => "%{message}" }
        remove_field => [ "message" ]
     }
   }
}

output {
   if "lumberjack_metric" in [tags] or "syslog_metric" in [tags] or "kafka_metric" in [tags]
   {
      elasticsearch
      {
         hosts => ["hosts"]
         workers => "2"
      }
   }
   else
   {
      elasticsearch {
      hosts => ["hosts"]
      workers => "10"
      user => "user"
      password => "password"
      }
    }
}

This is what I get processing a simple message:
{
"message" => "testing 1 2 3",
"tags" => "_jsonparsefailurewhatever",
"@Version" => "1",
"@timestamp" => "2015-04-16T22:10:07.726Z",
"type" => "access"
}

If I change the kafka input to stdin I get what you would expect:
{
"message" => "testing 1 2 3",
"@Version" => "1",
"@timestamp" => "2015-04-16T22:09:34.144Z",
"host" => "devfp03",
"tags" => [
[0] "whatever"
]
}

Possibly related we experience frequent logstash hangs when we are getting input from kafka topics.

Can this plugin read from Kafka in a round-robin way?

I have 3 partitions: 0, 1, 2. So the messages can be classified as 0, 1, 2.

eg:

1 message in partition 0: 0

3 messages in partition 1: 111

2 messages in partition 2: 22

How to make the logstash to consume messages in the order 012x12x1x
(x means no messages at that time). The order of consumed messages look
like: 012121.

There is partition.assignment.strategy in Kafka consumer configs (http://kafka.apache.org/documentation.html#consumerconfigs). I am looking for some plugins which implements this config to read data from kafka, reorder them, and write to kafka again.

Can this plugin read from kafka in a round-robin way?
Any idea?

test issue

refactor shutdown sequence to use plugin.stop

Change shutdown sequence to be triggered by plugin.stop instead of ShutdownSignal exception.
Also remove any calls to: shutdown, finished, finished?, running? or terminating?
This depends on elastic/logstash#3895

Logstash (1.5.3) is not able to get data from remote kafka

I am trying to read some log messages from kafka. Below is my logstash config:-

input {
    kafka {
        zk_connect => "kafka:2181"
        group_id => "logstash"
        topic_id => "logstash_logs"
        reset_beginning => false
        consumer_threads => 1
    }
}

output {
  elasticsearch {
    index => "test-2015-08-18"
  }
}

My zookeeper.properties in kafka look like below:-

dataDir=/tmp/kafka/
clientPort=2181
maxClientCnxns=0

The topic logstash_logs was created with the below command in the kafka machine:-

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic logstash_logs

On running logstash I am getting the below error in console:-

log4j, [2015-08-19T10:41:42.315]  WARN: kafka.client.ClientUtils$: Fetching topic metadata with correlation id 5 for topics [Set(logstash_logs)] from broker [id:0,host:localhost,port:9092] failed
java.nio.channels.ClosedChannelException
    at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
    at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
    at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
log4j, [2015-08-19T10:41:42.315]  WARN: kafka.consumer.ConsumerFetcherManager$LeaderFinderThread: [logstash_logstash-indexer-1439961100662-86d0d679-leader-finder-thread], Failed to find leader for Set([logstash_logs,0])
kafka.common.KafkaException: fetching topic metadata for topics [Set(logstash_logs)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
    at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
Caused by: java.nio.channels.ClosedChannelException
    at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
    at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
    ... 3 more

I am able to telent kafka 2181 and also able to connect to kafka from kafka-web-console running from logstash machines. I am also able to connect to kafka if I place a kafka in the same machine as logstash and make z_connect => "localhost:2181". The problem seems to be there only when kafka is running remote.

Environment:-

Logstash - 1.5.3 on Debain Linux 7 x86_64
Kafka - 2.10-0.8.2.1 on Debian Linux 8 x 86_64

To check if there is an issue with OS Version I placed logstash also in the same machine as in kafka and then also it worked fine with z_connect => "localhost:2181".

test 2

test for integration

Decorate metadata field name

Currently, if you set the decorate_events flag, the Kafka metadata is put in the kafka field, where the field name is not configurable. With the introduction of the @metadata field in Logstash 1.5, maybe this data should go into [@metadata][kafka].

SSL for Kafka 0.9

Kafka 0.9 support TLS/SSL (https://issues.apache.org/jira/browse/KAFKA-1684). We've been getting some customer interest in supporting this feature in our input/output plugins.

Add support SASL/Kerberos authentication

Kafka support SASL/Kerberos, can you add SASL/Kerberos authentication support in the logstash-input-kafka

http://kafka.apache.org/documentation.html#security_sasl

Kafka.consumer.RangeAssignor: No broker partitions consumed by consumer thread logstash_logstash-indexer

I have an ELK Set-up in which a logstash is pushing data to Kafka and another logstash is pulling data from Kafka.

Below are my Kafka Input Config:-

input {
    kafka {
        zk_connect => "kafka:2181"
        group_id => "logstash"
        topic_id => "logstash_logs"
        reset_beginning => false
        consumer_threads => 3
    }
}

I have gone through this issue & I have 3 partitions & 1 replica for my logstash topic. partitions for my logstash topic. Kafka is a single node cluster with both zookeeper and broker running the same server.

After starting the logstash I am seeing the below warnings and logstash is not pulling any data from Kafka:-

'[DEPRECATED] use `require 'concurrent'` instead of `require 'concurrent_ruby'`
log4j, [2015-10-27T00:11:50.471]  WARN: kafka.consumer.RangeAssignor: No broker partitions consumed by consumer thread logstash_logstash-indexer-1445884909915-d6a99924-2 for topic logstash_logs
log4j, [2015-10-27T00:11:50.471]  WARN: kafka.consumer.RangeAssignor: No broker partitions consumed by consumer thread logstash_logstash-indexer-1445884909915-d6a99924-0 for topic logstash_logs
log4j, [2015-10-27T00:11:50.471]  WARN: kafka.consumer.RangeAssignor: No broker partitions consumed by consumer thread logstash_logstash-indexer-1445884909915-d6a99924-1 for topic logstash_logs
Logstash startup completed

Kafka's server.properties looks like below:-

broker.id=0
port=9092
advertised.host.name=kafka
num.network.threads=1
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/log/kafka
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000

Kafka's zookeeper.properties looks like below:-

dataDir=/tmp/kafka/
clientPort=2181
maxClientCnxns=0

I tried restarting the Kafka as well. After restarting Kafka saw the below INFO logs:-

[2015-10-27 11:42:04,619] INFO Loading logs. (kafka.log.LogManager)
[2015-10-27 11:42:04,673] INFO Recovering unflushed segment 97126428 in log logstash_logs-2. (kafka.log.Log)
[2015-10-27 11:42:06,104] INFO Completed load of log logstash_logs-2 with log end offset 97455132 (kafka.log.Log)
[2015-10-27 11:42:06,117] INFO Recovering unflushed segment 102920474 in log logstash_logs-1. (kafka.log.Log)
[2015-10-27 11:42:07,508] INFO Completed load of log logstash_logs-1 with log end offset 103493961 (kafka.log.Log)
[2015-10-27 11:42:07,519] INFO Recovering unflushed segment 99665013 in log logstash_logs-0. (kafka.log.Log)
[2015-10-27 11:42:08,036] INFO Completed load of log logstash_logs-0 with log end offset 99824837 (kafka.log.Log)
[2015-10-27 11:42:08,042] INFO Logs loading complete. (kafka.log.LogManager)
[2015-10-27 11:42:08,043] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2015-10-27 11:42:08,046] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2015-10-27 11:42:08,074] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2015-10-27 11:42:08,075] INFO [Socket Server on Broker 0], Started (kafka.network.SocketServer)
[2015-10-27 11:42:08,152] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$)
[2015-10-27 11:42:08,219] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2015-10-27 11:42:08,426] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2015-10-27 11:42:08,438] INFO Registered broker 0 at path /brokers/ids/0 with address kafka:9092. (kafka.utils.ZkUtils$)
[2015-10-27 11:42:08,450] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2015-10-27 11:42:08,589] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [logstash_logs,1],[logstash_logs,2],[logstash_logs,0] (kafka.server.ReplicaFetcherManager)
[2015-10-27 11:42:08,621] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [logstash_logs,1],[logstash_logs,2],[logstash_logs,0] (kafka.server.ReplicaFetcherManager)

Environment:-

Logstash - 1.5.3 - Tried with Logstash 1.5.4 and observed the same issue.
Kafka - 0.8.2.1
Debian 7 - 64 Bit OS

Just to add it was working fine with this config. Suddenly after restarting logstash it stopped working. The topic has about 60M logstash event messages when I started seeing this issue.

single threaded parsing?

First of all I'm not a ruby / logstash expert, so I might be missing something.

When I set consumer_threads = 3, it will add three consumers to consumer group (triggers rebalance) and we should get better troughput.

However, it seems to me, that message parsing, still runs in a single thread, no matter how many consumer threads we have.

https://github.com/logstash-plugins/logstash-input-kafka/blob/master/lib/logstash/inputs/kafka.rb#L149

Deserialization is the most expensive operation here, therefore we wouldn't be able to scale up a single logstash instance ...

Yes, there is still an option to scale out, but if I'm right, this is obvious bottleneck, which can be easily fixed.

Am I missing something?

Thanks,
D.

problems trying bin/plugin logstash-input-kafka .. suggests logstash-core dependency, which doesn't exist

I was trying to get logstash1.5 working in docker, using https://github.com/roblayton/docker-logstash as a template and editing the run.sh in that repo just to run logstash on creation of the container and using a -v flag in docker to point to the logstash.conf (see lower down) on my host, as well as linking into various other containers (zookeeper and elasticsearch).

docker run -v /home/me/Development/Tools/data/logstash/config/:/etc/logstash/conf.d/ -d -p 5043:5043 -p 9292:9292 --name logstash1.5 --link elasticsearch:es --link zookeeper:zk -t roblayton/docker-logstash

This gives the following error:
Using milestone 1 input plugin 'kafka'. This plugin should work, but would benefit from use by folks like you. Please let us know if you find bugs or have suggestions on how to improve this plugin. For more information on plugin milestones, see http://logstash.net/docs/1.5.0.beta1/plugin-milestones {:level=>:warn}
Using milestone 1 filter plugin 'ruby'. This plugin should work, but would benefit from use by folks like you. Please let us know if you find bugs or have suggestions on how to improve this plugin. For more information on plugin milestones, see http://logstash.net/docs/1.5.0.beta1/plugin-milestones {:level=>:warn}
log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
Failed to install template: Connection refused {:level=>:error}
kafka client threw exception, restarting {:exception=>#<KafkaError: Got ZkException: Unable to connect to zookeeper server within timeout: 6000>, :level=>:warn}
kafka client threw exception, restarting {:exception=>#<KafkaError: Got ZkException: Unable to connect to zookeeper server within timeout: 6000>, :level=>:warn}

I then thought that maybe the logstash1.5 beta was missing the logstash-input-kafka plugin and from within the logstash container I attempted to install that plugin. I realised as part of that process that an earlier version of the plugin was in fact installed (0.1.5), but the install failed to install the newer version because of some unresolved dependencies on a logstash-core plugin, which doesn't appear to exist yet. Any suggestions on a possible fix for this?

root@f6f3610dc8d5:/opt/logstash# bin/plugin install logstash-input-kafka
validating logstash-input-kafka >= 0
Valid logstash plugin. Continuing...
removing existing plugin before installation
Successfully uninstalled logstash-input-kafka-0.1.5
Gem::UnsatisfiableDependencyError: Unable to resolve dependency: logstash-input-kafka (= 0.1.11) requires logstash-core (< 2.0.0, >= 1.4.0)
resolve_for at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/dependency_resolver.rb:144
resolve_for at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/dependency_resolver.rb:194
resolve at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/dependency_resolver.rb:87
resolve at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/request_set.rb:134
resolve_dependencies at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/dependency_installer.rb:408
install at /opt/logstash/vendor/jruby/lib/ruby/shared/rubygems/dependency_installer.rb:345
execute at /opt/logstash/lib/logstash/pluginmanager/install.rb:60
run at /opt/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.3/lib/clamp/command.rb:67
execute at /opt/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.3/lib/clamp/subcommand/execution.rb:11
run at /opt/logstash/lib/logstash/runner.rb:144
call at org/jruby/RubyProc.java:271
run at /opt/logstash/lib/logstash/runner.rb:171
call at org/jruby/RubyProc.java:271
initialize at /opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.18/lib/stud/task.rb:12
root@f6f3610dc8d5:/opt/logstash# bin/plugin install logstash-core
validating logstash-core >= 0
Plugin does not exist 'logstash-core'. Aborting

This is the logstash.conf file that works on logstash 1.4 against kafka 0.8.1.1
input {
kafka {
zk_connect => "127.0.0.1:2181"
group_id => "logstash"
topic_id => "request"
reset_beginning => false
consumer_threads => 1
queue_size => 20
rebalance_max_retries => 4
rebalance_backoff_ms => 2000
consumer_timeout_ms => -1
consumer_restart_on_error => true
consumer_restart_sleep_ms => 0
codec => json
}
}
input {
kafka {
zk_connect => "127.0.0.1:2181"
group_id => "logstash"
topic_id => "response"
reset_beginning => false
consumer_threads => 1
queue_size => 20
rebalance_max_retries => 4
rebalance_backoff_ms => 2000
consumer_timeout_ms => -1
consumer_restart_on_error => true
consumer_restart_sleep_ms => 0
codec => json
}
}

filter {
if [response][compressed] {
ruby {
code => "
require 'zlib';
temp_payload = event['response']['payload']
#buf_payload = new_payload.pack('C_') if new_payload.is_a?(Array)
#puts compressYN
#puts new_payload.class
event['response']['payload_decoded']= Zlib::Inflate.inflate(temp_payload.to_a.pack('C_'))
"
}
}
}

output {

stdout { codec => rubydebug }

elasticsearch {
    protocol => "http"
    host => "localhost"
    index => "logs"

}
}

Thanks,
Colum

Audit existing defaults for Kafka consumer

Need to audit and verify if default settings makes sense with performance numbers. Especially configurations like fetch_message_max_bytes, :queue_size and :consumer_threads

Stalling thread preventing closing of logstash

I get this, which prevents logstash from closing:

{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"total"=>0}, "STALLING_THREADS"=>{["LogStash::Inputs::Kafka", {"zk_connect"=>"10.187.147.57:2181,10.187.147.58:2181,10.35.132.117:2181", "topic_id"=>"rsyslog_logstash2", "reset_beginning"=>"false", "type"=>"kafka"}]=>[{"thread_id"=>19, "name"=>"<kafka", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-2.0.3/lib/logstash/inputs/kafka.rb:139:in `pop'"}]}}

Add support to read avro messages encoded with Confluent's kafkaavroserializer

I have avro messages encoded with kafkaavroserializer. I am not able to read them to elasticsearch using kafka input plugin.

I did classpath the kafka avro decoder (the entire serializer jars along with its dependencies) too. But I got an error that said that the plugin was not found.

do you happen to know this error or issue? do you have any thoughts to overcome this issue?

Kafka threads unable to subscribe peg CPU usage

Steps to reproduce:
logstash 2.1.1
jdk 1.8_51
kafka 0.8.1

Set your partitions on kafka to 20 (or whatever number you can exceed consumers on).
Start a 21st logstash consumer
CPU usages for Kafka inputs will all use the maximum CPU.

top - 20:00:29 up 2:33, 1 user, load average: 8.08, 8.10, 8.09
Tasks: 235 total, 9 running, 226 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy,100.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8059392k total, 1595996k used, 6463396k free, 69176k buffers
Swap: 1048572k total, 0k used, 1048572k free, 882964k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29574 logstash 39 19 4614m 362m 15m R 50.2 4.6 42:10.56 <kafka
29569 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:29.97 <kafka
29570 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:13.39 <kafka
29571 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:12.77 <kafka
29572 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:03.24 <kafka
29573 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:13.51 <kafka
29575 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:11.28 <kafka
29576 logstash 39 19 4614m 362m 15m R 49.9 4.6 42:19.94 <kafka

High CPU usage in logstash 2.0

Hi, we are using logstash 2.0 to collect logs. But we found it consumed too much CPU resource, over 300% (8 cores).

Here is the config:

input {
    kafka {
        group_id => "logstash"
        topic_id => "logstash"
        zk_connect => "192.168.1.101:2181,192.168.1.102:2181,192.168.1.103:2181/kafkalog"
    }
}

output {
    elasticsearch {
        hosts => [ "192.168.1.201:9200","192.168.1.202:9200","192.168.1.203:9200"]
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
        index_type => "%{type}"
        workers => 24
        template_overwrite => true
    }
}

Here is the top for logstash process:

45520 admin     20   0 5145m 770m  13m R 95.4  1.2   3919:16 <kafka
16493 admin     20   0 5145m 770m  13m S  9.9  1.2  10:00.87 <kafka
16494 admin     20   0 5145m 770m  13m R  8.0  1.2  10:22.48 <kafka
45611 admin     20   0 5145m 770m  13m S  6.0  1.2 219:50.52 <kafka
45536 admin     20   0 5145m 770m  13m R  4.0  1.2 295:32.40 >output
45548 admin     20   0 5145m 770m  13m S  4.0  1.2 105:37.91 >elasticsearch.
45608 admin     20   0 5145m 770m  13m S  4.0  1.2  47:13.84 >elasticsearch.
45635 admin     20   0 5145m 770m  13m S  4.0  1.2  47:29.15 >elasticsearch.
45638 admin     20   0 5145m 770m  13m S  4.0  1.2  47:09.60 >elasticsearch.
45363 admin     20   0 5145m 770m  13m S  2.0  1.2  88:07.72 java
45364 admin     20   0 5145m 770m  13m S  2.0  1.2  88:04.00 java
45365 admin     20   0 5145m 770m  13m S  2.0  1.2  88:13.39 java
45366 admin     20   0 5145m 770m  13m S  2.0  1.2  88:14.11 java
45367 admin     20   0 5145m 770m  13m S  2.0  1.2  88:12.28 java
45368 admin     20   0 5145m 770m  13m S  2.0  1.2  88:05.51 java
45372 admin     20   0 5145m 770m  13m S  2.0  1.2  39:54.07 java
45373 admin     20   0 5145m 770m  13m S  2.0  1.2  41:29.71 java
45543 admin     20   0 5145m 770m  13m S  2.0  1.2 107:13.18 >elasticsearch.
45544 admin     20   0 5145m 770m  13m S  2.0  1.2 106:05.68 >elasticsearch.
45545 admin     20   0 5145m 770m  13m S  2.0  1.2 105:52.78 >elasticsearch.
45546 admin     20   0 5145m 770m  13m S  2.0  1.2 105:43.77 >elasticsearch.
45549 admin     20   0 5145m 770m  13m S  2.0  1.2 106:16.93 >elasticsearch.
45550 admin     20   0 5145m 770m  13m S  2.0  1.2 105:44.84 >elasticsearch.
45552 admin     20   0 5145m 770m  13m S  2.0  1.2 106:15.47 >elasticsearch.
45554 admin     20   0 5145m 770m  13m S  2.0  1.2 106:38.47 >elasticsearch.
45555 admin     20   0 5145m 770m  13m S  2.0  1.2 106:31.77 >elasticsearch.
45557 admin     20   0 5145m 770m  13m S  2.0  1.2 105:34.25 >elasticsearch.
45558 admin     20   0 5145m 770m  13m R  2.0  1.2 105:55.13 >elasticsearch.
45561 admin     20   0 5145m 770m  13m S  2.0  1.2 106:28.72 >elasticsearch.
45562 admin     20   0 5145m 770m  13m S  2.0  1.2 106:50.73 >elasticsearch.

Here is the jstack of thread-id 45520:

"<kafka" daemon prio=10 tid=0x00007ff37435b800 nid=0xb1d0 runnable [0x00007ff37bbe6000]
   java.lang.Thread.State: RUNNABLE
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
        at rubyjit.LogStash::Inputs::Base$$stop?_5ecc17de0faba55421c72ac5c66b2d232a0c2171273061103.__file__(/home/admin/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.0.0-java/lib/logstash/inputs/base.rb:89)
        at rubyjit.LogStash::Inputs::Base$$stop?_5ecc17de0faba55421c72ac5c66b2d232a0c2171273061103.__file__(/home/admin/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.0.0-java/lib/logstash/inputs/base.rb)
        at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:141)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
        at org.jruby.ast.FCallNoArgNode.interpret(FCallNoArgNode.java:31)
        at org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60)
        at org.jruby.ast.WhileNode.interpret(WhileNode.java:127)
        at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
        at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
        at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
        at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
        at org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
        at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
        at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
        at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
        at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
        at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
        at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
        at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
        at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
        at org.jruby.ast.RescueNode.executeBody(RescueNode.java:221)
        at org.jruby.ast.RescueNode.interpret(RescueNode.java:116)
        at org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
        at org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
        at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
        at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
        at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
        at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
        at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:203)
        at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
        at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
        at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
        at org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
        at org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
        at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
        at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
        at org.jruby.runtime.Block.call(Block.java:101)
        at org.jruby.RubyProc.call(RubyProc.java:290)
        at org.jruby.RubyProc.call(RubyProc.java:228)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)

Decorating events assumes message field is a string

When decorate_events is set to true, msg_size is set by calling bytesize on the event's message field. This assumes that this field is always a string, and fails if it is a hash instead, as in my particular use-case.

Kafka logstash input do not continue from where it left off on restart

This is highly undesirable, the reason we are publishing data to kafka is to ensure the consumers can be taken down and can come back up in an asynchronous fashion. Is this a limitation of this plugin? or simply a configuration issue.

Also where does the kafka input save the kafka position per partition ?

I am using the latest version of the plugin with the following kafka config

input {
    kafka {
        zk_connect => "zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181"
        topic_id => "logs"
        consumer_threads => 1
        consumer_restart_on_error => true
        consumer_restart_sleep_ms => 100
        decorate_events => true
    type => "logs"
    }
}

Java Event can't convert ArrayJavaProxy when decorate_events is true and key exists.

Line 182 tries to set event["kafka"] to a ruby hash.
Unfortunately the key is an array of bytes. JRuby wraps this in an ArrayJavaProxy.
Java Event does not like this.

Fix: stringify the message_and_metadata.key in the same way as the message_and_metadata.message

Logstash-input-kafka does not build properly

Hi,
if you bundle to have all the dependencies installed from a fresh repo, you get actually the next error:

Installing sinatra 1.4.5
Installing stud 0.0.19
Installing polyglot 0.3.5
Installing treetop 1.4.15
Using logstash 1.5.0.beta1 from git://github.com/elasticsearch/logstash.git (at 1.5)
Installing logstash-codec-json 0.1.5
Installing logstash-codec-plain 0.1.4
[ERROR] Failed to execute goal on project logstash-input-kafka: Could not resolve dependencies for project rubygems:logstash-input-kafka:gem:0.1.8: Failed to collect dependencies at rubygems:logstash:gem:[1.4.0,2.0.0): No versions available for rubygems:logstash:gem:[1.4.0,2.0.0) within specified range -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

it looks like there is an issue with the dependency management, might be the jar-dependencies as it got a new minor version (that might have broken something here).

more details (this is using an old Gemfile.lock for this repo):

Resolving dependencies...
Bundler could not find compatible versions for gem "jar-dependencies":
  In snapshot (Gemfile.lock):
    jar-dependencies (0.1.2)

  In Gemfile:
    logstash (>= 0) java depends on
      jar-dependencies (= 0.1.7) java

Print an error when kafka client threw exception multiple times

We are running Logstash in quiet mode because Logstash can be really verbose.
In quiet mode, if the Kafka client can't connect, nothing is printed out.

It would be great to add an error message if the Kafka client threw an exception multiple times in a row (configurable ?). That way the quiet mode will hide intermittent failures but will print an error when there's possibly an unrecoverable issue.

Last consumer overwriting the previous one. Does logstash kafka plugin support multiple instances of logstash?

Hi,

When I start 3 instances of logstash all with the same kafka input (same group, different consumer id), it seems that the last logstash instance overwrites the previous consumer in zookeeper.

zookeeper output below -
ls /consumers/logstash/ids
[logstash_logstash-1448849297652-a7ccb59a]

then when I start another logstash instance the consumer is overwritten, rather than added to the list.

ls /consumers/logstash/ids
[logstash_logstash-1448847917934-adf01861]

I'm sure the logstash-kafka input supports multiple loadbalanced logstash servers. maybe I'm doing something wrong.

Here is my config .. you can see that I have removed the consumerid and group id to use the defaults to get the base scenario working.

kafka {
zk_connect => '{{ KAFKA_ZOOKEEPER_CONNECT }}'
topic_id => 'test'
reset_beginning => true
consumer_threads => 1
}

Logstash throws an exception upon terminating

{:exception=>#<LogStash::ShutdownSignal: LogStash::ShutdownSignal>, :backtrace=>["org/jruby/ext/thread/SizedQueue.java:133:in push'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-0.1.14/lib/logstash/inputs/kafka.rb:172:inqueue_event'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-codec-binary/lib/logstash/codecs/binary.rb:26:in decode'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-0.1.14/lib/logstash/inputs/kafka.rb:163:inqueue_event'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-0.1.14/lib/logstash/inputs/kafka.rb:133:in run'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/pipeline.rb:177:ininputworker'", "/home/cisco/workspace/naas/naas-core/logging/third-party/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/pipeline.rb:171:in `start_input'"], :level=>:error}

Test failure: something about maven?

Jenkins says

It was installed into ./vendor/bundle
validating ../logstash-input-kafka/logstash-input-kafka-0.1.0.gem >= 0
valid logstash plugin. Continueing...
[ERROR] Failed to execute goal on project logstash-input-kafka: Could not resolve dependencies for project rubygems:logstash-input-kafka:gem:0.1.0: The following artifacts could not be resolved: javax.jms:jms:jar:1.1, com.sun.jdmk:jmxtools:jar:1.2.1, com.sun.jmx:jmxri:jar:1.2.1: Could not transfer artifact javax.jms:jms:jar:1.1 from/to java.net (https://maven-repository.dev.java.net/nonav/repository): No connector available to access repository java.net (https://maven-repository.dev.java.net/nonav/repository) of type legacy using the available factories WagonRepositoryConnectorFactory -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Errno::ENOENT: No such file or directory - /home/jenkins/workspace/logstash_input_kafka_commit/logstash/vendor/plugins/jruby/1.9/gems/logstash-input-kafka-0.1.0/deps.lst

Logstash with kafka at 100% CPU, kill -9

I am using Kafka plugin in logstash to read items and push to ES. After processing 3M messages, I can see the logstash process remains at 100% CPU continuously, although it is doing nothing.
kill did not kill the process. Had to do kill -3.

Here's my config:

    input {
            beats {
                    port => 5048
                            type => 'hit'
            }
            kafka {
                    zk_connect => "10.35.132.117:2181,10.187.147.57:2181,10.187.147.58:2181"
                            topic_id => "rsyslog_logstash"
                            reset_beginning => false
                            type => "kafka"
            }
    }

    filter {
                    drop {}
            }

            if [type] == "kafka" {
                    date {
                            match => [ "timestamp", "ISO8601" ]
                            locale => "en"
                    }
            }

            metrics {
                    meter => "events"
                            add_tag => "metric"
            }
    }
    output {
            if "metric" in [tags] {
                    stdout {
                            codec => line {
                                    format => "rate: %{[events][rate_1m]}"
                            }
                    }
            }

            if [fingerprint] {
                    elasticsearch {
                            hosts =>  ["10.35.76.37","10.35.132.143","10.35.132.142"]
                                    index => "hits-%{+YYYY.MM.dd}"
                                    template => "./conf/hits.json"
                                    template_name => "hits"
                                    document_type => "hit"
                                    template_overwrite => true
                                    manage_template => true
                                    flush_size => "1000"
                    }
            } else {
            }
    }

reset_beginning is unexpectedly destructive

The documentation on reset_beginning is misleading. reset_beginning deletes the consumer group zk entry no matter what which can lead to unpredictable behavior with multiple consumers in a group.

Support SSL

Request for SSL support for Kafka. Now supported in version 0.9.

https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka

Support Kafka 0.10 client

Kafka 0.10 exposes a new max.poll.records setting that can help with excessive rebalancing behaviors in Kafka.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records

This is an enhancement request for the Kafka plugin to support Kafka 0.10 client not just to get the latest fixes but also to expose this new option via the Kafka input plugin.

Document format of arrays such as topics in kafka input 5.0

From here I see that the topics are now in an array format.

Does that mean it looks like this?

topics => [ {blah => topic1}, {blah => topic2} ]
Or this?
topics => [ {topic1}, {topic2} ]

Some examples would be helpful
What is the key for the arrays?

kafka init deadlock issue

I wanted to open this issue on the input side as this deadlock can theoretically affect the input or the output plugin:

logstash-plugins/logstash-output-kafka#8

I'm looking for guidance in the aforementioned issue.

auto_commit_interval_ms default should be the same as Kafka new consumer configuration default

As part of the validation work done in #88

auto_commit_interval_ms -> 5000 (Note:  For this one, Kafka's new default is 5000, but ours is 10)

The auto_commit_interval_ms default for our Kafka input plugin is set to 10, but Kafka's new consumer default for this same parameter is 5000.

Documentation on default Kafka consumer configuration values

The 0.9 input plugin has a lot of configuration parameters that are essentially mapped to the Kafka consumer counterparts. But the majority of them do not have a default value set - but there are actually defaults for these in the Kafka consumer as noted here. So the table of parameters with no default values (when you click on them, most of them say There is no default value for this setting.) can be confusing to users.

https://www.elastic.co/guide/en/logstash/5.0/plugins-inputs-kafka.html#_synopsis_21

Proposed change:

Instead of "There is no default value for this setting.", maybe we can just link to http://kafka.apache.org/documentation.html#newconsumerconfigs and ask them to check the Kafka documentation for the default.

Or actually put the Kafka default values in place.

auto_offset_reset -> latest
check_crcs -> true
auto_commit_interval_ms -> 5000 (Note:  For this one, Kafka's new default is 5000, but ours is 10)
connections_max_idle_ms -> 540000
fetch_max_wait_ms -> 500
fetch_min_bytes -> 1
heartbeat_interval_ms -> 3000
max_partition_fetch_bytes -> 1048576
partition_assignment_strategy -> org.apache.kafka.clients.consumer.RangeAssignor
receive_buffer_bytes -> 32768
reconnect_backoff_ms -> 50
request_timeout_ms -> 30000
retry_backoff_ms -> 100

The better solution is probably to simply cross reference Kafka's documentation in case they change their defaults at a later time.

Also on the page (https://www.elastic.co/guide/en/logstash/5.0/plugins-inputs-kafka.html#_synopsis_21), we are linking to the beginning of the consumer config section with the 0.80 settings.

Kafka consumer configuration: http://kafka.apache.org/documentation.html#consumerconfigs

Maybe we can change it to

Kafka consumer configuration (see New Consumer Configs section for 0.9 settings): http://kafka.apache.org/documentation.html#consumerconfigs

Continuous exception when kafka servers are restarted

When I stopped and started all kafka & zookeeper servers, I started receiving continuous exception from logstash.

[2016-01-04T08:50:07.425] WARN: kafka.consumer.ConsumerFetcherManager$LeaderFinderThread: [logstash_rdl04910app06-1451895739775-36bdd0f3-leader-finder-thread], Failed to find leader for Set([rsyslog_logstash,0], [rsyslog_logstash,2], [rsyslog_logstash,1])
kafka.common.KafkaException: fetching topic metadata for topics [Set(rsyslog_logstash)] from broker [ArrayBuffer()] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
log4j, [2016-01-04T08:50:07.628] WARN: kafka.consumer.ConsumerFetcherManager$LeaderFinderThread: [logstash_rdl04910app06-1451895739775-36bdd0f3-leader-finder-thread], Failed to find leader for Set([rsyslog_logstash,0], [rsyslog_logstash,2], [rsyslog_logstash,1])
kafka.common.KafkaException: fetching topic metadata for topics [Set(rsyslog_logstash)] from broker [ArrayBuffer()] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
log4j, [2016-01-04T08:50:07.832] WARN: kafka.consumer.ConsumerFetcherManager$LeaderFinderThread: [logstash_rdl04910app06-1451895739775-36bdd0f3-leader-finder-thread], Failed to find leader for Set([rsyslog_logstash,0], [rsyslog_logstash,2], [rsyslog_logstash,1])
kafka.common.KafkaException: fetching topic metadata for topics [Set(rsyslog_logstash)] from broker [ArrayBuffer()] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
log4j, [2016-01-04T08:50:08.035] WARN: kafka.consumer.ConsumerFetcherManager$LeaderFinderThread: [logstash_rdl04910app06-1451895739775-36bdd0f3-leader-finder-thread], Failed to find leader for Set([rsyslog_logstash,0], [rsyslog_logstash,2], [rsyslog_logstash,1])
kafka.common.KafkaException: fetching topic metadata for topics [Set(rsyslog_logstash)] from broker [ArrayBuffer()] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

I had to kill the instance.

On the zookeeper log as well, I was seeing continuous exceptions:

    [2016-01-04 08:52:11,652] WARN Ignoring exception (org.apache.zookeeper.server.NIOServerCnxnFactory)
    java.io.IOException: Too many open files
            at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
            at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
            at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:188)
            at java.lang.Thread.run(Thread.java:745)
    [2016-01-04 08:52:11,653] WARN Ignoring exception (org.apache.zookeeper.server.NIOServerCnxnFactory)
    java.io.IOException: Too many open files
            at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
            at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
            at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:188)
            at java.lang.Thread.run(Thread.java:745)
    [2016-01-04 08:52:11,653] WARN Ignoring exception (org.apache.zookeeper.server.NIOServerCnxnFactory)
    java.io.IOException: Too many open files
            at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
            at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
            at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:188)
            at java.lang.Thread.run(Thread.java:745)
    [2016-01-04 08:52:11,653] WARN Ignoring exception (org.apache.zookeeper.server.NIOServerCnxnFactory)
    java.io.IOException: Too many open files
            at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
            at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
            at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:188)
            at java.lang.Thread.run(Thread.java:745)

Commit offsets to kafka rather than zookeeper

Hi,

Is it possible to change the offset commit policy to commit to kafka rather than zookeeper?
https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

If not, is this feature being considered?

Thanks

Setting session_timeout_ms causes plugin to fail with unrecoverable error

Running LS 2.3.2 with kafka input plugin 3.0.2.

When attempting to set a session_timeout_ms to the input configuration:

input{
kafka{
topics => ["test"]
session_timeout_ms => "60000"
}
}

Logstash fails with the error:

Starting pipeline {:id=>"main", :pipeline_workers=>4, :batch_size=>125, :batch_delay=>5, :max_inflight=>500, :level=>:info, :file=>"logstash/pipeline.rb", :line=>"188", :method=>"start_workers"}
Pipeline main started {:file=>"logstash/agent.rb", :line=>"465", :method=>"start_pipeline"}
Unable to create Kafka consumer from given configuration {:kafka_error_message=>org.apache.kafka.common.KafkaException: Failed to construct kafka consumer, :level=>:error, :file=>"logstash/inputs/kafka.rb", :line=>"213", :method=>"create_consumer"}
A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::Kafka topics=>["test"], session_timeout_ms=>"60000", codec=><LogStash::Codecs::Plain charset=>"UTF-8">, auto_commit_interval_ms=>"10", bootstrap_servers=>"localhost:9092", client_id=>"logstash", consumer_threads=>1, enable_auto_commit=>"true", group_id=>"logstash", key_deserializer_class=>"org.apache.kafka.common.serialization.StringDeserializer", value_deserializer_class=>"org.apache.kafka.common.serialization.StringDeserializer", poll_timeout_ms=>100, ssl=>false>
  Error: uncaught throw Failed to construct kafka consumer in thread 0x2ffc
  Exception: ArgumentError
  Stack: org/jruby/RubyKernel.java:1283:in `throw'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:214:in `create_consumer'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:142:in `run'
org/jruby/RubyFixnum.java:275:in `times'
org/jruby/RubyEnumerator.java:274:in `each'
org/jruby/RubyEnumerable.java:757:in `map'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:142:in `run'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:342:in `inputworker'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:336:in `start_input' {:level=>:error, :file=>"logstash/pipeline.rb", :line=>"353", :method=>"inputworker"}
Unable to create Kafka consumer from given configuration {:kafka_error_message=>org.apache.kafka.common.KafkaException: Failed to construct kafka consumer, :level=>:error, :file=>"logstash/inputs/kafka.rb", :line=>"213", :method=>"create_consumer"}
A plugin had an unrecoverable error. Will restart this plugin.

Also tried setting it to a number instead of a string (just in case the documentation is incorrect), doesn't work either:

Unable to create Kafka consumer from given configuration {:kafka_error_message=>org.apache.kafka.common.config.ConfigException: Invalid value 60000 for configuration session.timeout.ms: Expected value to be an number., :level=>:error, :file=>"logstash/inputs/kafka.rb", :line=>"213", :method=>"create_consumer"}
A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::Kafka topics=>["test"], session_timeout_ms=>60000, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, auto_commit_interval_ms=>"10", bootstrap_servers=>"localhost:9092", client_id=>"logstash", consumer_threads=>1, enable_auto_commit=>"true", group_id=>"logstash", key_deserializer_class=>"org.apache.kafka.common.serialization.StringDeserializer", value_deserializer_class=>"org.apache.kafka.common.serialization.StringDeserializer", poll_timeout_ms=>100, ssl=>false>
  Error: uncaught throw Invalid value 60000 for configuration session.timeout.ms: Expected value to be an number. in thread 0x2ffc
  Exception: ArgumentError
  Stack: org/jruby/RubyKernel.java:1283:in `throw'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:214:in `create_consumer'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:142:in `run'
org/jruby/RubyFixnum.java:275:in `times'
org/jruby/RubyEnumerator.java:274:in `each'
org/jruby/RubyEnumerable.java:757:in `map'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-3.0.2/lib/logstash/inputs/kafka.rb:142:in `run'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:342:in `inputworker'
/Users/User_Name/ELK/ElasticStack_2_0/logstash-2.3.2/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:336:in `start_input' {:level=>:error, :file=>"logstash/pipeline.rb", :line=>"353", :method=>"inputworker"}
Unable to create Kafka consumer from given configuration {:kafka_error_message=>org.apache.kafka.common.config.ConfigException: Invalid value 60000 for configuration session.timeout.ms: Expected value to be an number., :level=>:error, :file=>"logstash/inputs/kafka.rb", :line=>"213", :method=>"create_consumer"}

Update to latest 0.8.2.1 Kafka

Currently Kafka 0.8.2.0 RC1 is out -- we should test and upgrade to 0.8.2.0 when it is GA

https://mail-archives.apache.org/mod_mbox/kafka-dev/201501.mbox/%3CCAFc58G_-6N2HWRr4Ddu-TeX%3Dmx3QgJziZXfxzMbdTab6Tv3xzA%40mail.gmail.com%3E

Update: We are at version 0.8.2.1

0.9 consumer doesn't add tags or fields

My conf is

  kafka {
    topics => ["eventlog"]
    bootstrap_servers => "localhost:9092"
    tags => ["kafka"]
    add_field => {"foo" => "bar"}
    codec => json {}
  }

But the tag and field are missing from resulting event.
I'm using beta 4

rake install_jars task fails

Trying to install jars using the rake task and seems like this method was removed in the latest jar-dependency version.

suyog@machine:~/ws/elastic/ls_plugins/logstash-input-kafka (plugin-api-v1)$ rake install_jars
rake aborted!
NoMethodError: undefined method `vendor_jars!' for #<Jars::Installer:0x332820f4>
/Users/suyog/ws/elastic/ls_plugins/logstash-input-kafka/Rakefile:14:in `(root)'
Tasks: TOP => install_jars
(See full trace by running task with --trace)

Provide Support for reading lz4 compressed Topics

Currently, when reading from an lz4 compressed topic from Kafka 0.8.2 results in the following error:

Kafka::Consumer caught exception: Java::KafkaCommon::UnknownCodecException
3 is an unknown compression codec

It appears that there is only support for reading from gzip and snappy compression for input, though lz4 seems supported in the output plugin.

Does it support logstash1.4.2?

Does it support logstash1.4.2?
Or only support 1.5?

Consumer group not reading from all partitions?

Hi there,

I have a setup with a kafka topic that has 6 partitions and 2 different logstash servers forming a consumer group of that topic and basically just outputting it to ElasticSearch. I configured the input currently with 3 workers on each server and a unique consumer group name.

The setup works fine for a few days. After that, however, it apparently is not consuming from all partitions anymore. I can see gaps in ElasticSearch that are pretty consistent with missing messages from certain partitions. I'm not a 100% sure if that is the issue, however, as the producer to this topic is not producing with a round-robin partitioner (see logstash-plugins/logstash-output-kafka#17), it is hard to tell.

Please let me know what information you need for debugging this issue. I can provide complete logstash configs if that would be helpful.

Additional Kafka Message Metadata?

I'm attempting to use Flume 1.6 and a spooldir source to stream a directory of logs through a broker and into logstash kafka input. One of the header values passed from the Flume spooldir source is the file path and name of the file picked up. I'm expecting this to be exposed somehow upon reception in logstash so that I can extract it and make parsing and routing decisions based on the logfile type.

Are the metadata options exposed with decorate turned on all that's available? How can I see the kafka message and its content at a deeper level in logstash? I don't see it with debug on.

Looking a the plugin code leads me to think I could add things here, but not sure if there are other options available. I added one that showed me the cryptic kafka message, but that doesn't help until it's decoded.

event['kafka'] = {'msg_size' => message_and_metadata.message.size,
'topic' => message_and_metadata.topic,
'consumer_group' => @group_id,
'partition' => message_and_metadata.partition,
'message' => message_and_metadata.message, <-- I added this
'key' => message_and_metadata.key}
end

Thanks for pointers. I'd think this would be a common use case for folks using Flume 1.6 to ship logs to logstash using kafka.

Doug

rspec tests need to be rewriten using the logstash parts

Improve performance for Kafka consumer

Some users have reported that 1.5.x version of this plugin being slower than 1.4.2 which is very weird. We'd like to get to the bottom of this. Starting with an audit of existing code for obvious issues, and then attaching a profiler would help find hotspots.

See ref: elastic/logstash#3477 (comment)

Add option to define custom deserializer class in input

We should add an option to directly load a custom deserializer class in the input (the deserializer should be in the classpath). You can use Logstash codec to workaround this (see logstash-codec-avro for example), but it would be more convenient to expose this functionality directly to the end user.

config :deserializer_class, :validate => :string, :default => 'kafka.serializer.StringEncoder'

Logstash initialization warning

When I start logstash, I get the following warning:

     [2016-01-04T08:22:54.930]  WARN: kafka.client.ClientUtils$: Fetching topic metadata with correlation id 144 for topics [Set(rsyslog_logstash)] from broker [id:3,host:rdl04910app05,port:9092] failed
    java.nio.channels.ClosedChannelException
            at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
            at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
            at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
            at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
            at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
            at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
            at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
            at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
    log4j, [2016-01-04T08:22:55.184]  WARN: kafka.client.ClientUtils$: Fetching topic metadata with correlation id 145 for topics [Set(rsyslog_logstash)] from broker [id:3,host:rdl04910app05,port:9092] failed
    java.nio.channels.ClosedChannelException
            at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
            at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
            at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
            at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
            at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
            at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
            at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
            at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

There's no message in the topic. Logstash isn't really doing anything.

decorate_events results in nil map using symbols

decorate_events is using symbols to populate the event['kafka'] map instead of strings which results in the kafka values being nil instead of being populated.

{:timestamp=>"2014-12-16T20:38:49.997000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>81, :exception=>#<NoMethodError: undefined method `[]' for nil:NilClass>, :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.1.9-java/lib/logstash/outputs/elasticsearch.rb:446:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.1.9-java/lib/logstash/outputs/elasticsearch.rb:444:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.1.9-java/lib/logstash/outputs/elasticsearch.rb:442:in `flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.18/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1341:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.18/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.18/lib/stud/buffer.rb:112:in `buffer_initialize'", "org/jruby/RubyKernel.java:1501:in `loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.18/lib/stud/buffer.rb:110:in `buffer_initialize'"], :level=>:warn}

Support Kafka 0.9 consumer API for performance and security features

It looks like we will be reworking the logstash-input-kafka to support upcoming Kafka 0.9 release that has performance improvements for the consumer. This ticket is created to track our progress.