blizzard / node-rdkafka Goto Github PK
View Code? Open in Web Editor NEWNode.js bindings for librdkafka
License: MIT License
Node.js bindings for librdkafka
License: MIT License
This might be a feature request, and it might not be possible.
My use case needs 'simple consumer' like behavior. I don't want Kafka to manage consumer group rebalancing, and I won't be doing any offset commits. I do want consumers to be able to provide topic, partition, and offset from which they will begin consuming, so that clients can manage their own offsets. When the protocol eventually supports consuming based on event timestamps, I'd like to support that as well.
Is this possible? I see KafkaConsumer.prototype.assign
is a function, but I don't see anything using it, and my attempts to do so leave me in Erroneous state
from librdkafka. I think this is either because the offsets aren't actually passed to librdkafka's assign, or because of some internal subscribed
state that is not being updated.
Is doing something like consumer.assign([ { topic: 'test', partition: 0, offset: 1102583 } ])
possible?
Also, if can I disable auto consumer group rebalancing? I'm looking for a way to override the rebalance_cb, but I don't see that.
I am trying to have my consumer start from the smallest/earliest offset when there is none yet stored, using the option "auto.offset.reset": "smallest"
, as according to Configuration.md, but am getting the errorNo such configuration property: "auto.offset.reset"
. Other configuration options do work, e.g. "auto.commit.enable": false
.
I would like to perform each message commit to Kafka manually, I'm using the commit function like this :
consumer.commit({ topic: metadata.topic, partition: metadata.partition, offset: metadata.offset }, function (err, data) { ... }
Could you tell me if this is a good way of using it ?
The problem is that even if the callback is triggered without errors, if I restart the consumer, it consumes the last committed message again (restart is made by calling disconnect and then reconnect or simply calling process.exit() function).
That's weird, but I get a native crash simply by running npm test
in the repository when using node.js 6.0.0. It's not completely clear where is it coming from, but here's a relevant part of the crash dump:
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Application Specific Information:
abort() called
*** error for object 0x7f9602e014f0: pointer being freed was not allocated
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff8f610f06 __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff9cd414ec pthread_kill + 90
2 libsystem_c.dylib 0x00007fff9e9cb6e7 abort + 129
3 libsystem_malloc.dylib 0x00007fff8aef5041 free + 425
4 node-librdkafka.node 0x0000000109855578 RdKafka::ConfImpl::~ConfImpl() + 14 (rdkafkacpp_int.h:228)
5 node-librdkafka.node 0x0000000109849ffb NodeKafka::Connection::~Connection() + 67 (connection.cc:67)
6 node-librdkafka.node 0x000000010984fa36 NodeKafka::Producer::~Producer() + 14 (producer.cc:40)
7 node 0x0000000107ddde43 v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) + 163
8 node 0x0000000107dde171 v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 49
9 node 0x0000000107deb35f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 1871
10 node 0x0000000107dea79d v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) + 717
11 node 0x0000000107dae6bb v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) + 107
12 node 0x0000000107ff31bb v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) + 139
Seems like the 'RdKafka::Conf' object deletion crashes internally - perhaps some of the properties were garbage-collected by v8?
Installing on windows does not work, and there is no way to make it work currently.
This is a low priority feature, but keeping it here so it can be tracked properly.
It seems some interfaces use topic
when refering to a string topic name, and others use topic_name
. E.g. consumer assign()
expects the assignment object to use topic
and assignments()
returns an object with topic
, but message objects from consume()
calls use topic_name
. I understand that sometimes a topic can be a Topic
instance, and sometimes just a string topic name, but it'd be nice if the usage of the two was consistent.
I'm reusing the message metadata for future assign()
calls, and I'd like to not have to translate between topic and topic_name.
I see that Consumer::Unsubscribe
returns RdKafka::ERR__STATE
if it is called on a not-yet-subscribed consumer. However, there doesn't seem to be a way to infer subscription state via the node client. Should I manage this state myself, or can we add a consumer.subscriptions()
method to get the current subscriptions? And/or, perhaps Consumer::Unsubscribe
should just be a no-op if the consumer has no current subscriptions.
While looking into this, I noticed that consumer.subscribe()
takes a cb, but as far as I can tell it is not invoked. Should it be?
Calling getMetadata(), we are getting an invalid list for the ISRs:
{"orig_broker_id":0,"orig_broker_name":"sasl_ssl://kafka01-prod01.messagehub.services.us-south.bluemix.net:9093/0","topics":[{"name":"mh-nodejs-console-sample-topic","partitions":[{"id":0,"leader":0,"replicas":[0,1,4],"isrs":[null,null,null,1]}]}],"brokers":[{"id":2,"host":"kafka03-prod01.messagehub.services.us-south.bluemix.net","port":9093},{"id":4,"host":"kafka05-prod01.messagehub.services.us-south.bluemix.net","port":9093},{"id":1,"host":"kafka02-prod01.messagehub.services.us-south.bluemix.net","port":9093},{"id":3,"host":"kafka04-prod01.messagehub.services.us-south.bluemix.net","port":9093},{"id":0,"host":"kafka01-prod01.messagehub.services.us-south.bluemix.net","port":9093}]}
At the time, all the replicas were in-sync according to Kafka
Even when creating a producer Topic object with code like this
// Create a topic object for the Producer to allow passing topic settings
var topicOpts = { 'request.required.acks': -1 };
var topic = producer.Topic(topicName, topicOpts);
console.log('Topic object created with opts ' + JSON.stringify(topicOpts));
producer.once('delivery-report', deliveryReportListener);
var value = new Buffer('This is a test message #' + counter);
var key = 'key';
counter++;
var partition = -1;
producer.produce(topic, partition, value, key);
// Keep calling poll() to get delivery reports
the produce request over the wire is sending acks=1 (the default).
We also tried setting the property value as a string '-1'
or 'all'
. No joy.
Note that the property set in code is validated, i.e. an Error is thrown if the name or value are unknown. However valid values 0 and -1 are not honored.
We could verify this enabling tracing on the broker's kafkaApi
sample log4j conf line:
log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
sample broker log:
[2016-11-29 17:53:10,103] TRACE [KafkaApi-0] Handling request:{api_key=0,api_version=1,correlation_id=114,client_id=mh-node-console-sample-producer} -- {acks=1,timeout=5000,topic_data=[{topic=mh-nodejs-console-sample-topic,data=[{partition=0,record_set=java.nio.HeapByteBuffer[pos=0 lim=56 cap=56]}]}]} from connection 192.168.1.67:9092-192.168.1.67:52624;securityProtocol:PLAINTEXT,principal:User:ANONYMOUS (kafka.server.KafkaApis)
Message Structure
Messages that are returned by the KafkaConsumer have the following structure.
{
message: new Buffer('hi'), // message contents as a Buffer <--- should be 'value' not 'message'
size: 2, // size of the message, in bytes
topic: 'librdtesting-01', // topic the message comes from
offset: 1337, // offset the message was read from
partition: 1, // partition the message was on
key: 'someKey' // key of the message if present
}
This is just FYI since I don't think it's a bug in librdkafka
, not here. When I create many processes with consumers that belong to the same consumer group, sometimes I get librdkafka
assert on partition assignment: confluentinc/librdkafka#761 For now I've solved the problem by commenting out the code that sets up the rebalance_cb
as it seem to fix that and use my own fork, but it would be nice to understand it better and fix. No particular steps to reproduce as well as no small test case unfortunately.
When sending a message with an empty key (""), it is received as an undefined key by the Javascript consumer and a null key by the Java consumer. It should be received as an empty string.
Also currently, the producer doesn't allow sending messages with a null payload (can be done in Java). Similarly the consumer sees messages with a null payload (sent from Java) as empty.
if conf.rebalance_cb
is set to true
the consumer does not emit the rebalance event - which could be used to mimick seek
by using assign
as suggested by @edenhill
BTW with conf.rebalance_cb = true, the consumer calls an undefined
unassign` JS function
kafka-consumer.js
line 54
self.unassign(e.assignment);
The array of topicPartitions in the commited(...)
callback is always of zero length
this happens both after a sync or asynch commit, e.g.
consumer.commit(message, function(err) {
t.ifError(err);
consumer.committed(10000, function(err, tp){
t.ifError(err);
t.equal(1, tp.length); ///booom
test is commented out in PR #66
Hi we found a really mind boggling problem
Using the following sample program, the key is sent with a length of 0 on Ubuntu 14.04.5
(just retested on a clean VM!)
Note that the code works on both Mac OS (10.11) and Ubuntu 16.04
var Kafka = require('../');
var producer = new Kafka.Producer({
//'debug' : 'all',
'metadata.broker.list': ...,
'dr_cb': true
});
console.log('producer created');
//Connect to the broker manually
producer.connect();
var counter = 0;
producer.on('delivery-report', function(report) {
// Report of delivery statistics here:
console.log(report);
counter++;
if (counter === 10) producer.disconnect();
});
//Any errors we encounter, including connection errors
producer.on('error', function(err) {
console.error('Error from producer');
console.error(err);
});
//Wait for the ready event before proceeding
producer.on('ready', function() {
console.log('producer ready');
// Create a Topic object with any options our Producer
// should use when writing to that topic.
var topic = producer.Topic('testtopic', {
// Make the Kafka broker acknowledge our message (optional)
'request.required.acks': 1
});
console.log('topic created');
for (var i = 0; i < 10; i++) {
var message = new Buffer('message' +i);
var key = "k-"+i;
var partition = -1;
producer.produce(topic, partition, message, key);
}
for (var j = 0; j < 50; j++) {
setTimeout(function() {
producer.poll();
}, 1000);
}
});
we could see that in NAN_METHOD(Producer::NodeProduce)
in producer.cpp
if we add a debug after the key is created, around line 330:
// This will just go out of scope and we don't send it anywhere,
// since it is copied there is no need to delete it
key = &keyString;
}
//add debug printf:
printf("key->c_str=%s key->size=%d\n", key->c_str(), key->size());
Producer* producer = ObjectWrap::Unwrap<Producer>(info.This());
then we can see that
key->c_str()
will print the expected string ... but key->size()
is 0 !!
sample output:
ubuntu@ubuntu:~/node-rdkafka/mickael-examples$ node producer2.js
producer created
producer ready
topic created
key->c_str=k-0 key->size=0
key->c_str=k-1 key->size=0
key->c_str=k-2 key->size=0
key->c_str=k-3 key->size=0
key->c_str=k-4 key->size=0
key->c_str=k-5 key->size=0
key->c_str=k-6 key->size=0
key->c_str=k-7 key->size=0
key->c_str=k-8 key->size=0
key->c_str=k-9 key->size=0
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic', partition: 0, offset: 0, key: null }
{ topic_name: 'testtopic',
partition: 0,
offset: 171866,
key: null }
After having seen this on a real machine, I would not believe it if I didn't replicate in a VM myself.
Verified with node 6.9.1 and 6.8.0 and 4.6.1
I noticed that when commit is called while connection to kafka is lost, the function never returns or throws any error.
How should I recover from this state?
I could set a timeout and assume the connection is lost if the function does not return within that period and try to establish connection with a different kafka server. However, the disconnect function does not return either when connection is lost. How do I perform the necessary clean up without the disconnect function?
Thanks in advance
Hi,
While doing some fail tests, I've discovered that when I use this library and it can't connect a broker during application startup (i.e. the first connect - we supply multiple brokers via options), the Producer.produce()
method will work the first time and then, upon subsequent call to the same method, NodeJS thread will hang until error about all brokers being offline is emitted.
Is this something that's related to node-rdkafka, or could this actually be librdkafka behaving this way?
Thanks!
Here's a relevant part of the crash report:
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000000
VM Regions Near 0:
-->
__TEXT 00000001022f8000-00000001032f9000 [ 16.0M] r-x/rwx SM=COW /Users/USER/*
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libc++abi.dylib 0x00007fff9c7afb1e __dynamic_cast + 34
1 node-librdkafka.node 0x00000001044a1bea RdKafka::Topic::create(RdKafka::Handle*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, RdKafka::Conf*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) + 232 (TopicImpl.cpp:102)
2 node-librdkafka.node 0x000000010448fb0e NodeKafka::Topic::New(Nan::FunctionCallbackInfo<v8::Value> const&) + 1380 (topic.cc:33)
3 node-librdkafka.node 0x000000010448ff4d Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) + 131 (nan_callbacks_12_inl.h:175)
4 node 0x000000010245f8b5 v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) + 373
5 node 0x00000001024b901f v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) + 1375
6 ??? 0x000004463bb0961b 0 + 4699695650331
Unfortunately I couldn't create a small tests case to reliably reproduce, but the cause seem to be quite clear. When produce
and disconnect
calls are placed with very unlucky timing, we've got a race:
disconnect
called, ProducerDisconnect work is placed.produce
called - since we're updating the _isConnected
only after the disconnect happened, the produce call goes to maybeTopic
and then to native code and gets all the way till hereI think the easiest solution is to update JS _isConnected property before the actual disconnect happens - then all these racy code paths will be protected. However, maybe it's better to invest time in fixing those races on the native level, not sure which path do you wanna choose.
I Expected consumer.position() to return a non empty array, when connected and consuming.
However, even in a handlerconsumer.on('data', function()
when messages are being received, the array is always empty.
The addition of the opaque is great (performance/flexibility) but:
var value = new Buffer('...');
var key = 'key';
producer.produce('topic', null, value, key, {value: value, key: key});
It feels a lot more natural to automatically get the opaque populated in case the user has not passed in a custom value. And this does not cause any performance penalty as it reuses the opaque mechanism.
Although people generally consider to be small, in Kafka it is actually the same datatype as the payload and can be arbitrary big. So reusing the opaque to pass it along the payload make sense.
Since the driver uses Nan::AsyncQueueWorker
for background job scheduling, we end up using built-in libuv
thread pool. That means, that by default there's only 4 threads in the pool, and this number couldn't be increased more then to 128 threads.
Each consumer in the flowing mode submits a ConsumerConsumeLoop
job, which is blocking, so it occupies one background thread completely - this means that number of consumers per process is limited to 4 (3 actually, since we need to save at least 1 thread from the pool for other tasks)
Another possible way to use up all the threads in the pool is to call consume(cb)
in non-flowing mode many-many times on a topic without any messages coming - each call would create a ConsumerConsume
work that will occupy the thread from the pool until the message is there, so it will block all the other operations which could be going on (producing, metadata requests etc etc etc).
I'm wondering if you think this might be a problem? In our use-case we don't fork a worker per consumer group, but instead create all of the consumers in every worker process and rely on kafka rebalancing for assigning individual partitions to workers since it's easier and better for failover (every worker is replaceable but any other one)
Do you think this will hit you too at some point? I've created this issue mostly to get the understanding on your thoughts on this.
In case there's been an error in the delivery report, currently the plain Error
object is passed to the delivery-report
callback and there's no way to correlate it with the exact message that has been sent. The Error
object should has all the same properties as the normal delivery report contains.
As discussed in #62 (comment), if the testcase called should produce a message to a Topic object
is executed before should get 100% deliverability
, the latter testcase hangs.
If in the first testcase we change the topic object to be a string, then the second testcase doesn't hang.
We've also managed to reproduce this issue in a small testcase script (see enclosed). It's basically the same logic from the tests but inlined and without any mocha deps to be run with node directly. Note that if we enable DEBUG the script doesn't hang. Like in the e2e tests, changing the topic object to a string makes it pass.
I'm trying to produce keyed messages to Kafka but for some reason the key value I get back during delivery report events is null
produce({ message: data.content, key: data.key, topic: ... }, ... )
{"topic_name":"vpn","partition":0,"offset":1137763,"key":null}
Any ideas?
rdkafka has support for seeking to a specific offset using rd_kafka_seek(). It would be nice to expose this in node-rdkafka as well. I could not see this supported anywhere today.
My use case : I want to get an history of messages, for example get the N last logs in a particular topic.
So I need to know the last offset for use this.assign().
How can I do that cleanly ?
:)
waiting for librdkafka 0.92
e2e test excluded temporarily
Hi, we noticed that a message as received by a consumer has no key field.
Looking at the code, it seem not that complicated to add, we'll open a PR
The Consumer instance emits an error
event. As per the examples and documentation, this is an error originating from Kafka and we should handle it as we see fit.
I just wanted to clarify that the above statement is correct as in node, the error
event means that the runtime entered an unrecoverable state and the process should be terminated.
If that's the case, would you consider renaming the event to kafkaError
for clarity and disambiguation?
Hello guys,
I try to install node-rdkafka
$ npm i node-rdkafka
And get the following error
CC(target) Release/obj.target/librdkafka/deps/librdkafka/src/rdkafka_sasl.o
../deps/librdkafka/src/rdkafka_sasl.c:35:23: fatal error: sasl/sasl.h: Aucun fichier ou dossier de ce type
#include <sasl/sasl.h>
^
compilation terminated.
deps/librdkafka.target.mk:147: recipe for target 'Release/obj.target/librdkafka/deps/librdkafka/src/rdkafka_sasl.o' failed
make: *** [Release/obj.target/librdkafka/deps/librdkafka/src/rdkafka_sasl.o] Error 1
make: Leaving directory '/home/tristan/Documents/node_modules/node-rdkafka/build'
gyp ERR! build error
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:276:23)
gyp ERR! stack at emitTwo (events.js:106:13)
gyp ERR! stack at ChildProcess.emit (events.js:191:7)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
gyp ERR! System Linux 3.16.0-4-amd64
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/tristan/Documents/node_modules/node-rdkafka
gyp ERR! node -v v7.4.0
gyp ERR! node-gyp -v v3.4.0
gyp ERR! not ok
I use node 7.4.0 and npm 4.0.5
Hello,
I am developing a kafka-avro library that depends on your library and it adds a layer on top for Avro Schemas validation and serialization. The Producer.Topic
class will return a magic object which has no properties, I can see it is used with rdkafka and local bindings.
I'd like to kindly ask if it's possible that you expose the topic name as a string on the returned instance so I can use it on my end to apply Avro schema validations and serialization.
Because of that, I had to change the signature of the produce()
method to 5 arguments to also include the topic name as a string.
Please consider this a feature request, I suppose.
Although cited in https://raw.githubusercontent.com/edenhill/librdkafka/2213fb29f98a7a73f22da21ef85e0783f6fd67c4/CONFIGURATION.md, they throw an error as per below.
/node-rdkafka/lib/client.js:59
this._client = new SubClientType(globalConf, topicConf);
^
Error: No such configuration property: "sasl.username"
at Error (native)
at Producer.Client (/Users/eshao/wsp/tty1/workers/node_modules/node-rdkafka/lib/client.js:59:18)
at new Producer (/Users/eshao/wsp/tty1/workers/node_modules/node-rdkafka/lib/producer.js:71:10)
at Object.<anonymous> (/Users/eshao/wsp/tty1/workers/k2.js:9:16)
at Module._compile (module.js:541:32)
at Object.Module._extensions..js (module.js:550:10)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
at Module.runMain (module.js:575:10)
Am in the process of testing PR #42 as a fix for Issue #5, and ran into this:
const Kafka = require('node-rdkafka');
const producer = new Kafka.Producer({
'metadata.broker.list': 'localhost:9092',
});
producer.connect(undefined, () => {
producer.produce({
topic: 'test',
message: '{"f": 1}'
});
});
nodejs: ../node_modules/nan/nan_object_wrap.h:33: static T* Nan::ObjectWrap::Unwrap(v8::Local<v8::Object>) [with T = NodeKafka::Topic]: Assertion `object->InternalFieldCount() > 0' failed.
Aborted (core dumped)
node-rdkafka compiled at commit 62e53c22cafe93b9697fc9ff6fc0daa7c2670da2
.
I had made a minor change in #22 that i thought would fix the default partitioning behavior, but some subsequent tests seem to indicate that this fix hasn't worked.
Manually setting partition to -1 in the produce function works, but when not specifying a partition, all my messages are still being put into partition 0.
Note that i need to verify that i haven't done something boneheaded in my own code (like using the wrong branch..) and will review in more detail this evening, but wanted to get this in here in case anyone else sees this issue. Im not experienced in c++ but will see if i can at least add some unit tests for the behaviour.
Hi!
I was trying to explicit use commit(), like this:
var stream = consumer.getReadStream(config.kafka_topic, {
fetchSize: 10
});
stream.on('data', msgs => {
msgs.map(msg => {
console.log('commiting...')
consumer.commit(msg, err => {
console.log(' commited!', err)
})
})
})
Noticed that the 'commited' callback, takes quite some time to be called (2-3 seconds for each one of them)
Is it correct to send commits like this?
Is so, why would it be so slow?
Thanks!!
For certain event rates (for me it happens with about 10 events per second) some contention is happening and the consumer in the flowing mode starts to lag. The lag goes larger and larger and it seems unbound. I'm not sure if you'd be able to reproduce with the exact test case, if not please try playing with the number of events produced in each loop iteration, there's some sweet spot when the contention is happening.
My limited investigation shows that the amount of time ConsumerLoop worker is waiting after reaching the end of partition has critical impact on this issue.
Here's a test case:
const consumer = new kafka.KafkaConsumer({
'group.id': 'my_test_consumer_group5 ',
'metadata.broker.list': 'localhost:9092',
'enable.auto.commit': 'false',
'client.id': '1234567'
}, {
'auto.offset.reset': 'largest'
});
consumer.connect();
consumer
.on('ready', () => {
consumer.consume([ 'test_dc.mediawiki.revision-create' ], (e, kafkaMessage) => {
if (e) {
console.log(e);
}
console.log('Message', new Date(), new Date(kafkaMessage.value.toString()), new Date() - new Date(kafkaMessage.value.toString()));
});
});
Producer script:
#/bin/bash
while :
do
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
date | kafkacat -b localhost:9092 -t test_dc.mediawiki.revision-create -p 0
sleep 1
done
Example output:
Message Wed Dec 14 2016 15:51:03 GMT-0800 (PST) Wed Dec 14 2016 15:50:57 GMT-0800 (PST) 6564
Message Wed Dec 14 2016 15:51:04 GMT-0800 (PST) Wed Dec 14 2016 15:50:58 GMT-0800 (PST) 6568
Message Wed Dec 14 2016 15:51:04 GMT-0800 (PST) Wed Dec 14 2016 15:50:58 GMT-0800 (PST) 6568
Message Wed Dec 14 2016 15:51:04 GMT-0800 (PST) Wed Dec 14 2016 15:50:58 GMT-0800 (PST) 6568
Message Wed Dec 14 2016 15:51:05 GMT-0800 (PST) Wed Dec 14 2016 15:51:00 GMT-0800 (PST) 5573
Message Wed Dec 14 2016 15:51:06 GMT-0800 (PST) Wed Dec 14 2016 15:51:00 GMT-0800 (PST) 6576
Message Wed Dec 14 2016 15:51:06 GMT-0800 (PST) Wed Dec 14 2016 15:51:00 GMT-0800 (PST) 6576
Message Wed Dec 14 2016 15:51:07 GMT-0800 (PST) Wed Dec 14 2016 15:51:00 GMT-0800 (PST) 7580
Message Wed Dec 14 2016 15:51:07 GMT-0800 (PST) Wed Dec 14 2016 15:51:01 GMT-0800 (PST) 6581
Message Wed Dec 14 2016 15:51:08 GMT-0800 (PST) Wed Dec 14 2016 15:51:01 GMT-0800 (PST) 7581
Message Wed Dec 14 2016 15:51:08 GMT-0800 (PST) Wed Dec 14 2016 15:51:01 GMT-0800 (PST) 7581
Message Wed Dec 14 2016 15:51:09 GMT-0800 (PST) Wed Dec 14 2016 15:51:01 GMT-0800 (PST) 8585
Message Wed Dec 14 2016 15:51:09 GMT-0800 (PST) Wed Dec 14 2016 15:51:02 GMT-0800 (PST) 7585
Message Wed Dec 14 2016 15:51:09 GMT-0800 (PST) Wed Dec 14 2016 15:51:02 GMT-0800 (PST) 7585
Kafka version 0.9.0.1, node-rdkafka version 0.6.1
Hi!
Is it also possible to build a consumer that polls for data with a timeout?
From documentation and tests I can see the infinite loop ("flowing mode") and the non flowing mode (in which a "hard" consume is called in a certain interval).
"consumer.consume" does not offer a timeout parameter, so it is blocking, right?
Is there also a possibility to use your library with a poll-like-consume like "consumer.consume(timeout)" that might either return a message or timeout and return null in that case?
Thanks
Tino
The current delivery-report object contains key, offset, topic ... but this is not enough to figure out about which produced message they refer to.
Unless you wait for every single report after sending a single message.
Looking under the covers, node-rdkafka is using the full dr_cb_msg librdkafka callback, but does not put the message payload in the delivery report.
Investigated with @mimaison
I was wondering if someone could explain to me the synchronous nature of the produce function and why it is synchronous. Let's take an example. If I were to set 'request.required.acks' to -1 (i.e. write to all replicas before responding), does that mean that the produce function will be blocking until kafka returns with a success?
By looking at the code I see one other possibility. That is you are synchronously writing to a write stream, and then relying on the 'delivery-report' event to notify of actual success. If this is the case, then I can just provide a function in the 'dr_cb' parameter when creating the producer, correct? This callback will let me know the result from the kafka response.
Using a callback that is global to a Producer instance instead of a callback for a single request means that I cannot link a success message from a 'delivery-report' back to the specific call chain that initiated the request. Am I correct? If I am wrong here, an example would be great on how this can be achieved.
Can someone please explain to me when the this driver will give control back to the user? i.e., What does the synchronous call do? Thanks again for your help. I really appreciate someone taking the time to clear this up.
EDIT: One more question. The new producer uses 'acks' in place of 'request.required.acks'. Does the C++ module use the old producer (v8 or earlier) or is the documentation just out of date?
Hi,
I've been trying to resolve an issue when adding consumer scripts with the rebalance callback.
{"message":"Local: Erroneous state","code":-172,"errno":-172,"origin":"kafka"}
errorsThis is what I get when running in gdb.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fe7780 (LWP 22887)]
0x00007ffff490b884 in NodeKafka::Consumer::NodeGetAssignments (info=...) at ../src/consumer.cc:493
493 Nan::Newv8::String(part->topic()).ToLocalChecked());
(I'm not entirely sure this is the reason)
See os section of package.json
Following the latest update, producer.produce()
now only accepts topic
as a string. This prevents us from setting topic properties like request.required.acks
.
Looking at the code, it all boils down to the way produce()
evaluates if it's called from the old API or the new one.
I expect that after a producer/consumer is disconnected it should destroy all RdKafka objects and let the application exit, however it doesn't happen.
Code example:
const Kafka = require('node-rdkafka');
const producer = new Kafka.Producer({
'metadata.broker.list': 'localhost:9092'
});
producer.connect(undefined, () => {
producer.produce({
topic: 'test_dc.resource_change',
message: 'test'
}, () => {
producer.disconnect(() => {
console.log('Disconnected');
});
});
});
Versions: kafkaOS X El Capitan
Expected behaviour: application sends one message to the topic and shuts down.
Actual behaviour: application sends one message to the topic but continues running.
A bit of investigation: gdb
shows that we have 8 threads running after disconnect
was called:
Id Target Id Frame
* 1 Thread 0x1203 of process 55925 0x00007fff8f611eca in kevent () from /usr/lib/system/libsystem_kernel.dylib
2 Thread 0x1303 of process 55925 0x00007fff8f60afae in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
3 Thread 0x1403 of process 55925 0x00007fff8f60afae in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
4 Thread 0x1503 of process 55925 0x00007fff8f60afae in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
5 Thread 0x1603 of process 55925 0x00007fff8f60afae in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
6 Thread 0x1703 of process 55925 0x00007fff8f610db6 in __psynch_cvwait () from /usr/lib/system/libsystem_kernel.dylib
7 Thread 0x1803 of process 55925 0x00007fff8f61107a in select$DARWIN_EXTSN () from /usr/lib/system/libsystem_kernel.dylib
8 Thread 0x1903 of process 55925 0x00007fff8f61107a in select$DARWIN_EXTSN () from /usr/lib/system/libsystem_kernel.dylib
Threads 1-5 are normal, so whatever prevents shutdown is in threads 6-8, I suspect it's on thread 6, here's the backtrace:
#0 0x00007fff8f610db6 in __psynch_cvwait () from /usr/lib/system/libsystem_kernel.dylib
#1 0x00007fff9cd3f728 in _pthread_cond_wait () from /usr/lib/system/libsystem_pthread.dylib
#2 0x000000010079630b in uv_cond_wait ()
#3 0x000000010078a2ab in worker ()
#4 0x0000000100796000 in uv.thread_start ()
#5 0x00007fff9cd3e99d in _pthread_body () from /usr/lib/system/libsystem_pthread.dylib
#6 0x00007fff9cd3e91a in _pthread_start () from /usr/lib/system/libsystem_pthread.dylib
#7 0x00007fff9cd3c351 in thread_start () from /usr/lib/system/libsystem_pthread.dylib
#8 0x0000000000000000 in ?? ()
Any ideas where could the source of a problem be?
In kafka-consumer.js
consume([topics])
calls subscribe([topics])
therefore client code like this
consumer.assign([{ topic:topicName, partition:0, offset:2950}]);
consumer.consume([ topicName ]);
does not work, because the assignment (i.e. unbalanced consumer with manual partition assignment) is overridden by the hidden subscribe (i.e. causing a shift to the balanced consumer)
the consumer API should allow both these use cases:
consumer.assign([{ topic:topicName, partition:0, offset:2950}]);
consumer.consume(); //invoke the loop
and
consumer.subscribe([topics]);
consumer.consume(); //invoke the loop
The consume method should not require again the list of topics that have been subscribed or assigned to.
This behavior looks more consistent with the Java client, though removing the argument from consume
currently invokes consumeOne
Hello, I was wondering if topic creation is exposed as well as the ability to configure partitions and replication factor on create.
Also, is topic altering exposed as well as the ability to alter the number of partitions when altering a topic.
I see a topic creation function in the producer but it also says that it only creates / manages this object in V8. So I am not sure this is actually creating the topic in Kafka.
Hello, I have created a script based on your tutorial to produce and consume on a single topic a single message.
https://gist.github.com/thanpolas/ed14e3db69646fefe268639ae069bbd5
When I run the script I get weird and unpredictable outcomes, sometimes there will be no data incoming at all, sometimes it will take several seconds to get the produced outcome (~25") and sometimes it'll work immediately. See screenshot of outcomes here: http://than.pol.as/iuB3
I understand that not cleanly shutting down consumers might be a reason why I observe these behaviors, would you confirm that too?
During development it is inevitable that a consumer will shutdown uncleanly, is there a solution for this, even if it is only for development env.
Following your latest updates, we are not able to consume messages anymore.
This is the sample we used:
var Kafka = require('../');
var topic = 'testtopic';
var consumer = new Kafka.KafkaConsumer({
'debug': 'all',
'metadata.broker.list': 'localhost:9092',
'group.id': 'node-rdkafka-consumer' + new Date().getTime(),
'enable.auto.commit': false
}, {
'auto.offset.reset': 'latest'
});
// Flowing mode
consumer.connect();
consumer.on('event.log', function(log) {
console.log(log);
});
consumer.on('data', function(m) {
console.log('Received a message:');
console.log(' message: ' + m.payload.toString());
console.log(' key: ' + m.key);
console.log(' topic: ' + m.topic);
console.log(' offset: ' + m.offset);
console.log(' partition: ' + m.partition);
});
consumer.on('ready', function() {
console.log('ready');
consumer.consume([topic]);
});
We've tried adding calls to subscribe(), adding a callback to consume(), nothing seems to work.
Running with debug set to all, we can see it's actually fetching messages but the 'data' callback is never invoked.
{ severity: 7,
fac: 'SEND',
message: 'mickael-ThinkPad-W530:9092/0: Sent FetchRequest (v1, 68 bytes @ 0, CorrId 46)' }
{ severity: 7,
fac: 'RECV',
message: 'mickael-ThinkPad-W530:9092/0: Received FetchResponse (v1, 351 bytes, CorrId 46, rtt 91.23ms)' }
{ severity: 7,
fac: 'FETCH',
message: 'mickael-ThinkPad-W530:9092/0: Topic testtopic [0] MessageSet size 310, error "Success", MaxOffset 20, Ver 3/3' }
{ severity: 7,
fac: 'CONSUME',
message: 'mickael-ThinkPad-W530:9092/0: Enqueue 10 messages on testtopic [0] fetch queue (qlen 1, v3)' }
{ severity: 7,
fac: 'FETCH',
message: 'mickael-ThinkPad-W530:9092/0: Fetch reply: Success' }
Using the standard API,
consumer.subscribe(topics)
, when the consumer is actually "ready" to consume those topics? Since there is no relevant callback, is manually parsing librdkafka's log events the only way?consumer.unsubscribe()
first. Assuming commits are turned off, does that mean we could miss some messages between the call to unsubscribe()
and the next call to subscribe(topics)
?Thank you.
Hi,
My setup is as follows, I have two instances of Kafka running in docker containers, a producer and a consumer script.
When I kill the first broker and restart the producer / consumer script they timeout and "ready" event is never emitted.
{"message":"Local: Timed out","code":-185,"errno":-185,"origin":"kafka"}
The issue is that when calling rd_kafka_metadata
, there is a moment when both brokers' state
is NOT up and the function that decides which broker to talk to, rd_kafka_broker_any
, returns the first broker, rd_kafka_metadata
returns a timeout error (because that broker is down).
In the JS code's connect
method we return and never emit "ready" event.
This error seems to be fixed on librdkafka's side in the current master version as rd_kafka_metadata
's function was partly rewritten.
Did you observer this behavior during your tests or am I doing something wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.