GithubHelp home page GithubHelp logo

kxsystems / kafka Goto Github PK

View Code? Open in Web Editor NEW
48.0 15.0 29.0 292 KB

kdb+ to Apache Kafka adapter, for pub/sub

Home Page: https://code.kx.com/q/interfaces

License: Apache License 2.0

C 69.30% q 26.98% Batchfile 1.38% Makefile 2.34%
q kdb kafka interface

kafka's Introduction

Kafka kfk – Kafka for kdb+

GitHub release (latest by date) Travis (.org) branch

kfk is a thin wrapper for kdb+ around librdkafka C API for Kafka. It is part of the Fusion for kdb+ interface collection.

This interface is supported for the following platforms

  • 32- & 64-bit macOS and Linux
  • 64-bit Windows

New to kdb+ ?

Kdb+ is the world’s fastest time-series database, optimized for ingesting, analyzing and storing massive amounts of structured data. To get started with kdb+, see https://code.kx.com/q for downloads and developer information. For general information, visit https://kx.com/

API Documentation

👉 API reference

Installation Documentation

👉 Install guide

Example Setup

👉 Example setup guide

Performance and Tuning

👉 edenhill/librdkafka/wiki/How-to-decrease-message-latency

There are numerous configuration options and it is best to find settings that suit your needs and setup.

Status

This interface is provided under an Apache 2.0 license.

If you find issues with the interface or have feature requests please raise an issue.

To contribute to this project, please follow the contribution guide.

kafka's People

Contributors

5jt avatar awilson-kx avatar charlieskelton-kx avatar cmccarthy1 avatar mshimizu-kx avatar nmcdonnell-kx avatar sshanks-kx avatar sv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafka's Issues

Addition of per client `errorcb` functionality

Raised internally

Is your feature request related to a problem? Please describe.
Similar to #31 given the close association between the consumecb and errorcb callbacks expose a method to allow error callbacks to be handled on a more granular level

Describe the solution you'd like
Functionality change should be in line with the method provided through the .kfk.Subscribe function overhaul #47

Describe alternatives you've considered
There are not really many alternatives considered to this point

Additional resource
No additional resources deemed necessary at this point, will update issue if relevant on this front

Application of AssignOffsets results in purge of current assignment.

Describe the bug
When applying AssignOffsets to a specified topic with partition offset pairs the application of this function results in all previous assignments being voided. This is as a result of a call to rd_kafka_assign within the C api which purges all previous assignments. In reality the function should (if they exist) purge the assignments associated and add the new assignment with specified offset to the topic/partitions specified leaving all others as pre defined.

To Reproduce

client:.kfk.Consumer[cfg];
.kfk.AssignAdd[client;((count partitions)#`topic1)!"j"$partitions]
.kfk.AssignAdd[client;((count partitions)#`topic2)!"j"$partitions]
.kfk. AssignOffsets[client;`topics2;(partitions)!(count partitions)#0j]]

Or

client:.kfk.Consumer[cfg];
.kfk.AssignAdd[client;((count partitions)#`topic1)!"j"$partitions]
.kfk. AssignOffsets[client;`topics2;(partitions)!(count partitions)#0j]]

Each of the above result from a call to `.kfk.Assignment a table

topic partition offset metadata
topic2 * 0 ""

Expected behavior
The above should have resulted in

topic partition offset metadata
topic1 * -1001 ""
topic2 * 0 ""

Deleting one of the many producers in a process results in segfault

You hit a segfault if you try to delete a producer in a process with more than one producer.

producer :.kfk.Producer enlist[`metadata.broker.list]!enlist `localhost:9012;
producer2:.kfk.Producer enlist[`metadata.broker.list]!enlist `localhost:9013;

.kfk.ClientDel producer;

q process can hang on exit if kafka topic/client is not cleaned up

Creating multiple clients and topics with the same config can cause the q process to hang if the topic is not cleaned up before exit.

Here is a reproducible case:

\l kfk.q
producer1: .kfk.Producer[enlist[`metadata.broker.list]!enlist `172.30.4.232:9010];
topic1: .kfk.Topic[producer1;`test;()!()];
.kfk.ClientDel producer1;

producer2: .kfk.Producer[enlist[`metadata.broker.list]!enlist `172.30.4.232:9010];
.kfk.Topic[producer2;`test;()!()];
.kfk.Topic[producer2;`test;()!()];

Try exiting the process after running this script

Allow to set timestamp for message at publish time

Is your feature request related to a problem? Please describe.
It would be helpful if we can set timestamp of the message, it helps to query offsets by time possible resulting in simplified recovery logic.

Describe the solution you'd like
Allow for .kfk.pub to take timestamp as a parameter which can underneath call librdkafka's rd_kafka_producev() API.

Additional resource
confluentinc/librdkafka#1016

Docs should be included with release

New docs folder wont currently be added to a release build - should be included in the future while docs are there.
.travis.yml example area
e.g.
elif [[ $TRAVIS_OS_NAME == "windows" ]]; then
7z a -tzip $FILE_NAME README.md install.bat LICENSE q examples;
elif [[ $TRAVIS_OS_NAME == "linux" || $TRAVIS_OS_NAME == "osx" ]]; then
tar -zcvf $FILE_NAME README.md install.sh LICENSE q examples;

Support for rd_kafka_consume_batch

Internal Feature Request

Is your feature request related to a problem? Please describe.
Within the librdkafka C api there appears to be functionality to allow the batched send/receive of data. In many use cases this is preferable to the continuous consumption/sending of data.

Describe the solution you'd like
Provide a logical mechanism for q to expose the use of some of the below librdkafka C functions

  • rd_kafka_consume_batch

Describe alternatives you've considered
There isn't currently an alternative mechanism within the structure of the interface to provide this functionality

Additional resource

Support for rd_kafka_produce_batch

Internal Feature Request

Is your feature request related to a problem? Please describe.
Within the librdkafka C api there appears to be functionality to allow the batched send/receive of data. In many use cases this is preferable to the continuous consumption/sending of data.

Describe the solution you'd like
Provide a logical mechanism for q to expose the use of some of the below librdkafka C functions

  • rd_kafka_produce_batch

Describe alternatives you've considered
There isn't currently an alternative mechanism within the structure of the interface to provide this functionality

Additional resource

Bad config can cause consumer not to get data

Recreation

Run test_consumer.q and subscribe to data. Now quit & edit test_consumer.q to have an extra config
(`queue.buffering.max.ms;`1);

you will get no data & when exit you will see
"(4i;\"CONFWARN\";\"[thrd:app]: Configuration property queue.buffering.max.ms is a producer property and will be ignored by this consumer instance\")"

SASL/Kerberos authentication

I am looking to connect to a topic that requires SASL authentication.
Would I be able to do this using this library? if yes, what would an example look like?
Thank you

Flexible location for kafka lib

Makefile has

KAFKA_ROOT     = ${HOME}
KFK_INCLUDE    = ${KAFKA_ROOT}/include

When building, its looking for Kafka libs in home dir.
Could add change to make it more flexible and allow it to use a Kafka lib installed anywhere.
Also need to update README to inform uses that they can use this in order to build/etc

Expose `.kfk.throttlecb` and `.kfk.errorcb` to the interface

Is your feature request related to a problem? Please describe.
As highlighted tangentially from #32 erroring on kafka side prints out information about the error to standard out but does not provide the ability to interact with this. Similarly throttle callbacks are not supported in the current implementation of the interface but should be

Describe the solution you'd like
Implement the callback functionality necessary such that q functions .kfk.errorcb and .kfk.throttlecb can be exposed to allow users to handle errors and when a non zero throttle time is received from a broker.

Describe alternatives you've considered
This cannot be handled alternatively to exposure of the functions at a C level with the addition of appropriate q functions to call

Additional resource
An initial implementation of this has been completed although still in beta

  • Callback definitions are here, and called by the client definition function here
  • q code additions are here

Modify `.kfk.consumecb` functionality to allow per topic/client pair granularity

Internally raised issue

Is your feature request related to a problem? Please describe.
The addition of support for consumption on a per topic basis in #47 allows consumption to be managed based on topic. Consumption can occur from the same topic name but different clients

Describe the solution you'd like
Potentially key the topic specific callback logic to a multi key lookup allowing flexibility on what granularity the callbacks can handle various callbacks can be invoked.

Describe alternatives you've considered
It is possible for a user within the functions specific to the topic to add per client logic as applicable. This issue should as such be seen as a potential enhancement with the caveat that a lookup on the topic consumption across 2 keys may eat in too heavily to the consumecb function to allow it to be scalable

Additional resource
No additional resources available for this issue at this time. Further resourcing will be provided in comments if necessary

Getting error when trying to load kfk.q

Describe the bug
Installed kdb kafka on WIndows using WSL and Ubuntu - everything looked ok. When running q and using the load command
\l kfk.q

I get the following error
KDB+ 3.6 2019.04.02 Copyright (C) 1993-2019 Kx Systems
w32/ 8()core 4095MB wlee use7410wlee2 192.168.1.196 NONEXPIRE

Welcome to kdb+ 32bit edition
For support please see http://groups.google.com/d/forum/personal-kdbplus
Tutorials can be found at http://code.kx.com
To exit, type \
To remove this startup msg, edit q.q
q)\l kfk.q
'The specified module could not be found.
[2] \wsl$\Ubuntu-20.04\home\wlee\kdb_kafka\kfk.q:74: .kfk,:(`$3_'string funcs[;0])!LIBPATH@/:funcs
^
[0] ()

Expected behavior
don't expect error

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal

APologies, new to KDB and Kafka - but I would have expected this to work out of the box. Any help would be appreciated.

message header and message type

  1. as a producer, is it possible to add message type and message header in the message if we are publishing multiple message types in the same topic

  2. as a consumer, is it possible to include the message header and message type in the message so we can distinguish the message type/header information separately without going into the data payload. Currently, the consumer only sees certain information on the call back function.

Mechanism to allow a user to unsubscribe from individual topics

Internal Feature Request

Is your feature request related to a problem? Please describe.
The addition of the ability to subscribe to multiple topics in v1.4.0 of this interface, however the unsubscribe functionality is asymmetric to this allowing for subscriptions from all topics to be completed only.

Describe the solution you'd like
The underlying functionality which controls unsubscriptions within the current api calls the librdkafka function rd_kafka_unsubscribe to achieve unsubscriptions. This is essentially the same as calling subscribe with no topics which doesn't provide sufficient granularity.

This is outlined here.

Describe alternatives you've considered
Exposure of the assignment interface appears to allow a level of granularity that would begin to allow this to be achieved. This allows a user to remove the consumption of data from particular topic/partition pairs and could form the basis for a solution.

Additional resource
Prototypes for the C code which may be useful for exposure of the Assignment function calls are available here

Update docs with latest info

Now docs are within the repo & associated with a release, check/update with additions in recent releases (e.g. kfk.queryWatermark) and all possible callbacks.

Deleting a client and recreating it causes segfault

Looks like .kfk.ClientDel is not cleaning up resources properly. Deleting a client and then recreating one with the same config results in a segfault. Please see below for a reproducible case.

\l kfk.q
\c 20 200

kfk_cfg:(!) . flip(
    (`metadata.broker.list;`172.31.1.104:9011);
    (`group.id;`0)
    );

client1: .kfk.Consumer kfk_cfg;
.kfk.Sub[client1;`foo; enlist .kfk.PARTITION_UA];

system "sleep 2";
.kfk.Unsub client1;
.kfk.ClientDel client1;

client2: .kfk.Consumer kfk_cfg;

Consumer group reassignment of offset not working

Describe the bug
For an existing consumer group with stored offsets, resetting the offset to the latest is not working, it still reads from the stored offset.

To Reproduce

  1. First, have to set up a consumer group (say test) and run for a few messages and commit those offsets. .kfk.Sub[client;topicName;(enlist 6h$1)!(enlist .kfk.OFFSET.STORED)];

  2. Now restart the process and assign a specific partition and latest offset with same group test .kfk.Sub[client;topicName;(enlist 6h$1)!(enlist .kfk.OFFSET.END)];`

Expected behavior
It should only start reading if there are new messages in `topicName, however it starts reading from the last committed offset.

Desktop (please complete the following information):

  • OS: linux 2.6
  • KDB+ banner information KDB+ 3.6 2019.09.19
  • .kfk.Version : 16842992i

Additional context
relevant config parmater "auto.offset.reset:latest"

Thread starvation when consuming persisted messages

Client Raised Issue

Describe the bug
When consuming persisted data with large numbers of messages, thread starvation can occur as the main thread is blocked until consumption is completed.

To Reproduce

  • Create a Kafka topic with say few hundred thousand messages persisted. This will create q main thread starvation.
  • Set consumer to read from the begining -> (auto.offset.reset;earliest);
  • make group_id random string. It has to be unique string. We do not use consumer groups in many use cases so random is fine.

Settings:

group_id:20?.Q.a
kfk_cfg:(!) . flip(
    (`metadata.broker.list;`localhost:9100);  
    (`group.id;group_id);
    (`statistics.interval.ms;`10000);
    (`enable.auto.commit;`false);
    (`auto.offset.reset;`earliest);
    (`queue.buffering.max.ms;`1);
    (`fetch.wait.max.ms;`10)
    );

Expected behavior
There should be a mechanism by which to interrupt the consumption process or an option to allow this consumption to take place in a more controlled manner.

SegFault under condition `group.id not defined with consumer

Internally Raised Issue

Describe the bug
In a rare case that a user attempts to pass a consumer config into .kfk.Consumer due to restrictions on librdkafka the q session will segfault

To Reproduce

q) .kfk.Consumer[enlist[`metadata.broker.list]!enlist `localhost:9011]

Expected behaviour
The interface should error out with an appropriate error describing the problem

Screenshots
No applicable screenshots

Desktop (please complete the following information):

  • Issue is independent of operating OS seen on mac/linux/windows

Additional context
No additional context necessary

Support for rd_kafka_consume_batch_queue

Internal Feature Request

Is your feature request related to a problem? Please describe.
Within the librdkafka C api there appears to be functionality to allow the batched send/receive of data. In many use cases this is preferable to the continuous consumption/sending of data.

Describe the solution you'd like
Provide a logical mechanism for q to expose the use of some of the below librdkafka C functions

  • rd_kafka_consume_batch_queue

Describe alternatives you've considered
There isn't currently an alternative mechanism within the structure of the interface to provide this functionality

Additional resource

Stale consumers after commit

Describe the bug
Committing an offset as a member of a consumer group during a rebalancing of the group can cause a consumer to become stale.

To Reproduce

cat stale_con.q

\l kfk.q
OFFSET_LOG:() ; MSGS:()
\c 5000 5000
commit:{ .kfk.CommitOffsets[0i;`test1;;1b] exec partition!offset from MSGS where offset = (max;offset)fby partition ; `COMMITED set .z.p ;  }
.kfk.offsetcb: {[cid;err;offsets] if[not err like "Success" ; .ms.sys.message "offsetcb not success" ; OFFSET_LOG,:(cid;err;offsets) ; `commit set { } ]; }
.kfk.consumecb:{ x[`rcvtime]:.z.p ; MSGS,:: enlist x _ `data  ; `MSG set x  }
cfg:(!) . flip(
  (`metadata.broker.list;`$"localhost:9092");
  (`bootstrap.servers;`$"localhost:9092");
  (`group.id;`$"test_consumer_group_1");
  (`enable.auto.commit;`false);
  (`enable.auto.offset.store;`false);
  (`auto.offset.reset;`latest);
  (`session.timeout.ms;`60000);
  );
.kfk.Consumer cfg
.kfk.Sub[0i;`test1;enlist[.kfk.PARTITION_UA]!enlist[.kfk.OFFSET.END] ]

cat other__cons.q

\l kfk.q
OFFSET_LOG:() ; MSGS:()
\c 5000 5000
system"sleep 2"
commit:{ .kfk.CommitOffsets[0i;`test1;;1b] exec partition!offset from MSGS where offset = (max;offset)fby partition ; `COMMITED set .z.p ;  }
.kfk.offsetcb: {[cid;err;offsets] if[not err like "Success" ; .ms.sys.message "offsetcb not success" ; OFFSET_LOG,:(cid;err;offsets) ]; }
.kfk.consumecb:{ x[`rcvtime]:.z.p ; MSGS,:: enlist x _ `data  ; `MSG set x ; }
cfg:(!) . flip(
  (`metadata.broker.list;`$"localhost:9092");
  (`bootstrap.servers;`$"localhost:9092");
  (`group.id;`$"test_consumer_group_1");
  (`enable.auto.commit;`false);
  (`enable.auto.offset.store;`false);
  (`auto.offset.reset;`latest);
  (`session.timeout.ms;`60000);
  );
clients:{ .kfk.Consumer cfg } each til 10
{ .kfk.Sub[x;`test1;enlist[.kfk.PARTITION_UA]!enlist[.kfk.OFFSET.END] ] } each clients

Steps:

  1. have a process producing on your topic
  2. start stale_con.q
  3. start other_cons.q once the stale one is up and running
  4. manually run commit[] on stale_con.q process in quick succession.
  5. If “offsetcb not success” is not seen then restart the other_cons.q process and try again
  6. After receiving the "Offset commit failed - Specified group generation id is not valid" from offsetcb the consumer won't consume any more messages.

Expected behavior
Consumer groups should continue to consume messages from an appropriate location following all rebalancing events

.kfk.Metadata lists a topic even if it's deleted using .kfk.TopicDel

Deleting a topic using .kfk.TopicDel doesn't remove that from the list of topics returned by .kfk.Metadata

  client: .kfk.Consumer[`metadata.broker.list`group.id!`localhost:9092`0];
  .kfk.Metadata[client]`topics;
  topic: .kfk.Topic[client; `new_topic; ()!()];
  .kfk.Metadata[client]`topics;
  .kfk.TopicDel topic;
  .kfk.Metadata[client]`topics;
  ```

Enable manual offset store by rd_kafka_offset_store()

By default, the client will automatically store the offset+1 of the message just prior to passing the message to the application. This become an issue if application crashed after Kafka client handed over the message to application but before application actually processed it. By enabling rd_kafka_offset_store() we can manually store the offset when application really acknowledged the message.
https://github.com/edenhill/librdkafka/wiki/Consumer-offset-management

Failure to connect to a broker does not throw a q error

Internally raised issue

Describe the bug
If the q process fails to connect to an invalid kafka broker does not throw a q error but prints out kafka errors and returns the client id that would have been used. This is invalid behaviour.

To Reproduce
Run the following in a valid q session initialised with q kfk.q

q).kfk.Consumer[`metadata.broker.list`group.id!`foobar`0]

Expected behavior
Evaluation of the above should error out with an appropriate error indicating that failure was unsuccessful.

Screenshots
There are no relevant screenshots necessary for this

Desktop (please complete the following information):
This issue is os independent and a result of implementation decisions

Additional context
There is no additional context for this issue necessary

Stale consumers after commit offsets

Internally raised issue

Describe the bug
Committing an offset as a member of a consumer group during a group rebalance event for that group can cause the consumer to become stale thus resulting in the consumer no longer receiving messages

To Reproduce
The following scripts can be used to reproduce the issue (Note that the localhost/port need to be added in accordance with your kafka installation)

cat stale_con.q
//load kafka
OFFSET_LOG:() ; MSGS:()
\c 5000 5000
commit:{ .kfk.CommitOffsets[0i;`test1;;1b] exec partition!offset from MSGS where offset = (max;offset)fby partition ; `COMMITED set .z.p ;  }
.kfk.offsetcb: {[cid;err;offsets] if[not err like "Success" ; 0N!"offsetcb not success" ; OFFSET_LOG,:(cid;err;offsets) ; `commit set { } ]; }
.kfk.consumecb:{ x[`rcvtime]:.z.p ; MSGS,:: enlist x _ `data  ; `MSG set x  }
cfg:(!) . flip(
  (`metadata.broker.list;`$"localhost:port");
  (`bootstrap.servers;`$"localhost:port");
  (`group.id;`$"test_consumer_group_1");
  (`enable.auto.commit;`false);
  (`enable.auto.offset.store;`false);
  (`auto.offset.reset;`latest);
  (`session.timeout.ms;`60000);
  );
.kfk.Consumer cfg
.kfk.Sub[0i;`test1;enlist[.kfk.PARTITION_UA]!enlist[.kfk.OFFSET.END] ]
cat other_cons.q
OFFSET_LOG:() ; MSGS:()
\c 5000 5000
system"sleep 2"
commit:{ .kfk.CommitOffsets[0i;`test1;;1b] exec partition!offset from MSGS where offset = (max;offset)fby partition ; `COMMITED set .z.p ;  }
.kfk.offsetcb: {[cid;err;offsets] if[not err like "Success" ; 0N!"offsetcb not success" ; OFFSET_LOG,:(cid;err;offsets) ]; }
.kfk.consumecb:{ x[`rcvtime]:.z.p ; MSGS,:: enlist x _ `data  ; `MSG set x ; }
cfg:(!) . flip(
  (`metadata.broker.list;`$"localhost:port");
  (`bootstrap.servers;`$"localhost:port");
  (`group.id;`$"test_consumer_group_1");
  (`enable.auto.commit;`false);
  (`enable.auto.offset.store;`false);
  (`auto.offset.reset;`latest);
  (`session.timeout.ms;`60000);
  );
clients:{ .kfk.Consumer cfg } each til 10
{ .kfk.Sub[x;`test1;enlist[.kfk.PARTITION_UA]!enlist[.kfk.OFFSET.END] ] } each clients

Steps to reproduce:

  1. Have a process producing on the topic `test1
  2. start stale_con.q
  3. start other_cons.q once the stale one is up and running
  4. manually run commit[] on stale_con.q process in quick succession.
  5. If “offsetcb not success” is not seen then restart the other_cons.q process and try again
  6. After receiving the "Offset commit failed - Specified group generation id is not valid" from offsetcb the consumer won't consume any more messages.

Expected behavior
If offset commit is unsuccessful the consumer should be able to retry commit or configuration should be set to allow this

Desktop (please complete the following information):

q).kfk.VersionSym[]
`1.4.2
Kdb: 4.0 2020.10.02
Kx kafka release:  v1.4.0```
**Additional context**
Add any other context about the problem here.

mistmatch error when pub/sub on the same process

The kdb process gets a mismatch error when it is subscribing to the same topic

q)'mismatch
  [0]  /home/nion/work/marketgrid/install/lib/TorQ/code/common/kfk.q:110: .kfk.statcb:
 s:.j.k j;if[all `ts`time in key s;s[`ts]:-10957D+`timestamp$s[`ts]*1000;s[`time]:-10957D+`timestamp$1000000000*s[`time]];
 .kfk.stats,::enlist s;
           ^
 delete from `.kfk.stats where i<count[.kfk.stats]-100;}
q.kfk))s
name              | "rdkafka#consumer-2"
client_id         | "rdkafka"
type              | "consumer"
ts                | 1970.01.01D00:31:07.068489000
time              | 2020.07.27D03:41:17.000000000
replyq            | 0f
msg_cnt           | 0f
msg_size          | 0f
msg_max           | 0f
msg_size_max      | 0f
simple_cnt        | 0f
metadata_cache_cnt| 2f
brokers           | `centos8.marketgridsystems.com:9092/0`localhost:9092/bootstrap`GroupCoordinator!+`name`nodeid`nodename`source`state`stateage`outbuf_cnt`outbuf_msg_cnt`waitresp_cnt`waitresp_msg_cnt`tx`txbytes`txerrs`txretries`req_timeouts`rx`rxbytes`rxerrs`rxcorriderrs`rxpartial`zbuf_grow`buf_grow`wakeups`connects`disconnects`int_latency`outbuf_latency`rtt`throttle`req`toppars!(("centos8.marketgridsystems.com:9092/0";"localhost:9092/bootstrap";"GroupCoordinator");0 -1 -1f;("centos8.marketg..
topics            | `CHGW`RECN!+`topic`age`metadata_age`batchsize`batchcnt`partitions!(("CHGW";"RECN");29984 29984f;29985 29985f;(`min`max`avg`sum`stddev`p50`p75`p90`p95`p99`p99_99`outofrange`hdrsize`cnt!0 0 0 0 0 0 0 0 0 0 0 0 14448 0f;`min`max`avg`sum`stddev`p50`p75`p90`p95`p99`p99_99`outofrange`hdrsize`cnt!0 0 0 0 0 0 0 0 0 0 0 0 14448 0f);(`min`max`avg`sum`stddev`p50`p75`p90`p95`p99`p99_99`outofrange`hdrsize`cnt!0 0 0 0 0 0 0 0 0 0 0 0 8304 0f;`min`max`avg`sum`stddev`p50`p75`p90`p95`p99`p..
cgrp              | `state`stateage`join_state`rebalance_age`rebalance_cnt`rebalance_reason`assignment_size!("up";29991f;"started";29984f;4f;"group rejoin";2f)
tx                | 2804f
tx_bytes          | 368192f
rx                | 2803f
rx_bytes          | 352831f
txmsgs            | 0f
txmsg_bytes       | 0f
rxmsgs            | 1f
rxmsg_bytes       | 786f
q.kfk)).kfk.stats
name                 client_id type       ts                            time                          replyq msg_cnt msg_size msg_max msg_size_max simple_cnt metadata_cache_cnt brokers                                                                                                                                                                                                                                                                                                                         ..
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------..
"rdkafka#producer-1" "rdkafka" "producer" 1970.01.01D00:30:47.066212000 2020.07.27D03:40:57.000000000 0      0       0        100000  1073741824   0          1                  `centos8.marketgridsystems.com:9092/0`localhost:9092/bootstrap!+`name`nodeid`nodename`source`state`stateage`outbuf_cnt`outbuf_msg_cnt`waitresp_cnt`waitresp_msg_cnt`tx`txbytes`txerrs`txretries`req_timeouts`rx`rxbytes`rxerrs`rxcorriderrs`rxpartial`zbuf_grow`buf_grow`wakeups`connects`disconnects`int_latency`outbuf_latency..
"rdkafka#producer-1" "rdkafka" "producer" 1970.01.01D00:30:57.066443000 2020.07.27D03:41:07.000000000 0      0       0        100000  1073741824   0          1                  `centos8.marketgridsystems.com:9092/0`localhost:9092/bootstrap!+`name`nodeid`nodename`source`state`stateage`outbuf_cnt`outbuf_msg_cnt`waitresp_cnt`waitresp_msg_cnt`tx`txbytes`txerrs`txretries`req_timeouts`rx`rxbytes`rxerrs`rxcorriderrs`rxpartial`zbuf_grow`buf_grow`wakeups`connects`disconnects`int_latency`outbuf_latency..
"rdkafka#producer-1" "rdkafka" "producer" 1970.01.01D00:31:07.066591000 2020.07.27D03:41:17.000000000 0      0       0        100000  1073741824   0          1                  `centos8.marketgridsystems.com:9092/0`localhost:9092/bootstrap!+`name`nodeid`nodename`source`state`stateage`outbuf_cnt`outbuf_msg_cnt`waitresp_cnt`waitresp_msg_cnt`tx`txbytes`txerrs`txretries`req_timeouts`rx`rxbytes`rxerrs`rxcorriderrs`rxpartial`zbuf_grow`buf_grow`wakeups`connects`disconnects`int_latency`outbuf_latency..

q.kfk))key s
`name`client_id`type`ts`time`replyq`msg_cnt`msg_size`msg_max`msg_size_max`simple_cnt`metadata_cache_cnt`brokers`topics`cgrp`tx`tx_bytes`rx`rx_bytes`txmsgs`txmsg_bytes`rxmsgs`rxmsg_bytes

q.kfk))cols .kfk.stats
`name`client_id`type`ts`time`replyq`msg_cnt`msg_size`msg_max`msg_size_max`simple_cnt`metadata_cache_cnt`brokers`topics`tx`tx_bytes`rx`rx_bytes`txmsgs`txmsg_bytes`rxmsgs`rxmsg_bytes

you can see that the column cgrp is not produced in the s variable.

what i did was add this line - it forces this column to to be added if its not present. I am not sure if its always produced.

if[not `cgrp in key s;s[`cgrp]:()];

I can see that as a consumer, it produces that column. but as a producer it doesn't.

But i read carefully on the website it states: Multiple clients, publishers, and subscribers in the same process

https://kx.com/blog/kdb-interface-kafka/

there are alot of examples out there that has this feature...

https://stackoverflow.com/questions/37889760/can-a-kafka-client-to-play-multiple-role-both-consumer-and-producer

Thanks

Nion

calling .kfk.ClientMemberId on a producer client causes segfault

Describe the bug
Running .kfk.ClientMemberId on a client id associated with a producer causes a segfault. This is a known issue but a fix for this should be possible by maintaining a map of client id to the type of handle.

To Reproduce

\l kfk.q
cfg:(!) . flip(
  (`metadata.broker.list;`$"localhost:port");
  (`bootstrap.servers;`$"localhost:port");
  (`group.id;`$"test_consumer_group_1");
  (`enable.auto.commit;`false);
  (`enable.auto.offset.store;`false);
  (`auto.offset.reset;`latest);
  (`session.timeout.ms;`60000)
  );
producer_id:.kfk.Producer cfg
.kfk.ClientMemberId producer_id

Expected behavior
Given this is not supported behaviour from librdkafka side this should error out if a user attempts to run this with a producer handle.

Desktop (please complete the following information):
This has been reproduced on both linux and mac and should be independent of the version of librdkafka and the Kx kafka interface.

kafka using mTLS for kdb+

We are currently using the kdb kafka lib from kx and its working fine so far - however we would like to implement mTLS to improve the security of our kafka messages.

Ability to enable mTLS encryption and authentication - probably allow the basic ones such as shown on this website
https://docs.confluent.io/operator/current/co-authenticate.html

This is quite an important feature for using kdb and hopefully it can be supported.

Consumer not working on osx (librdkafka 1.5.2) due to invalid config

Looks like librdkafka 1.5.2 has a strict configuration checker. It's unhappy if you set delivery report message callback (rd_kafka_conf_set_dr_msg_cb) for a consumer process.

"(4i;\"CONFWARN\";\"[thrd:app]: Configuration property dr_msg_cb is a producer property and will be ignored by this consumer instance\")"

The weird part is this is not just a warning, it seems to break the consumer and the consumer doesn't receive any messages for some reason. Making the following change in kfk.c fixes the issue:

-  rd_kafka_conf_set_dr_msg_cb(conf,drcb);
+  if('p' == xg )
+    rd_kafka_conf_set_dr_msg_cb(conf,drcb);

There is also a warning on the producer process

"(4i;\"CONFWARN\";\"[thrd:app]: Configuration property offset_commit_cb is a consumer property and will be ignored by this producer instance\")"

Process hangs on publish if the producer is deleted

Trying to publish a message on a topic whose associated producer is deleted hangs the process

producer: .kfk.Producer enlist[`metadata.broker.list]!enlist `localhost:9011;
random: .kfk.Topic[producer;`test;()!()];
.kfk.ClientDel producer;
.kfk.Pub[random; -1i; -8!"hello world" ;""]

Exposure for method for per subscription callback functions

Internally raised issue

Is your feature request related to a problem? Please describe.
Largely as a result of interface behaviour prior to v1.4.0 the function .kfk.consumecb assumes that messages being received on the connection between the q and kafka are all handled in the same manner, this is an oversight.

Describe the solution you'd like
Provide an example or easy to use implementation which allows users to modify .kfk.consumecb on a per user, ideally without impacting the current behaviour of the interface

Potentially something along the lines of a dictionary mapping logic to topic which can be passed to a wrapper around .kfk.consumecb

dict:`test1`test2!({[msg]test1_logic ... };{test2_logic ...})

Describe alternatives you've considered
There are no obvious alternatives at a C level although the above logic will need to be thought out more broadly

Additional resource
There are no additional resources that are relevant for this issue at this time

Bottleneck in publishing on a tight while loop

Internally raised issue

Describe the bug

  • Publishing data on a tight while loop without polling for delivery report functions as expected (hitting queue full at the 100k messages sent)
  • Polling after every publish however blocks on C function call (rd_kafka_poll) within 3k messages. Increasing system buffer size does not appear to change how quickly this behaviour arises.

To Reproduce

consumer script

\l kfk.q

kfk_cfg:(!) . flip(
  (`metadata.broker.list;`localhost:9011);
  (`group.id;`0);
  (`queue.buffering.max.ms;`2);
  (`enable.partition.eof;`0)
  );
client:.kfk.Consumer[kfk_cfg];

data:();
kfk.consumecb:{[msg]
  msg[`data]:"c"$msg[`data];
  msg[`rcvtime]:.z.p;
  data,::enlist msg;}

.kfk.Sub[client;`random;enlist .kfk.PARTITION_UA];

producer script

\l kfk.q
kfk_cfg:(!) . flip(
  (`metadata.broker.list; `localhost:9011);
  (`queue.buffering.max.ms;`10)
  );
producer:.kfk.Producer[kfk_cfg];

random:.kfk.Topic[producer;`random;()!()];

n:0;
run:{
  .kfk.Pub[random;.kfk.PARTITION_UA; raze string -8!n+:1;""];
  //.kfk.Poll[producer; 1; 100];
  show .kfk.OutQLen producer;
  };

show "Publishing...";
while[1b; run[]];

Expected behavior
Producer should not block in this scenario.

Screenshots
No applicable screenshots to explain this scenario further

Desktop (please complete the following information):
Behaviour has been seen in a variety of Linux environments and on MacOS so should be reproducible across multiple environments

Additional context
No applicable additional context

README.md documentation links

Describe the bug
README.md ends with a documentation section - thought there was a link to the real documentation within the intro. Not sure what the end documentation section is trying to say - has 2 links then a bit of info about Linux Kafka systems.
Should prob have a main documentation section near the start with a link the code.kx.com docs to make it clearer were the docs are & the end doc links tidied up

.kfk.Metadata returns topic information as lists of dictionaries instead of tables

When you invoke .kfk.Metadata, the topics return is a list of dictionaries that have aligned keys but is not typed as 98h. This makes it impossible to do select or exec on that data to filter out the topic you might be interested in.

client:.kfk.Consumer[`metadata.broker.list`group.id!(`$"localhost:9092";`0)];
info: .kfk.Metadata client
type info `topics
/=> 0h
type each info`topics
/=> 99h 99h
key each info`topics
/=> topic err partitions
/=> topic err partitions

The same is then true for the partitions key that is nested under the topics table. I am using version 1.5.0 in kdb+ 3.6 2019.11.13 on macOS 10.15.7.

KDB+ 3.6 2019.11.13 Copyright (C) 1993-2019 Kx Systems
m64/ 4()core 8192MB

Issue with connecting AWS MSK kafka broker URL

Describe the bug
I was using kdb kafka wrapper to send data to AWS MSK. When we create broker on AWS MSK it will give the URL. This URL generated by AWS MSK contains "-" in the URL. For e.g b-1.aws.east-us.awsmks.com Due to hyphen in URL when we try to pass broker URL in config Q throws error.
To Reproduce
Please use any URL which have - in it as a broker url. For e.g
kfk_cfg:(!) . flip( (metadata.broker.list;b-1.aws.east-us.awsmks.com:9092); (statistics.interval.ms;10000); (queue.buffering.max.ms;1); (fetch.wait.max.ms;10) );`

Expected behavior
Q should not restrict such URL to be used as Kafka broker URL

Failed to get current position offset

Describe the bug
When using the function kfk.PositionOffsets, it is not returning the current offset on the topic from kafka until kdb have processed the message.

To Reproduce

/- note we do not commit offset
/- check the current position offset on the topic - which should be 1 because there is 1 message.
.kfk.PositionOffsets[kfk_consumer;`queue;0i]
topic partition offset metadata
-------------------------------
queue 0         -1001  ""      
/- subscribe to the topic and process the message
{[x] .kfk.Subscribe[kfk_consumer;x;enlist .kfk.PARTITION_UA;upd x]} each kfk_consumer_topics;
/- check the current position offset on the topic
.kfk.PositionOffsets[kfk_consumer;`queue;enlist 0i]
topic partition offset metadata
-------------------------------
queue 0         1      ""      

Expected behavior
The function should be able to provide the current position offset on the topic without having to subscribe/process the message so we can determine where the current offset is on the topic and able to be in replay mode until we catch up to the current message.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS:
    No LSB modules are available.
    Distributor ID: Debian
    Description: Debian GNU/Linux 11 (bullseye)
    Release: 11
    Codename: bullseye
  • KDB+ banner information
    KDB+ 4.1t 2021.11.04 Copyright (C) 1993-2021 Kx Systems
    l64/ 8()core 31578MB nion debian 127.0.1.1 EXPIRE 2023.04.16 carta.com DEV TMP #76484
  • Repository version [e.g. 0.1.0]

Additional context
Add any other context about the problem here.

Issue with special character

Description
While using the kdb kafka wrapper to send data to AWS MSK, we encountered an error with a special character. When we create an AWS MSK broker, it generates an URL which contains the hyphen "-" e.g. b-1.aws.east-us.awsmks.com. Due to the hyphen in the URL when we try to pass broker URL in configuration, Q command throws an error.
To Reproduce
Please use any URL which has hyphen - in it as a broker url. For e.g
kfk_cfg:(!) . flip( (metadata.broker.list;b-2.poc-cluster-1.lergsf.kafka.ap-southeast-1.amazonaws.com:9092); (statistics.interval.ms;10000); (queue.buffering.max.ms;1); (fetch.wait.max.ms;10) );`

Expected behavior
Q should not restrict to use AWS URLS with special characters including hyphens.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.