turtlesoupy / haskakafka Goto Github PK
View Code? Open in Web Editor NEWKafka bindings for Haskell
License: MIT License
Kafka bindings for Haskell
License: MIT License
Hi,
This is maybe not an issue with haskakafka itself but being relatively knew to haskell in general, I don't know what's going on exactly. Sorry if it's not the case.
So I created a simple consumer consuming and printing all messages in a topic until exhaustion. It then prints the number of messages received. Well, that's what I'm trying to achieve. See code.
But when I launch it:
$ ghc consumer.hs && ./consumer
[1 of 1] Compiling Main ( consumer.hs, consumer.o )
Linking consumer ...
...
[messages]
...
Segmentation fault (core dumped)
$
See the segmentation fault? I suppose that's not the expected behavior. I correctly see all messages printed out though.
I then tried with the consumer script provided by kafka - which does the same thing as my script - and there nothing seems wrong:
$ kafka-console-consumer.sh --zookeeper localhost:2181 --topic ttp --from-beginning
...
[messages]
...
^CConsumed 200 messages
$
I checked the kafka's server log and with both programs, I get:
[2015-08-09 08:35:21,556] INFO Closing socket connection to /0:0:0:0:0:0:0:1. (kafka.network.Processor)
[2015-08-09 08:35:21,564] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
at kafka.network.MultiSend.writeTo(Transmission.scala:101)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
at kafka.network.Processor.write(SocketServer.scala:472)
at kafka.network.Processor.run(SocketServer.scala:342)
at java.lang.Thread.run(Thread.java:745)
To be precise, in the case of the kafka provided consumer, the ERROR
and subsequent lines appear after issuing the ^C
. There's an error but since the log is the same in both cases, I suppose nothing is wrong on the server side.
After a bit more debugging, the segmentation fault appears on the consumeMessageBatch
line when there is no more message to fetch. So it consumes every message and after that loops a last time and boom. I would have thought in that case that the consumeMessageBatch
would return a Left KafkaError
but instead it produces a segmentation fault.
I have the same behavior with consumeMessage
Am I misusing the library in some way?
EDIT: I do not have the same behavior with consumeMessage
. It was a false alert.
A few functions have moved/been renamed as of 0.9.0.100. At least:
rd_kafka_position
renamed to rd_kafka_committed
rd_kafka_position
has a new signatureMaking the above changes enables compilation.
There's possibly other changes too - I'll have a browse through librdkafka > 0.9.0.99 commits to double check before sending over a PR.
Seems like it's easier to just speak the binary protocol rather than use FFI to the C bindings.
https://github.com/tylerholien/milena
Still needs a lot of work and you know y'all are kicking around Hailstorm, but if this is still on your radar help would be greatly appreciated.
Cheers and good luck either way!
λ> example
Done producing messages, here was our config:
Kafka config: fromList [("batch.num.messages","1000"),("broker.address.family","any"),("broker.address.ttl","300000"),("client.id","rdkafka"),("compression.codec","none"),("delivery.report.only.error","false"),("fetch.error.backoff.ms","500"),("fetch.message.max.bytes","1048576"),("fetch.min.bytes","1"),("fetch.wait.max.ms","100"),("internal.termination.signal","0"),("log_cb","0x7f2d20e3e230"),("log_level","6"),("message.max.bytes","4000000"),("message.send.max.retries","2"),("metadata.request.timeout.ms","60000"),("open_cb","0x7f2d20e561b0"),("queue.buffering.max.messages","100000"),("queue.buffering.max.ms","1000"),("queued.max.messages.kbytes","1000000"),("queued.min.messages","100000"),("receive.message.max.bytes","100000000"),("retry.backoff.ms","100"),("socket.keepalive.enable","false"),("socket.max.fails","3"),("socket.receive.buffer.bytes","0"),("socket.send.buffer.bytes","0"),("socket.timeout.ms","50000"),("socket_cb","0x7f2d20e47e00"),("statistics.interval.ms","0"),("topic.metadata.refresh.fast.cnt","10"),("topic.metadata.refresh.fast.interval.ms","250"),("topic.metadata.refresh.interval.ms","10000"),("topic.metadata.refresh.sparse","false")]
Topic config: fromList [("auto.commit.enable","true"),("auto.commit.interval.ms","60000"),("auto.offset.reset","largest"),("enforce.isr.cnt","0"),("message.timeout.ms","300000"),("offset.store.method","file"),("offset.store.path","."),("offset.store.sync.interval.ms","-1"),("produce.offset.report","false"),("request.required.acks","1"),("request.timeout.ms","50000")]
Woo, payload was Hello world
Woohoo, we got 7 messages
*** Exception: RdKafkaRespErrT.toEnum: Cannot match 32556
λ>
It works to a point (I assume this point):
setLogLevel kafka KafkaLogCrit
Is there something wrong in Haskakafka/InternalRdKafka.chs
perhaps?
Hi Thomas,
In your example I see that withKafkaConsumer
has a partition
parameter to lock the consumer to a particular partition. Yet then to call consumeMessage
inside I have to pass the partition
again.
Wouldn't it be more clear if the lambda function for withKafkaConsumer
had a type of
Kafka -> KafkaTopic -> Int -> IO a
--where Int is an assigned partition number
instead of
Kafka -> KafkaTopic -> IO a
?
Cheers,
Alexey
I'm wondering if there's a way to set the partitioner for a topic? I didn't see any mechanism in HaskaKafka for specifying the default partitioner in librdkafka
, which is set to random by default:
https://github.com/edenhill/librdkafka/blob/547e3bac8ee7f4714406db429d35ee51c449024e/src/rdkafka_topic.c#L195-L197
I've got an application I'd like to use the consistent partitioner:
https://github.com/edenhill/librdkafka/blob/547e3bac8ee7f4714406db429d35ee51c449024e/src/rdkafka_msg.c#L335-L341
I'd be nice if I could maybe add an entry to the ConfigOverrides
I pass into newKafkaTopic
:
newKafkaTopic kafka topicName [("partitioner", "consistent")]
I turned off a broker and was caught off guard by getting no error that a message wasn't delivered when I call produceMessage
. I think this is because this Kafka library is inherently asynchronous. I'm just now looking into the underlying librdkafka library to see how synchronous producer messages can be done. I see people doing this with a "report delivery" callback that you can wait for (with some timeout).
I also see some of that callback code in Haskakafka's internal module, but I'm not sure it's plumbed through the API completely. Also, my expertise in Haskell FFI is incomplete, so I need to do more reading on my end.
Anyway, I just wanted to open this issue so users can talk about ways forward. Or maybe I missed something, and what I want to do is straight forward.
When I produce tons of messages, not all of them make they way to the kafka queue even if I call drainOutQueue
before my program terminates.
I am sure that produceMessage
is called exactly as many times as needed, but I see way less messages in the queue (up to 2/3 of them are lost somehow).
Can it be an implementation bug, or should I slush kafka's buffers somehow before terminating the program?
I am using ubuntu 14.04 with ghc - 7.8.4. When i ran cabal install haskakafka, it didn't work straight out of the box.
First, i had to do "sudo apt-get install librdkafka-dev" to solve, "* Missing C library: rdkafka" error.
Then i got the next error,
dist/build/Haskakafka/InternalRdKafka.chs.h:2:21: fatal error: rdkafka.h: No such file or directory
#include "rdkafka.h"
I had to add librdkafka include path to my shell environment as,
export C_INCLUDE_PATH=/usr/include/librdkafka
If these two steps can be added in README page, then it will save other people's time. thanks.
Please don't hard-code the link search path /usr/local/lib
in your Cabal file! That setting causes the build to fail on NixOS because the path doesn't exist on that distribution.
It would be great if 7.8.3 was supported
I get this error when I pass ("auto.offset.reset", "latest") as extra kafka conf properties in runConsumer function.
librdkafka seems to implement this:
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
Am I missing something?
Hi,
Thanks for the library. I receive a segfault when calling fetchBrokerMetadata per the example in the docs. The issue seems to be in InternalRdKafka.hs
.
{#sizeof rd_kafka_metadata_broker_t#}
This line calculates the size of rd_kafka_metadata_broker_t
to be 20. However, the size is really 24 with gcc v4.9.2 on Debian. The result is that in getMetadata
in Haskakafka.hs
calling peekElemOff brokersPtr i
will create a bogus RdKafkaMetadataBrokerT
. The subsequent peekCString (host'RdKafkaMetadataBrokerT bmd)
that occurs in constructBrokerMetadata
is the line that causes the segfault.
There's an issue for this in haskell/c2hs: haskell/c2hs#129. They state that using c2hs's current design, it's not possible to correctly compute the sizes in this instance due to limitations in what is available in Foreign.Storable
.
I looked at other packages for deriving Storable instances, but they had the same issues. What do you suggest? I'm currently explicitly inserting the values in a local version of haskakafka.
Similarly, the size of rd_kafka_metadata_topic_t
should be 32. However, that doesn't cause a segfault, just an enum conversion error.
I've managed to get haskafka to build and apparently to run. A key addition was to add the lines
extra-lib-dirs:
- /usr/local/lib
to the stack.yaml
.
Running example in Haskafka.Examples, I see the error
Woo, payload was Hello world
Woohoo, we got 10 messages
testkafka-exe: RdKafkaRespErrT.toEnum: Cannot match 32570
The tests in tests
also fail with this message. Any insight into why this might be?
Edit: I saw #9 after writing this - is there a sanctioned version of librdkafka
to pull?
I am trying to consume from multiple kafka topics/partitions using forkIO
threads, but it looks like timeout set for one consumer affects others. The whole application seems to be blocked for the timeout
period every time when there is no message in any of the topics/partitions.
Is it a but or am I doing something incorrectly?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.