turtlesoupy / haskakafka Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 17.0 121 KB

Kafka bindings for Haskell

License: MIT License

Haskell 100.00%

haskakafka's People

Contributors

Stargazers

Watchers

Forkers

alexeyraga laurencer commandodev slon1024 sztupi thomasdziedzic donatello thierry-b diffos takano-akio dbeacham lucaqz vipo

haskakafka's Issues

Segmentation fault when looping with consumeMessageBatch

Hi,

This is maybe not an issue with haskakafka itself but being relatively knew to haskell in general, I don't know what's going on exactly. Sorry if it's not the case.

So I created a simple consumer consuming and printing all messages in a topic until exhaustion. It then prints the number of messages received. Well, that's what I'm trying to achieve. See code.

But when I launch it:

$ ghc consumer.hs && ./consumer
[1 of 1] Compiling Main             ( consumer.hs, consumer.o )
Linking consumer ...
...
[messages]
...
Segmentation fault (core dumped)
$

See the segmentation fault? I suppose that's not the expected behavior. I correctly see all messages printed out though.

I then tried with the consumer script provided by kafka - which does the same thing as my script - and there nothing seems wrong:

$ kafka-console-consumer.sh --zookeeper localhost:2181 --topic ttp --from-beginning
...
[messages]
...
^CConsumed 200 messages
$

I checked the kafka's server log and with both programs, I get:

[2015-08-09 08:35:21,556] INFO Closing socket connection to /0:0:0:0:0:0:0:1. (kafka.network.Processor)
[2015-08-09 08:35:21,564] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor)
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
at kafka.network.MultiSend.writeTo(Transmission.scala:101)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
at kafka.network.Processor.write(SocketServer.scala:472)
at kafka.network.Processor.run(SocketServer.scala:342)
at java.lang.Thread.run(Thread.java:745)

To be precise, in the case of the kafka provided consumer, the ERROR and subsequent lines appear after issuing the ^C. There's an error but since the log is the same in both cases, I suppose nothing is wrong on the server side.

After a bit more debugging, the segmentation fault appears on the consumeMessageBatch line when there is no more message to fetch. So it consumes every message and after that loops a last time and boom. I would have thought in that case that the consumeMessageBatch would return a Left KafkaError but instead it produces a segmentation fault.
~~I have the same behavior with consumeMessage~~

Am I misusing the library in some way?

EDIT: I do not have the same behavior with consumeMessage. It was a false alert.

Doesn't build against librdkafka >= 0.9.0.100

A few functions have moved/been renamed as of 0.9.0.100. At least:

rd_kafka_position renamed to rd_kafka_committed
rd_kafka_position has a new signature

Making the above changes enables compilation.

There's possibly other changes too - I'll have a browse through librdkafka > 0.9.0.99 commits to double check before sending over a PR.

A viable Kafka client got rolling recently

Seems like it's easier to just speak the binary protocol rather than use FFI to the C bindings.

https://github.com/tylerholien/milena

Still needs a lot of work and you know y'all are kicking around Hailstorm, but if this is still on your radar help would be greatly appreciated.

Cheers and good luck either way!

Can't run Examples

λ> example
Done producing messages, here was our config: 
Kafka config: fromList [("batch.num.messages","1000"),("broker.address.family","any"),("broker.address.ttl","300000"),("client.id","rdkafka"),("compression.codec","none"),("delivery.report.only.error","false"),("fetch.error.backoff.ms","500"),("fetch.message.max.bytes","1048576"),("fetch.min.bytes","1"),("fetch.wait.max.ms","100"),("internal.termination.signal","0"),("log_cb","0x7f2d20e3e230"),("log_level","6"),("message.max.bytes","4000000"),("message.send.max.retries","2"),("metadata.request.timeout.ms","60000"),("open_cb","0x7f2d20e561b0"),("queue.buffering.max.messages","100000"),("queue.buffering.max.ms","1000"),("queued.max.messages.kbytes","1000000"),("queued.min.messages","100000"),("receive.message.max.bytes","100000000"),("retry.backoff.ms","100"),("socket.keepalive.enable","false"),("socket.max.fails","3"),("socket.receive.buffer.bytes","0"),("socket.send.buffer.bytes","0"),("socket.timeout.ms","50000"),("socket_cb","0x7f2d20e47e00"),("statistics.interval.ms","0"),("topic.metadata.refresh.fast.cnt","10"),("topic.metadata.refresh.fast.interval.ms","250"),("topic.metadata.refresh.interval.ms","10000"),("topic.metadata.refresh.sparse","false")]
Topic config: fromList [("auto.commit.enable","true"),("auto.commit.interval.ms","60000"),("auto.offset.reset","largest"),("enforce.isr.cnt","0"),("message.timeout.ms","300000"),("offset.store.method","file"),("offset.store.path","."),("offset.store.sync.interval.ms","-1"),("produce.offset.report","false"),("request.required.acks","1"),("request.timeout.ms","50000")]
Woo, payload was Hello world
Woohoo, we got 7 messages
*** Exception: RdKafkaRespErrT.toEnum: Cannot match 32556
λ>

It works to a point (I assume this point):

setLogLevel kafka KafkaLogCrit

Is there something wrong in Haskakafka/InternalRdKafka.chs perhaps?

partition in consumeMessage

Hi Thomas,

In your example I see that withKafkaConsumer has a partition parameter to lock the consumer to a particular partition. Yet then to call consumeMessage inside I have to pass the partition again.

Wouldn't it be more clear if the lambda function for withKafkaConsumer had a type of
Kafka -> KafkaTopic -> Int -> IO a --where Int is an assigned partition number
instead of
Kafka -> KafkaTopic -> IO a ?

Cheers,
Alexey

Set default partitioner?

I'm wondering if there's a way to set the partitioner for a topic? I didn't see any mechanism in HaskaKafka for specifying the default partitioner in librdkafka, which is set to random by default:
https://github.com/edenhill/librdkafka/blob/547e3bac8ee7f4714406db429d35ee51c449024e/src/rdkafka_topic.c#L195-L197

I've got an application I'd like to use the consistent partitioner:
https://github.com/edenhill/librdkafka/blob/547e3bac8ee7f4714406db429d35ee51c449024e/src/rdkafka_msg.c#L335-L341

I'd be nice if I could maybe add an entry to the ConfigOverrides I pass into newKafkaTopic:

newKafkaTopic kafka topicName [("partitioner", "consistent")]

Synchronous producer API?

I turned off a broker and was caught off guard by getting no error that a message wasn't delivered when I call produceMessage. I think this is because this Kafka library is inherently asynchronous. I'm just now looking into the underlying librdkafka library to see how synchronous producer messages can be done. I see people doing this with a "report delivery" callback that you can wait for (with some timeout).

I also see some of that callback code in Haskakafka's internal module, but I'm not sure it's plumbed through the API completely. Also, my expertise in Haskell FFI is incomplete, so I need to do more reading on my end.

Anyway, I just wanted to open this issue so users can talk about ways forward. Or maybe I missed something, and what I want to do is straight forward.

Not all the messages make their way to the queue

When I produce tons of messages, not all of them make they way to the kafka queue even if I call drainOutQueue before my program terminates.
I am sure that produceMessage is called exactly as many times as needed, but I see way less messages in the queue (up to 2/3 of them are lost somehow).

Can it be an implementation bug, or should I slush kafka's buffers somehow before terminating the program?

Patch work needed to make "cabal install haskakafka" work

I am using ubuntu 14.04 with ghc - 7.8.4. When i ran cabal install haskakafka, it didn't work straight out of the box.

First, i had to do "sudo apt-get install librdkafka-dev" to solve, "* Missing C library: rdkafka" error.
Then i got the next error,

dist/build/Haskakafka/InternalRdKafka.chs.h:2:21: fatal error: rdkafka.h: No such file or directory
 #include "rdkafka.h"

I had to add librdkafka include path to my shell environment as,

export C_INCLUDE_PATH=/usr/include/librdkafka

If these two steps can be added in README page, then it will save other people's time. thanks.

haskakafka-1.0.0 build failures

Please don't hard-code the link search path /usr/local/lib in your Cabal file! That setting causes the build to fail on NixOS because the path doesn't exist on that distribution.

Haskell Platform 7.8.3 support

It would be great if 7.8.3 was supported

KafkaUnknownConfigurationKey: No such configuration property: \"auto.offset.reset\"

I get this error when I pass ("auto.offset.reset", "latest") as extra kafka conf properties in runConsumer function.

librdkafka seems to implement this:
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

Am I missing something?

Segmentation fault in fetchBrokerMetadata

Hi,

Thanks for the library. I receive a segfault when calling fetchBrokerMetadata per the example in the docs. The issue seems to be in InternalRdKafka.hs.

{#sizeof rd_kafka_metadata_broker_t#}

This line calculates the size of rd_kafka_metadata_broker_t to be 20. However, the size is really 24 with gcc v4.9.2 on Debian. The result is that in getMetadata in Haskakafka.hs calling peekElemOff brokersPtr i will create a bogus RdKafkaMetadataBrokerT. The subsequent peekCString (host'RdKafkaMetadataBrokerT bmd) that occurs in constructBrokerMetadata is the line that causes the segfault.

There's an issue for this in haskell/c2hs: haskell/c2hs#129. They state that using c2hs's current design, it's not possible to correctly compute the sizes in this instance due to limitations in what is available in Foreign.Storable.

I looked at other packages for deriving Storable instances, but they had the same issues. What do you suggest? I'm currently explicitly inserting the values in a local version of haskakafka.

Similarly, the size of rd_kafka_metadata_topic_t should be 32. However, that doesn't cause a segfault, just an enum conversion error.

Building & testing with stack

I've managed to get haskafka to build and apparently to run. A key addition was to add the lines

extra-lib-dirs:
- /usr/local/lib

to the stack.yaml.

Running example in Haskafka.Examples, I see the error

Woo, payload was Hello world
Woohoo, we got 10 messages
testkafka-exe: RdKafkaRespErrT.toEnum: Cannot match 32570

The tests in tests also fail with this message. Any insight into why this might be?

Edit: I saw #9 after writing this - is there a sanctioned version of librdkafka to pull?

consumeMessage blocks physical thread

I am trying to consume from multiple kafka topics/partitions using forkIO threads, but it looks like timeout set for one consumer affects others. The whole application seems to be blocked for the timeout period every time when there is no message in any of the topics/partitions.

Is it a but or am I doing something incorrectly?

turtlesoupy / haskakafka Goto Github PK

haskakafka's People

Contributors

Stargazers

Watchers

Forkers

haskakafka's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs