GithubHelp home page GithubHelp logo

Comments (5)

edenhill avatar edenhill commented on August 27, 2024

I'm unable to duplicate your test results but I've added a new tool to the examples directory better suited for performance testing: examples/rdkafka_performance.c

On an intel core i7 machine running a local kafka 0.7.2 broker (with default settings, storing the kafka logs on an in-memory filesystem) I'm getting the following performance figures:

Producer
Sending 1,000,000 messages of 200 bytes each with no copying or memduping of the payload:

# ./rdkafka_performance -P -t perf1 -p 0 -s 200 -c 1000000
...
% 1000000 messages and 200000000 bytes sent in 8342ms: 119000 msgs/s and 23975.00 Mb/s

Same test but with payload memduping and freeing:

# ./rdkafka_performance -P -t perf1 -p 0 -s 200 -c 1000000 -D
....
% 1000000 messages and 200000000 bytes sent in 8390ms: 119000 msgs/s and 23837.00 Mb/s

So I dont think the extra memdupping and allocations are an issue.
Either way it is really up to the application to decide if data needs to be duplicated before sending off to librdkafka.

(Note: with profiling enabled it shows that 92% of the time is spent calculating checksums on the payload)

Consumer
Receiving 1,000,000 messages of 200 bytes each and just throwing them away (offsets are stored to local file (without fsync()) for each received message):

# ./rdkafka_performance -C -t perf1 -c 1000000 -i 1000
....
% 1000000 messages and 200000000 bytes received in 9406ms: 106000 msgs/s and 21263.00 Mb/s
% Average application fetch latency: 3us

An important configuration setting on the consumer for this kind of thruput is the conf.consumer.replyq_low_thres :

        /* Tell rdkafka to (try to) maintain 10000 messages
         * in its internal receive buffers. This is to avoid
         * application -> rdkafka -> broker  per-message ping-pong
         * latency. */
        conf.consumer.replyq_low_thres = 100000;

It would be interesting if you could re-run your tests using the rdkafka_performance tool instead (or basing your implementation on rdkafka_performance's configuration struct).

Please also indicate the degree of network latency between the rdkafka producer and the kafka broker.

from librdkafka.

rngadam avatar rngadam commented on August 27, 2024

Wow, OK, these are really surprising results to me...

 ./rdkafka_performance -P -t perf1 -p 0 -s 200 -c 1000000 -b 192.168.11.250

Results:

% 1000000 messages and 200000000 bytes sent in 31744ms: 31000 msgs/s and 6300.00 Mb/s
% 1000000 messages and 200000000 bytes sent in 31736ms: 31000 msgs/s and 6301.00 Mb/s

latency:

ping 192.168.11.250
PING 192.168.11.250 (192.168.11.250) 56(84) bytes of data.
64 bytes from 192.168.11.250: icmp_req=1 ttl=64 time=0.546 ms
64 bytes from 192.168.11.250: icmp_req=2 ttl=64 time=0.375 ms
64 bytes from 192.168.11.250: icmp_req=3 ttl=64 time=0.346 ms
64 bytes from 192.168.11.250: icmp_req=4 ttl=64 time=0.301 ms

Memory copies:

 ./rdkafka_performance -P -t perf1 -p 0 -s 200 -c 1000000 -b 192.168.11.250 -D

results:

% 1000000 messages and 200000000 bytes sent in 31629ms: 31000 msgs/s and 6323.00 Mb/s
% 1000000 messages and 200000000 bytes sent in 31895ms: 31000 msgs/s and 6270.00 Mb/s

Larger messages (100 times larger, count 100 times smaller):

no copy:

% 10000 messages and 200000000 bytes sent in 18801ms: 0 msgs/s and 10637.00 Mb/s

copy:

% 10000 messages and 200000000 bytes sent in 18605ms: 0 msgs/s and 10749.00 Mb/s

I'd need to dig into your test to convince myself that it's doing what it's supposed to be doing ;).

from librdkafka.

rngadam avatar rngadam commented on August 27, 2024

Thanks for taking the time to look at time! I'll re-open if I find something interesting.

from librdkafka.

edenhill avatar edenhill commented on August 27, 2024

Uhm, those thruput numbers are in Kb/s actually, not Mb/s.. Sorry about that ;)
So msg/s calculation is correct, but Mb/s is really 6Mb/s and 10Mb/s respetively in your test.
I'll push a fix.
With that small embarasment out of the way;

Your initial msg/s numbers are about 100 times (s)lower than your latest test. Can it all be attributed to the differences between rdkafka_example and rdkafka_performance? I'm not so sure.

What kind of network connection do you have between the producer and the broker? If its 100Mbit the thruput speeds look reasonable (link is saturated when using the bigger message sizes), but if its a gigabit link I would expect much higher thruput and I would have to look in to that.

Running kafka locally and storing logs on an SSD disk, with 20kb messages, gives:
% 100000 messages and 2000000000 bytes sent in 17172ms: 5882 msgs/s and 116.47 Mb/s

Running the same test but storing logs in memory gives:
% 100000 messages and 2000000000 bytes sent in 11615ms: 9090 msgs/s and 172.19 Mb/s

The jvm (for zookeeper & kafka) consumes 99% CPU while rdkafka_performance consumes about 45%.
The system io-wait during SSD disk test was about 7%.

from librdkafka.

rngadam avatar rngadam commented on August 27, 2024

I think the initial test wasn't really comparable; I was running everything (services, test program) from my (relatively slow) laptop and since it seems these operations are mostly CPU bound, it's to be expected that the initially reported test results would be low compared to LAN / two hosts (one with a faster disk).

I think it's really interesting that checksumming is the bottleneck; wonder how much of a performance improvement could be made by tweaking that particular operation?

In any case, I since moved (at least for now) from Kafka since on the consumer-side I couldn't find a good, event-based library for Node.JS (The best library I could find does continuous polling: (cainus/Prozess#25) so it isn't as nice as Redis pubsub support.

from librdkafka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.