Hi, I am a bit curious about the latency result in <a href="https://github.com/eBay/Nu

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question about Quick Benchmark Results about nuraft HOT 6 OPEN

ebay commented on July 22, 2024

Question about Quick Benchmark Results

from nuraft.

Comments (6)

greensky00 commented on July 22, 2024

Hi @Steamgjk

As you can find in the benchmark program, client and leader are running in the same process (more precisely, there is no separate client but the benchmark program itself is both client and server, and directly invokes the Raft API), to measure the pure Raft performance:

NuRaft/tests/bench/raft_bench.cxx

Lines 303 to 304 in f004f4c

 ptr<raft_result> ret = 

 args->stuff_.raft_instance_->append_entries( {msg} );

Hence, there is no network cost between client and leader, and each replication can be done within a single RTT.

from nuraft.

Steamgjk commented on July 22, 2024

Hi, @greensky00
I am wondering whether you have some bench results with >3 replicas, because according to my perf test,
with 3 replicas, the throughput can reach ~33K ops/second
but when it comes to 9 replicas, the throughput drops to ~20K ops/second
I am wondering whether this is a normal result, or I miconfigured something.

from nuraft.

greensky00 commented on July 22, 2024

Hi @Steamgjk
We don't publish the result with more replicas, but that is expected behavior as the amount of data that the leader should transfer over the network is proportional to the number of followers.

from nuraft.

Steamgjk commented on July 22, 2024

Hi, @greensky00 . Do you think it is bounded by network bandwidth or something else? You know, I am using n1-standard-32 VM as replias, the bandwidth is 32^4 Gbps. I feel that is quite large and should be inefficient. So I think it may be more reasonable to attribute the bottleneck to CPU, because replicas need to serialize/deserialize/process more messages when we have more replicas. What do you think is a convincing explanation, CPU or bandwidht, or something else?
https://cloud.google.com/compute/docs/machine-types

from nuraft.

greensky00 commented on July 22, 2024

@Steamgjk
You can monitor the CPU usage of the leader during the test, and if it reaches 3200%, you can say it is CPU-bound. But I don't believe it uses that much CPU. Network bandwidth is also not likely the cause, as payload size (256 bytes) is small compared to the TCP window size (64KB by default). With simple math, the total throughput of the leader is 256 * 20K * 8 ~= 40MB only for 9 replicas.

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

from nuraft.

Steamgjk commented on July 22, 2024

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

I agree with it. Serialization/deserialization should be the bottleneck

from nuraft.

Question about Quick Benchmark Results about nuraft HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	ptr<raft_result> ret =
	args->stuff_.raft_instance_->append_entries( {msg} );