GithubHelp home page GithubHelp logo

Question about Quick Benchmark Results about nuraft HOT 6 OPEN

ebay avatar ebay commented on July 22, 2024
Question about Quick Benchmark Results

from nuraft.

Comments (6)

greensky00 avatar greensky00 commented on July 22, 2024

Hi @Steamgjk

As you can find in the benchmark program, client and leader are running in the same process (more precisely, there is no separate client but the benchmark program itself is both client and server, and directly invokes the Raft API), to measure the pure Raft performance:

ptr<raft_result> ret =
args->stuff_.raft_instance_->append_entries( {msg} );

Hence, there is no network cost between client and leader, and each replication can be done within a single RTT.

from nuraft.

Steamgjk avatar Steamgjk commented on July 22, 2024

Hi, @greensky00
I am wondering whether you have some bench results with >3 replicas, because according to my perf test,
with 3 replicas, the throughput can reach ~33K ops/second
but when it comes to 9 replicas, the throughput drops to ~20K ops/second
I am wondering whether this is a normal result, or I miconfigured something.

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Hi @Steamgjk
We don't publish the result with more replicas, but that is expected behavior as the amount of data that the leader should transfer over the network is proportional to the number of followers.

from nuraft.

Steamgjk avatar Steamgjk commented on July 22, 2024

Hi, @greensky00 . Do you think it is bounded by network bandwidth or something else? You know, I am using n1-standard-32 VM as replias, the bandwidth is 32^4 Gbps. I feel that is quite large and should be inefficient. So I think it may be more reasonable to attribute the bottleneck to CPU, because replicas need to serialize/deserialize/process more messages when we have more replicas. What do you think is a convincing explanation, CPU or bandwidht, or something else?
https://cloud.google.com/compute/docs/machine-types

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

@Steamgjk
You can monitor the CPU usage of the leader during the test, and if it reaches 3200%, you can say it is CPU-bound. But I don't believe it uses that much CPU. Network bandwidth is also not likely the cause, as payload size (256 bytes) is small compared to the TCP window size (64KB by default). With simple math, the total throughput of the leader is 256 * 20K * 8 ~= 40MB only for 9 replicas.

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

from nuraft.

Steamgjk avatar Steamgjk commented on July 22, 2024

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

I agree with it. Serialization/deserialization should be the bottleneck

from nuraft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.