I made a rough diff between echo_server and bench. I notice th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the explanation, <a class="user-mention notranslate" data-hovercard-type="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Difference between echo_server and bench about nuraft HOT 3 OPEN

ebay commented on June 24, 2024

Difference between echo_server and bench

from nuraft.

Comments (3)

greensky00 commented on June 24, 2024

Hi @Steamgjk

It depends on 1) your workload and 2) how many cores your machine has. In many-core machines, more threads will help to achieve better CPU utilization. If your workload is enough to fully utilize 4 threads (so that CPU usage is nearly 400%) and you have more than 4 cores, then increasing the thread pool size will make the performance better. Otherwise, the improvement will be marginal.
Examples use raft_launcher which internally creates asio_listener_ :

NuRaft/src/launcher.cxx

Line 39 in b1f7c07

asio_listener_ = asio_svc_->create_rpc_listener(port_number, lg);

There are a few different Raft settings:

NuRaft/tests/bench/raft_bench.cxx

Lines 195 to 202 in b1f7c07

 raft_params params; 

 params.heart_beat_interval_ = 500; 

 params.election_timeout_lower_bound_ = 1000; 

 params.election_timeout_upper_bound_ = 2000; 

 params.reserved_log_items_ = 10000000; 

 params.snapshot_distance_ = 100000; 

 params.client_req_timeout_ = 4000; 

 params.return_method_ = raft_params::blocking;

If you set the same parameters, there should be no performance difference.

from nuraft.

Steamgjk commented on June 24, 2024

Thanks for the explanation, @greensky00
I just made a bench test on Google Cloud. I directly use the bench program in the repo. I am using 3 replica VMs, each is n1-standard-32 type (that means, each VM has 32 cores), surely asio_opt.thread_pool_size_ = 32. The testing result is as follows. We can see that the max throughput is only 34.3K/second, and p50 latency is 845 us.

Then, I made a low-load test, and the result is as follows. With 1K/second load, the p50 latency is 362 us.

In my cluster, the ping latency is around 250~300 us, that means one RTT should be around this value. Considering the message serialization/deserialization also takes some time, I feel fine with the low-load latency (362 us) [If we become more critical, we can still find some inconsistency between the results and your reported bench results, because your RTT is 180us and you can reach the median value of 187 us. For me, my RTT is around 250 but I reach 362us]. However, I am a little concerned about the throughput number, and there is still 7K/second gap between my result and your reported bench results (around 40K/second with 16 client threads). And according to your bench report, the replicas only have 8 cores, so that means, I am using more powerful VMs but earns less powerful results. Do you have any idea about that? What black magics can we use to further improve the throughput?

from nuraft.

greensky00 commented on June 24, 2024

@Steamgjk
The numbers on the benchmark result page are just for reference, and of course, the performance will vary according to the environment.

And note that the workload generated by the benchmark program will not be CPU-bound unless the network is super-fast. That means having 32 cores does not help to improve the performance. Given the fact that your network environment is a bit slower than that of our data center, the discrepancy of 34.3K vs. 40K seems reasonable.

Regarding p50 latency, the number in the result page was measured with the workload of a single client thread + max throughput. You can increase your target throughput to a big number (let's say 1M) and re-measure it:

raft_bench 1 10.128.0.59:12345 120 1000000 1 256 10.128.0.73:12345 10.128.0.28:12345

If you use multiple client threads and higher throughput, a longer p50 latency is expected. Even though client threads independently call the Raft API in parallel, each request will be assigned with a unique Raft log index number, and replication should be done exactly in that order. That means some requests with bad luck should wait for the completion of the replication (including commit) of Raft logs with smaller index numbers, and this wait time is reflected in the latencies.

from nuraft.

Difference between echo_server and bench about nuraft HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	raft_params params;
	params.heart_beat_interval_ = 500;
	params.election_timeout_lower_bound_ = 1000;
	params.election_timeout_upper_bound_ = 2000;
	params.reserved_log_items_ = 10000000;
	params.snapshot_distance_ = 100000;
	params.client_req_timeout_ = 4000;
	params.return_method_ = raft_params::blocking;