Comments (3)
Hi @Steamgjk
-
It depends on 1) your workload and 2) how many cores your machine has. In many-core machines, more threads will help to achieve better CPU utilization. If your workload is enough to fully utilize 4 threads (so that CPU usage is nearly 400%) and you have more than 4 cores, then increasing the thread pool size will make the performance better. Otherwise, the improvement will be marginal.
-
Examples use
raft_launcher
which internally createsasio_listener_
:
Line 39 in b1f7c07
-
There are a few different Raft settings:
NuRaft/tests/bench/raft_bench.cxx
Lines 195 to 202 in b1f7c07
If you set the same parameters, there should be no performance difference.
from nuraft.
Thanks for the explanation, @greensky00
I just made a bench test on Google Cloud. I directly use the bench program in the repo. I am using 3 replica VMs, each is n1-standard-32 type (that means, each VM has 32 cores), surely asio_opt.thread_pool_size_ = 32. The testing result is as follows. We can see that the max throughput is only 34.3K/second, and p50 latency is 845 us.
Then, I made a low-load test, and the result is as follows. With 1K/second load, the p50 latency is 362 us.
In my cluster, the ping latency is around 250~300 us, that means one RTT should be around this value. Considering the message serialization/deserialization also takes some time, I feel fine with the low-load latency (362 us) [If we become more critical, we can still find some inconsistency between the results and your reported bench results, because your RTT is 180us and you can reach the median value of 187 us. For me, my RTT is around 250 but I reach 362us]. However, I am a little concerned about the throughput number, and there is still 7K/second gap between my result and your reported bench results (around 40K/second with 16 client threads). And according to your bench report, the replicas only have 8 cores, so that means, I am using more powerful VMs but earns less powerful results. Do you have any idea about that? What black magics can we use to further improve the throughput?
from nuraft.
@Steamgjk
The numbers on the benchmark result page are just for reference, and of course, the performance will vary according to the environment.
And note that the workload generated by the benchmark program will not be CPU-bound unless the network is super-fast. That means having 32 cores does not help to improve the performance. Given the fact that your network environment is a bit slower than that of our data center, the discrepancy of 34.3K vs. 40K seems reasonable.
Regarding p50 latency, the number in the result page was measured with the workload of a single client thread + max throughput. You can increase your target throughput to a big number (let's say 1M) and re-measure it:
raft_bench 1 10.128.0.59:12345 120 1000000 1 256 10.128.0.73:12345 10.128.0.28:12345
If you use multiple client threads and higher throughput, a longer p50 latency is expected. Even though client threads independently call the Raft API in parallel, each request will be assigned with a unique Raft log index number, and replication should be done exactly in that order. That means some requests with bad luck should wait for the completion of the replication (including commit) of Raft logs with smaller index numbers, and this wait time is reflected in the latencies.
from nuraft.
Related Issues (20)
- buffer overflow vulnerability HOT 2
- Whether we should add consistency test. HOT 2
- bug: runtests.sh calls missing buffer_test script HOT 2
- RPC should set connection timeout. HOT 3
- OpenSSF Security Scorecard improvements
- Wrong "src" and "dst" when cs_new<req_msg> HOT 2
- deadblock HOT 1
- Clean up CMakeLists HOT 2
- nuraft 2.0.0 cannot work on centos7 + centos8 HOT 2
- Preconditions of apply_pack HOT 1
- handle_append_entries_resp() declined append HOT 1
- How to use NuRaft with CMake's FetchContent? HOT 12
- CMake targets should be namespaced using ALIAS targets
- CMake targets are missing usage requirements
- Leadership yielding is not synchronized with replicated log HOT 1
- Should the type be uint8_t instead of size_t for serialize_v1p(...) in srv_state.hxx line 133 HOT 1
- This is a question, not an issue. About `state_machine::pre_commit` HOT 2
- Does nuraft support linearizable read if generating no raft log entry of read requests? HOT 2
- Out of order call to state_machine::create_snapshot() when manually triggering a snapshot HOT 4
- Found a MSan error in Asio HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nuraft.