Comments (19)
Hi @michaelqxd ,
First of all, adding S2
will not succeed if S2
is offline at the time you add server.
Anyway, let's assume that S2
has been successfully added.
S1
is a leader who periodically sends heartbeat to all followers. If S2
is offline, there will be no response from S2
so that S1
knows S2
is down. If S2
becomes online, S2
will respond to the heartbeat, and both S1
and S2
will know other nodes are currently alive.
from nuraft.
Thanks for your response,
i think S1 need periodically sends heartbeat to all followers.
yes, but this was not implement in Nuraft. do you mean app need implement it?
Thanks
from nuraft.
Heartbeat is one of fundamentals in Raft, of course NuRaft has it. You can adjust the interval of heartbeat in here:
NuRaft/include/libnuraft/raft_params.hxx
Line 303 in 9e4efd3
from nuraft.
i try this case:
-
start S1
./calc_server 1 127.0.0.1:1000 -
add S2 ( S2 is not up this time)
add 2 127.0.0.1:1001 -
S2 up
./calc_server 2 127.0.0.l:1001 -
check server list in S1 and S2
calc 1> ls
server id 1: 127.0.0.1:1000 (LEADER)
calc 2> ls
server id 2: 127.0.0.1:1001 (LEADER)
in this case, S2 was not add into the cluster.
from nuraft.
Hi @michaelqxd
As I already mentioned, adding server succeeds only when the server (to be added) is online, since it requires a sort of handshaking (synchronizing logs, configurations, ...). So you need to do step 3 first, and then do step 2.
After adding server is done, S2 will be in the list even after you kill S2.
By the way, please note that the list by ls
represents the list of members in the group, not the list of "live" nodes.
from nuraft.
@greensky00 Following up the question, does nuraft support a follower to join an existing cluster? Say a new node S3 is up, how does S3 to request to join the (S1, S2) cluster if S3 knows the leader's ip and port, and the leader at this point doesn't know S3 is gonna join?
from nuraft.
Hi @xuluna,
You can add a new follower to an existing & running cluster. But that new follower should be either 1) empty or 2) having consistent Raft log and state machine data (but don't need to be up-to-date).
To add S3
to an existing cluster, you need to first figure it out who the current leader is, and then call add_srv
API in leader, passing the info (including address and port) of S3
.
There is also an API to get the ID of the current leader from any node:
NuRaft/include/libnuraft/raft_server.hxx
Line 322 in 9e4efd3
from nuraft.
@greensky00 Thank you for your reply. My understanding is when S3 is launched, it will have his own instance of raft_server, and at this point, himself is forming a single node raft cluster. At this point, S3 doesn't know the information of cluster {S1, S2} yet since S3 is not added to the {S1, S2} cluster by the leader. By calling his own (S3's) instance of rafter_server->get_leader() will return himself because now only S3 is in the cluster of {S3}. My question was, how do S3 add himself into cluster of {S1, S2}.
from nuraft.
@xuluna Now I get your point. There is no way that S3
knows about any existing clusters by itself. Someone (outside Raft) should be aware of it and coordinate these membership change things.
In eBay, we have a separate Coordinator component that manages adding/removing servers to/from Raft clusters. That is not the part of Raft, and those kind of cluster reorganization do not happen frequently.
from nuraft.
@greensky00 Thanks for your answer!
from nuraft.
Hello, when we were testing this today, for the echo_server example program, we were able to add servers locally over localhost. But, when we tried to add a remote server by providing it's IP address and port, this did not work. (We were unable to add any remote servers to the cluster.) Can you provide any advice?
from nuraft.
Hi @farrerm , when you execute echo server binary, did you also put real IP address instead of localhost
? ./echo_server 1 10.20.30.40:12345
for example. If it doesn't work even though you put correct IP address (for both execution and adding server), usually this is a port binding issue. That 10.20.30.40:12345
endpoint itself should be accessible outside the server.
from nuraft.
Hi, thanks very much for the response! Yes, we did figure out the communication problem, which was caused by firewalls. It was not related to Nuraft code at all.
May I ask another question? Our next task is to modify the echo_server code so that any server can propose log entries, not just the leader. I was reviewing the code last night, and I think this would work by having a follower send a message to the leader. The message would contain the log entry. After the leader receives it, the leader inserts the message in its log, and then the cluster replicates the log entry.
Am I thinking of this the right way? Is there an easier way to do it? I think I saw some functions for sending messages between servers, and handling messages. I think this would be like simulating a "client" message. thanks!
from nuraft.
@farrerm
Yes, that will work. The only concern is it requires additional network hop (for redirecting from follower to leader).
There is an option to do that behavior automatically:
NuRaft/include/libnuraft/raft_params.hxx
Line 439 in f495e3a
but we internally don't use this option. It's not been used long time, so not sure whether it works or not.
from nuraft.
Hi, thank you so much! Your suggestion works! Followers can enter messages with "msg" command. Initially terminal says "replication failed", but then the message commits and is printed by the leader.
Ok, I think we are going to do something with NuRaft for our class project!
from nuraft.
By the way, I am going to apply for summer internship at EBay!
from nuraft.
Great, hope to see you at here :)
from nuraft.
Hi, we're still working on a peer to peer chat app using NuRaft. We started with echo_server. We are trying to implement leaving the chat. We found a function, remove_srv() that can remove a server from the cluster. Based on documentation here, the server in question must be active when it is removed, or it will be "forced removed". But, we did not observe the "forced remove" happen. Also, is "force remove" undesirable? thanks!
from nuraft.
Hi @farrerm
Forced removal will happen after 5 heartbeats, if the leadership is still valid (i.e., leader can communicate to a majority of servers). You will see a log like this
[09:40:07.147 676] [tid fc77] [WARN] [handle_timeout.cxx:51, check_srv_to_leave_timeout()]
server to be removed 3, response timeout 4032 ms. force remove now
Force remove is not recommended, as the "forced removed" server doesn't know whether it is abandoned or not. Once that removed server restarts, it tries to do leader election on the cluster where the server no longer exists (so that the size of quorum may be different). NuRaft already has some safeguards against this situation, but not 100% sure.
from nuraft.
Related Issues (20)
- nuraft 2.0.0 cannot work on centos7 + centos8 HOT 2
- Preconditions of apply_pack HOT 1
- handle_append_entries_resp() declined append HOT 1
- How to use NuRaft with CMake's FetchContent? HOT 12
- CMake targets should be namespaced using ALIAS targets
- CMake targets are missing usage requirements
- Leadership yielding is not synchronized with replicated log HOT 1
- Should the type be uint8_t instead of size_t for serialize_v1p(...) in srv_state.hxx line 133 HOT 1
- This is a question, not an issue. About `state_machine::pre_commit` HOT 2
- Does nuraft support linearizable read if generating no raft log entry of read requests? HOT 2
- Out of order call to state_machine::create_snapshot() when manually triggering a snapshot HOT 4
- Found a MSan error in Asio HOT 4
- Data race in peer HOT 3
- auto_fwd_resp_handler should forward resp->get_result_code
- src/tracer.hxx:52 stack-buffer-overflow when using vsnprintf's return value
- cmd_result::get_result_str missing `SERVER_IS_LEAVING`
- The semantics of the nuraft::snapshot HOT 3
- Merge two clusters into one HOT 1
- Misleading append_entries result code HOT 3
- rpc_listener HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nuraft.