GithubHelp home page GithubHelp logo

is NuRaft support dynamic find peer? about nuraft HOT 19 OPEN

ebay avatar ebay commented on July 22, 2024
is NuRaft support dynamic find peer?

from nuraft.

Comments (19)

greensky00 avatar greensky00 commented on July 22, 2024

Hi @michaelqxd ,

First of all, adding S2 will not succeed if S2 is offline at the time you add server.

Anyway, let's assume that S2 has been successfully added.

S1 is a leader who periodically sends heartbeat to all followers. If S2 is offline, there will be no response from S2 so that S1 knows S2 is down. If S2 becomes online, S2 will respond to the heartbeat, and both S1 and S2 will know other nodes are currently alive.

from nuraft.

michaelqxd avatar michaelqxd commented on July 22, 2024

Thanks for your response,

i think S1 need periodically sends heartbeat to all followers.
yes, but this was not implement in Nuraft. do you mean app need implement it?

Thanks

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Heartbeat is one of fundamentals in Raft, of course NuRaft has it. You can adjust the interval of heartbeat in here:

// Heartbeat interval, in millisecond.

from nuraft.

michaelqxd avatar michaelqxd commented on July 22, 2024

i try this case:

  1. start S1
    ./calc_server 1 127.0.0.1:1000

  2. add S2 ( S2 is not up this time)
    add 2 127.0.0.1:1001

  3. S2 up
    ./calc_server 2 127.0.0.l:1001

  4. check server list in S1 and S2
    calc 1> ls
    server id 1: 127.0.0.1:1000 (LEADER)

calc 2> ls
server id 2: 127.0.0.1:1001 (LEADER)

in this case, S2 was not add into the cluster.

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Hi @michaelqxd
As I already mentioned, adding server succeeds only when the server (to be added) is online, since it requires a sort of handshaking (synchronizing logs, configurations, ...). So you need to do step 3 first, and then do step 2.

After adding server is done, S2 will be in the list even after you kill S2.

By the way, please note that the list by ls represents the list of members in the group, not the list of "live" nodes.

from nuraft.

xuluna avatar xuluna commented on July 22, 2024

@greensky00 Following up the question, does nuraft support a follower to join an existing cluster? Say a new node S3 is up, how does S3 to request to join the (S1, S2) cluster if S3 knows the leader's ip and port, and the leader at this point doesn't know S3 is gonna join?

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Hi @xuluna,

You can add a new follower to an existing & running cluster. But that new follower should be either 1) empty or 2) having consistent Raft log and state machine data (but don't need to be up-to-date).

To add S3 to an existing cluster, you need to first figure it out who the current leader is, and then call add_srv API in leader, passing the info (including address and port) of S3.

There is also an API to get the ID of the current leader from any node:

int32 get_leader() const { return leader_; }

from nuraft.

xuluna avatar xuluna commented on July 22, 2024

@greensky00 Thank you for your reply. My understanding is when S3 is launched, it will have his own instance of raft_server, and at this point, himself is forming a single node raft cluster. At this point, S3 doesn't know the information of cluster {S1, S2} yet since S3 is not added to the {S1, S2} cluster by the leader. By calling his own (S3's) instance of rafter_server->get_leader() will return himself because now only S3 is in the cluster of {S3}. My question was, how do S3 add himself into cluster of {S1, S2}.

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

@xuluna Now I get your point. There is no way that S3 knows about any existing clusters by itself. Someone (outside Raft) should be aware of it and coordinate these membership change things.

In eBay, we have a separate Coordinator component that manages adding/removing servers to/from Raft clusters. That is not the part of Raft, and those kind of cluster reorganization do not happen frequently.

from nuraft.

xuluna avatar xuluna commented on July 22, 2024

@greensky00 Thanks for your answer!

from nuraft.

farrerm avatar farrerm commented on July 22, 2024

Hello, when we were testing this today, for the echo_server example program, we were able to add servers locally over localhost. But, when we tried to add a remote server by providing it's IP address and port, this did not work. (We were unable to add any remote servers to the cluster.) Can you provide any advice?

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Hi @farrerm , when you execute echo server binary, did you also put real IP address instead of localhost? ./echo_server 1 10.20.30.40:12345 for example. If it doesn't work even though you put correct IP address (for both execution and adding server), usually this is a port binding issue. That 10.20.30.40:12345 endpoint itself should be accessible outside the server.

from nuraft.

farrerm avatar farrerm commented on July 22, 2024

Hi, thanks very much for the response! Yes, we did figure out the communication problem, which was caused by firewalls. It was not related to Nuraft code at all.

May I ask another question? Our next task is to modify the echo_server code so that any server can propose log entries, not just the leader. I was reviewing the code last night, and I think this would work by having a follower send a message to the leader. The message would contain the log entry. After the leader receives it, the leader inserts the message in its log, and then the cluster replicates the log entry.

Am I thinking of this the right way? Is there an easier way to do it? I think I saw some functions for sending messages between servers, and handling messages. I think this would be like simulating a "client" message. thanks!

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

@farrerm
Yes, that will work. The only concern is it requires additional network hop (for redirecting from follower to leader).

There is an option to do that behavior automatically:

bool auto_forwarding_;

but we internally don't use this option. It's not been used long time, so not sure whether it works or not.

from nuraft.

farrerm avatar farrerm commented on July 22, 2024

Hi, thank you so much! Your suggestion works! Followers can enter messages with "msg" command. Initially terminal says "replication failed", but then the message commits and is printed by the leader.

Ok, I think we are going to do something with NuRaft for our class project!

from nuraft.

farrerm avatar farrerm commented on July 22, 2024

By the way, I am going to apply for summer internship at EBay!

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Great, hope to see you at here :)

from nuraft.

farrerm avatar farrerm commented on July 22, 2024

Hi, we're still working on a peer to peer chat app using NuRaft. We started with echo_server. We are trying to implement leaving the chat. We found a function, remove_srv() that can remove a server from the cluster. Based on documentation here, the server in question must be active when it is removed, or it will be "forced removed". But, we did not observe the "forced remove" happen. Also, is "force remove" undesirable? thanks!

from nuraft.

greensky00 avatar greensky00 commented on July 22, 2024

Hi @farrerm
Forced removal will happen after 5 heartbeats, if the leadership is still valid (i.e., leader can communicate to a majority of servers). You will see a log like this

 [09:40:07.147 676] [tid fc77] [WARN] [handle_timeout.cxx:51, check_srv_to_leave_timeout()]
server to be removed 3, response timeout 4032 ms. force remove now

Force remove is not recommended, as the "forced removed" server doesn't know whether it is abandoned or not. Once that removed server restarts, it tries to do leader election on the cluster where the server no longer exists (so that the size of quorum may be different). NuRaft already has some safeguards against this situation, but not 100% sure.

from nuraft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.