goraft / raft Goto Github PK
View Code? Open in Web Editor NEWUNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
License: MIT License
UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
License: MIT License
When reading the log, if a line is corrupt then truncate the log from the last correct entry.
When the server is a candidate, it should step down to a follower if it receives an AppendEntriesRequest
from a leader with an equal term.
/cc: @xiangli-cmu
I startup my raft instance and run a test to start sending data. I then ctrl-c to kill the running raft state machine. I have a channel that catches the ctrl-c and runs raft.Stop() and I then use Running() to make sure raft is not running anymore.
I then start my server and raft recovers the log. I then start to send data with the same test and get the following error.
This error is resolved by deleting the log and starting again. Maybe corruption? Not totally sure what's going on.
panic: server unable to send signal to commit channel
goroutine 9 [running]:
github.com/goraft/raft.(_server).processAppendEntriesResponse(0xc2000f1100, 0xc2000dad80)
/go/src/github.com/goraft/raft/server.go:919 +0x60c
github.com/goraft/raft.(_server).leaderLoop(0xc2000f1100)
/go/src/github.com/goraft/raft/server.go:705 +0x52a
github.com/goraft/raft.(_server).loop(0xc2000f1100)
/go/src/github.com/goraft/raft/server.go:500 +0x33e
created by github.com/goraft/raft.(_server).Start
/go/src/github.com/goraft/raft/server.go:420 +0x71e
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc200000300)
/usr/local/go/src/pkg/runtime/zsema_darwin_amd64.c:165 +0x2e
sync.(*WaitGroup).Wait(0xc2000c7600)
/usr/local/go/src/pkg/sync/waitgroup.go:109 +0xf2
main.main()
/go/src/github.com/repo/main.go:31 +0x1e6
goroutine 2 [syscall]:
goroutine 4 [syscall]:
os/signal.loop()
/usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1c
created by os/signal.init·1
/usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x2f
goroutine 5 [chan receive]:
github.com/repo/router.(_Router).sigMonitor(0xc2000db000)
/go/src/github.com/repo/router/router.go:138 +0x196
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:80 +0xeb
goroutine 6 [IO wait]:
net.runtime_pollWait(0x58ef00, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_darwin_amd64.c:118 +0x82
net.(_pollDesc).WaitRead(0xc2000cc1a0, 0x23, 0xc2000b6c60)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(_netFD).accept(0xc2000cc120, 0x335ef0, 0x0, 0xc2000b6c60, 0x23, ...)
/usr/local/go/src/pkg/net/fd_unix.go:385 +0x2c1
net.(_TCPListener).AcceptTCP(0xc200000338, 0x18, 0xc2000f2010, 0xf55f7)
/usr/local/go/src/pkg/net/tcpsock_posix.go:229 +0x45
net.(_TCPListener).Accept(0xc200000338, 0x0, 0x0, 0x0, 0xc2000b6bd0, ...)
/usr/local/go/src/pkg/net/tcpsock_posix.go:239 +0x25
net/http.(_Server).Serve(0xc2000b35f0, 0xc2000da100, 0xc200000338, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1542 +0x85
net/http.(_Server).ListenAndServe(0xc2000b35f0, 0x0, 0x0)
/usr/local/go/src/pkg/net/http/server.go:1532 +0x9e
github.com/repo/router.(_Router).listenAndServe(0xc2000db000)
/go/src/github.com/repo/router/router.go:126 +0x2c
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:81 +0x102
goroutine 8 [chan receive]:
github.com/goraft/raft.(_server).send(0xc2000f1100, 0x274920, 0xc20011c060, 0xc20011c060, 0xc200000001, ...)
/go/src/github.com/goraft/raft/server.go:515 +0xde
github.com/goraft/raft.(_server).Do(0xc2000f1100, 0xc2000f5a50, 0xc20011c060, 0x19, 0x40, ...)
/go/src/github.com/goraft/raft/server.go:768 +0x56
github.com/repo/router.(_Router).register(0xc2000db000, 0xc2001184d0, 0x8, 0x0, 0x0, ...)
/go/src/github.com/repo/router/router.go:221 +0x145
github.com/repo/router.(_Router).parseMessage(0xc2000db000, 0xc20011c030, 0x2, 0x2)
/go/src/github.com/repo/router/router.go:202 +0x120
github.com/repo/router.(_Router).messageFilter(0xc2000db000)
/go/src/github.com/repo/router/router.go:192 +0x318
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:83 +0x147
goroutine 22 [select]:
github.com/goraft/raft.func·004()
/go/src/github.com/goraft/raft/server.go:793 +0x2cc
created by github.com/goraft/raft.(*server).processCommand
/go/src/github.com/goraft/raft/server.go:803 +0x478
exit status 2
Consolidate all state changes and command execution to the event loop in server.go
.
I hope we can do better logging and error handling.
The log.Println only print to second level, which is not accurate enough for our debugging purpose.
Think more carefully about which part should be panic and which is only a temp error.
The current debugging method I take is to set heartbeat and election timeout very quick(several ms) and keep on killing leader and sending commands to the current leader.
After we do better logging and panic, we can more easily trace down the sequence and cause of the problem
/cc: @xiangli-cmu
go-raft doesn't follow with the Package Naming that golang.org recommends. Essentially, our base name should be raft instead of go-raft.
It would be nice to fix this and I was thinking we could move the repo into an organization at the same time: github.com/goraft/raft
That way the package URL doesn't get longer, it still conveys the information that this is a Go library and by using an Org it sends a signal that there is a community of people behind the project.
@benbjohnson What do you think about this? I reserved the goraft Org already and would be happy to give you admin access.
the restart one can know that by receive the first heartbeat from the current leader (but it is slow to replay all the logs after the server has been started)
maybe we can keep the committed index and flush that to disk every several seconds?
If you create a 2 node cluster and kill the follower process you can still issue writes to the leader. I would expect the leader once its discovered the follower is down to go back to trying to elect a new leader but can't because there's not enough nodes to elect one which would cause writes to fail.
I let it run for about 30 minutes and writes were still being accepted.
If you stop both nodes and only start the leader, the leader correctly rejects writes.
If the leader dies after the client send a command to it, the client will not get the callback from the "leader" it connects to. The command may be committed (the new leader has received the append entries request before the previous leader dies) or may be deleted (the new leader did not receive the append entires request).
How to notify the client?
When starting the server, the API should be simplified:
Server.Start()
method which starts node as a follower.JoinCommand
into the Raft library with a command name of raft:join
.LeaveCommand
in the Raft library with a command name of raft:leave
.We may want to combine the code in the current Server.Initialize()
into Server.Start()
since it doesn't make sense to have two calls to initialize. It'd be nice to add a Server.Initialize(name, connectionString string)
method that issues the self-join.
/cc: @xiangli-cmu
getEntriesAfter
panics with the message "raft: Index is beyond end of log: 10 20"
In general, the locking around log operations is super sketchy. I suspect, for instance, that the entirety of https://github.com/goraft/raft/blob/master/server.go#L904-920 ought to be protected by a single lock acquisition / release, for instance.
Append()
should queue entries in memory until SetCommitIndex()
has been set to an index higher than the entry.
I haven't looked into whether not these are legitimate races or test-only races but they should be cleaned up so go test -race is useful.
I've just spent part of my evening reading the source code. I notice in server.go that error
is returned in some cases, while panic(...)
is called in other error cases. Here's an example.
A Go blog article states
The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.
My question is, is there a specific reason as to why the raft library uses panic(...)
as opposed to errors? Does it distinguish between the two somehow?
The state for the server (currentTerm, votedFor) and peers (last log index/term) needs to be written to disk.
{"peers":[...]}
.ioutil.ReadFile()
.ioutil.WriteFile()
.os.Rename()
./cc: @xiangli-cmu
RequestVote
RPC to Server.We should return the committed index of the command to the client.
The client will get more power to do sync work.
What do you think about this?
With join command, all the peers will append join command.
Suppose we have 3 nodes, and node 1 is the first leader node.
Node 2 join Leader 1:
Leader: peer 2 Restart: reply join 2 -> peer 2
Peer2: None Restart: reply join 2 -> None
Node 2, Node 3 join Leader 1:
Leader: peer 2 peer 3 Restart: reply join 2 -> peer 2 reply join 3 -> peer 3
Peer2: peer 3 Restart: reply join 2 -> None reply join 3 -> peer3
Peer3: peer 2 Restart: reply join 2 -> peer3 reply join 3 -> peer2
Here is the problem, the peers will not add leader to their log.
The setState
function locks the state mutex that isn't unlocked when the event handlers are called. As a result if an event handler calls server.State()
in response to a StateChange
or LeaderChange
event, the application can deadlock.
Example gist: https://gist.github.com/nemothekid/8576383
I think this can be fixed by either unlocked the mutex before the events are called, or moving the event callbacks inside a defer
Uncommitted entries should be written to disk. When a node becomes leader with an inconsistent log index and commit index then a noop
should be performed to advance the commit index.
/cc: @ongardie
Refactor the log and commands to allow for generic binary encoding.
map[uint64]chan bool
to Server
to track notifications of entry commits.Server.commitCenter()
See conversation with @xiangli:
The do()
function should fail if the server is not the leader.
cc: @ongardie
The external interface to the library need to be pared down significantly. A client using go-raft should only need to access a couple functions on the Server
such as Start()
, Stop()
, etc. Everything else should be internal-only.
Server.Do(*Command)
.AppendEntries
RPC before being committed to log.Some application might want to let leader trigger a sync event every given time.
So I think we can provide a api to let application register sync event and send it out from leader.
@benbjohnson
Add log compaction through snapshotting or some other means.
See the Log Compaction in Raft paper for reference.
Replace the callback system in raft.Server to use the following interface instead:
type Transporter interface {
SendAppendEntriesRequest(server *Server, peer *Peer, req *AppendEntriesRequest) (*AppendEntriesResponse, error)
SendVoteRequest(server *Server, peer *Peer, req *RequestVoteRequest) (*RequestVoteResponse, error)
}
cc: @tadglines
Add documentation that describes:
Now, while Log.open() try to restore the LogEntry from log file, readBytes doesn't add the size of lengths before each entry, plus 1 newline, which is 9 bytes for each entry.
The Server.do()
method should not timeout while waiting for responses. Instead it should wait until an entry is committed or until it is notified of its demotion.
Reported by @xiangli-cmu.
Now when we remove a node, we will write an log entry.
So if a node is removed, and then came back with the same name.
It will also be removed at the previous log entry.
Or it will receive a snapshot, which does not contain itself in the conf.
To solve this, we probably need to let the removed node to send a join command and bypass the previous remove command.
The use case is that the application may want to remove a node if it does not reply to AE
for minutes. But if the node recover later, we may still want it to join in.
@benbjohnson any ideas?
@xiangli-cmu @philips I added Drone.io as the CI server since Travis has been flakey for us. I also added coveralls.io support so we now have a badge on the README.md.
You can see the coveralls page here:
https://coveralls.io/r/goraft/raft
Snapshots are in our top 5 least covered files:
I'm working on improving test coverage there.
The timer has several race conditions surrounding the initialization of timer goroutines.
Server
within a lock.Log
.When doing committing, the log is locked. This is a disk operation, and sometime it may take a longer time.
During that operation, our current implementation cannot do heartbeat. This will delay heartbeat and leads to election timeout.
We need to fix a possible unlimited wait in stopHeartbeat. Initial attempt was via #186
Via @xiangli-cmu I have found the root cause of the problem. Here is why there is a deadlock:
When the leader call removePeer it is holding the log lock, since it entry the removePeer via
setCommitIndex. The leader will send a stop signal and wait for receiving.If the peer is actually in function: flush(), it is also need to acquire the log lock at func
p.server.log.getEntriesAfter.So a deadlock happens.
/cc: @ongardie
There are duplicate events sending to the dispatcher. I will clean this up soon.
} else if entry.index == lastEntry.index && entry.index <= lastEntry.index {
return fmt.Errorf("raft.Log: Cannot append entry with earlier index in the same term (%x:%x <= %x:%x)", entry.term, entry.index, lastEntry.term, lastEntry.index)
}
Should the first condition be entry.term == lastEntry.term
?
When the server is a candidate, it can also step down due to receiving appendEntries request from new leader.
The internalTimer
in raft.Timer
is not closing when internalTimer.Stop()
is called. This needs to be done explicitly.
When the snapshot get large, to send the whole snapshot to a slow follower will not be a good idea.
Should we do incremental snapshot or keep the previous log for a longer time after the snapshot?
It would be good to have public access to this channel or a way to poll len(server.c) so we can tell if the queue is getting backed up.
Based on my testing, in many cases all the peers tend to timeout at the same time.
This may cause by all the server is created at the same time (election timeout use time.Now() as the seed?). Can we add more randomness, such as the server name?
04:59:00.588118 Name: 9, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588150 Name: 9, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588187 Name: 3, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588230 Name: 3, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588266 Name: 5, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588282 Name: 5, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588317 Name: 4, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588334 Name: 4, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588369 Name: 2, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588383 Name: 2, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588419 Name: 6, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588434 Name: 6, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588471 Name: 7, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588486 Name: 7, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588521 Name: 1, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588535 Name: 1, State: candidate, Term: 31629, Index: 483708 start Select
Commands should wait indefinitely until committed unless an application-specific timeout is set.
The library currently times out after 1 second:
https://github.com/goraft/raft/blob/master/server.go#L751
/cc: @xiangli-cmu
I will try to draw lock dependencies picture. And try to change Mutex to RWLock when needed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.