goraft / raft Goto Github PK

View Code? Open in Web Editor NEW

2.4K 2.4K 480.0 1.9 MB

UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.

License: MIT License

Makefile 0.16% Go 95.88% Protocol Buffer 3.96%

raft's People

Contributors

Stargazers

Watchers

Forkers

t3rm1n4l muntasirraihan philips jamestryand discobot coreos timcubb olivere matttproud petar ceejbot donzell mistobaan juliusv bergwolf gnomix dreamfrog imaxxs mattn zionwu dogpandacat grncdr jayvansantos redisoptimal wariosg zhanglei syupei sysbot glycerine jensrantil baruch klobucar nimishzynga jjoos gdb jbcrail bcwaldon swinghu wangsikuan ajroetker paulj danpalmer abyx swwu metcalf snakescott tomlea dwahler lfranchi euoia g15ecb yogendersolanki91 yichengq macb tsenart jimmiebtlr liucw2012 fabware amscanne wuzesheng lshmouse fanscout liucw0202 bussiere achun2080 is00hcw ivanmarcin rdallman vikram tracymacding martin-ly prestontao m-xparker yyzi rbolkey latlontude smuir soheilhy leither jrallison jstamerj chengzhong davidajohnston lafengnan leverly tomzhang apsaltis haolei pombredanne yanyiwu digambar15 jsipprell cjhcms junjieguo blinkinglight chihiro-py alex-quiterio tangyang hongbing dessalines

raft's Issues

Log Recovery

When reading the log, if a line is corrupt then truncate the log from the last correct entry.

Detect new leader with same term as candidate

When the server is a candidate, it should step down to a follower if it receives an AppendEntriesRequest from a leader with an equal term.

/cc: @xiangli-cmu

Error after stopping a single raft instance

I startup my raft instance and run a test to start sending data. I then ctrl-c to kill the running raft state machine. I have a channel that catches the ctrl-c and runs raft.Stop() and I then use Running() to make sure raft is not running anymore.

I then start my server and raft recovers the log. I then start to send data with the same test and get the following error.

This error is resolved by deleting the log and starting again. Maybe corruption? Not totally sure what's going on.

panic: server unable to send signal to commit channel

goroutine 9 [running]:
github.com/goraft/raft.(_server).processAppendEntriesResponse(0xc2000f1100, 0xc2000dad80)
/go/src/github.com/goraft/raft/server.go:919 +0x60c
github.com/goraft/raft.(_server).leaderLoop(0xc2000f1100)
/go/src/github.com/goraft/raft/server.go:705 +0x52a
github.com/goraft/raft.(_server).loop(0xc2000f1100)
/go/src/github.com/goraft/raft/server.go:500 +0x33e
created by github.com/goraft/raft.(_server).Start
/go/src/github.com/goraft/raft/server.go:420 +0x71e

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc200000300)
/usr/local/go/src/pkg/runtime/zsema_darwin_amd64.c:165 +0x2e
sync.(*WaitGroup).Wait(0xc2000c7600)
/usr/local/go/src/pkg/sync/waitgroup.go:109 +0xf2
main.main()
/go/src/github.com/repo/main.go:31 +0x1e6

goroutine 2 [syscall]:

goroutine 4 [syscall]:
os/signal.loop()
/usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1c
created by os/signal.init·1
/usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x2f

goroutine 5 [chan receive]:
github.com/repo/router.(_Router).sigMonitor(0xc2000db000)
/go/src/github.com/repo/router/router.go:138 +0x196
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:80 +0xeb

goroutine 6 [IO wait]:
net.runtime_pollWait(0x58ef00, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_darwin_amd64.c:118 +0x82
net.(_pollDesc).WaitRead(0xc2000cc1a0, 0x23, 0xc2000b6c60)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(_netFD).accept(0xc2000cc120, 0x335ef0, 0x0, 0xc2000b6c60, 0x23, ...)
/usr/local/go/src/pkg/net/fd_unix.go:385 +0x2c1
net.(_TCPListener).AcceptTCP(0xc200000338, 0x18, 0xc2000f2010, 0xf55f7)
/usr/local/go/src/pkg/net/tcpsock_posix.go:229 +0x45
net.(_TCPListener).Accept(0xc200000338, 0x0, 0x0, 0x0, 0xc2000b6bd0, ...)
/usr/local/go/src/pkg/net/tcpsock_posix.go:239 +0x25
net/http.(_Server).Serve(0xc2000b35f0, 0xc2000da100, 0xc200000338, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1542 +0x85
net/http.(_Server).ListenAndServe(0xc2000b35f0, 0x0, 0x0)
/usr/local/go/src/pkg/net/http/server.go:1532 +0x9e
github.com/repo/router.(_Router).listenAndServe(0xc2000db000)
/go/src/github.com/repo/router/router.go:126 +0x2c
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:81 +0x102

goroutine 8 [chan receive]:
github.com/goraft/raft.(_server).send(0xc2000f1100, 0x274920, 0xc20011c060, 0xc20011c060, 0xc200000001, ...)
/go/src/github.com/goraft/raft/server.go:515 +0xde
github.com/goraft/raft.(_server).Do(0xc2000f1100, 0xc2000f5a50, 0xc20011c060, 0x19, 0x40, ...)
/go/src/github.com/goraft/raft/server.go:768 +0x56
github.com/repo/router.(_Router).register(0xc2000db000, 0xc2001184d0, 0x8, 0x0, 0x0, ...)
/go/src/github.com/repo/router/router.go:221 +0x145
github.com/repo/router.(_Router).parseMessage(0xc2000db000, 0xc20011c030, 0x2, 0x2)
/go/src/github.com/repo/router/router.go:202 +0x120
github.com/repo/router.(_Router).messageFilter(0xc2000db000)
/go/src/github.com/repo/router/router.go:192 +0x318
created by github.com/repo/router.(_Router).Start
/go/src/github.com/repo/router/router.go:83 +0x147

goroutine 22 [select]:
github.com/goraft/raft.func·004()
/go/src/github.com/goraft/raft/server.go:793 +0x2cc
created by github.com/goraft/raft.(*server).processCommand
/go/src/github.com/goraft/raft/server.go:803 +0x478
exit status 2

Server Event Loop

Consolidate all state changes and command execution to the event loop in server.go.

Better logging and debugging method

I hope we can do better logging and error handling.

The log.Println only print to second level, which is not accurate enough for our debugging purpose.

Think more carefully about which part should be panic and which is only a temp error.

The current debugging method I take is to set heartbeat and election timeout very quick(several ms) and keep on killing leader and sending commands to the current leader.

After we do better logging and panic, we can more easily trace down the sequence and cause of the problem

Timer Clean Up

/cc: @xiangli-cmu

Moving this library to a correct base name?

go-raft doesn't follow with the Package Naming that golang.org recommends. Essentially, our base name should be raft instead of go-raft.

It would be nice to fix this and I was thinking we could move the repo into an organization at the same time: github.com/goraft/raft

That way the package URL doesn't get longer, it still conveys the information that this is a Go library and by using an Org it sends a signal that there is a community of people behind the project.

@benbjohnson What do you think about this? I reserved the goraft Org already and would be happy to give you admin access.

The restart node should only apply committed logs

the restart one can know that by receive the first heartbeat from the current leader (but it is slow to replay all the logs after the server has been started)

maybe we can keep the committed index and flush that to disk every several seconds?

Writes accepted in a 2 node cluster if the follower is down

If you create a 2 node cluster and kill the follower process you can still issue writes to the leader. I would expect the leader once its discovered the follower is down to go back to trying to elect a new leader but can't because there's not enough nodes to elect one which would cause writes to fail.

I let it run for about 30 minutes and writes were still being accepted.

If you stop both nodes and only start the leader, the leader correctly rejects writes.

Command commit callback issue

If the leader dies after the client send a command to it, the client will not get the callback from the "leader" it connects to. The command may be committed (the new leader has received the append entries request before the previous leader dies) or may be deleted (the new leader did not receive the append entires request).

How to notify the client?

Simplify Server Initialization

When starting the server, the API should be simplified:

Simplify to a single Server.Start() method which starts node as a follower.
If log entries exist then allow promotion to candidate if no AEs received.
If no log entries exist then wait for AEs from another node.
If no log entries exist and a self-join command is issued then immediately become leader and commit entry.
Only allow the first log entry to be a self-join.
Move JoinCommand into the Raft library with a command name of raft:join.
Create a LeaveCommand in the Raft library with a command name of raft:leave.

We may want to combine the code in the current Server.Initialize() into Server.Start() since it doesn't make sense to have two calls to initialize. It'd be nice to add a Server.Initialize(name, connectionString string) method that issues the self-join.

/cc: @xiangli-cmu

Race condition during leader stepdown

Node 1 is leader. It has 20 entries in its log, the first 10 of which are committed. It has pushed all 20 to Node 2, but let's assume Nodes 3, 4, and 5 haven't seen any of these uncommitted entries.
Somehow Node 3 gets elected while Node 1 isn't looking. It only has 10 entries in its log.
Node 3 sends Node 1 an AppendEntries with a higher term number, and Node 1 steps down. It truncates its log to entry 10.
In a separate goroutine, Node 1 prepares to send an AppendEntries request to Node 2 (that goroutine has no idea a step-down is in progress). It notes that the last entry it replicated to Node 2 was entry 20, and attempts to retrieve all entries in the log from 20 onwards.
getEntriesAfter panics with the message "raft: Index is beyond end of log: 10 20"

In general, the locking around log operations is super sketchy. I suspect, for instance, that the entirety of https://github.com/goraft/raft/blob/master/server.go#L904-920 ought to be protected by a single lock acquisition / release, for instance.

Log Commit

Append() should queue entries in memory until SetCommitIndex() has been set to an index higher than the entry.

go test -race fails

I haven't looked into whether not these are legitimate races or test-only races but they should be cleaned up so go test -race is useful.

Panics: leaky abstractions

I've just spent part of my evening reading the source code. I notice in server.go that error is returned in some cases, while panic(...) is called in other error cases. Here's an example.

A Go blog article states

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.

My question is, is there a specific reason as to why the raft library uses panic(...) as opposed to errors? Does it distinguish between the two somehow?

Persist Server & Peer State to Disk

The state for the server (currentTerm, votedFor) and peers (last log index/term) needs to be written to disk.

Draft of new server.go

https://gist.github.com/xiangli-cmu/04d291ae7002d7366e34

Cluster Configuration

Change cluster configuration file to use JSON: {"peers":[...]}.
Atomically read config file using ioutil.ReadFile().
Atomically write config file using ioutil.WriteFile().
Write config file to temporary location and then swap using os.Rename().

/cc: @xiangli-cmu

Leader Election

Add RequestVote RPC to Server.
Add state (Follower, Candidate, Leader) to Server.

Return the committed Index to the client

We should return the committed index of the command to the client.
The client will get more power to do sync work.

What do you think about this?

Cluster configuration issue

With join command, all the peers will append join command.
Suppose we have 3 nodes, and node 1 is the first leader node.

Node 2 join Leader 1:
Leader: peer 2 Restart: reply join 2 -> peer 2
Peer2: None Restart: reply join 2 -> None

Node 2, Node 3 join Leader 1:
Leader: peer 2 peer 3 Restart: reply join 2 -> peer 2 reply join 3 -> peer 3
Peer2: peer 3 Restart: reply join 2 -> None reply join 3 -> peer3
Peer3: peer 2 Restart: reply join 2 -> peer3 reply join 3 -> peer2

Here is the problem, the peers will not add leader to their log.

DeadLock with LeaderChange/StateChange Events

The setState function locks the state mutex that isn't unlocked when the event handlers are called. As a result if an event handler calls server.State() in response to a StateChange or LeaderChange event, the application can deadlock.

Example gist: https://gist.github.com/nemothekid/8576383

I think this can be fixed by either unlocked the mutex before the events are called, or moving the event callbacks inside a defer

Write Uncommitted Entries to Disk

Uncommitted entries should be written to disk. When a node becomes leader with an inconsistent log index and commit index then a noop should be performed to advance the commit index.

/cc: @ongardie

Binary Log

Refactor the log and commands to allow for generic binary encoding.

Log refactor
Command refactor

Move Client Commit Notification From Log to Server

Add map[uint64]chan bool to Server to track notifications of entry commits.
Sending to the channel should occur in Server.commitCenter()
Notifications only applicable to Leaders.
Do() function should manage adding and removing channels.
Demotion from Leader should close all channels and clear out map.

See conversation with @xiangli:

e09c874#commitcomment-3512519

Command Execution Should Fail If Not Leader

The do() function should fail if the server is not the leader.

cc: @ongardie

External Interface Cleanup

The external interface to the library need to be pared down significantly. A client using go-raft should only need to access a couple functions on the Server such as Start(), Stop(), etc. Everything else should be internal-only.

Log Replication

Add Server.Do(*Command).
Commands should be sent via AppendEntries RPC before being committed to log.

Add sync registration and method

Some application might want to let leader trigger a sync event every given time.
So I think we can provide a api to let application register sync event and send it out from leader.
@benbjohnson

Log Compaction

Add log compaction through snapshotting or some other means.

See the Log Compaction in Raft paper for reference.

Transport Interface

Replace the callback system in raft.Server to use the following interface instead:

type Transporter interface {
    SendAppendEntriesRequest(server *Server, peer *Peer, req *AppendEntriesRequest) (*AppendEntriesResponse, error)
    SendVoteRequest(server *Server, peer *Peer, req *RequestVoteRequest) (*RequestVoteResponse, error)
}

cc: @tadglines

Documentation

Add documentation that describes:

An overview of Raft.
How to implement the go-raft library.
Do's and Don'ts of using the library.
Limitations of the library.
Future Enhancements.

When openning log file, readBytes should add the bytes of lengths of every LogEntry

Now, while Log.open() try to restore the LogEntry from log file, readBytes doesn't add the size of lengths before each entry, plus 1 newline, which is 9 bytes for each entry.

Do() should not end unless committed or demoted

The Server.do() method should not timeout while waiting for responses. Instead it should wait until an entry is committed or until it is notified of its demotion.

Reported by @xiangli-cmu.

Add back removed node

Now when we remove a node, we will write an log entry.
So if a node is removed, and then came back with the same name.

It will also be removed at the previous log entry.
Or it will receive a snapshot, which does not contain itself in the conf.

To solve this, we probably need to let the removed node to send a join command and bypass the previous remove command.

The use case is that the application may want to remove a node if it does not reply to AE for minutes. But if the node recover later, we may still want it to join in.

@benbjohnson any ideas?

Add Code Coverage Reporting

@xiangli-cmu @philips I added Drone.io as the CI server since Travis has been flakey for us. I also added coveralls.io support so we now have a badge on the README.md.

You can see the coveralls page here:

https://coveralls.io/r/goraft/raft

Snapshots are in our top 5 least covered files:

I'm working on improving test coverage there.

Timer Race Conditions

The timer has several race conditions surrounding the initialization of timer goroutines.

Simplify Log

Ensure that log is only accessed from the Server within a lock.
Remove all locks from the Log.

commit log and heartbeat issue

When doing committing, the log is locked. This is a disk operation, and sometime it may take a longer time.
During that operation, our current implementation cannot do heartbeat. This will delay heartbeat and leads to election timeout.

Avoid unlimited wait in stopHeartbeat

We need to fix a possible unlimited wait in stopHeartbeat. Initial attempt was via #186

Via @xiangli-cmu I have found the root cause of the problem. Here is why there is a deadlock:
When the leader call removePeer it is holding the log lock, since it entry the removePeer via
setCommitIndex. The leader will send a stop signal and wait for receiving.

If the peer is actually in function: flush(), it is also need to acquire the log lock at func
p.server.log.getEntriesAfter.

So a deadlock happens.

Use Randomized Timeout After Split Vote

/cc: @ongardie

duplicate events

There are duplicate events sending to the dispatcher. I will clean this up soon.

Bug in log.appendEntry?

log.go:390:

} else if entry.index == lastEntry.index && entry.index <= lastEntry.index {
    return fmt.Errorf("raft.Log: Cannot append entry with earlier index in the same term (%x:%x <= %x:%x)", entry.term, entry.index, lastEntry.term, lastEntry.index)
}

Should the first condition be entry.term == lastEntry.term?

Clean up promote func

When the server is a candidate, it can also step down due to receiving appendEntries request from new leader.

Fix raft.Timer Channel

The internalTimer in raft.Timer is not closing when internalTimer.Stop() is called. This needs to be done explicitly.

Snapshot issue

When the snapshot get large, to send the whole snapshot to a slow follower will not be a good idea.

Should we do incremental snapshot or keep the previous log for a longer time after the snapshot?

Feature: access to the len(server.c)

It would be good to have public access to this channel or a way to poll len(server.c) so we can tell if the queue is getting backed up.

https://github.com/goraft/raft/blob/master/server.go#L171

Randomness issue

Based on my testing, in many cases all the peers tend to timeout at the same time.
This may cause by all the server is created at the same time (election timeout use time.Now() as the seed?). Can we add more randomness, such as the server name?

04:59:00.588118 Name: 9, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588150 Name: 9, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588187 Name: 3, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588230 Name: 3, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588266 Name: 5, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588282 Name: 5, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588317 Name: 4, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588334 Name: 4, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588369 Name: 2, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588383 Name: 2, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588419 Name: 6, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588434 Name: 6, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588471 Name: 7, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588486 Name: 7, State: candidate, Term: 31629, Index: 483708 start Select
04:59:00.588521 Name: 1, State: follower, Term: 31628, Index: 483708 start Candiate
04:59:00.588535 Name: 1, State: candidate, Term: 31629, Index: 483708 start Select

Adjustable Commit Timeout

Commands should wait indefinitely until committed unless an application-specific timeout is set.

The library currently times out after 1 second:

https://github.com/goraft/raft/blob/master/server.go#L751

/cc: @xiangli-cmu

Lock dependencies picture and then clean up locks

I will try to draw lock dependencies picture. And try to change Mutex to RWLock when needed.

goraft / raft Goto Github PK

raft's People

Contributors

Stargazers

Watchers

Forkers

raft's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs