xraft's Introduction


A raft implementation of XnnYygn's.

I want to make something with netty framework, and I found raft. Raft is interesting. As the first distributed consensus algorithm I learnt, I read the paper and implemented almost all of the feature of raft including

  • Leader election and log replication
  • Membership change(one server change)
  • Log compaction

All these feature are implemented in xraft-core. And the client interaction in raft, I thought, should be the feature of service based on xraft-core. Until now, I made a simple key value store based on xraft-core, called xraft-kvstore. It supports GET and SET command.


To test xraft with xraft-kvstore, you can download xraft and run xraft-kvstore, xraft-kvstore-cli.


Java 1.8+ is required to run xraft. You can run java -version to check the version of java on your computer.


You can get complied xraft in releases.

Run Server

xraft-kvstore under the bin directory is the command to run xraft kvstore server.

To demostrate a xraft cluster with 3 nodes(memory log mode),

  • node A, host localhost, port raft node 2333, port kvstore 3333
  • node B, host localhost, port raft node 2334, port kvstore 3334
  • node C, host localhost, port raft node 2335, port kvstore 3335

start servers with commands below

Terminal A

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i A -p2 3333

Terminal B

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i B -p2 3334

Terminal C

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i C -p2 3335

Since the minimum election timeout is 3 seconds, if you cannot execute all 3 commands within 3 seconds, you will get some error like failed to connect ..... But after you started all nodes, the error will disapper.

After start, you will see something like become leader, current leader is xxx and it shows the cluster is started and leader election is ok.

Run Client

Run xraft-kvstore-cli with the cluster configuration. The client will not connect to any node in cluster so it is ok to run client before cluster starts.

$ bin/xraft-kvstore-cli -gc A,localhost,3333 B,localhost,3334 C,localhost,3335

It will run an interative console, press TAB two times and you will get the available commands. For this demostration, firstly run

> kvstore-get x

and you should get the result null. Then run

> kvstore-set x 1

nothing will be printed, now you can run get again.

> kvstore-get x

1 should be printed.

New Service

How to create new service based on xraft-core?

For more detailed implementation of new service, see the source code of xraft-kvstore.


xraft use Maven as build system.

$ mvn clean compile install

To package xraft-kvstore

$ cd xraft-kvstore
$ mvn package assembly:single

About PreVote

If you are looking for Raft optimization PreVote, please check develop branch.

Consistency in xraft-kvstore

To make the implmenetation simple, the xraft-kvstore just reads the value in the concurrent hash map, which actually could be a stale value. There is an optimiation in develop branch called readindex to offer consistent read. If you need consistent read or want to know how to implement it, please refer to develop branch.


This project is licensed under the MIT License.




xraft's Issues



  1. 实际的心跳间隔为2秒,猜测一秒太短来不及执行?
  2. kvstore-set指令返回极慢,猜测由于心跳命令走的相同的链路,导致set指令得不到执行

VotedFor is not stored when a node is candidate and receives an AppendEntriesRpc

Here the candidate should store the message sender, i.e., current leader, in votedFor.

becomeFollower(rpc.getTerm(), null, rpc.getLeaderId(), true);

This issue can result in two leader with the same term in the following complex scenario.
Consider a cluster of 5 nodes. Node 1 ~ 3 become candidates with term 1. Node 4 and 5 are followers.

Node 1 receives votes from 4 and 5. It becomes the leader. However, Node 2 and 3 still remain as candidate.
Node 1 receives AppendEntry request, and send messages to Node 2 and 3.
Node 2 and 3 step down to followers, and set votedFor as null (by above code). Here they do not write logs yet.
Node 4 restarts and becomes candidates with term 1. It requests Node 2 and Node 3 for votes.
Node 2 and 3 finds their votedFor is null, and their logs are not newer than Node 4. Thus, they vote for Node 4.
Node 4 becomes the leader.

Here both Node 1 and Node 4 are the leader with term 1, which violate Raft's safety property.






[ERROR] Failed to execute goal on project xraft-kvstore: Could not resolve dependencies for project in.xnnyygn.xraft:xraft-kvstore:jar:0.1.0-SNAPSHOT: Could not find artifact in.xnnyygn.xraft:xraft-core:jar:0.1.0-SNAPSHOT -> [Help 1]

While trying to build the source code, execute this command:
cd xraft-kvstore && mvn package assembly:single

I got this error:

[INFO] ------------------------------------------------------------------------                                                    
[INFO] BUILD FAILURE                                                                                                               
[INFO] ------------------------------------------------------------------------                                                    
[INFO] Total time:  13.622 s                                                                                                      
[INFO] Finished at: 2020-09-28T10:16:39Z                                                                                           
[INFO] ------------------------------------------------------------------------                                                    
[ERROR] Failed to execute goal on project xraft-kvstore: Could not resolve dependencies for project in.xnnyygn.xraft:xraft-kvstore :jar:0.1.0-SNAPSHOT: Could not find artifact in.xnnyygn.xraft:xraft-core:jar:0.1.0-SNAPSHOT -> [Help 1]                           
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.                                                
[ERROR] Re-run Maven using the -X switch to enable full debug logging.                                                             
[ERROR] For more information about the errors and possible solutions, please read the following articles:                          
[ERROR] [Help 1]


Am I missing to import something...?


以A B C 为集群
1 生成log-0 log-10 log-15 三个日志,此时log-15已经生成快照
2 这时候宕机A,B,C 并且删除B的日志
3 此时启动A,C A,C节点开始选举 term增加【假设为9】,但是快照的term还没有增加【假设为7】
4 此时B重新启动 开始installing快照 ;快照安装成功后,开始追加日志
追加日志的时候,取消息的term和本地快照的term比较 但是本地的来源于刚刚的installing的term

AbstractLog - previous log index matches snapshot's last included index, but term not (expected 40, actual 36)

那么在下一个日志代不会快速生成 B节点无法更新快照只能基于当前快照提供服务
如果 kvstore-set x 12



假设有节点 1,2,3,4,5
1 是leader
客户端写入一条消息 leader成功同步给2,3 两个节点(此时未提交日志保存在内存中)
leader 增加commitIndex 并提交日志(还未同步给2,3)

假设这个时候 节点1下线,节点2重启(内存中未提交的日志不存在了) 接下来的选举节点2成为leader(是可以的,因为4,5节点会支持) 这个时候会导致之前上一个term节点1已提交的日志被推翻.

Duplicate vote response can make illegal leader without a quorum

I notice that the message RequestVoteResult does not contains a field for distinguishing which node the result is from.
Besides, when processing RequestVoteResult, the node only checks the count of votes. It means a node can become leader as long as he receives enough granted votes, rather than is supported by enough nodes.

Considering a scenario with 5 nodes, Node 1 becomes Candidate and request vote for Node 2. Node 2 responses two duplicate RequestVoteResult messages (due to network error or some other faults). Then Node 1 can become Leader even when he only receives one node's support!

I think it is a bug.



Missing check for `result.getTerm() == role.getTerm()` in `doProcessRequestVoteResult()` can result in two leaders with the same term.


In the NodeImpl.doProcessRequestVoteResult() method, the code does not check for term equality between result.getTerm() and role.getTerm(). This can lead to multiple leaders with the same term, which is not allowed in the Raft protocol. As shown in the figure below, this issue can be caused due to fast election timeouts on N1.

It's unclear whether the code has any mechanisms to prevent the problem, such as filtering smaller term responses or using messageId to ignore stale responses in channelRead(). Can you confirm whether this is a bug or if there is any existing prevention mechanism?


private void doProcessRequestVoteResult(RequestVoteResult result) {
// step down if result's term is larger than current term
if (result.getTerm() > role.getTerm()) {
becomeFollower(result.getTerm(), null, null, true);
// check role
if (role.getName() != RoleName.CANDIDATE) {
logger.debug("receive request vote result and current role is not candidate, ignore");
// do nothing if not vote granted
if (!result.isVoteGranted()) {
int currentVotesCount = ((CandidateNodeRole) role).getVotesCount() + 1;
int countOfMajor =;
logger.debug("votes count {}, major node count {}", currentVotesCount, countOfMajor);
if (currentVotesCount > countOfMajor / 2) {
// become leader"become leader, term {}", role.getTerm());
changeToRole(new LeaderNodeRole(role.getTerm(), scheduleLogReplicationTask()));
context.log().appendEntry(role.getTerm()); // no-op log
context.connector().resetChannels(); // close all inbound channels
} else {
// update votes count
changeToRole(new CandidateNodeRole(role.getTerm(), currentVotesCount, scheduleElectionTimeout()));

ls: ./lib: No such file or directory

Running bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i A -p2 3333 inside xraft/xraft-kvstore/src subdirectory results in following error:

ls: ./lib: No such file or directory
Error: -cp requires class path specification

How do I resolve this?

kvstore get 操作 是不是从日志的最后往前搜索

我看源码 是从内存的map直接get
真实应该从commitindex 往前搜索, 直到找到对应的key, 且取过半follower返回的数据中, 最新的term的数据?
作者能否把这个逻辑实现到 源码里啊?

public void get(CommandRequest commandRequest) {

    String key = commandRequest.getCommand().getKey();

    logger.debug("get {}", key);

    byte[] value =;

    // TODO view from node state machine

    commandRequest.reply(new GetCommandResponse(value));



java.util.ConcurrentModificationException in `List<Entry> entries` of `MemoryEntrySequence`

When the network latency is high, a kvstore-set operation may be inserted between sending the AppendEntriesRpc msg and receiving the AppendEntriesResultMessage. The entries in the AppendEntriesRpc is a view of entrySequence. Upon receiving the AppendEntriesResultMessage, the entries in AppendEntriesRpc has been concurrently modified, triggering java.util.ConcurrentModificationException.

The kvstore-set operation results in appending a log entry to the entrySequence:

public GeneralEntry appendEntry(int term, byte[] command) {
GeneralEntry entry = new GeneralEntry(entrySequence.getNextLogIndex(), term, command);
return entry;

Processing AppendEntriesResultMessage involves reading the entries that have been concurrently changed:

public int getLastEntryIndex() {
return this.entries.isEmpty() ? this.prevLogIndex : this.entries.get(this.entries.size() - 1).getIndex();

Log (note that the timestamps are fake):

2022-06-04 13:36:54.100 [node] INFO  node.NodeImpl - become leader, term 1
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - node n1, role state changed -> LeaderNodeRole{term=1, logReplicationTask=LogReplicationTask{delay=999}}
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - receive request vote result and current role is not candidate, ignore
** 2022-06-04 13:36:54.101 [nioEventLoopGroup-5-1] DEBUG server.Service - set x **
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n2 is replicating, skip replication task
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n3 is replicating, skip replication task
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification( ~[?:?]
        at java.util.ArrayList$SubList.size( ~[?:?]
        at java.util.AbstractCollection.isEmpty( ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$ ~[?:?]
        at$TrustedFutureInterruptibleTask.runInterruptibly( ~[guava-32.0.0-jre.jar:?]
        at ~[guava-32.0.0-jre.jar:?]
        at ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
        at java.util.concurrent.ThreadPoolExecutor$ [?:?]
        at [?:?]
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification( ~[?:?]
        at java.util.ArrayList$SubList.size( ~[?:?]
        at java.util.AbstractCollection.isEmpty( ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6( ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$ ~[?:?]
        at$TrustedFutureInterruptibleTask.runInterruptibly( ~[guava-32.0.0-jre.jar:?]
        at ~[guava-32.0.0-jre.jar:?]
        at ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
        at java.util.concurrent.ThreadPoolExecutor$ [?:?]
        at [?:?]
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server

A possible fix

Create a copy of the subList into a new List:

diff --git a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/ b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/
index 9d48010..4c2293a 100644
--- a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/
+++ b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/
@@ -22,7 +22,7 @@ public class MemoryEntrySequence extends AbstractEntrySequence {
     protected List<Entry> doSubList(int fromIndex, int toIndex) {
-        return entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset);
+        return new ArrayList<>(entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset));

Xraft-kvstore does not satisfy linearizability

The read operation in the kvstore (i.e. kvstore-get command) can be processed in Followers without quorum round trips, which violates linearizability (e.g. reading stale values).

public void get(CommandRequest<GetCommand> commandRequest) {
String key = commandRequest.getCommand().getKey();
logger.debug("get {}", key);
byte[] value =;
// TODO view from node state machine
commandRequest.reply(new GetCommandResponse(value));

I further discovered that the develop branch has implemented the readindex protocol to guarantee linearizability for reads. But there is no explanation of the consistency levels in the doc, so few users can find the develop branch and users might assume the kvstore satisfy linearizability because it is based on Raft.

A possible fix

For a reference, Consul has three read modes: default (Leader processes reads without quorums), consistent (like the develop branch of Xraft-kvstore), stale (like the main branch).

In the Xraft's documentation/README, it would be better to explicitly clarify the consistency modes. Specifically, provide an explanation that the kvstore-get operation may read stale values, directing users seeking consistent reads to the develop branch. Additionally, considering the feasibility, Xraft can offer the default level, akin to Consul, for users seeking a balanced compromise between availability and consistency.






