GithubHelp home page GithubHelp logo

xraft's Introduction

xraft

A raft implementation of XnnYygn's.

I want to make something with netty framework, and I found raft. Raft is interesting. As the first distributed consensus algorithm I learnt, I read the paper and implemented almost all of the feature of raft including

  • Leader election and log replication
  • Membership change(one server change)
  • Log compaction

All these feature are implemented in xraft-core. And the client interaction in raft, I thought, should be the feature of service based on xraft-core. Until now, I made a simple key value store based on xraft-core, called xraft-kvstore. It supports GET and SET command.

Demostration

To test xraft with xraft-kvstore, you can download xraft and run xraft-kvstore, xraft-kvstore-cli.

Prerequistes

Java 1.8+ is required to run xraft. You can run java -version to check the version of java on your computer.

Download

You can get complied xraft in releases.

Run Server

xraft-kvstore under the bin directory is the command to run xraft kvstore server.

To demostrate a xraft cluster with 3 nodes(memory log mode),

  • node A, host localhost, port raft node 2333, port kvstore 3333
  • node B, host localhost, port raft node 2334, port kvstore 3334
  • node C, host localhost, port raft node 2335, port kvstore 3335

start servers with commands below

Terminal A

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i A -p2 3333

Terminal B

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i B -p2 3334

Terminal C

$ bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i C -p2 3335

Since the minimum election timeout is 3 seconds, if you cannot execute all 3 commands within 3 seconds, you will get some error like failed to connect ..... But after you started all nodes, the error will disapper.

After start, you will see something like become leader, current leader is xxx and it shows the cluster is started and leader election is ok.

Run Client

Run xraft-kvstore-cli with the cluster configuration. The client will not connect to any node in cluster so it is ok to run client before cluster starts.

$ bin/xraft-kvstore-cli -gc A,localhost,3333 B,localhost,3334 C,localhost,3335

It will run an interative console, press TAB two times and you will get the available commands. For this demostration, firstly run

> kvstore-get x

and you should get the result null. Then run

> kvstore-set x 1

nothing will be printed, now you can run get again.

> kvstore-get x

1 should be printed.

New Service

How to create new service based on xraft-core?

For more detailed implementation of new service, see the source code of xraft-kvstore.

Build

xraft use Maven as build system.

$ mvn clean compile install

To package xraft-kvstore

$ cd xraft-kvstore
$ mvn package assembly:single

About PreVote

If you are looking for Raft optimization PreVote, please check develop branch.

Consistency in xraft-kvstore

To make the implmenetation simple, the xraft-kvstore just reads the value in the concurrent hash map, which actually could be a stale value. There is an optimiation in develop branch called readindex to offer consistent read. If you need consistent read or want to know how to implement it, please refer to develop branch.

License

This project is licensed under the MIT License.

关于《分布式一致性算法开发实战》和相关讨论

2020年5月,我出版了一本书,名字叫做《分布式一致性算法开发实战》。书里面大部分代码都是参考我的这个项目,或者说我是先完成了这个项目然后再写了书。如果你对项目本身的架构,设计选择或者算法本身等有兴趣的话,欢迎阅读《分布式一致性算法开发实战》,我在书里做了很多详细的讲解。

另外,如果你对书籍或者代码设计有疑问的话,或者想要交流的话,欢迎在豆瓣页面的讨论区内发表话题,我会定期检查并回复。

xraft's People

Contributors

dependabot[bot] avatar hsuchungyuan avatar xnnyygn avatar xnnyygn2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xraft's Issues

关于心跳

心跳间隔如果设置为1秒,会有如下两种现象:

  1. 实际的心跳间隔为2秒,猜测一秒太短来不及执行?
  2. kvstore-set指令返回极慢,猜测由于心跳命令走的相同的链路,导致set指令得不到执行
    如果心跳间隔设置为10秒(相应地增加超时时间),则不会有上述两种现象
    心跳间隔设置为10秒则实际地心跳间隔也为10秒
    上述情况不知是否是我机器的问题
    G@HJDHA(2J B2WEU)_5V@SP

VotedFor is not stored when a node is candidate and receives an AppendEntriesRpc

Here the candidate should store the message sender, i.e., current leader, in votedFor.

becomeFollower(rpc.getTerm(), null, rpc.getLeaderId(), true);

This issue can result in two leader with the same term in the following complex scenario.
Consider a cluster of 5 nodes. Node 1 ~ 3 become candidates with term 1. Node 4 and 5 are followers.

Node 1 receives votes from 4 and 5. It becomes the leader. However, Node 2 and 3 still remain as candidate.
Node 1 receives AppendEntry request, and send messages to Node 2 and 3.
Node 2 and 3 step down to followers, and set votedFor as null (by above code). Here they do not write logs yet.
Node 4 restarts and becomes candidates with term 1. It requests Node 2 and Node 3 for votes.
Node 2 and 3 finds their votedFor is null, and their logs are not newer than Node 4. Thus, they vote for Node 4.
Node 4 becomes the leader.

Here both Node 1 and Node 4 are the leader with term 1, which violate Raft's safety property.

如果能按照章节存放源代码就更好了。

目前这本书给我感觉最大的缺点就是代码太混乱了,书中代码示例中有大量文中之前没有提到过的方法,如果出第二版的话,希望能够修正,源代码按照章节存放就更好了。

关于第二章内容的一点疑问

看了你的第二章内容,也看了raft简化版论文,有个疑问是:
在term2,客户端的请求发给s1,然后s1将日志备份到大多数节点上,但是没有提交就挂了;这时客户端应该认为这个请求并没有成功?但是在图e中,s1提交了term4的日志就可以将黄色小块提交,相当于客户端的请求实际上被提交做了这是否是一种不一致呢?

image

[ERROR] Failed to execute goal on project xraft-kvstore: Could not resolve dependencies for project in.xnnyygn.xraft:xraft-kvstore:jar:0.1.0-SNAPSHOT: Could not find artifact in.xnnyygn.xraft:xraft-core:jar:0.1.0-SNAPSHOT -> [Help 1]

While trying to build the source code, execute this command:
cd xraft-kvstore && mvn package assembly:single

I got this error:

[INFO] ------------------------------------------------------------------------                                                    
[INFO] BUILD FAILURE                                                                                                               
[INFO] ------------------------------------------------------------------------                                                    
[INFO] Total time:  13.622 s                                                                                                      
[INFO] Finished at: 2020-09-28T10:16:39Z                                                                                           
[INFO] ------------------------------------------------------------------------                                                    
[ERROR] Failed to execute goal on project xraft-kvstore: Could not resolve dependencies for project in.xnnyygn.xraft:xraft-kvstore :jar:0.1.0-SNAPSHOT: Could not find artifact in.xnnyygn.xraft:xraft-core:jar:0.1.0-SNAPSHOT -> [Help 1]                           
[ERROR]                                                                                                                            
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.                                                
[ERROR] Re-run Maven using the -X switch to enable full debug logging.                                                             
[ERROR]                                                                                                                            
[ERROR] For more information about the errors and possible solutions, please read the following articles:                          
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
--
--

xraft-compile-error

Am I missing to import something...?

向您请教我目前遇见的问题

以A B C 为集群
1 生成log-0 log-10 log-15 三个日志,此时log-15已经生成快照
2 这时候宕机A,B,C 并且删除B的日志
3 此时启动A,C A,C节点开始选举 term增加【假设为9】,但是快照的term还没有增加【假设为7】
4 此时B重新启动 开始installing快照 ;快照安装成功后,开始追加日志
问题:
追加日志的时候,取消息的term和本地快照的term比较 但是本地的来源于刚刚的installing的term
也就是7,而追加日志的term来源于日志最新的term[9]

此时一直报错
AbstractLog - previous log index matches snapshot's last included index, but term not (expected 40, actual 36)

必须等待客户端操作从而推进日志代生成;直到跳过当前日志生成代到下一个后,B更新新的日志才能恢复;
假如shouldGenerateSnapshot过大
那么在下一个日志代不会快速生成 B节点无法更新快照只能基于当前快照提供服务
如果 kvstore-set x 12
此时在A可以获取,但B节点无法获取

此问题的原因在于节点非优雅宕机,日志term更新后,快照term尚未更新;请问能否加入相关机制优化【???】

如果未提交的日志放在内存中,会不会导致已提交后也会被推翻?

假设有节点 1,2,3,4,5
1 是leader
客户端写入一条消息 leader成功同步给2,3 两个节点(此时未提交日志保存在内存中)
leader 增加commitIndex 并提交日志(还未同步给2,3)

假设这个时候 节点1下线,节点2重启(内存中未提交的日志不存在了) 接下来的选举节点2成为leader(是可以的,因为4,5节点会支持) 这个时候会导致之前上一个term节点1已提交的日志被推翻.

Duplicate vote response can make illegal leader without a quorum

I notice that the message RequestVoteResult does not contains a field for distinguishing which node the result is from.
Besides, when processing RequestVoteResult, the node only checks the count of votes. It means a node can become leader as long as he receives enough granted votes, rather than is supported by enough nodes.

Considering a scenario with 5 nodes, Node 1 becomes Candidate and request vote for Node 2. Node 2 responses two duplicate RequestVoteResult messages (due to network error or some other faults). Then Node 1 can become Leader even when he only receives one node's support!

I think it is a bug.

书与仓库中的代码不一致

作者您好。书和仓库中的代码,从快照部分开始就有一些不一致的地方了,到了集群成员变更部分不一致的更多。是否应该以仓库里的为参考?

Missing check for `result.getTerm() == role.getTerm()` in `doProcessRequestVoteResult()` can result in two leaders with the same term.

Description

In the NodeImpl.doProcessRequestVoteResult() method, the code does not check for term equality between result.getTerm() and role.getTerm(). This can lead to multiple leaders with the same term, which is not allowed in the Raft protocol. As shown in the figure below, this issue can be caused due to fast election timeouts on N1.

It's unclear whether the code has any mechanisms to prevent the problem, such as filtering smaller term responses or using messageId to ignore stale responses in channelRead(). Can you confirm whether this is a bug or if there is any existing prevention mechanism?

xraft-bug1

private void doProcessRequestVoteResult(RequestVoteResult result) {
// step down if result's term is larger than current term
if (result.getTerm() > role.getTerm()) {
becomeFollower(result.getTerm(), null, null, true);
return;
}
// check role
if (role.getName() != RoleName.CANDIDATE) {
logger.debug("receive request vote result and current role is not candidate, ignore");
return;
}
// do nothing if not vote granted
if (!result.isVoteGranted()) {
return;
}
int currentVotesCount = ((CandidateNodeRole) role).getVotesCount() + 1;
int countOfMajor = context.group().getCountOfMajor();
logger.debug("votes count {}, major node count {}", currentVotesCount, countOfMajor);
role.cancelTimeoutOrTask();
if (currentVotesCount > countOfMajor / 2) {
// become leader
logger.info("become leader, term {}", role.getTerm());
resetReplicatingStates();
changeToRole(new LeaderNodeRole(role.getTerm(), scheduleLogReplicationTask()));
context.log().appendEntry(role.getTerm()); // no-op log
context.connector().resetChannels(); // close all inbound channels
} else {
// update votes count
changeToRole(new CandidateNodeRole(role.getTerm(), currentVotesCount, scheduleElectionTimeout()));
}
}

ls: ./lib: No such file or directory

Running bin/xraft-kvstore -gc A,localhost,2333 B,localhost,2334 C,localhost,2335 -m group-member -i A -p2 3333 inside xraft/xraft-kvstore/src subdirectory results in following error:

ls: ./lib: No such file or directory
Error: -cp requires class path specification

How do I resolve this?

kvstore get 操作 是不是从日志的最后往前搜索

我看源码 是从内存的map直接get
真实应该从commitindex 往前搜索, 直到找到对应的key, 且取过半follower返回的数据中, 最新的term的数据?
作者能否把这个逻辑实现到 源码里啊?

`
public void get(CommandRequest commandRequest) {

    String key = commandRequest.getCommand().getKey();

    logger.debug("get {}", key);

    byte[] value = this.map.get(key);

    // TODO view from node state machine

    commandRequest.reply(new GetCommandResponse(value));

}

`

java.util.ConcurrentModificationException in `List<Entry> entries` of `MemoryEntrySequence`

When the network latency is high, a kvstore-set operation may be inserted between sending the AppendEntriesRpc msg and receiving the AppendEntriesResultMessage. The entries in the AppendEntriesRpc is a view of entrySequence. Upon receiving the AppendEntriesResultMessage, the entries in AppendEntriesRpc has been concurrently modified, triggering java.util.ConcurrentModificationException.

The kvstore-set operation results in appending a log entry to the entrySequence:

public GeneralEntry appendEntry(int term, byte[] command) {
GeneralEntry entry = new GeneralEntry(entrySequence.getNextLogIndex(), term, command);
entrySequence.append(entry);
return entry;
}

Processing AppendEntriesResultMessage involves reading the entries that have been concurrently changed:

public int getLastEntryIndex() {
return this.entries.isEmpty() ? this.prevLogIndex : this.entries.get(this.entries.size() - 1).getIndex();
}

Log (note that the timestamps are fake):

2022-06-04 13:36:54.100 [node] INFO  node.NodeImpl - become leader, term 1
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - node n1, role state changed -> LeaderNodeRole{term=1, logReplicationTask=LogReplicationTask{delay=999}}
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.100 [node] DEBUG node.NodeImpl - receive request vote result and current role is not candidate, ignore
** 2022-06-04 13:36:54.101 [nioEventLoopGroup-5-1] DEBUG server.Service - set x **
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - replicate log
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n2 is replicating, skip replication task
2022-06-04 13:36:54.101 [node] DEBUG node.NodeImpl - node n3 is replicating, skip replication task
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1415) ~[?:?]
        at java.util.ArrayList$SubList.size(ArrayList.java:1155) ~[?:?]
        at java.util.AbstractCollection.isEmpty(AbstractCollection.java:91) ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex(AppendEntriesRpc.java:77) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult(NodeImpl.java:694) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6(NodeImpl.java:646) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.lang.Thread.run(Thread.java:1589) [?:?]
2022-06-04 13:36:54.101 [monitor] WARN  node.NodeImpl - failure
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1415) ~[?:?]
        at java.util.ArrayList$SubList.size(ArrayList.java:1155) ~[?:?]
        at java.util.AbstractCollection.isEmpty(AbstractCollection.java:91) ~[?:?]
        at in.xnnyygn.xraft.core.rpc.message.AppendEntriesRpc.getLastEntryIndex(AppendEntriesRpc.java:77) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.doProcessAppendEntriesResult(NodeImpl.java:694) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at in.xnnyygn.xraft.core.node.NodeImpl.lambda$onReceiveAppendEntriesResult$6(NodeImpl.java:646) ~[xraft-core-0.1.0-SNAPSHOT.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) ~[guava-32.0.0-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-32.0.0-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.lang.Thread.run(Thread.java:1589) [?:?]
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server
2022-06-04 13:36:54.120 [shutdown] INFO  server.Server - stopping server

A possible fix

Create a copy of the subList into a new List:

diff --git a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
index 9d48010..4c2293a 100644
--- a/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
+++ b/xraft-core/src/main/java/in/xnnyygn/xraft/core/log/sequence/MemoryEntrySequence.java
@@ -22,7 +22,7 @@ public class MemoryEntrySequence extends AbstractEntrySequence {
 
     @Override
     protected List<Entry> doSubList(int fromIndex, int toIndex) {
-        return entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset);
+        return new ArrayList<>(entries.subList(fromIndex - logIndexOffset, toIndex - logIndexOffset));
     }
 
     @Override

Xraft-kvstore does not satisfy linearizability

The read operation in the kvstore (i.e. kvstore-get command) can be processed in Followers without quorum round trips, which violates linearizability (e.g. reading stale values).

public void get(CommandRequest<GetCommand> commandRequest) {
String key = commandRequest.getCommand().getKey();
logger.debug("get {}", key);
byte[] value = this.map.get(key);
// TODO view from node state machine
commandRequest.reply(new GetCommandResponse(value));
}

I further discovered that the develop branch has implemented the readindex protocol to guarantee linearizability for reads. But there is no explanation of the consistency levels in the doc, so few users can find the develop branch and users might assume the kvstore satisfy linearizability because it is based on Raft.

A possible fix

For a reference, Consul has three read modes: default (Leader processes reads without quorums), consistent (like the develop branch of Xraft-kvstore), stale (like the main branch).

In the Xraft's documentation/README, it would be better to explicitly clarify the consistency modes. Specifically, provide an explanation that the kvstore-get operation may read stale values, directing users seeking consistent reads to the develop branch. Additionally, considering the feasibility, Xraft can offer the default level, akin to Consul, for users seeking a balanced compromise between availability and consistency.

书中关于CANDIDATE也拥有选举超时的任务

为什么CANDIDATE也会拥有选举超时的任务?是为了防止整个请求投票的过程时间太长吗?
如果请求投票迟迟不受到响应,就会将自己的任期+1,然后进行新一轮的投票?

请教一个关于选举成功后第一条NoOp日志的问题

我现在照着您的书自己写一遍,部分代码结构有差异,然后现在基本把选举和日志复制部分写完了,快照还未实现(从原理上来说快照似乎不是必须的)。

我的问题是,如果三个节点都是第一次启动,leader选出来后,发送第一条NoOp日志给follower时,follower回复的消息中success应该为true还是false,现在我的代码是回复false的,因为checkPreviousLogMatched方法中找不到previndex对应的entry。我把您的代码中关于快照的部分注释掉后也是回复false,我不知道这是不是正常的。

Sent from PPHub

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.