GithubHelp home page GithubHelp logo

pikalabs / floyd Goto Github PK

View Code? Open in Web Editor NEW
320.0 320.0 108.0 948 KB

A raft consensus implementation that is simply and understandable

License: GNU General Public License v3.0

Makefile 7.13% C++ 90.72% C 1.45% Shell 0.71%
consensus library raft raft-protocol

floyd's People

Contributors

baotiao avatar catkang avatar flabby avatar kernelmaker avatar leviathan1995 avatar wenduo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

floyd's Issues

support multi group in floyd

multi group can break through the big lock in floyd. However, we need consider how to set different key in multi group?

编译问题

centos 7.2
在顺利编译完floyd、pink、slash和rocksDB后,编译example中simple和redis的时候报错:
Makefile:9: Warning: slash path missing, using default
Makefile:16: Warning: slash path missing, using default
Makefile:23: Warning: rocksdb path missing, using default
g++ -pg -g -O2 -ggdb3 -pipe -fPIC -W -Wwrite-strings -Wpointer-arith -Wreorder -Wswitch -Wsign-promo -Wredundant-decls -Wformat -D_GNU_SOURCE -D__STDC_FORMAT_MACROS -std=c++11 -gdwarf-2 -Wno-redundant-decls -Wno-unused-variable -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -I../../.. -I/root/downloads/floyd/floyd/example/simple/third/slash -I/root/downloads/floyd/floyd/example/simple/third/pink -I/root/downloads/floyd/floyd/example/simple/third/rocksdb/include -o t t.o -I../../../ -I../../third/rocksdb/include -I../../third/slash/ -I../../third/pink/ -L../../lib/ -L../../third/slash/slash/lib/ -L../../third/rocksdb/ -L../../third/pink/pink/lib/ -lfloyd -lpink -lslash -lrocksdb -lsnappy -lprotobuf -lz -lbz2 -lrt -lssl -lcrypto -lpthread
../../third/rocksdb//librocksdb.a(db_impl.o):在函数‘ZSTD_Supported’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84:对‘ZSTD_versionNumber’未定义的引用
../../third/rocksdb//librocksdb.a(column_family.o):在函数‘ZSTD_Supported’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84:对‘ZSTD_versionNumber’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84:对‘ZSTD_versionNumber’未定义的引用
../../third/rocksdb//librocksdb.a(format.o):在函数‘ZSTD_Uncompress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:775:对‘ZSTD_createDCtx’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:778:对‘ZSTD_decompress_usingDict’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:779:对‘ZSTD_freeDCtx’未定义的引用
../../third/rocksdb//librocksdb.a(format.o):在函数‘LZ4_Uncompress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:611:对‘LZ4_createStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:618:对‘LZ4_decompress_safe_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:619:对‘LZ4_freeStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:611:对‘LZ4_createStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:618:对‘LZ4_decompress_safe_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:619:对‘LZ4_freeStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:614:对‘LZ4_setStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:614:对‘LZ4_setStreamDecode’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o):在函数‘ZSTD_Compress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:738:对‘ZSTD_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:742:对‘ZSTD_createCCtx’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:745:对‘ZSTD_compress_usingDict’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:746:对‘ZSTD_freeCCtx’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o):在函数‘LZ4_Compress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:546:对‘LZ4_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:551:对‘LZ4_createStream’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:559:对‘LZ4_compress_fast_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:565:对‘LZ4_freeStream’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o):在函数‘LZ4HC_Compress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:665:对‘LZ4_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:670:对‘LZ4_createStreamHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:671:对‘LZ4_resetStreamHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:676:对‘LZ4_loadDictHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:681:对‘LZ4_compress_HC_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:687:对‘LZ4_freeStreamHC’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o):在函数‘LZ4_Compress’中:
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:554:对‘LZ4_loadDict’未定义的引用
collect2: 错误:ld 返回 1
make: *** [t] 错误 1

Why commit index would be persistent?

Maybe the commit index shouldn't be persistent in RaftMeta, instead, it will be set correctly by new leader after the first log entry being appended on quorum.

remove_server删除节点bug

再跑remove_server,删除节点用例,发现调用RemoveServer接口必挂。查看代码
最终删除节点命令重做时会调用rocksdb::Status FloydApply::MembershipChange(const std::string& ip_port,
bool add) 函数
image

编译问题

libprotoc 3.4.0 版本编译提示错误
请问需要的版本为多少呢
以上问题已经解决,
执行了下./proto/pr.sh
提一个建议,觉得应该在makefile之前执行下 ./proto/pr.sh 不然容易产生版本不一致的情况

编译报错

src/http_conn.cc: 在成员函数‘bool pink::HttpConn::BuildRequestHeader()’中:
src/http_conn.cc:333:32: 错误:从类型‘int64_t* {aka long long int*}’到类型‘long int*’中的 static_cast 无效
static_cast<long*>(&tmp));

Down or slow nodes cause poor availability

Down or slow nodes will cause huge drops in performance, which becomes more and more serious as time goes by.

The PeerThread for abnormal nodes need more and more time to get entries which growth over time.

Meanwhile, the global context mutex hold by it and prevent any other PeerThreads from advancing commit index.

So the performance goes worse and worse at the perspective of user, and finally not available.

support Lock operation

In our use case, sometime get/set interface can't not satisfy our requirements. Such as in pika_hub, efan. We want using floyd choosing leader, so it has best to have a lock distribute lock interface.

Membership change may cause poor availability

A new server who is not added into cluster yet, will constantly send RequestVote RPC with new term number to others, and this will cause new raft election.

But the new server couldn't receive any AppendEntry RPC, so it will timeout and redo the same process, then cause a new election process again and again.

Certainly, this will result in very poor availability.

optimize the log library in floyd

right now the log in floyd is logger, it is origin copy from leveldb's log, there is some weakness in it, such us when we restart the node, the old log will be delete, don's support split log by date, and so on..
since we don't pay too much attention on it as we develop in the early phase, I think we need optimize right now

void Peer::RequestVoteRPC()实现中的一个小疑问

在并行请求RequestVoteRPC的时候,目前的实现是只要有一个peer返回VOTE_DENIED, 就从candidate变为follower, 但是根据论文里面的实现,不是只要多数返回了VOTE_GRANTED就可以变为LEADER么? 这里是不是返回VOTE_DENIED的时候简单记录下就好?谢谢

quit peer thread when node become follower

every node has peer threads even it becomes follower, it is better to quit the peer threads when the node become follower and start the peer threads when it become leader again. Since the only leader requires peer threads to do heartbeat.

optimize the read operation

right now the read operation need Append Entry RPC, in raft's paper and paxos paper, both of them suggest leader lease to optimize the read operation. we can call it raft with lease.

change the variable name

there is some variable name that's not the same as raft's paper, such as last_apply_index. in raft paper it is last_applied.
By doing that, we can make the code more readable

add a tool to construct log

we want test some extreme case such as 5 node have 5 different logs, then they will choose the one who has the largest term or if they have the same term, then the one who have longest index as the leader.
Right now, we test these case by run simple test case, and maybe there will exist the case that nodes have different logs, but this is not good enough. We need a more accurate test case

请问是否支持类似apply()的回调呢?

从api列表上看只是支持kv形式的读写,有开放的接口让用户自己定义在raft状态机成功之后的行为吗?如果有很多复杂的逻辑不能通过读写kv来执行。

KvServer code in example folder can not build succ

kv_server.cc:156:31: error: ‘class floyd::Floyd’ has no member named ‘DirtyWrite’
Status result = floyd_->DirtyWrite(request.key(), request.value());

kv_server.cc:139:47: error: no matching function for call to ‘floyd::Floyd::GetServerStatus(std::string&)’
bool ret = floyd_->GetServerStatus(value);

lots of bug exists

v2.0.5 is not compatible with v2.0.3

v2.0.5 is not compatible with v2.0.3 in proto as follow:

message Entry {
enum OpType {
kRead = 0;
kWrite = 1;
kDelete = 2;
kTryLock= 4;
kUnLock = 5;
}
required OpType optype = 1;
// used in key value operator
required uint64 term = 2;
required string key = 3;
optional bytes value = 4;
// used in lock and unlock
optional bytes holder = 5;
optional uint64 lease_end = 6;
}

int FloydImpl::ReplyRequestVote里面有bug么?

注释和论文里面有写:
// if votedfor is null or candidateId, and candidated's log is at least as up-to-date
// as receiver's log, grant vote

但是这个rpc实现逻辑里面,好像完全没有去判断FloydContext内部的vote_for_ip和vote_for_port, 事实是如果一个follower变成candidate后,首先会把vote_for_ip和vote_for_port设置成自己;此时如果收到RequestVoteRPC请求过来,应该直接return false, 表明自己这一轮已经投过票了(投给自己)

不知道我理解的对不对?谢谢

improve floyd performance

I have wrote a simple test to get the performance of floyd

write 10000 cost time microsecond(us) 8136502, qps 1229
      Node           | Role    |   Term    | CommitIdx |    Leader         |  VoteFor          | LastLogTerm | LastLogIdx | LastApplyIdx |
      127.0.0.1:8901      leader          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8902    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8903    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8904    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8905    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000

write 10000 cost time microsecond(us) 8325906, qps 1201
      Node           | Role    |   Term    | CommitIdx |    Leader         |  VoteFor          | LastLogTerm | LastLogIdx | LastApplyIdx |
      127.0.0.1:8901      leader          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8902    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8903    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8904    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8905    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000

it's qps is about 1000+, it is to low, so we had better to improve it

测试add_server1用例发现新加节点无法加入集群问题

测试add_server1用例问题发现,新加节点时,若之前集群的log量太大的化,很容易出现新节点无法正常加入到集群的情况。新节点一直无法收到心跳包,导致一直成为Candidate尝试发起新选举。
定位发现,新节点加入时,一次发送的log量太大了,一直无法处理其他请求导致超时。
image

这里
1)能否提供个设置
uint64_t append_entries_size_once;
uint64_t append_entries_count_once;
的对外接口
2)这种新加节点情况,每次都要从最原始日志开始拷贝,重做。有无优化计划

Read critical ?

Hi, 在看 Floyd 代码时发现,Floyd 将 read 当做一条日志 Append 到 peers 解决脏读的问题,在 read log entry apply 之后,(1)通过 context_->apply_cond.SignalAll(); 唤醒阻塞等待在 read 的 client 的,(2)然后 client 再 read rocksdb 中 k-v 返回给用户,但是感觉这样会有一 corner case ?

上面标注的两步(1)(2)两步之间会存在一个时间空隙,尽管很小,但是在这段时候,如果针对同一个 key的一个新的 update command 被 apply,虽然没有 read 旧数据,但是 read 会读到更新的数据,

当然单 client 的情况下不会出现这种问题

replace string:ip with node

there is many string(ip):int(port) in code, that is ugly.
we should replace them with a struct node I think

多线程Write()支持

CmdRequest cmd;
BuildWriteRequest(key, value, &cmd);
CmdResponse response;
Status s = DoCommand(cmd, &response);
if (!s.ok()) {
return s;
}

这里是不是可以支持一下多线程写,像leveldb那样的多个写合并到一个中

test with jepsen

Since it's hard to prove that a consensus algorithm is correct, and there is joke that it harder to prove that a consensus algorithm is correct than implement a consensus algorithm.
Right now many consensus algorithm implementation will passing jepsen test to prove that it is correct, so floyd need pass jepsen test, too.

Separate read and write operation

Just read floyd code and raft paper. According to raft paper, read operation will be forwarded to leader and leader holds the whole committed log, that is, the complete meta info.

So is it possible to separate read operation and write operation apart? In this case, for read operation, leader just return to the client without waiting for majority of servers' append entry response. @CatKang @baotiao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.