pikalabs / floyd Goto Github PK

View Code? Open in Web Editor NEW

320.0 320.0 108.0 948 KB

A raft consensus implementation that is simply and understandable

License: GNU General Public License v3.0

Makefile 7.13% C++ 90.72% C 1.45% Shell 0.71%

consensus library raft raft-protocol

floyd's People

Contributors

Stargazers

Watchers

floyd's Issues

support multi group in floyd

multi group can break through the big lock in floyd. However, we need consider how to set different key in multi group?

Add no-op entry when the server start as leader

For simplicity, we haven't add no-op entry when the server become leader. As the raft's paper said, it will cause inconsistency problem.

support pipeline to optimize the performance

we have supported batch to optimize the performance, yet pipeline is also needed.

centos 7.2
在顺利编译完floyd、pink、slash和rocksDB后，编译example中simple和redis的时候报错：
Makefile:9: Warning: slash path missing, using default
Makefile:16: Warning: slash path missing, using default
Makefile:23: Warning: rocksdb path missing, using default
g++ -pg -g -O2 -ggdb3 -pipe -fPIC -W -Wwrite-strings -Wpointer-arith -Wreorder -Wswitch -Wsign-promo -Wredundant-decls -Wformat -D_GNU_SOURCE -D__STDC_FORMAT_MACROS -std=c++11 -gdwarf-2 -Wno-redundant-decls -Wno-unused-variable -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -I../../.. -I/root/downloads/floyd/floyd/example/simple/third/slash -I/root/downloads/floyd/floyd/example/simple/third/pink -I/root/downloads/floyd/floyd/example/simple/third/rocksdb/include -o t t.o -I../../../ -I../../third/rocksdb/include -I../../third/slash/ -I../../third/pink/ -L../../lib/ -L../../third/slash/slash/lib/ -L../../third/rocksdb/ -L../../third/pink/pink/lib/ -lfloyd -lpink -lslash -lrocksdb -lsnappy -lprotobuf -lz -lbz2 -lrt -lssl -lcrypto -lpthread
../../third/rocksdb//librocksdb.a(db_impl.o)：在函数‘ZSTD_Supported’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84：对‘ZSTD_versionNumber’未定义的引用
../../third/rocksdb//librocksdb.a(column_family.o)：在函数‘ZSTD_Supported’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84：对‘ZSTD_versionNumber’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:84：对‘ZSTD_versionNumber’未定义的引用
../../third/rocksdb//librocksdb.a(format.o)：在函数‘ZSTD_Uncompress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:775：对‘ZSTD_createDCtx’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:778：对‘ZSTD_decompress_usingDict’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:779：对‘ZSTD_freeDCtx’未定义的引用
../../third/rocksdb//librocksdb.a(format.o)：在函数‘LZ4_Uncompress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:611：对‘LZ4_createStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:618：对‘LZ4_decompress_safe_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:619：对‘LZ4_freeStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:611：对‘LZ4_createStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:618：对‘LZ4_decompress_safe_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:619：对‘LZ4_freeStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:614：对‘LZ4_setStreamDecode’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:614：对‘LZ4_setStreamDecode’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o)：在函数‘ZSTD_Compress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:738：对‘ZSTD_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:742：对‘ZSTD_createCCtx’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:745：对‘ZSTD_compress_usingDict’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:746：对‘ZSTD_freeCCtx’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o)：在函数‘LZ4_Compress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:546：对‘LZ4_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:551：对‘LZ4_createStream’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:559：对‘LZ4_compress_fast_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:565：对‘LZ4_freeStream’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o)：在函数‘LZ4HC_Compress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:665：对‘LZ4_compressBound’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:670：对‘LZ4_createStreamHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:671：对‘LZ4_resetStreamHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:676：对‘LZ4_loadDictHC’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:681：对‘LZ4_compress_HC_continue’未定义的引用
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:687：对‘LZ4_freeStreamHC’未定义的引用
../../third/rocksdb//librocksdb.a(block_based_table_builder.o)：在函数‘LZ4_Compress’中：
/root/downloads/floyd/floyd/third/rocksdb/./util/compression.h:554：对‘LZ4_loadDict’未定义的引用
collect2: 错误：ld 返回 1
make: *** [t] 错误 1

Why commit index would be persistent?

Maybe the commit index shouldn't be persistent in RaftMeta, instead, it will be set correctly by new leader after the first log entry being appended on quorum.

remove_server删除节点bug

再跑remove_server，删除节点用例，发现调用RemoveServer接口必挂。查看代码
最终删除节点命令重做时会调用rocksdb::Status FloydApply::MembershipChange(const std::string& ip_port,
bool add) 函数

编译问题

libprotoc 3.4.0 版本编译提示错误
请问需要的版本为多少呢
以上问题已经解决，
执行了下./proto/pr.sh
提一个建议，觉得应该在makefile之前执行下 ./proto/pr.sh 不然容易产生版本不一致的情况

Use rocksdb::WriteBatch in RaftLog::TruncateSuffix

Use rocksdb::WriteBatch to remove all useless log in one batch, it better than calling rocksdb::Delete one by one

https://github.com/Qihoo360/floyd/blob/ca64b0c310aa5ea756268908aa3a72b002e8b17c/floyd/src/raft_log.cc#L123-L130

编译报错

src/http_conn.cc: 在成员函数‘bool pink::HttpConn::BuildRequestHeader()’中:
src/http_conn.cc:333:32: 错误：从类型‘int64_t* {aka long long int*}’到类型‘long int*’中的 static_cast 无效
static_cast<long*>(&tmp));

support thread safe in floyd

support thread safe write like leveldb.

Down or slow nodes cause poor availability

Down or slow nodes will cause huge drops in performance, which becomes more and more serious as time goes by.

The PeerThread for abnormal nodes need more and more time to get entries which growth over time.

Meanwhile, the global context mutex hold by it and prevent any other PeerThreads from advancing commit index.

So the performance goes worse and worse at the perspective of user, and finally not available.

support Lock operation

In our use case, sometime get/set interface can't not satisfy our requirements. Such as in pika_hub, efan. We want using floyd choosing leader, so it has best to have a lock distribute lock interface.

Support Remove Server

optimize the log when there is a node down

leader will print logs of error logs when there is a node down. We need optimize the log there.

About add a simple example to test floyd

Maybe we can refer to the implementation of Raft by etcd.
site: https://github.com/coreos/etcd/tree/master/contrib/raftexample

Membership change may cause poor availability

A new server who is not added into cluster yet, will constantly send RequestVote RPC with new term number to others, and this will cause new raft election.

But the new server couldn't receive any AppendEntry RPC, so it will timeout and redo the same process, then cause a new election process again and again.

Certainly, this will result in very poor availability.

optimize the log library in floyd

right now the log in floyd is logger, it is origin copy from leveldb's log, there is some weakness in it, such us when we restart the node, the old log will be delete, don's support split log by date, and so on..
since we don't pay too much attention on it as we develop in the early phase, I think we need optimize right now

void Peer::RequestVoteRPC()实现中的一个小疑问

在并行请求RequestVoteRPC的时候，目前的实现是只要有一个peer返回VOTE_DENIED, 就从candidate变为follower, 但是根据论文里面的实现，不是只要多数返回了VOTE_GRANTED就可以变为LEADER么？这里是不是返回VOTE_DENIED的时候简单记录下就好？谢谢

quit peer thread when node become follower

every node has peer threads even it becomes follower, it is better to quit the peer threads when the node become follower and start the peer threads when it become leader again. Since the only leader requires peer threads to do heartbeat.

floyd need add some extreme case to test it's safety

right now there is not enough case to test floyd's implementation of raft is safety, we should add some more extreme test case, or we should add some test tool like jepsen?

optimize the read operation

right now the read operation need Append Entry RPC, in raft's paper and paxos paper, both of them suggest leader lease to optimize the read operation. we can call it raft with lease.

support membership change according Diego phd paper

Diego's phd paper has suggest a simple way to support membership change, every time we can only add/remove a single node in to cluster, in this limitation, this problem become simple.

change the variable name

there is some variable name that's not the same as raft's paper, such as last_apply_index. in raft paper it is last_applied.
By doing that, we can make the code more readable

异步客户端转发到leader

floyd/floyd/src/floyd_impl.cc

Lines 555 to 557 in 01a6979

 return worker_client_pool_->SendAndRecv( 

 slash::IpPortString(leader_ip, leader_port), 

 request, response);

如果这里用异步客户端，可以并发处理多个转发到leader的操作，是不是会加速。

add a tool to construct log

we want test some extreme case such as 5 node have 5 different logs, then they will choose the one who has the largest term or if they have the same term, then the one who have longest index as the leader.
Right now, we test these case by run simple test case, and maybe there will exist the case that nodes have different logs, but this is not good enough. We need a more accurate test case

请问是否支持类似apply()的回调呢？

从api列表上看只是支持kv形式的读写，有开放的接口让用户自己定义在raft状态机成功之后的行为吗？如果有很多复杂的逻辑不能通过读写kv来执行。

KvServer code in example folder can not build succ

kv_server.cc:156:31: error: ‘class floyd::Floyd’ has no member named ‘DirtyWrite’
Status result = floyd_->DirtyWrite(request.key(), request.value());

kv_server.cc:139:47: error: no matching function for call to ‘floyd::Floyd::GetServerStatus(std::string&)’
bool ret = floyd_->GetServerStatus(value);

lots of bug exists

GetAllNodes also need to go through the consistency process.

v2.0.5 is not compatible with v2.0.3

v2.0.5 is not compatible with v2.0.3 in proto as follow:

message Entry {
enum OpType {
kRead = 0;
kWrite = 1;
kDelete = 2;
kTryLock= 4;
kUnLock = 5;
}
required OpType optype = 1;
// used in key value operator
required uint64 term = 2;
required string key = 3;
optional bytes value = 4;
// used in lock and unlock
optional bytes holder = 5;
optional uint64 lease_end = 6;
}

非法地址空间访问

floyd/floyd/src/raft_meta.cc

Line 146 in d3af033

memcpy(&ans, buf.data(), sizeof(uint64_t));

这里如果不存在的时候会不会触发访问非法内存空间

int FloydImpl::ReplyRequestVote里面有bug么？

注释和论文里面有写：
// if votedfor is null or candidateId, and candidated's log is at least as up-to-date
// as receiver's log, grant vote

但是这个rpc实现逻辑里面，好像完全没有去判断FloydContext内部的vote_for_ip和vote_for_port, 事实是如果一个follower变成candidate后，首先会把vote_for_ip和vote_for_port设置成自己；此时如果收到RequestVoteRPC请求过来，应该直接return false, 表明自己这一轮已经投过票了（投给自己）

不知道我理解的对不对？谢谢

improve floyd performance

I have wrote a simple test to get the performance of floyd

write 10000 cost time microsecond(us) 8136502, qps 1229
      Node           | Role    |   Term    | CommitIdx |    Leader         |  VoteFor          | LastLogTerm | LastLogIdx | LastApplyIdx |
      127.0.0.1:8901      leader          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8902    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8903    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8904    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000
      127.0.0.1:8905    follower          2      10000       127.0.0.1:8901         127.0.0.1:8901            2      10000      10000

write 10000 cost time microsecond(us) 8325906, qps 1201
      Node           | Role    |   Term    | CommitIdx |    Leader         |  VoteFor          | LastLogTerm | LastLogIdx | LastApplyIdx |
      127.0.0.1:8901      leader          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8902    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8903    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8904    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000
      127.0.0.1:8905    follower          2      20000       127.0.0.1:8901         127.0.0.1:8901            2      20000      20000

it's qps is about 1000+, it is to low, so we had better to improve it

测试add_server1用例发现新加节点无法加入集群问题

测试add_server1用例问题发现，新加节点时，若之前集群的log量太大的化，很容易出现新节点无法正常加入到集群的情况。新节点一直无法收到心跳包，导致一直成为Candidate尝试发起新选举。
定位发现，新节点加入时，一次发送的log量太大了，一直无法处理其他请求导致超时。

这里
1）能否提供个设置
uint64_t append_entries_size_once;
uint64_t append_entries_count_once;
的对外接口
2）这种新加节点情况，每次都要从最原始日志开始拷贝，重做。有无优化计划

Read critical ?

Hi, 在看 Floyd 代码时发现，Floyd 将 read 当做一条日志 Append 到 peers 解决脏读的问题，在 read log entry apply 之后，（1）通过 context_->apply_cond.SignalAll(); 唤醒阻塞等待在 read 的 client 的，（2）然后 client 再 read rocksdb 中 k-v 返回给用户，但是感觉这样会有一 corner case ？

上面标注的两步（1）（2）两步之间会存在一个时间空隙，尽管很小，但是在这段时候，如果针对同一个 key的一个新的 update command 被 apply，虽然没有 read 旧数据，但是 read 会读到更新的数据，

当然单 client 的情况下不会出现这种问题

Add network partition case

replace string:ip with node

there is many string(ip):int(port) in code, that is ugly.
we should replace them with a struct node I think

多线程Write()支持

floyd/floyd/src/floyd_impl.cc

Lines 385 to 391 in 01a6979

 CmdRequest cmd; 

 BuildWriteRequest(key, value, &cmd); 

 CmdResponse response; 

 Status s = DoCommand(cmd, &response); 

 if (!s.ok()) { 

 return s; 

 }

这里是不是可以支持一下多线程写，像leveldb那样的多个写合并到一个中

test with jepsen

Since it's hard to prove that a consensus algorithm is correct, and there is joke that it harder to prove that a consensus algorithm is correct than implement a consensus algorithm.
Right now many consensus algorithm implementation will passing jepsen test to prove that it is correct, so floyd need pass jepsen test, too.

Separate read and write operation

Just read floyd code and raft paper. According to raft paper, read operation will be forwarded to leader and leader holds the whole committed log, that is, the complete meta info.

So is it possible to separate read operation and write operation apart? In this case, for read operation, leader just return to the client without waiting for majority of servers' append entry response. @CatKang @baotiao

赋值运算符不严谨

floyd/floyd/src/floyd_peer_thread.h

Line 101 in d3af033

void operator=(const Peer&);

DISALLOW_COPY_ASSIGN
我觉得写成 Peer & operator=(const Peer&); 会更严谨一些

	return worker_client_pool_->SendAndRecv(
	slash::IpPortString(leader_ip, leader_port),
	request, response);

	CmdRequest cmd;
	BuildWriteRequest(key, value, &cmd);
	CmdResponse response;
	Status s = DoCommand(cmd, &response);
	if (!s.ok()) {
	return s;
	}

pikalabs / floyd Goto Github PK

floyd's People

Contributors

Stargazers

Watchers

Forkers

floyd's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs