ebay / nuraft Goto Github PK
View Code? Open in Web Editor NEWC++ implementation of Raft core logic as a replication library
License: Apache License 2.0
C++ implementation of Raft core logic as a replication library
License: Apache License 2.0
Hi! Recently, I've got the following error https://gist.github.com/alesapin/0dbbc2616a6b88d6a14467fd78dc922a on server startup. Actually, this specific error is not important, but the idea to start some threads from a partially initialized object doesn't look good:
https://github.com/eBay/NuRaft/blob/master/src/raft_server.cxx#L252-L257
So in my case background threads were fast enough to do something and call my callback which (in my case) checks that the current node is Leader. And all these things have happened before the constructor of raft_server
was finished. So such behavior makes raft_server
object very dangerous not only for user callbacks but also for its own code which executes in background threads. Maybe we can introduce some kind of startup()
method which will start all background threads after the object already initialized?
Basically, it's easy to understand that the flexible quorum is safe if it's a static config.
But is the Leader Completeness
guaranteed when Qc
and Qe
are dynamically adjustable?
For example, I have a 5 nodes cluster with Qc(3)
and Qe(3)
(the default algorithm) and then change the config to Qc(5)
and Qe(1)
. How can the cluster always elect a valid leader as there are potential 2 nodes with incomplete raft logs but a leader can be established by its own vote?
I asked the same question in cornerstone github.
Just wanted to get your inputs, how can we implement a scenario
Can you please provide your inputs -
Consider I have 5 servers (S1,S2,S3,S4,S5) , and 7 tasks which are running on these servers. Each task needs 2 replicas.
For task T1 - I can have S1,S2,S3 where S1 is the leader
For task T2 - I can have S2,S1,S3 where S2 is the leader
and so on. Its possible one of the server goes down, then I should be able to add one of the existing server like say if S1 goes down, I should be able to use S4 instead.
Any guidelines or pointers will be really appreciated
For remediating a node, we clone data and logs to new node and add it to the leader.
However, if there are uncommitted logs, the new node (to be added) first commits those logs and then joins the existing cluster.
In handle_join_cluster_req()
, we reset sm_commit_index_
to initial_commit_index_
which causes the rollback of sm_commit_index_
and duplicate commits of the same log will happen. That may be ok for state machine which allows idempotent writes, but may not for the others. We should do it selectively.
Currently, skip_initial_election_timeout_ option is in init_options struct which can only configure through raft_server constructor. It's not possible to configure it through raft_launcher. Better move skip_initial_election_timeout_ option to raft_params.
Please review my PR #142 together with some other changes
I looked at raft_bench.cxx and notice that worker_func will call append_entries, however, the lock is only used to guard numops. When we spawned multiple threads, will these threads call append_entries concurrently? In that way, will append_entries be thread-safe (I guess so)?
Is there any way to subscribe to commit
events in the state_machine? As far as I see there is no callback for it: https://github.com/eBay/NuRaft/blob/master/include/libnuraft/callback.hxx. StateMachineExecution
seems suitable, but I need to get the result of the commit (from the state_machine) in my callback.
Also, I'm not sure, is it safe to make such a callback directly from state_machine's commit
method? According to https://github.com/ClickHouse-Extras/NuRaft/blob/master/src/handle_commit.cxx#L270-L272 seems like yes, but it would be much better if you'll confirm this.
What is the best-recommended way to execute read-only requests on the followers?
Seems likes it is unreliable to check get_committed_log_idx() == get_leader_committed_log_idx()
? Is there any way to directly ask leader for it's commited_log_idx
?
Hi,
I see that in calculator example, after bringing up 3 nodes, from the node 1 we run add-server 2 and 3. This makes node 1 as the leader. In this case if the remote server is not up, add-server will fail. Once the cluster forms, even if nodes goes down and comes back, they will be added back to cluster automatically.
Is there any way to provide the list of remote node's details to each node and once they come up try to add them automatically until either it accepts the other node as leader or it becomes leader. Similar to the phase in the previous case, where all the nodes forms the cluster, but some node gores down and comes back and added back to cluster automatically.
Thanks
Hi there, thanks for your great work. I'd to know, have you implemented the membership change feature? I think It's importan in production enviroment.
Thanks.
Is there a way to reject a clients request? For example, they try to perform an operation that is invalid or wrong?
Let's say we have 3 servers: S1, S2, and S3. Suppose that we don't have leader now. S1 and S2's priorities are higher than S3.
However, let's assume that we have wrong config; election timeout for S1 and S2 is much longer than S3. Below is what will happen in such case:
Problem: S1 and S2 will never reach the election timeout, so that never reduce their target priority.
First time adding server, after initialization, the other node server is initialized and running as well.
log says:
handle_join_leave.cxx:handle_add_srv_req:70: previous config has not committed yet
I've just built NuRaft on OS X 11.3. The election priority test fails as follows:
[ .... ] leader election priority test
=== TEST MESSAGE (BEGIN) ===
time: 2021-04-28 01:07:50.099266
thread: b79a
in: leader_election_priority_test()
at: /Users/Ben/dev/NuRaft/tests/unit/raft_server_test.cxx:943
value of: s2.raftServer->is_leader()
expected: true
actual: false
[ FAIL ] leader election priority test (178.1 ms)
[01:07:50.100 570] [tid b79a] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
[01:07:50.100 898] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]
=== Critical info (given by user): 0 bytes ===
[01:07:50.100 970] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
[01:07:51.270 971] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]
Thread b79a (crashed here)
#0 0x000000010f7f2c10 in SimpleLoggerMgr::logStackBacktrace(unsigned long) at logger.cc:390
#1 0x000000010f7f32b4 in SimpleLoggerMgr::handleSegAbort(int) at logger.cc:451
#2 0x00007fff2072ad7d in _sigtramp() at libsystem_platform.dylib
#3 0x0000000114416298 in 0x0() at ???
#4 0x00007fff2063a411 in abort() at libsystem_c.dylib
#5 0x000000010f7e7bd0 in TestSuite::reportTestResult(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) at test_common.h:1358
#6 0x000000010f7e1572 in main at raft_server_test.cxx:2389
#7 0x00007fff20700f3d in start() at libdyld.dylib
#8 0x0000000000000002 in 0x0() at ???
[ABORT] Flushed all logs safely.
./runtests.sh: line 9: 892 Abort trap: 6 ./tests/raft_server_test --abort-on-failure
Hi :
NuRaft have been used in large-scale production ?
For the code segment https://github.com/eBay/NuRaft/blob/master/src/handle_append_entries.cxx#L856-L862 in raft_server::handle_append_entries_resp()
, I have question on the else case (i.e. p->get_next_log_idx() <= resp.get_next_idx()
). Why it is required to move one log backward (p->set_next_log_idx(p->get_next_log_idx() - 1)
) instead of just doing same thing ( p->set_next_log_idx(resp.get_next_idx())
) as the if case?
Question as title. It would help to get a node joining the cluster faster by recovering states from last snapshot instead of applying large number of log entries.
#0 0x0000000000418961 in SimpleLoggerMgr::logStackBacktrace(unsigned long) at /home/xmly/NuRaft/examples/logger.cc:390
#1 0x0000000000418f74 in SimpleLoggerMgr::handleSegAbort(int) at /home/xmly/NuRaft/examples/logger.cc:451
#2 0x00000000000366d0 in __restore_rt() at sigaction.c:?
#3 0x00007f381686164b in gsignal() at ??:0
#4 0x00007f3816863450 in abort() at ??:0
#5 0x00007f38173da055 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv() at ??:0
#6 0x000000000008fc46 in __cxxabiv1::__terminate(void ()()) at /usr/src/debug/gcc-7.1.1-20170622/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/eh_terminate.cc:51
#7 0x000000000008fc91 in std::terminate() at ??:?
#8 0x000000000008fed4 in __cxa_throw() at ??:?
#9 0x000000000043b1ae in void asio::detail::throw_exceptionstd::system_error(std::system_error const&) at ??:?
#10 0x000000000043b482 in asio::detail::do_throw_error(std::error_code const&, char const) at /home/xmly/NuRaft/asio/asio/include/asio/detail/impl/throw_error.ipp:49
#11 0x00000000004341c3 in asio::detail::throw_error(std::error_code const&, char const*) at /home/xmly/NuRaft/asio/asio/include/asio/detail/throw_error.hpp:41
#12 0x0000000000423aba in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&) at /usr/include/c++/7/bits/shared_ptr_base.h:710
#13 0x000000000040adc7 in calc_server::init_raft(std::shared_ptrnuraft::state_machine) at /home/xmly/NuRaft/examples/example_common.hxx:183
#14 0x0000000000409148 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() at /usr/include/c++/7/bits/shared_ptr_base.h:681
#15 0x00007f381684b4da in __libc_start_main() at ??:0
#16 0x000000000040977a in _start() at ??:?
Hi , Dose NuRaft support configure change by using joint consensus ? or hava a plan to support it in the future?
Same code deployed in 3 machines: {1,2,3}. Tries to add_srv in 1, and can add only one more machine (either 2 or 3), when adding one more, it gives error -7. Code is the same, why for example adding 2 is successful but unsuccessful when adding one more (3)?
There is an optimization in cornerstone that introduces leader expiration to prevent a node thinks itself as a leader forever if it's isolated, see this, do you want to port it?
Hi!
In my state machine implementation, I'm always using the single latest snapshot. I've encountered the following case:
2021.05.05 21:35:46.899607 [ 1861 ] {} <Information> RaftInstance: creating a snapshot for index 1100
2021.05.05 21:35:46.899616 [ 1861 ] {} <Information> RaftInstance: create snapshot idx 1100 log_term 1
2021.05.05 21:35:46.899634 [ 1861 ] {} <Debug> KeeperStateMachine: Creating snapshot 1100
2021.05.05 21:35:46.899668 [ 1861 ] {} <Debug> KeeperStateMachine: In memory snapshot 1100 created, queueing task to flash to disk
2021.05.05 21:35:46.900040 [ 1861 ] {} <Information> RaftInstance: create snapshot idx 1100 log_term 1 done: 411 us elapsed
2021.05.05 21:35:46.922391 [ 1862 ] {} <Debug> RaftInstance: send snapshot peer 3, peer log idx: 465, my starting idx: 991, my log idx: 1104, last_snapshot_log_idx: 1000
2021.05.05 21:35:46.922406 [ 1862 ] {} <Debug> RaftInstance: previous sync_ctx exists 0x7f9eccdcf418, offset 1, snp idx 1000, user_ctx (nil)
2021.05.05 21:35:46.922417 [ 1862 ] {} <Debug> RaftInstance: peer: 3, obj_idx: 1, user_snp_ctx (nil)
2021.05.05 21:35:46.922428 [ 1862 ] {} <Debug> KeeperStateMachine: Reading snapshot 1000 obj_id 1
2021.05.05 21:35:46.947817 [ 1855 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 1100 with path /home/robot-clickhouse/db/coordination/snapshots/snapshot_1100.bin
DB::Exception Required to read snapshot with last log index 1000, but our last log index is 1100, Stack trace (when copying this message, always include the lines below):
Thus, the leader server already finished creating 1100-idx snapshot, but was requested to read 1000-idx snapshot. What can we do in this case? Seems like the return code of read_logical_snp_obj
is ignored in the library code.
Another performance improvement is to use unique_ptr instead of shared_ptr for buffers, maybe worth to port as well, check this
Hello, how can I make the cluster config persistent? For example, if I have 3 servers instances online that are already part a cluster and all of them get offline at the same time. There is a way to make them reconnect to the same cluster? The first one to reconnect become master, and the others followers?
To achieve that, the only thing I need to do is implement the in_memory state_mgr.hxx file to save to the disk instead of use memory?
Is there any other project that uses NuRaft as a library? I want to check how they incorporated NuRaft as a library for building other distributed system.
Hi!
As far as I understood from examples and issues NuRaft cluster startup should look like this:
raft_server
on all nodes.S_l
) to add other nodes according to whatever rules.S_l
call add_srv
for all other nodes.S_l
just wait until we will be added to the cluster by S_l
.But this algorithm has several complexities on the user-side. For example, what we should do if S_l
crashed after it was chosen to add other nodes? Probably we should use some timeouts, and try to choose another node to be S_l'
. Or what we should do if one of the follower nodes doesn't respond to add_srv
. Probably start without it, but try to add it in the background.
It would be much simpler if there would be a way to start all nodes with some fixed configuration and all leader/follower communication will be on the RAFT's side. This approach is described in the original paper:
When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats (AppendEntries RPCs that carry no log entries) to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.
Are there any fundamental reasons why it's not implemented?
[ .... ] multiple config change test
=== TEST MESSAGE (BEGIN) ===
time: 2020-10-22 16:36:46.954628
thread: c858
in: multiple_config_change_test()
at: /home/development/Public/NuRaft/tests/unit/raft_server_test.cxx:788
value of: configs_out.size()
expected: 3
actual: 2
[ FAIL ] multiple config change test (728.2 ms)
[16:36:46.955 825] [tid c858] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
[16:36:46.956 037] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
=== Critical info (given by user): 0 bytes ===
[16:36:46.956 167] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
[16:36:47.196 321] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
[ .... ] remove node test
=== TEST MESSAGE (BEGIN) ===
time: 2020-10-22 16:48:42.305966
thread: 1d15
in: remove_node_test()
at: /home/development/Public/NuRaft/tests/unit/raft_server_test.cxx:481
value of: configs.size()
expected: 2
actual: 3
info: id = 1
[ FAIL ] remove node test (522.2 ms)
[16:48:42.306 822] [tid 1d15] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
[16:48:42.307 030] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]
=== Critical info (given by user): 0 bytes ===
[16:48:42.307 167] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
[16:48:42.535 961] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]
Seems both cases related to remove_srv()
and the operation will succeed or fail depending on timing.
Getting the following crash randomly on Ubuntu 18.04 . Any fix or mitigation ?
munmap_chunk(): invalid pointer
Thread 21 "nuraft_w_0" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff74ae8700 (LWP 21466)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Sun Mar 28 06:37:01 UTC 2021
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff5d4c921 in __GI_abort () at abort.c:79
#2 0x00007ffff5d95967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff5ec2b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff5d9c9da in malloc_printerr (str=str@entry=0x7ffff5ec4720 "munmap_chunk(): invalid pointer") at malloc.c:5342
#4 0x00007ffff5da3fbc in munmap_chunk (p=0x7fff480018e0) at malloc.c:2846
#5 __GI___libc_free (mem=0x7fff480018f0) at malloc.c:3127
#6 0x00005555565fa90e in OPENSSL_free (orig_ptr=0x7fff480018f8) at external/boringssl/src/crypto/mem.c:154
#7 0x000055555657e70d in bio_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/pair.c:144
#8 0x000055555657c3a9 in BIO_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/bio.c:103
#9 0x00005555566c610c in asio::ssl::detail::engine::~engine (this=0x555558680e70, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/detail/impl/engine.ipp:66
#10 asio::ssl::detail::stream_core::~stream_core (this=0x555558680e70, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/detail/stream_core.hpp:54
#11 0x00005555566c621a in asio::ssl::stream<asio::basic_stream_socket<asio::ip::tcp>&>::~stream (this=0x555558680e68, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/strea$.hpp:120
#12 nuraft::asio_rpc_client::~asio_rpc_client (this=0x555558680e10, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/src/asio_service.cxx:854
#13 0x0000555556714b3e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555558680e00) at /usr/include/c++/7/bits/shared_ptr_base.h:154
#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684
#15 std::__shared_ptr<nuraft::rpc_client, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123
#16 std::__shared_ptr<nuraft::rpc_client, (__gnu_cxx::_Lock_policy)2>::operator= (__r=..., this=0x5555586adc50) at /usr/include/c++/7/bits/shared_ptr_base.h:1213
#17 std::shared_ptr<nuraft::rpc_client>::operator= (__r=..., this=0x5555586adc50) at /usr/include/c++/7/bits/shared_ptr.h:319
#18 nuraft::peer::recreate_rpc (this=0x5555586adc30, config=std::shared_ptr<nuraft::srv_config> (use count 3, weak count 0) = {...}, ctx=...) at /home/azureuser/barrel/thirdparty/NuRaft/src/peer.cxx:205
#19 0x000055555670eb85 in nuraft::raft_server::request_prevote (this=this@entry=0x55555872b490) at /home/azureuser/barrel/thirdparty/NuRaft/src/handle_vote.cxx:80
#20 0x000055555670a4db in nuraft::raft_server::handle_election_timeout (this=0x55555872b490) at /home/azureuser/barrel/thirdparty/NuRaft/src/handle_timeout.cxx:288
#21 0x00005555566c1034 in std::__invoke_impl<void, void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (__f=<optimized out>) at /usr/include/$++/7/bits/invoke.h:60
#22 std::__invoke<void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (__fn=@0x7fff74abcef0: 0x5555566b3990 <_timer_handler_(std::shared_ptr<$uraft::delayed_task>&, std::error_code)>) at /usr/include/c++/7/bits/invoke.h:95
#23 std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::__call<void, std::error_code const&, 0ul, 1ul>(std::tuple<std::error_code con$t&>&&, std::_Index_tuple<0ul, 1ul>) (__args=..., this=0x7fff74abcef0) at /usr/include/c++/7/functional:467
#24 std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::operator()<std::error_code const&, void>(std::error_code const&) (this=0x7fff$4abcef0) at /usr/include/c++/7/functional:551
#25 asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>::operator()() (this=0x7fff74abcef0) at
/home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/bind_handler.hpp:64
#26 asio::asio_handler_invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::deta$l::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, ...) (function=...) at /home/azureuser/barrel/third$arty/NuRaft/asio/asio/include/asio/handler_invoke_hook.hpp:68
#27 asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, st
d::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std
::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std
::error_code)>&) (context=..., function=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#28 asio::detail::handler_work<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, asio::system_executor>::complete<asio::detail::bind
er1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nu
raft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nur
aft::delayed_task>&, std::error_code)>&) (this=<synthetic pointer>, handler=..., function=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/handler_work.hpp:81
#29 asio::detail::wait_handler<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)> >::do_complete(void*, asio::detail::scheduler_operat
ion*, std::error_code const&, unsigned long) (owner=0x55555860be90, base=0x7fff080133a0) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/wait_handler.hpp:71
#30 0x00005555566bed25 in asio::detail::scheduler_operation::complete (bytes_transferred=<optimized out>, ec=..., owner=0x55555860be90, this=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/as
io/detail/scheduler_operation.hpp:39
#31 asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., this=0x55555860be90) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/impl/scheduler.ipp:400
#32 asio::detail::scheduler::run (this=0x55555860be90, ec=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/impl/scheduler.ipp:153
#33 0x00005555566b3dc5 in asio::io_context::run (this=0x55555860bc20) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/impl/io_context.ipp:61
#34 nuraft::asio_service_impl::worker_entry (this=0x55555860bc20) at /home/azureuser/barrel/thirdparty/NuRaft/src/asio_service.cxx:1563
#35 0x00007ffff74e26df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#36 0x00007ffff7bbb6db in start_thread (arg=0x7fff74ae8700) at pthread_create.c:463
#37 0x00007ffff5e2d71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95`
(gdb) frame 7
#7 0x000055555657e70d in bio_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/pair.c:144
144 external/boringssl/src/crypto/bio/pair.c: No such file or directory.
(gdb) p *bio
$7 = {method = 0x555556f911e0 <methods_biop>, init = 0, shutdown = 1, flags = 0, retry_reason = 0, num = 0, references = 0, ptr = 0x7fff480018f8, next_bio = 0x0, num_read = 0, num_write = 0}
(gdb) frame 6
#6 0x00005555565fa90e in OPENSSL_free (orig_ptr=0x7fff480018f8) at external/boringssl/src/crypto/mem.c:154
154 external/boringssl/src/crypto/mem.c: No such file or directory.
(gdb) p orig_ptr
$8 = (void *) 0x7fff480018f8
(gdb) p (struct bio_bio_st *)orig_ptr
$9 = (struct bio_bio_st *) 0x7fff480018f8
(gdb) p *(struct bio_bio_st *)orig_ptr
$10 = {peer = 0x0, closed = 0, len = 0, offset = 0, size = 0, buf = 0x0, request = 0}
(gdb) info local
ptr = 0x7fff480018f0
size = 56
(gdb) p *(struct bio_bio_st *)0x7fff480018f0
$11 = {peer = 0x0, closed = 0, len = 0, offset = 0, size = 0, buf = 0x0, request = 0}
Let's say we have 2 servers: S1, S2
and S1 is up and it is the leader. S2 is not up now ( a new Server ).
add S2 to the system, this time S2 is not up.
when S2 up, how can S1 know S2 up or S2 know S1 up?
Thanks
Hi! Seems like it's possible that NuRaft will concurrently run both methods (gdb trace https://gist.github.com/alesapin/973313f4d89f76634b4b1dd7e653e8ff). My code was node ready for this and I've got a deadlock :)
Is it expected behavior? Of course, I'll fix my code, but maybe it would be simpler to manage this on the NuRaft side? Both create
and save
snapshot methods write data to disk. In my case, it's not a problem, but I can imagine cases when it possibly can be painful.
in the benchmark result file, it suggest that as the payload size increases, the replication throughput is increased as well. But I noticed that the unit becomes MB/s instead of previously ops/s. Can you explain how this value is calculated? thank you.
Hi, @greensky00
Just want to confirm, when we followers receive more than 1 request from leader, or when leader has more than 1 request to send to followers, will it send one by one, or will it batch them into one, and then send?
Hi @greensky00 ,
Thank you for your great job!
I found that all members start as a leader. I have 4 members (s0, s1, s2, s3), and I want only s0, s1 and s2 to participate in voting. How can I avoid s3 becoming a leader?
I see that list command lists the configured servers in the cluster, but they are not necessarily up at that time.
How to list the servers that are only up?
Is there any API to get the details of down followers and track cluster membership?
How can I add a call-back function to be called when any follower goes down or comes up?
Hi @greensky00
I am considering the possibility of using NuRaft for two-node cluster and see that there is a special option auto_adjust_quorum_for_small_cluster_
which would change both commit and election quorum sizes to 1 once a node is offline.
I tried this option but once the offline node becomes online again, if there is network partition issue between two nodes, both nodes will consider itself as leader.
May I know if eBay uses two-node cluster with this option and how would you handle the case for network partition? Thanks~
Hi,
In the case of auto_forwarding_= true
, can the append_entries
of the follower only forward messages to the leader one by one?
When using auto forwarding for appending entries, the accepted flag doesn't seem to get updated whereas when it doesn't use auto-forwarding (current node = leader) it updates the accepted flag. This leads to my code thinking there is an error because the accepted flag is false when everything is actually fine.
I think it just needs presult.accept() calling when checking if the original request is accepted to also update the result accept flag.
NuRaft/src/handle_user_cmd.cxx
Lines 180 to 182 in 17aab8d
changed to:
if (resp->get_accepted()) {
resp_ctx = resp->get_ctx();
presult->accept();
}
It could also be my understanding of auto-forwarding being wrong and that is expected behaviour?
Thanks,
James.
libnuraft.so: undefined symbol: SSL_CTX_free with libssl-dev v 1.1.1 in ubuntu18.
Hi, I am a bit curious about the latency result in https://github.com/eBay/NuRaft/blob/master/docs/bench_results.md
The network RTT is about 180 micro seconds. Raft needs two RTTs for one request to be committed (Client->Leader->Follower->Leader->Client). In that way, the median latency should be much larger than 180 micro seconds, but why are they almost the same (187 micro seconds)?
Hi, and first of all thank you for the very cool library. We've connected nuraft with rocksdb as the WAL/state machine backend. It seems to work well and is pretty fast, but we're occasionally running into some issues, which are likely our fault but we're having trouble debugging them.
After a few tens of thousands of calls of append_entries
in async mode, there's an instance where the provided callback is never called. As far as we can tell the data is committed, but since we never receive a response we can't really make safe assumptions about the state of the system anymore.
We call append_entries
like this:
void PeshkaCtrl::asyncAppendEntries(
const std::shared_ptr<nuraft::buffer> &logs,
cutils::handler_t<void(cutils::ResponseCode &rc)> handler) const
{
LOGTRACE(m_Logger, "asyncAppendEntries start");
m_RaftServer->append_entries({logs})->when_ready(
[this, handler=std::move(handler)]
(const nuraft::cmd_result<std::shared_ptr<nuraft::buffer>> &res,
const std::shared_ptr<std::exception> &ex) mutable
{
if (res.get_accepted())
{
LOGTRACE(m_Logger, "asyncAppendEntries accepted");
cutils::post(handler, cutils::ResponseCode());
}
else
{
std::string message = ex ? ex->what() : res.get_result_str();
LOGTRACE(m_Logger, "asyncAppendEntries error ", message);
cutils::post(handler, cutils::ResponseCode(enums::Response::ERROR_PERMANENT, message));
}
}
);
}
I'm attaching some (trace-level) log messages of an example where this has happened: logs.zip
You will observe that the last call to asyncAppendEntries
(shown above) never receives a matching "done" or "error" message. I'm not sure if the nuraft trace logs reveal where the problem is, I've stared at them for hours and I just can't see it.
It could be a bug in our state machine or WAL implementations, but as far as our unit tests go we think those are ok. We believe the last transaction is actually committed, as when we restart the leader and a new one is elected things appear to progress as expected.
Please let us know if you can spot any irregularities in those nuraft log messages. We're happy to provide any other information or code that may be useful. Your help is very much appreciated!
Edit: I should add that we build Nuraft against boost::asio
, which we use quite extensively in our code base.
Edit: boost version 1.71, nuraft 1.2
Hi team !
NuRaft's consistency Has been rigorously verified? BTW, how many nodes at largest did eBay use in a production-scale environment?
Thanks very much !
I have two nodes in the cluster {1,2}, the leader is 1. Now 2 is disconnected, and some error message is shown like this (on 1):
Error: raft_server.cxx:check_leadership_validity:856: 1 nodes (out of 2, 2 including learners) are not responding longer than 2500 ms, at least 2 nodes (including leader) should be alive to proceed commit
Error: raft_server.cxx:check_leadership_validity:858: will yield the leadership of this node
Is it by design that raft cannot work with only one node?
thread: 8a08
in: bench_main()
at: /home/node3/NuRaft-master001/tests/bench/raft_bench.cxx:379
value of: add_servers(stuff, config)
expected: 0
actual: -1
Hi! I'm testing my ZooKeeper-like system based on NuRaft using Jepsen framework. Recently I found a very strange behavior when NuRaft decided to rollback already committed log entries:
2021.03.24 11:23:01.057713 [ 388 ] {} <Information> RaftInstance: rollback logs: 509 - 530, commit idx req 400, quick 530, sm 530, num log entries 1, current count 0
2021.03.24 11:23:01.057726 [ 388 ] {} <Warning> RaftInstance: rollback quick commit index from 530 to 508
2021.03.24 11:23:01.057740 [ 388 ] {} <Warning> RaftInstance: rollback sm commit index from 530 to 508
2021.03.24 11:23:01.057751 [ 388 ] {} <Information> RaftInstance: rollback log 530
2021.03.24 11:23:01.057766 [ 388 ] {} <Information> RaftInstance: rollback log 529
2021.03.24 11:23:01.057780 [ 388 ] {} <Information> RaftInstance: rollback log 528
2021.03.24 11:23:01.057788 [ 388 ] {} <Information> RaftInstance: rollback log 527
2021.03.24 11:23:01.057794 [ 388 ] {} <Information> RaftInstance: rollback log 526
2021.03.24 11:23:01.057801 [ 388 ] {} <Information> RaftInstance: rollback log 525
2021.03.24 11:23:01.057807 [ 388 ] {} <Information> RaftInstance: rollback log 524
2021.03.24 11:23:01.057813 [ 388 ] {} <Information> RaftInstance: rollback log 523
2021.03.24 11:23:01.057828 [ 388 ] {} <Information> RaftInstance: rollback log 522
2021.03.24 11:23:01.057860 [ 388 ] {} <Information> RaftInstance: rollback log 521
2021.03.24 11:23:01.057867 [ 388 ] {} <Information> RaftInstance: rollback log 520
2021.03.24 11:23:01.057873 [ 388 ] {} <Information> RaftInstance: rollback log 519
2021.03.24 11:23:01.057879 [ 388 ] {} <Information> RaftInstance: rollback log 518
2021.03.24 11:23:01.057885 [ 388 ] {} <Information> RaftInstance: rollback log 517
2021.03.24 11:23:01.057891 [ 388 ] {} <Information> RaftInstance: rollback log 516
2021.03.24 11:23:01.057897 [ 388 ] {} <Information> RaftInstance: rollback log 515
2021.03.24 11:23:01.057903 [ 388 ] {} <Information> RaftInstance: rollback log 514
2021.03.24 11:23:01.057909 [ 388 ] {} <Information> RaftInstance: rollback log 513
2021.03.24 11:23:01.057915 [ 388 ] {} <Information> RaftInstance: rollback log 512
2021.03.24 11:23:01.057920 [ 388 ] {} <Information> RaftInstance: rollback log 511
2021.03.24 11:23:01.057926 [ 388 ] {} <Information> RaftInstance: rollback log 510
2021.03.24 11:23:01.057931 [ 388 ] {} <Information> RaftInstance: rollback log 509
2021.03.24 11:23:01.057937 [ 388 ] {} <Information> RaftInstance: overwrite at 509
2021.03.24 11:23:01.209754 [ 388 ] {} <Information> RaftInstance: receive a config change from leader at 509
2021.03.24 11:23:01.209816 [ 388 ] {} <Debug> RaftInstance: [after OVWR] log_idx: 510, count: 1
As far as I understand this should never happen in RAFT and the comment clearly states about it https://github.com/ebay/NuRaft/blob/master/src/handle_append_entries.cxx#L631-L645. So isn't it a bug?
Also, I've found this interesting comment:
https://github.com/ebay/NuRaft/blob/master/src/handle_append_entries.cxx#L696-L717.
Full trace logs from all three nodes:
Related ClickHouse/ClickHouse#21677
I use sqlite3 to store log entires,but I do not understand how to implement log_store interface,include issues:
private:
static ptr<log_entry> make_clone(const ptr<log_entry> &entry);
// Dummy entry for index 0.
ptr<log_entry> dummy_entry_;
db_entry_helper entry_helper; //ๆไฝๆฐๆฎๅบ
std::atomic<ulong> start_idx_; //the start log id
std::atomic<ulong> last_idx_; //the last log id
bool indb_log_store::compact(ulong last_log_index) {
info("compact log from [%u] to [%u], cur last_log_index:[%d]", start_idx_.load(), last_log_index, last_idx_.load());
entry_helper.removeRange("id", start_idx_, last_log_index);
// WARNING:
// Even though nothing has been erased,
// we should set `start_idx_` to new index.
start_idx_ = last_log_index + 1;
auto last = entry_helper.selectLast("id");
if (!last->empty()) {
last_idx_ = last->at(0).id_;
} else {
last_idx_ = last_log_index + 1;
}
info("start_idx:[%d], last_indx[%d]", start_idx_.load(), last_idx_.load());
}
ๆนๆฅๅฟ 3ๅทๆฅๅฟ leader log
2019-11-12T18:51:13.792_873+08:00 [c313] [INFO] trying to sync snapshot with last index 15 to peer 1, its last log idx 9 [handle_snapshot_sync.cxx:111, create_sync_snapshot_req()]
2019-11-12T18:51:13.794_468+08:00 [7067] [WARN] peer 1 declined snapshot: p->get_next_log_idx(): 10, log_store_->next_slot(): 17 [handle_snapshot_sync.cxx:317, handle_install_snapshot_resp()]
2019-11-12T18:51:13.795_047+08:00 [a3ce] [WARN] declined append: peer 1, prev next log idx 12, resp next 12, new next log idx 11 [handle_append_entries.cxx:676, handle_append_entries_resp()]
2019-11-12T18:51:13.795_317+08:00 [a3ce] [ERRO] bad log_idx 10 for retrieving the term value, will ignore this log req [raft_server.cxx:1063, term_for_log()]
2019-11-12T18:51:13.795_322+08:00 [a3ce] [ERRO] last snapshot 0x7ff0ec003fe0, log_idx 10, snapshot last_log_idx 15 [raft_server.cxx:1066, term_for_log()]
2019-11-12T18:51:13.795_325+08:00 [a3ce] [ERRO] log_store_->start_index() 11 [raft_server.cxx:1068, term_for_log()]
2019-11-12T18:51:13.795_412+08:00 [c313] [WARN] declined append: peer 1, prev next log idx 11, resp next 12, new next log idx 10
//ๅ้ๆนๆฅๅฟ 1ๅทๆฅๅฟ fflower logs
2019-11-12T18:51:13.814_681+08:00 [42d1] [WARN] [LOG XX] req log idx: 11, req log term: 1, my last log idx: 11, my log (11) term: 0 [handle_append_entries.cxx:391, handle_append_entries()]
2019-11-12T18:51:13.814_683+08:00 [42d1] [WARN] deny, req term 1, my term 1, req log idx 11, my log idx 11 [handle_append_entries.cxx:398, handle_append_entries()]
2019-11-12T18:51:13.814_685+08:00 [42d1] [WARN] snp idx 15 term 1 [handle_append_entries.cxx:402, handle_append_entries()]
2019-11-12T18:51:13.814_845+08:00 [8b66] [ERRO] bad log_idx 10 for retrieving the term value, will ignore this log req [raft_server.cxx:1063, term_for_log()]
2019-11-12T18:51:13.814_848+08:00 [8b66] [ERRO] last snapshot 0x7f213c0013c0, log_idx 10, snapshot last_log_idx 15 [raft_server.cxx:1066, term_for_log()]
2019-11-12T18:51:13.814_850+08:00 [8b66] [ERRO] log_store_->start_index() 16 [raft_server.cxx:1068, term_for_log()]
Hi, thank you for the great library!
I'm experimenting with NuRaft in the ClickHouse DBMS. I've implemented ZooKeeper-like prototype on top of NuRaft and now performing simple tests. In one of them we have three NuRaft servers and consistently block them from each other using iptables. After removing iptable rules server aborted in debug mode with the following error:
2021.01.28 13:51:58.114048 [ 55 ] {} <Fatal> RaftInstance: socket is already in use, race happened on connection to node1:44444
2021.01.28 13:51:58.115156 [ 121 ] {} <Fatal> BaseDaemon: ########################################
2021.01.28 13:51:58.115315 [ 121 ] {} <Fatal> BaseDaemon: (version 21.2.1.1, build id: 0EE2AC6B59408A6EC7E3DA283C8FE6471D28ADFA) (from thread 55) (no query) Received signal Aborted (6)
2021.01.28 13:51:58.115397 [ 121 ] {} <Fatal> BaseDaemon:
2021.01.28 13:51:58.115495 [ 121 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f5aadad418b 0x7f5aadab3859 0x7f5aadab3729 0x7f5aadac4f36 0x1eb92f74 0x1eb8e029 0x1eb9b1db 0x1eba01d0 0x1eba0074 0x1eb9bfb1 0x1eb9ba3b 0x1eb9e388 0x1eb9e345 0x1eb9e322 0x1eb9e2e6 0x1eb9e2b0 0x1eb9e25d 0x1eb9e120 0x1eb9e0c9 0x1eb9ddc4 0x1eb9da0f 0x1eb7caf5 0x1eb9cc02 0x1eb7caf5 0x1eb7bf82 0x1eb7ba8e 0x1eb760ae 0x1eb72937
2021.01.28 13:51:58.115725 [ 121 ] {} <Fatal> BaseDaemon: 4. raise @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115810 [ 121 ] {} <Fatal> BaseDaemon: 5. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115870 [ 121 ] {} <Fatal> BaseDaemon: 6. ? @ 0x25729 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115929 [ 121 ] {} <Fatal> BaseDaemon: 7. ? @ 0x36f36 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.162884 [ 121 ] {} <Fatal> BaseDaemon: 8. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1110: nuraft::asio_rpc_client::set_busy_flag(bool) @ 0x1eb92f74 in /usr/bin/clickhouse
2021.01.28 13:51:58.208419 [ 121 ] {} <Fatal> BaseDaemon: 9. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1016: nuraft::asio_rpc_client::send(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&) @ 0x1eb8e029 in /usr/bin/clickhouse
2021.01.28 13:51:58.254000 [ 121 ] {} <Fatal> BaseDaemon: 10. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1162: nuraft::asio_rpc_client::connected(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>) @ 0x1eb9b1db in /usr/bin/clickhouse
2021.01.28 13:51:58.299755 [ 121 ] {} <Fatal> BaseDaemon: 11. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/type_traits:3617: decltype(*(std::__1::forward<std::__1::shared_ptr<nuraft::asio_rpc_client>&>(fp0)).*fp(std::__1::forward<std::__1::shared_ptr<nuraft::req_msg>&>(fp1), std::__1::forward<std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&>(fp1), std::__1::forward<boost::system::error_code const&>(fp1), std::__1::forward<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>(fp1))) std::__1::__invoke<void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client>&, std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&, void>(void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client>&, std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&) @ 0x1eba01d0 in /usr/bin/clickhouse
2021.01.28 13:51:58.344898 [ 121 ] {} <Fatal> BaseDaemon: 12. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/functional:2857: std::__1::__bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>, __is_valid_bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >::value>::type std::__1::__apply_functor<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, 0ul, 1ul, 2ul, 3ul, 4ul, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >(void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >&, std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul, 4ul>, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>&&) @ 0x1eba0074 in /usr/bin/clickhouse
2021.01.28 13:51:58.390103 [ 121 ] {} <Fatal> BaseDaemon: 13. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/functional:2890: std::__1::__bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>, __is_valid_bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >::value>::type std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&>::operator()<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>(boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&) @ 0x1eb9bfb1 in /usr/bin/clickhouse
2021.01.28 13:51:58.435141 [ 121 ] {} <Fatal> BaseDaemon: 14. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/connect.hpp:565: boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >::operator()(boost::system::error_code, int) @ 0x1eb9ba3b in /usr/bin/clickhouse
2021.01.28 13:51:58.481436 [ 121 ] {} <Fatal> BaseDaemon: 15. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/bind_handler.hpp:66: boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>::operator()() @ 0x1eb9e388 in /usr/bin/clickhouse
2021.01.28 13:51:58.530046 [ 121 ] {} <Fatal> BaseDaemon: 16. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/handler_invoke_hook.hpp:70: void boost::asio::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, ...) @ 0x1eb9e345 in /usr/bin/clickhouse
2021.01.28 13:51:58.575332 [ 121 ] {} <Fatal> BaseDaemon: 17. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&>&) @ 0x1eb9e322 in /usr/bin/clickhouse
2021.01.28 13:51:58.620160 [ 121 ] {} <Fatal> BaseDaemon: 18. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/connect.hpp:614: void boost::asio::detail::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::executor, boost::asio::ip::tcp, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >*) @ 0x1eb9e2e6 in /usr/bin/clickhouse
2021.01.28 13:51:58.664685 [ 121 ] {} <Fatal> BaseDaemon: 19. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> > >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >&) @ 0x1eb9e2b0 in /usr/bin/clickhouse
2021.01.28 13:51:58.709550 [ 121 ] {} <Fatal> BaseDaemon: 20. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/bind_handler.hpp:108: void boost::asio::detail::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>*) @ 0x1eb9e25d in /usr/bin/clickhouse
2021.01.28 13:51:58.754456 [ 121 ] {} <Fatal> BaseDaemon: 21. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&) @ 0x1eb9e120 in /usr/bin/clickhouse
2021.01.28 13:51:58.802236 [ 121 ] {} <Fatal> BaseDaemon: 22. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/io_object_executor.hpp:120: void boost::asio::detail::io_object_executor<boost::asio::executor>::dispatch<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, std::__1::allocator<void> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&&, std::__1::allocator<void> const&) const @ 0x1eb9e0c9 in /usr/bin/clickhouse
2021.01.28 13:51:58.847350 [ 121 ] {} <Fatal> BaseDaemon: 23. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_work.hpp:74: void boost::asio::detail::handler_work<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::asio::detail::io_object_executor<boost::asio::executor>, boost::asio::detail::io_object_executor<boost::asio::executor> >::complete<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >&) @ 0x1eb9ddc4 in /usr/bin/clickhouse
2021.01.28 13:51:58.893035 [ 121 ] {} <Fatal> BaseDaemon: 24. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/reactive_socket_connect_op.hpp:102: boost::asio::detail::reactive_socket_connect_op<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::asio::detail::io_object_executor<boost::asio::executor> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) @ 0x1eb9da0f in /usr/bin/clickhouse
2021.01.28 13:51:58.936458 [ 121 ] {} <Fatal> BaseDaemon: 25. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/scheduler_operation.hpp:41: boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long) @ 0x1eb7caf5 in /usr/bin/clickhouse
2021.01.28 13:51:58.980078 [ 121 ] {} <Fatal> BaseDaemon: 26. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/epoll_reactor.ipp:778: boost::asio::detail::epoll_reactor::descriptor_state::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) @ 0x1eb9cc02 in /usr/bin/clickhouse
2021.01.28 13:51:59.025735 [ 121 ] {} <Fatal> BaseDaemon: 27. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/scheduler_operation.hpp:41: boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long) @ 0x1eb7caf5 in /usr/bin/clickhouse
2021.01.28 13:51:59.072552 [ 121 ] {} <Fatal> BaseDaemon: 28. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/scheduler.ipp:447: boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) @ 0x1eb7bf82 in /usr/bin/clickhouse
2021.01.28 13:51:59.116394 [ 121 ] {} <Fatal> BaseDaemon: 29. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/scheduler.ipp:200: boost::asio::detail::scheduler::run(boost::system::error_code&) @ 0x1eb7ba8e in /usr/bin/clickhouse
2021.01.28 13:51:59.160091 [ 121 ] {} <Fatal> BaseDaemon: 30. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/io_context.ipp:63: boost::asio::io_context::run() @ 0x1eb760ae in /usr/bin/clickhouse
2021.01.28 13:51:59.203132 [ 121 ] {} <Fatal> BaseDaemon: 31. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1557: nuraft::asio_service_impl::worker_entry() @ 0x1eb72937 in /usr/bin/clickhouse
2021.01.28 13:52:00.008916 [ 121 ] {} <Fatal> BaseDaemon: Calculated checksum of the binary: F8D26D56372D0BFB6D3F28EA91D86317. There is no information about the reference checksum.
2021.01.28 13:52:08.255602 [ 1 ] {} <Fatal> Application: Child process was terminated by signal 6.
Port 44444 passed to launcher.init
. The error is not stable, reproduces rarely. Also, found some thread sanitizer alerts:
https://gist.github.com/alesapin/560f5efadbf0e622d725cb7cdca7b7c2 and https://gist.github.com/alesapin/7e68b299a678489a04590f975e08753e.
I made a rough diff between echo_server and bench.
How can I modify it payload size test
I have a cluster (auto_forwarding_
enabled) with 3 nodes: node1
(priority 3), node2
(priority 2), node3
(priority 1). When I simulate network outage for node1
, node2
and node3
select new leader quite fast and everything continues to work well. But very rarely something strange happens and they cannot make any progress for about 5-10 minutes.
Logs from node2
https://gist.github.com/alesapin/502b6abea98d54cf83eed3b87e7e1aa7 and from node3
https://gist.github.com/alesapin/f0abe77e6f323f9068138d5b26084af8. Seems like node3
asks node2
to append the same log entry (idx 152) again and again, but node2
successfully processes this entry and responds to node3
. node3
receives the response but make the same request again?
Just trying to understand what is going on. Why it resolves without any help after 10 minutes? It happens very rarely, maybe it's related to auto_forwarding_
? I'll try to reproduce without it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.