GithubHelp home page GithubHelp logo

xiaomi / rdsn Goto Github PK

View Code? Open in Web Editor NEW
144.0 23.0 58.0 83.16 MB

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/rdsn

License: Other

CMake 1.18% Batchfile 0.02% Shell 1.36% Python 0.45% C 0.43% C++ 95.42% Thrift 1.14%
distributed-service-framework

rdsn's Introduction

Build Status

All pull requests please now go to https://github.com/imzhenyu/rdsn for automatic integration with latest version. We will periodically update this repo. Thank you.

Top Links

  • [Case] RocksDB made replicated using rDSN!
  • [Tutorial] Build a counter service with built-in tools (e.g., codegen, auto-test, fault injection, bug replay, tracing)
  • [Tutorial] Build a scalable and reliable counter service with built-in replication support
  • [Tutorial] Build a perfect failure detector with progressively added system complexity
  • [Tutorial] Plugin my own network implementation for higher performance
  • Installation

Robust Distributed System Nucleus (rDSN) is a framework for quickly building robust distributed systems. It has a microkernel for pluggable components, including applications, distributed frameworks, devops tools, and local runtime/resource providers, enabling their independent development and seamless integration. The project was originally developed for Microsoft Bing, and now has been adopted in production both inside and outside Microsoft.

  • an enhanced event-driven RPC library such as libevent, Thrift, and GRPC
  • a production Paxos framework to quickly turn a local component (e.g., rocksdb) into a online service with replication, partition, failure recovery, and reconfiguration supports
  • a scale-out and fail-over framework for stateless services such as Memcached
  • more as you can imagine.
  • reduced system complexity via microkernel architecture: applications, frameworks (e.g., replication, scale-out, fail-over), local runtime libraries (e.g., network libraries, locks), and tools are all pluggable modules into a microkernel to enable independent development and seamless integration (therefore modules are reusable and transparently benefit each other) rDSN Architecture
  • auto-handled distributed system challenges: built-in frameworks to achieve scalability, reliability, availability, and consistency etc. for the applications rDSN service model
  • transparent tooling support: dedicated tool API for tool development; built-in plugged tools for understanding, testing, debugging, and monitoring the upper applications and frameworks rDSN Architecture
  • late resource binding with global deploy-time view: tailor the module instances and their connections on demand with controllable system complexity and resource mapping (e.g., run all nodes in one simulator for testing, allocate CPU resources appropriately for avoiding resource contention, debug with progressively added system complexity) rDSN Configuration
Distributed frameworks
  • a production Paxos framework to quickly turn a local component (e.g., rocksdb) into an online service with replication, partition, failure recovery, and reconfiguration supports
  • a scale-out and fail-over framework for stateless services such as Memcached
Local runtime libraries
  • network libraries on Linux/Windows supporting rDSN/Thrift/HTTP messages at the same time
  • asynchronous disk IO on Linux/Windows
  • locks, rwlocks, semaphores
  • task queues
  • timer services
  • performance counters
  • loggers (high-perf, screen)
Devops tools
  • nativerun and fastrun enables native deployment on Windows and Linux
  • simulator debugs multiple nodes in one single process without worry about timeout
  • explorer extracts task-level dependencies automatically
  • tracer dumps logs for how requests are processed across tasks/nodes
  • profiler shows detailed task-level performance data (e.g., queue-time, exec-time)
  • fault-injector mimics data center failures to expose bugs early
  • global-checker enables cross-node assertion
  • replayer reproduces the bugs for easier root cause analysis
  • build-in web studio to visualize task-level performance and dependency information
Other distributed providers and libraries
  • remote file copy
  • perfect failure detector
  • multi-master perfect failure detector

License and Support

rDSN is provided on Windows and Linux, with the MIT open source license. You can use the "issues" tab in GitHub to report bugs.

rdsn's People

Contributors

0xfdi avatar 0xflotus avatar acelyc111 avatar andylin-hao avatar capfei avatar chenqshmily avatar empiredan avatar foreverneverer avatar gehafearless avatar giantking avatar glglwty avatar goksyli avatar hycdong avatar imzhenyu avatar levy5307 avatar lishenglong avatar loveheat avatar mcfatealan avatar mentoswang avatar neverchanje avatar qinzuoyan avatar sdxshen avatar shengofsun avatar smityz avatar vagetablechicken avatar xiaotz avatar ykwd avatar zhangyifan27 avatar zhongchaoqiang avatar zjc95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdsn's Issues

Weekly Digest (17 August, 2019 - 24 August, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 4 issues were created.
Of these, 0 issues have been closed and 4 issues are still open.

OPEN ISSUES

💚 #300 fix: use derror rather than dwarn for failed network bootstrap, by neverchanje
💚 #299 feat(split): parent replica prepare states, by hycdong
💚 #298 feat(throttle): support size-based write throttling, by neverchanje
💚 #297 feat(dup): implement duplication_sync on meta server side, by neverchanje


PULL REQUESTS

Last week, 5 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 3 pull requests were updated.
💛 #299 feat(split): parent replica prepare states, by hycdong
💛 #298 feat(throttle): support size-based write throttling, by neverchanje
💛 #297 feat(dup): implement duplication_sync on meta server side, by neverchanje

MERGED PULL REQUEST

Last week, 2 pull requests were merged.
💜 #296 http: improvement on http api, by neverchanje
💜 #291 split: parent replica create child replica, by hycdong


COMMITS

Last week there were 2 commits.
🛠️ split: parent replica create child replica (#291) by hycdong
🛠️ http: improvement on http api (#296) by neverchanje


CONTRIBUTORS

Last week there were 2 contributors.
👤 hycdong
👤 neverchanje


STARGAZERS

Last week there were no stargazers.


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

http server can't handle "Expect : 100-continue"

#290 提供cpu profiling功能时,遇到svg图中显示function address,而没有function name的bug。

目前解决方案为,推荐使用google/pprof,而不使用gperftool提供的pprof(perl脚本)。

根本原因是pprof perl脚本在FetchSymbols时使用了curl命令。
由于FetchSymbols的request带有较大数据,curl在不指定http版本情况下,就在请求中加入了"Expect : 100-continue"。(如果--http1.0限制版本,就不会走这一流程。实测http1.1/2都会走expect 100-continue)
而rdsn的http parser并不支持这一机制。

未来优化rdsn http server时可以考虑支持下。

idl: replace bin/Linux/thrift with official thrift compiler

What we need to do:

  • thirdparty/build-thirdparty.sh: build thrift compiler together while building thrift lib.
  • compile_thrift.py: use the thrift compiler built in thirdparty to generate thrift codes.

Be sure we are able to build the whole project via ./run.sh build.

dynamically balance partitions across disks within a replica server

This feature is necessary when split(#69) is added to our project.

For example:

  1. we have a cluster with 4 nodes
  2. there is a table with 4 partitions in the cluster.
  3. we have 8 disks on each node

Then ideally, each node will have 3 partitions, and these partitions will be placed on 3 disks, with the remaining 5 disks empty.

When the table is split, then partitions on these disks will be (2, 2, 2, 0, 0, 0, 0, 0), which is not balanced.

In addition, we may also need a command to move partitions from one disk to another.

Weekly Digest (6 October, 2019 - 13 October, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 2 issues were created.
Of these, 2 issues have been closed and 0 issues are still open.

CLOSED ISSUES

❤️ #327 fix(backup): delay clearing obsoleted backup when it's still checkpointing, by vagetablechicken
❤️ #326 refactor: remove useless functions from binary_reader, by neverchanje

NOISY ISSUE

🔈 #327 fix(backup): delay clearing obsoleted backup when it's still checkpointing, by vagetablechicken
It received 2 comments.


PULL REQUESTS

Last week, 6 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 1 pull request was updated.
💛 #320 feat(dup): protect private log from missing when duplication is enabled, by neverchanje

MERGED PULL REQUEST

Last week, 5 pull requests were merged.
💜 #327 fix(backup): delay clearing obsoleted backup when it's still checkpointing, by vagetablechicken
💜 #326 refactor: remove useless functions from binary_reader, by neverchanje
💜 #324 refactor: move replay related codes to mutation_log_replay, by neverchanje
💜 #323 refactor: move aio tests out from service_api_c, by neverchanje
💜 #321 feat(http): add http interface for get_app_envs, by levy5307


COMMITS

Last week there were 5 commits.
🛠️ refactor: move aio tests out from service_api_c (#323) by neverchanje
🛠️ refactor: remove useless functions from binary_reader (#326) by neverchanje
🛠️ fix(coldbackup): delay clean request when chkpting (#327) by vagetablechicken
🛠️ feat(http): add http interface for get_app_envs (#321) by levy5307
🛠️ refactor: move replay related codes to mutation_log_replay (#324) by neverchanje


CONTRIBUTORS

Last week there were 3 contributors.
👤 neverchanje
👤 vagetablechicken
👤 levy5307


STARGAZERS

Last week there were no stargazers.


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

don't build shared libraries

currently replica_server、 meta server、Poco、boost are dynamic libraries, we can change it to static libs for better deployment.
if we change all libraries to static libraries, we can link the tcmalloc try to resolve issue #28

bug of shared log's size calculation

The shared log's size is calculated by "global_end_offset - global_start_offset". When all shared log are deleted, the "global_start_offset" will be reset to 0, while the "global_end_offset" will remain unchanged, in which case we will get a uncorrect "size" of shared log.

please see

_global_start_offset =
for details.

Setting "_global_start_offset" to "_global_end_offset" may fix this, but we need to test it.

docs: update the official docs

Because rdsn is originally a project from Microsoft, we didn't hardly modify the documentations since last year when we open sourced the internal fork of rdsn in xiaomi. Currently the xiaomi/rdsn has a large gap ahead of microsoft/rdsn, which is hardly maintained by its author (the latest commit is in Aug, 2017), therefore most of the documents are stale (like windows support).

We need to have the documentations caught up with the current actual state.

Weekly Digest (8 September, 2019 - 15 September, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 4 issues were created.
Of these, 0 issues have been closed and 4 issues are still open.

OPEN ISSUES

💚 #312 feat(dup): implement ship_mutation stage and mutation_batch, by neverchanje
💚 #311 refactor: rename disk_aio to aio_context, by neverchanje
💚 #310 refactor: remove empty_aio_provider and posix aio_provider, by neverchanje
💚 #309 feat(split): child replica learn parent prepare list and checkpoint, by hycdong


PULL REQUESTS

Last week, 5 pull requests were created, updated or merged.

OPEN PULL REQUEST

Last week, 1 pull request was opened.
💚 #312 feat(dup): implement ship_mutation stage and mutation_batch, by neverchanje

UPDATED PULL REQUEST

Last week, 4 pull requests were updated.
💛 #311 refactor: rename disk_aio to aio_context, by neverchanje
💛 #310 refactor: remove empty_aio_provider and posix aio_provider, by neverchanje
💛 #309 feat(split): child replica learn parent prepare list and checkpoint, by hycdong
💛 #302 refactor: introduce mutation_log::replay_block, by neverchanje


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there was 1 stargazer.
netroby
You are the star! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Weekly Digest (1 September, 2019 - 8 September, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 1 issue was created.
It is closed now.

CLOSED ISSUES

❤️ #307 build: use official thrift, cmake set default 3rdlibs, curl without idn, by vagetablechicken

NOISY ISSUE

🔈 #307 build: use official thrift, cmake set default 3rdlibs, curl without idn, by vagetablechicken
It received 1 comments.


PULL REQUESTS

Last week, 6 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 2 pull requests were updated.
💛 #302 refactor: reimplement mutation_log::replay in a block by block way, by neverchanje
💛 #298 feat(throttle): support size-based write throttling, by neverchanje

MERGED PULL REQUEST

Last week, 4 pull requests were merged.
💜 #307 build: use official thrift, cmake set default 3rdlibs, curl without idn, by vagetablechicken
💜 #305 test: add unit tests for task, by neverchanje
💜 #304 feat(dup): add interface mutation_duplicator & duplication procedure, by neverchanje
💜 #299 feat(split): parent replica prepare states, by hycdong


COMMITS

Last week there were 4 commits.
🛠️ feat(split): parent replica prepare states (#299) by hycdong
🛠️ feat(dup): add interface mutation_duplicator & duplication procedure (#304) by neverchanje
🛠️ build: change thrift compiler, set default 3rdlibs (#307) by vagetablechicken
🛠️ add unit tests for task (#305) by neverchanje


CONTRIBUTORS

Last week there were 3 contributors.
👤 hycdong
👤 neverchanje
👤 vagetablechicken


STARGAZERS

Last week there was 1 stargazer.
IngrownMink4
You are the star! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

remove shared log

shared log is mainly used for sequential write to disks, which is excellent design for spin disks or for SSDs when "fsync" is necessary.
however, currently in rdsn, the data are stored in ssds without "fsync", so we can try to remove the shared log for better performance and easier archiecture.

Weekly Digest (29 September, 2019 - 6 October, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 2 issues were created.
Of these, 0 issues have been closed and 2 issues are still open.

OPEN ISSUES

💚 #324 refactor: remove replay related codes to mutation_log_replay, by neverchanje
💚 #323 refactor: rename task::is_empty to is_callback_empty & move aio tests…, by neverchanje


PULL REQUESTS

Last week, 4 pull requests were created, updated or merged.

OPEN PULL REQUEST

Last week, 2 pull requests were opened.
💚 #324 refactor: remove replay related codes to mutation_log_replay, by neverchanje
💚 #323 refactor: rename task::is_empty to is_callback_empty & move aio tests…, by neverchanje

UPDATED PULL REQUEST

Last week, 2 pull requests were updated.
💛 #321 feat(http): add http interface for get_app_envs, by levy5307
💛 #320 feat(dup): protect private log from missing when duplication is enabled, by neverchanje


COMMITS

Last week there were no commits.


CONTRIBUTORS

Last week there were no contributors.


STARGAZERS

Last week there were no stargazers.


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

core in replica::update_local_configuration()

core背景

时间:2018-05-11 01:52
集群:c3srv-stat
节点:10.142.11.54
版本:1.8.0 (dba2265cf29435729fa6d2a1e4e3e22b71b7d74f) Release
core文件:/home/core/core.replica.replica.98100.1525974558
log文件:/home/work/app/pegasus/c3srv-stat/replica/log/log.495.txt

core栈信息

Core was generated by `/home/work/app/pegasus/c3srv-stat/replica/package/bin/pegasus_server config.ini'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fadb38741d7 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fadb38741d7 in raise () from /lib64/libc.so.6
#1  0x00007fadb38758c8 in abort () from /lib64/libc.so.6
#2  0x00007fadb685fd8e in dsn_coredump () at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/service_api_c.cpp:176
#3  0x00007fadb67999d0 in dsn::replication::replica::update_local_configuration (this=this@entry=0x7fac401658d0, config=..., same_ballot=same_ballot@entry=true)
    at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_config.cpp:738
#4  0x00007fadb67f3d5a in dsn::replication::replica::on_add_learner (this=0x7fac401658d0, request=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_learn.cpp:1377
#5  0x00007fadb67687ed in dsn::replication::replica_stub::on_add_learner (this=0x1c36770, request=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_stub.cpp:973
#6  0x00007fadb678406d in bool dsn::serverlet<dsn::replication::replica_stub>::register_rpc_handler<dsn::replication::group_check_request>(dsn::task_code, char const*, void (dsn::replication::replica_stub::*)(dsn::replication::group_check_request const&), dsn::gpid)::{lambda(void*, void*)#1}::_FUN(void*, void*) () at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/include/dsn/cpp/serverlet.h:184
#7  0x00007fadb68ba157 in run (req=<optimized out>, this=0x7fac640069b0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/include/dsn/tool-api/task.h:249
#8  dsn::rpc_request_task::exec (this=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/include/dsn/tool-api/task.h:276
#9  0x00007fadb68b95b9 in dsn::task::exec_internal (this=this@entry=0x7fa253c2ac44) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:195
#10 0x00007fadb694a8ed in dsn::task_worker::loop (this=0x1ba5b00) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:323
#11 0x00007fadb694aab9 in dsn::task_worker::run_internal (this=0x1ba5b00) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:302
#12 0x00007fadb41cc600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#13 0x00007fadb4a45dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fadb393673d in clone () from /lib64/libc.so.6
(gdb) f 4
#4  0x00007fadb67f3d5a in dsn::replication::replica::on_add_learner (this=0x7fac401658d0, request=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_learn.cpp:1377
1377	in /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_learn.cpp
(gdb) p this._config
$4 = {_vptr.replica_configuration = 0x7fadb6c46570 <vtable for dsn::replication::replica_configuration+16>, pid = {_value = {u = {app_id = 8, partition_index = 5}, value = 21474836488}}, ballot = 27, 
  primary = {_addr = {u = {v4 = {type = 1, padding = 0, port = 43801, ip = 177081397}, uri = {type = 1, uri = 190139702928883712}, group = {type = 1, group = 190139702928883712}, 
        value = 760558811715534849}}}, status = dsn::replication::partition_status::PS_POTENTIAL_SECONDARY, learner_signature = 115964116994, __isset = {pid = true, ballot = true, primary = true, 
    status = true, learner_signature = true}}

代码@replica_config.cpp:738:

dassert(false, "invalid execution path");

replica closing makes invalid access to _app

原因

replica因为error需要关闭,在replica::close()中会close _app。
但是其他线程(gc/manual_compact/checkpoint)可能存取_app,然后出core

core栈

(gdb) bt
#0  dsn::replication::replica::last_durable_decree (this=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica.cpp:357
#1  0x00007f0215f1760f in dsn::replication::replica_stub::on_gc (this=0x1cd9a70) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_stub.cpp:1488
#2  0x00007f021606c28e in dsn::timer_task::exec (this=0x7f00c0006140) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:479
#3  0x00007f021606d199 in dsn::task::exec_internal (this=this@entry=0x7f00c0006140) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:195
#4  0x00007f02160ff38d in dsn::task_worker::loop (this=0x1c2e320) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:323
#5  0x00007f02160ff559 in dsn::task_worker::run_internal (this=0x1c2e320) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:302
#6  0x00007f0213967600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#7  0x00007f02141e0dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f02130d173d in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00007f0215f1760f in dsn::replication::replica_stub::on_gc (this=0x1cd9a70) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_stub.cpp:1488
1488	in /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_stub.cpp
(gdb) p r._obj._config
$5 = {_vptr.replica_configuration = 0x7f0216403ed0 <vtable for dsn::replication::replica_configuration+16>, pid = {_value = {u = {app_id = 2, partition_index = 3124}, value = 13417477832706}}, 
  ballot = 4, primary = {static s_invalid_address = {static s_invalid_address = <same as static member of an already seen type>, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, 
          uri = 0}, group = {type = 0, group = 0}, value = 0}}, _addr = {v4 = {type = 1, padding = 0, port = 34801, ip = 177090572}, uri = {type = 1, uri = 190149554362662912}, group = {type = 1, 
        group = 190149554362662912}, value = 760598217450651649}}, status = dsn::replication::partition_status::PS_ERROR, learner_signature = 17179869186, __isset = {pid = true, ballot = true, 
    primary = true, status = true, learner_signature = true}}
(gdb)

相关日志:

D2018-05-15 07:08:35.530 (1526339315530452861 123eb) replica.rep_long4.0405000208ad6b5f: replica_learn.cpp:936:on_copy_remote_state_completed(): [email protected]:34801: on_copy_remote_state_completed[0000000400000002]: learnee = 10.142.48.12:34801, learn_duration = 292299 ms, copy remote state done, err = ERR_TIMEOUT, copy_file_count = 2, copy_file_size = 0, copy_time_used = 29260 ms, local_committed_decree = 2871353, app_committed_decree = 2871353, app_durable_decree = 2869256, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
D2018-05-15 07:08:35.530 (1526339315530511263 123b9) replica.replica2.040700040000bee7: replica_learn.cpp:1147:on_learn_remote_state_completed(): [email protected]:34801: on_learn_remote_state_completed[0000000400000002]: learnee = 10.142.48.12:34801, learn_duration = 292299 ms, err = ERR_TIMEOUT, local_committed_decree = 2871353, app_committed_decree = 2871353, app_durable_decree = 2869256, current_learning_status = replication::learner_status::LearningWithoutPrepare
E2018-05-15 07:08:35.530 (1526339315530524828 123b9) replica.replica2.040700040000bee7: replica_learn.cpp:1170:handle_learning_error(): [email protected]:34801: handle_learning_error[0000000400000002]: learnee = 10.142.48.12:34801, learn_duration = 292299 ms, err = ERR_TIMEOUT, local_error
D2018-05-15 07:08:35.530 (1526339315530639621 123b9) replica.replica2.040700040000bee7: replica_config.cpp:900:update_local_configuration(): [email protected]:34801: status change replication::partition_status::PS_POTENTIAL_SECONDARY @ 4 => replication::partition_status::PS_ERROR @ 4, pre(2871353, 2871353), app(2871353, 2869256), duration = 292300 ms, replica_configuration(pid=2.3124, ballot=4, primary=10.142.48.12:34801, status=2, learner_signature=17179869186)
D2018-05-15 07:08:35.530 (1526339315530650745 123b9) replica.replica2.040700040000bee7: replica_config.cpp:909:update_local_configuration(): [email protected]:34801: being close ...

json decode problem

用当前版本 9377c4c 的Shell,访问老版本(<=1.8.1)集群:

>>>app_disk test -d
[Parameters]
app_name: test
detailed: true

[Result]
ERROR: decode perf counter info from node 10.132.42.1:35801 failed, result = {"result":"OK","timestamp":1528639103,"timestamp_str":"2018-06-10 21:58:23","counters":[{"name":"replica*app.pegasus*disk.storage.sst(MB)@1.13","type":"NUMBER","value":0},{"name":"replica*app.pegasus*disk.storage.sst(MB)@1.2","type":"NUMBER","value":0},{"name":"replica*app.pegasus*disk.storage.sst(MB)@1.3","type":"NUMBER","value":0},{"name":"replica*app.pegasus*[email protected]","type":"NUMBER","value":9},{"name":"replica*app.pegasus*[email protected]","type":"NUMBER","value":10},{"name":"replica*app.pegasus*[email protected]","type":"NUMBER","value":2}]}

partition split

partition split

最主要的考虑点

split的动作,在primary和secondary之间最好在同样的decree点之后发生。这样,对于某条写请求,可以明确的界定是由parent分片负责还是child分片负责。

如果没有规定一个固定的decree点,某个请求在不同副本间可能会由不同的partition负责,会有一些胡乱。

一种思路:2pc split

如果把split动作做成一个mutation, 走一致性协议。那么究竟在哪个点进行split,副本间就可以达到统一的共识了。事实上一致性协议就是干这件事情的,它本来就是用来保证主从之间的“复制状态机”的,如果把split也作为一种主从间的状态变化的话,这种做法是非常自然的。

mutation重放的问题

  1. 对于一条已经commit的split message 我们只要强制要求这条message必须flush/durable就行。这样replica在重启的时候是不会重放这条mutation的。

  2. 如果replica group发生了configuration变化,mutation也不会发生重放。这是我们的一致性协议保证的:

    • 一个没有commit的mutation, 只会把ballot变一下然后重新2pc, 而把原来的message给覆盖掉。
    • 一个已经commit了的message, 不会因为replica group的变化而重放。
  3. 退一步讲:果给每一条split的mutation指定好partition version,也就是说“本split希望partition count是从几变到几”,那么即使一个mutation要执行两次,貌似也不会有影响?因为我们可以在apply这条mutation的时候判断这个split还有没有必要做。

    而且由于一致性协议的保证,这条mutation即便进行了重放,在三个副本之间的行为也是一模一样的。

primary如何apply "split mutation"

  1. 首先像所有的write mutation一样,先prepare,等所有的副本都prepare之后,primary就可以commit了。

  2. commit时候,replica要:

    • 初始化一个新的空replica
    • clone一份storage_engine出来; 如果是rocksdb, 需要打hardlink
    • clone一份private log出来
    • clone replica相关的所有数据结构:replica config, primary context, secondary context,cold_backup_context, replica_config
    • 对于context里面涉及到的一些task需要做一些更为细致的讨论。对于常规任务类型的task,需要重新启动一个新的task: 比如各种timer, 以及冷备的rpc,热备的任务等等。其他的一些任务则需要把task置空

    replica的commit过程就有点像进程的fork:复制内存镜像,对内核的数据结构做修改;我们需要做的是复制数据,对各种控制行为做修改。

    可能这里会觉得commit的过程过于繁琐。但不管怎样的split方案,一个replica一分为二后,里面所对应的各种控制行为该怎么变化都是需要细致的考虑的。

    这也是我喜欢2pc的原因,精确的相同decree点上的split, 能让我们更好的对“split前后parent、child”的行为做思考和控制。

  3. primary在commit之后,split结束。

secondary如何applit "split mutation"

secondary的split过程和primary基本相同。但由于是2pc, primary在split结束后,secondary还没有split。

只有等新的一条mutation后发出后,才会推动secondary的commit。这条mutattion可能是一条普通的写,也可能是一条group check。但这条mutation一定得是parent发出的一条mutation; 如果是child, 它会发现其他副本还不存在这个partition而被拒绝掉,这里我们需要特殊处理下:

  • 在prepare阶段,如果因为partition version太低而发生的partition不存在,primary应该delay重试,而不是踢掉secondary。

split时候的一些其他问题

  1. 需要修改下clientlet, 让child partition标志为还没有被任何线程access过,因为parent和child可能会被分派到不同的线程池

split的时候发生replica group变化

因为replica的所有状态变化都是线性的,所以这里可以明确的讨论:

  • 变化发生在split之前,这个变化和split没有关系
  • 变化发生在split之后。
    • 首先:split已经commit, 所以重放是不会发生的。
    • 其次,如果一个secondary升级成primary。那么就需要考虑这条mutation在这个secondary这里有没有提交。提交了就没啥影响;如果没有提交,就会因为提交而secondary也发生split。

split之前的learner和split之后的primary

设想一种场景:learner的decree在split之前,而primary已经是split之后了;在这种情况下,如果简单的learn会导致learner面对"split的mutation"无所适从。

这里我们可以简单处理:如果partition的version不一致,强制要求"learn from scratch"。

另一种方式是跳过split的控制命令。

和meta的交互

meta在下发split消息的时候,可以通过config sync,也可以通过单独的命令。

当primary在完成split的时候,需要通知meta说split已经完成,这时meta会更新remote-storage, 把parent和child的partition configuration都更新一下,在此之前,child的partition configuration为空。

config-sync的时候,如果因为partition version不一致而导致replica-server上缺乏一些partition,可以暂时先忽略掉。因为split的mutation是走过两阶段提交的,这条mutation一定会发生,而不会丢失。

和client的交互

和之前的设计应该可以维持一致。

优化

  1. replica clone的时候阻塞write线程池的问题:应该可以拿到后台线程做的。
  2. split的时候不停写:只要保证split后面的东西暂时不commit就可以了。
    • 可以放宽读写一致性的要求:在三副本prepare完了之后就立马回复client端成功(目前实现一下可能要改接口)

另一种思路:控制命令式split(TBD)

大概想了下:split如果没有同步点,在prepare的时候会比较混乱。如果也走这种“通知secondary-等待回复”的流程,其实和走2pc也差不多。

另外需要考虑下,primary split完成、通知meta修改partition config,primary挂了之后的一些情况

MetaServer对DDD状态partition的处理方式的讨论

背景

在线上偶尔会出现下面的一种情况:

  1. 集群正常,假设某个Partition的Config如下(第一个为Primary):
    Config[C,B,A], LastDrop[]
  2. 某个ReplicaServer节点A挂掉(由于磁盘满、坏盘、宕机等原因),由于replica_assign_delay_ms_for_dropouts的限制,此时MetaServer并不会立即补上缺失的副本:
    Config[C,B], LastDrop[A]
  3. 另外一个ReplicaServer节点B也挂掉(可能由于磁盘满、learn出core等原因),此时进入单备份状态,MetaServer会立即到新的节点上add secondary。如果replica的个数较多或者数据量比较大,learn的过程可能比较慢,在一段时间内,该partition只存在一个副本:
    Config[C], LastDrop[A,B]
  4. 节点A重启成功,ReplicaServer恢复,补上一个副本:
    Config[C,A], LastDrop[B]
  5. 此时如果Stop集群,由于Stop集群的先后顺序不确定,可能最后MetaServer记录到Zookeeper上的config为:
    Config[], LastDrop[B,A,C]
  6. 由于某些原因(可能数据坏了),需要踢掉节点A,整个集群重启时不启动节点A,这时该partition进入DDD状态,但是由于LastDrop的最后两个节点是A和C。根据现在的策略,只有LastDrop最后的两个节点都恢复了,才能选出一个Primary。由于A无法启动,就会出现“last dropped node A haven't come back yet”的错误,需要运维人工干预。

以上只是一种示例情况。实际上,只要某个partition进入DDD状态,且LastDrop的最后两个节点中有一个节点无法正常启动,就会进入需要人工干预的DDD状态。而在线上集群多个节点的起起停停过程中,这种情况是很容易出现的。

如果这样的partition很多,人工干预就是一个很大的工作量,真的通过人工操作来一个个恢复是不现实的。所以我们要思考,是否可以将这种情况的恢复过程自动化

如果LastDrop最后两个节点中,有一个恢复正常,另外一个是要踢掉的节点,实际上就可以自动来恢复,譬如直接选择恢复正常的那个节点作为Primary。但是我们需要证明,这样的选择确定不会丢数据。

或者退一步,即使不完全自动化,如果在Shell中提供一个自动诊断工具,将处于需要人工干预的DDD状态的partition信息展示出来,给出运维干预的建议,然后让用户确认,选择出合适的方案,一方面能够大大降低运维工作了,另一方面也能保证数据的正确性。

rdsn perf-counter lib 重构

需要解决的问题:

1. remote commands

现在与 perf counter 相关的 remote commands 分布在 pegasus_counter_updater 和 perf_counters 两个类中, 如果不整理代码新人完全无法了解这点.

2. perf-counter 库的代码分布在 pegasus 项目和 rdsn 项目内多处

我们可以看看其相关的代码是如何散布的:

  • pegasus/src/pegasus_counter_updater.h
  • rdsn/src/core/tools/common
  • rdsn/src/core/core/perf_counters.cpp
  • rdsn/include/tool-api/perf_counters.h

作为新人很难从乱麻中找到具体调用路径. 我们需要将监控相关代码统一放到一个路径下.

3. pegasus_io_service 和 shared_io_service:

由于最初开发者不擅长使用 rdsn 的 task 模型, 故使用 boost io_service 编写定时器代码定期发请求给 falcon agent, 还有定期更新 percentile_value.

4. 一个 perf counter 接口, 该死的 4 个 perf counter 实现...

  • simple_perf_counter_v2_atomic (src/core/tools/common)
  • simple_perf_counter_v2_fast (src/core/tools/common)
  • simple_perf_counter (src/core/tools/common)
  • pegasus_perf_counter (pegasus/src/server) 这也是我们当前使用的实现

这四个实现其实没有让我们 tune 的空间, 因为除了 pegasus_perf_counter, 其他都有各种问题. 留下四个实现反而导致代码过于难懂.

5. 一些代码耦合问题, 包括 pegasus_counter_updater 类负责事情过多.

它具体负责了:

  • 使用 libevent 进行 http 通信.
  • 构建 falcon 请求
  • 发送请求给指定 falcon agent
  • 监控内存使用 mem_stats (鬼知道为什么 pegasus_counter_updater 会负责监控内存)
  • 更新 brief server stats

这个类负责有太多工作, 不是好的软件设计.

6. 通用 perf counter 类型

当前定义 perfcounter 时, 代码上看不出具体使用的类型是什么:

     ::dsn::perf_counter_wrapper _pfc_scan_qps;

当然我们在变量名上可以知道这应该是个 counter 类型, 然而真正去看初始化代码, 我们才知道它是 COUNTER_TYPE_RATE 类型

    _pfc_scan_qps.init_app_counter(
        "app.pegasus", buf, COUNTER_TYPE_RATE, "statistic the qps of SCAN request");

我们很难向新人解释什么是 counter rate 类型. 一种方法是我们编写文档, 一种方法是我们从一开始就选择业内通用命名:

Meters

A meter measures the rate of events over time (e.g., “requests per second”).

core in dsn::thrift_message_parser::parse_message

time: 20190717 11:39:15
cluster: tjwqtst-staging
node: tj1-hadoop-pegasus-tst-ts09
version: 1.11.5
CALL [replica-server] [10.38.162.236:31801] succeed: Pegasus Server 1.11.5 (ba0661d17a96143164d7a0a5c17bb88c0c1dd44d) Release, Started at 2019-07-17 11:39:15
core file: core.replica.asio.3.120767.1563334709
code:
https://github.com/XiaoMi/rdsn/blob/709ea4117fd31b2bd2788dddb1b41f94e8307210/src/core/tools/common/thrift_message_parser.cpp
call stack:

#0  0x00007f19d0d7d1d7 in raise () from /lib64/libc.so.6
#1  0x00007f19d0d7e8c8 in abort () from /lib64/libc.so.6
#2  0x00007f19d48b743e in dsn_coredump () at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:73
#3  0x00007f19d493e843 in dsn::thrift_message_parser::parse_message (thrift_header=..., message_data=...)
    at /home/wutao1/pegasus-release/rdsn/src/core/tools/common/thrift_message_parser.cpp:275
#4  0x00007f19d493eb13 in dsn::thrift_message_parser::get_message_on_receive (this=0x50a07ee10, reader=0x59bc4a80, read_next=@0x7f19bc46e04c: 4096)
    at /home/wutao1/pegasus-release/rdsn/src/core/tools/common/thrift_message_parser.cpp:72
#5  0x00007f19d495a5ff in operator() (length=<optimized out>, __closure=0x7f19bc46e090, ec=...) at /home/wutao1/pegasus-release/rdsn/src/core/tools/common/asio_rpc_session.cpp:114
#6  operator() (this=0x7f19bc46e090) at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/bind_handler.hpp:127
#7  asio_handler_invoke<boost::asio::detail::binder2<dsn::tools::asio_rpc_session::do_read(int)::__lambda2, boost::system::error_code, long unsigned int> > (function=...)
    at /home/wutao1/boost_1_58_0/output/include/boost/asio/handler_invoke_hook.hpp:69
#8  invoke<boost::asio::detail::binder2<dsn::tools::asio_rpc_session::do_read(int)::__lambda2, boost::system::error_code, long unsigned int>, dsn::tools::asio_rpc_session::do_read(int)::__lambda2> (context=..., function=...) at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#9  boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, dsn::tools::asio_rpc_session::do_read(int)::__lambda2>::do_complete(boost::asio::detail::io_service_impl *, boost::asio::detail::operation *, const boost::system::error_code &, std::size_t) (owner=<optimized out>, base=<optimized out>)
    at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/reactive_socket_recv_op.hpp:110
#10 0x000000000074fec9 in complete (bytes_transferred=<optimized out>, ec=..., owner=..., this=<optimized out>)
    at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/task_io_service_operation.hpp:38
#11 do_run_one (ec=..., this_thread=..., lock=..., this=0x29c4620) at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/impl/task_io_service.ipp:372
#12 boost::asio::detail::task_io_service::run (this=0x29c4620, ec=...) at /home/wutao1/boost_1_58_0/output/include/boost/asio/detail/impl/task_io_service.ipp:149
#13 0x00007f19d4952cc6 in run (this=<optimized out>, ec=...) at /home/wutao1/boost_1_58_0/output/include/boost/asio/impl/io_service.ipp:66
#14 operator() (__closure=0x299d930) at /home/wutao1/pegasus-release/rdsn/src/core/tools/common/asio_net_provider.cpp:73
#15 _M_invoke<> (this=0x299d930) at /home/wutao1/app/include/c++/4.8.2/functional:1732
#16 operator() (this=0x299d930) at /home/wutao1/app/include/c++/4.8.2/functional:1720
#17 std::thread::_Impl<std::_Bind_simple<dsn::tools::asio_network_provider::start(dsn::rpc_channel, int, bool)::__lambda2()> >::_M_run(void) (this=0x299d918)
    at /home/wutao1/app/include/c++/4.8.2/thread:115
#18 0x00007f19d16d5600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#19 0x00007f19d2348dc5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f19d0e3f73d in clone () from /lib64/libc.so.6

代码重构与化简

完成

  • layer2 app model: d1df52f
  • c api of perf counter: a55bdca
  • c api of CLI: 87194db
  • c api of layer1/global checker; dmodule: d8ac3cf
  • thread pool code
  • error code
  • task code
  • gpid
  • buid process
  • rpc address: #4
  • memory allocator: #25
  • rpc: #27
  • task tracker: #41
  • task: #48
  • future task: #53
  • configuration: #54
  • rename: #57
  • memory provider: #60
  • io-per-queue: #62 #93
  • rpc session: #116
  • fastrun: #117
  • perf counter: #119
  • clientlet: #121
  • test suites: #127
  • perf counter: #134
  • nfs: #142
  • file system: #148
  • rpc message: #151
  • partition resolver: #205

TBD

  • 将cli的接口从core中挪到dist中 (#156)
  • 移除serialization中对protobuf的支持,从而可以删除所有的xxx.types.h的头文件,只保留_types.h即可
    (#161, #164)
  • 移除 tools/repli (#162)
  • 拆出随机数的接口到utility (#163)
  • 拆出clock的接口到utility
  • 将logger并到utility,从而使得utility可以成为脱离rdsn的库
  • 一些staled feature的删除:start_delay、rpc_message中的local_hash
  • 彻底删除memory provider,换成tcmalloc
  • 将task、rpc、file的c接口拆成concurrent、rpc、aio的三个模块,从而解除这三个模块间的耦合
  • 将uri resolver从rpc engine中挪出,放到client库中
  • 移除对group address的支持,把rpc engine变成通用、易理解的rpc库
  • 整理bootstrap流程。至少将创建线程池的功能从配置文件挪到源代码,努力把task做成通用的线程池库。
  • 合并failure_detector和failure_detector_multimaster
  • 整理register_component_provider的机制,把注册放到对应的模块中,而不要堆到tool_api.h下。
  • 测试性能影响,可能的话删除shared log
  • 整理last_durable_decree、last_committed_decree、last_flushed_decree
  • 把删表时候添加的各种奇怪的work around整理清楚
  • 将simple kv和rocksdb分别作为storage_engine放到一个公共的目录中,推动pegasus和rDSN的合并。pegasus和rdsn的分离,对于项目的开发以及pr都是不太有利的。
  • 整理C++ client:争取在异步接口层能用到rdsn的task机制;开启条件编译,减小client的库大小;可能的话利用SWIG来统一生成所有的客户端SDK。或者将 C++ client 整理为简单的 C API,可以支持其他语言如 php。
  • 整理测试,把测试和逻辑代码位置放的近一些。
  • 各种satinizer的引入,valgrind的接入,代码覆盖率工具的接入
  • 整理thrift接口,按meta_server, replica_server, base等这些方式区分开
  • 使用 rpc_holder 简化相关的逻辑代码
  • 整理构建过程,让 thrift 的编译生成结果放在 builder 下而不是作为源文件
  • 直接编译官方 thrift compiler 而非下载魔改版本的 compiler。
  • 将 c++ 的 gpid 改名为 gpid_t
  • rocksdb 减少侵入式修改
  • 把 simple_logger 用 https://github.com/gabime/spdlog 代替。

bug(rpc): server core when close socket in thrift_message_parser::get_message_on_receive

core

cluster: tjwqtst-staging
node: 10.38.162.227
version: Pegasus Server 1.9.0 (972fc0148f89935e828181a7a3cd6b0c69150467) Release

log

/home/work/app/pegasus/tjwqtst-staging/replica/log/log.502.txt

D2018-06-28 23:41:28.254 (1530200488254061413 e6f2) replica.io-thrd.59122: network.cpp:619:on_server_session_accepted(): server session accepted, remote_client = 10.38.166.13:26882, current_count = 777
E2018-06-28 23:41:28.254 (1530200488254075072 e6ef) replica.io-thrd.59119: network.cpp:244:prepare_parser(): invalid header type, remote_client = 10.38.166.13:26882, header_type = '\FF\00\00('
E2018-06-28 23:41:28.254 (1530200488254165882 e6ef) replica.io-thrd.59119: asio_rpc_session.cpp:128:operator()(): asio read from 10.38.166.13:26882 failed
D2018-06-28 23:41:28.254 (1530200488254175468 e6ef) replica.io-thrd.59119: network.cpp:639:on_server_session_disconnected(): server session disconnected, remote_client = 10.38.166.13:26882, current_count = 776
D2018-06-28 23:41:28.254 (1530200488254583273 e6f0) replica.io-thrd.59120: network.cpp:619:on_server_session_accepted(): server session accepted, remote_client = 10.38.166.13:26883, current_count = 777
E2018-06-28 23:41:28.254 (1530200488254598616 e6ef) replica.io-thrd.59119: network.cpp:244:prepare_parser(): invalid header type, remote_client = 10.38.166.13:26883, header_type = '\00\1E\00\06'
E2018-06-28 23:41:28.254 (1530200488254618606 e6ef) replica.io-thrd.59119: asio_rpc_session.cpp:128:operator()(): asio read from 10.38.166.13:26883 failed
D2018-06-28 23:41:28.254 (1530200488254625148 e6ef) replica.io-thrd.59119: network.cpp:639:on_server_session_disconnected(): server session disconnected, remote_client = 10.38.166.13:26883, current_count = 776
... ... 
D2018-06-28 23:41:31.278 (1530200491278023227 e6f2) replica.io-thrd.59122: asio_rpc_session.cpp:102:operator()(): asio read from 10.118.45.2:49038 failed: End of file
D2018-06-28 23:41:31.278 (1530200491278032958 e6f2) replica.io-thrd.59122: network.cpp:639:on_server_session_disconnected(): server session disconnected, remote_client = 10.118.45.2:49038, current_count = 823
W2018-06-28 23:41:31.278 (1530200491278143007 e6f2) replica.io-thrd.59122: asio_rpc_session.cpp:189:safe_close(): asio socket shutdown failed, error = Bad file descriptor
W2018-06-28 23:41:31.278 (1530200491278148003 e6f2) replica.io-thrd.59122: asio_rpc_session.cpp:192:safe_close(): asio socket close failed, error = Bad file descriptor

core

gdb /home/work/app/pegasus/tjwqtst-staging/replica/package/bin/replica_server /home/core/core.replica.asio.3.59078.1530200491

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/work/app/pegasus/tjwqtst-staging/replica/package/bin/pegasus_server confi'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007ffab56dd1d7 in raise () from /lib64/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007ffab56dd1d7 in raise () from /lib64/libc.so.6
#1  0x00007ffab56de8c8 in abort () from /lib64/libc.so.6
#2  0x00007ffab64af9f5 in tcmalloc::Log (mode=mode@entry=tcmalloc::kCrash, filename=filename@entry=0x7ffab64c5cee "src/tcmalloc.cc", line=line@entry=332, a=..., b=..., c=..., d=...)
    at src/internal_logging.cc:118
#3  0x00007ffab64a4564 in (anonymous namespace)::InvalidFree (ptr=<optimized out>) at src/tcmalloc.cc:332
#4  0x00007ffab8c2497d in weak_release (this=0xffffffff) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:160
#5  release (this=0xffffffff) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:147
#6  ~shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/detail/shared_count.hpp:443
#7  ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/shared_ptr.hpp:323
#8  dsn::thrift_message_parser::parse_message (thrift_header=..., message_data=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/thrift_message_parser.cpp:255
#9  0x00007ffab8c24cc7 in dsn::thrift_message_parser::get_message_on_receive (this=0x32b94d9a0, reader=0x10275da10, read_next=@0x7ffaa0dce0cc: 4096)
    at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/thrift_message_parser.cpp:72
#10 0x00007ffab8c2eb47 in operator() (length=<optimized out>, __closure=0x7ffaa0dce110, ec=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/asio_rpc_session.cpp:119
#11 operator() (this=0x7ffaa0dce110) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/bind_handler.hpp:127
#12 asio_handler_invoke<boost::asio::detail::binder2<dsn::tools::asio_rpc_session::do_read(int)::__lambda1, boost::system::error_code, long unsigned int> > (function=...)
    at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/handler_invoke_hook.hpp:69
#13 invoke<boost::asio::detail::binder2<dsn::tools::asio_rpc_session::do_read(int)::__lambda1, boost::system::error_code, long unsigned int>, dsn::tools::asio_rpc_session::do_read(int)::__lambda1> (
    context=..., function=...) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#14 boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, dsn::tools::asio_rpc_session::do_read(int)::__lambda1>::do_complete(boost::asio::detail::io_service_impl *, boost::asio::detail::operation *, const boost::system::error_code &, std::size_t) (owner=<optimized out>, base=<optimized out>)
    at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/reactive_socket_recv_op.hpp:110
#15 0x0000000000584540 in complete (bytes_transferred=<optimized out>, ec=..., owner=..., this=<optimized out>)
    at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/task_io_service_operation.hpp:38
#16 do_run_one (ec=..., this_thread=..., lock=..., this=0x2ccc7e0) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/impl/task_io_service.ipp:372
#17 boost::asio::detail::task_io_service::run (this=0x2ccc7e0, ec=...) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/detail/impl/task_io_service.ipp:149
#18 0x00007ffab8c13bef in run (this=0x2c9f9c8) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/asio/impl/io_service.ipp:59
#19 operator() (__closure=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/asio_net_provider.cpp:69
#20 _M_invoke<> (this=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/functional:1732
#21 operator() (this=<optimized out>) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/functional:1720
#22 std::thread::_Impl<std::_Bind_simple<dsn::tools::asio_network_provider::start(dsn::rpc_channel, int, bool, dsn::io_modifer&)::__lambda1()> >::_M_run(void) (this=<optimized out>)
    at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/thread:115
#23 0x00007ffab6035600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#24 0x00007ffab6ca2dc5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007ffab579f73d in clone () from /lib64/libc.so.6
(gdb) 
(gdb) f 8
#8  dsn::thrift_message_parser::parse_message (thrift_header=..., message_data=...) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/thrift_message_parser.cpp:255
255	/home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/tools/common/thrift_message_parser.cpp: No such file or directory.
(gdb) p dsn_hdr
$1 = (dsn::message_header *) 0x73a23f844
(gdb) p *dsn_hdr
$2 = {hdr_type = 1413892180, hdr_version = 0, hdr_length = 192, hdr_crc32 = 0, body_length = 79, body_crc32 = 0, id = 26, trace_id = 0, rpc_name = "RPC_RRDB_RRDB_GET", '\000' <repeats 30 times>, 
  rpc_code = {local_code = 0, local_hash = 0}, gpid = {_value = {u = {app_id = 49, partition_index = 1}, value = 4294967345}}, context = {u = {is_request = 1, is_forwarded = 0, unused = 0, 
      serialize_format = 1, is_forward_supported = 0, parameter_type = 0, parameter = 0}, context = 65}, from_address = {static s_invalid_address = {
      static s_invalid_address = <same as static member of an already seen type>, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, 
        value = 0}}, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, value = 0}}, client = {timeout_ms = 0, thread_hash = 388032, 
    partition_hash = 0}, server = {error_name = '\000' <repeats 47 times>, error_code = {local_code = 0, local_hash = 0}}}
(gdb) f 6
#6  ~shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/detail/shared_count.hpp:443
443	/home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/detail/shared_count.hpp: No such file or directory.
(gdb) p this
$3 = (boost::detail::shared_count * const) <synthetic pointer>
(gdb) p *this
$4 = {pi_ = 0xffffffff}
(gdb) f 7
#7  ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/shared_ptr.hpp:323
323	/home/work/qinzuoyan/software/boost_1_58_0/output/include/boost/smart_ptr/shared_ptr.hpp: No such file or directory.
(gdb) p this
$5 = (boost::shared_ptr<dsn::binary_reader_transport> * const) <synthetic pointer>
(gdb) p *this
$6 = {px = <optimized out>, pn = {pi_ = 0xffffffff}}
(gdb) p stream
$8 = {<dsn::binary_reader> = {_vptr.binary_reader = 0x9014d0 <vtable for dsn::rpc_read_stream+16>, _blob = {_holder = {<std::__shared_ptr<char, (__gnu_cxx::_Lock_policy)2>> = {
          _M_ptr = 0x6e1402000 "THFT", _M_refcount = {_M_pi = 0x18c4eafc0}}, <No data fields>}, _buffer = 0x6e1402000 "THFT", _data = 0x6e1402d16 "\200\001", _length = 79}, _size = 79, 
    _ptr = 0x6e1402d33 "\f", _remaining_size = 50}, _msg = 0x73a23f780}
(gdb) p stream._msg
$9 = (dsn_message_t) 0x73a23f780
(gdb) p ('dsn::message_ex'*)stream._msg
$10 = (dsn::message_ex *) 0x73a23f780
(gdb) p *('dsn::message_ex'*)stream._msg
$11 = {<dsn::ref_counter> = {_vptr.ref_counter = 0x7ffab8f165b0 <vtable for dsn::message_ex+16>, _magic = 3735928559, _counter = {<std::__atomic_base<long>> = {
        _M_i = 0}, <No data fields>}}, <dsn::extensible_object<dsn::message_ex, 4>> = {<dsn::extensible> = {_ptr = 0x73a23f7a8, _count = 4}, static INVALID_SLOT = <optimized out>, 
    static INVALID_VALUE = <optimized out>, _extensions = {0, 0, 0, 0}, static s_extensionDeletors = {0x0, 0x0, 0x0, 0x0}, static s_nextExtensionIndex = {<std::__atomic_base<unsigned int>> = {
        _M_i = 1}, <No data fields>}}, <dsn::callocator_object<dsn::tls_trans_malloc, dsn::tls_trans_free>> = {<No data fields>}, header = 0x73a23f844, 
  buffers = {<std::_Vector_base<dsn::blob, std::allocator<dsn::blob> >> = {_M_impl = {<std::allocator<dsn::blob>> = {<__gnu_cxx::new_allocator<dsn::blob>> = {<No data fields>}, <No data fields>}, 
        _M_start = 0x1b073f540, _M_finish = 0x1b073f590, _M_end_of_storage = 0x1b073f590}}, <No data fields>}, io_session = {_obj = 0x0}, to_address = {static s_invalid_address = {
      static s_invalid_address = <same as static member of an already seen type>, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, 
        value = 0}}, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, value = 0}}, server_address = {static s_invalid_address = {
      static s_invalid_address = <same as static member of an already seen type>, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, 
        value = 0}}, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, value = 0}}, local_rpc_code = {_internal_code = 0}, hdr_format = {
    _internal_code = 0}, send_retry_count = 0, dl = {_next = 0x73a23f810, _prev = 0x73a23f810}, static _id = {<std::__atomic_base<unsigned long>> = {_M_i = 63889502}, <No data fields>}, _rw_index = 1, 
  _rw_offset = 0, _rw_committed = false, _is_read = true, static s_local_hash = 0}
(gdb) p *(('dsn::message_ex'*)stream._msg).header
$12 = {hdr_type = 1413892180, hdr_version = 0, hdr_length = 192, hdr_crc32 = 0, body_length = 79, body_crc32 = 0, id = 26, trace_id = 0, rpc_name = "RPC_RRDB_RRDB_GET", '\000' <repeats 30 times>, 
  rpc_code = {local_code = 0, local_hash = 0}, gpid = {_value = {u = {app_id = 49, partition_index = 1}, value = 4294967345}}, context = {u = {is_request = 1, is_forwarded = 0, unused = 0, 
      serialize_format = 1, is_forward_supported = 0, parameter_type = 0, parameter = 0}, context = 65}, from_address = {static s_invalid_address = {
      static s_invalid_address = <same as static member of an already seen type>, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, 
        value = 0}}, _addr = {v4 = {type = 0, padding = 0, port = 0, ip = 0}, uri = {type = 0, uri = 0}, group = {type = 0, group = 0}, value = 0}}, client = {timeout_ms = 0, thread_hash = 388032, 
    partition_hash = 0}, server = {error_name = '\000' <repeats 47 times>, error_code = {local_code = 0, local_hash = 0}}}
(gdb) 

看起来像是thrift_message_parser::parse_message()中trans_ptr的内存被写坏了

improve load balance

  • consider balance of placing replicas on ssd: do not place too many replicas of one table on one ssd
  • consider balance of dispatching replication thread: do not dispatch too many replicas of on table on one replication thread

Weekly Digest (15 September, 2019 - 22 September, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 4 issues were created.
Of these, 2 issues have been closed and 2 issues are still open.

OPEN ISSUES

💚 #317 feat(dup): implement procedure load_from_private_log, by neverchanje
💚 #316 add http interface for get_app_envs, by levy5307

CLOSED ISSUES

❤️ #315 feat(dup): verify private log validity before starting to duplicate, by neverchanje
❤️ #314 feat: support table-level slow query on meta server, by levy5307


PULL REQUESTS

Last week, 9 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 1 pull request was updated.
💛 #317 feat(dup): implement procedure load_from_private_log, by neverchanje

MERGED PULL REQUEST

Last week, 8 pull requests were merged.
💜 #315 feat(dup): verify private log validity before starting to duplicate, by neverchanje
💜 #314 feat: support table-level slow query on meta server, by levy5307
💜 #312 feat(dup): implement ship_mutation stage and mutation_batch, by neverchanje
💜 #311 refactor: rename disk_aio to aio_context, by neverchanje
💜 #310 refactor: remove empty_aio_provider and posix aio_provider, by neverchanje
💜 #309 feat(split): child replica learn parent prepare list and checkpoint, by hycdong
💜 #302 refactor: introduce mutation_log::replay_block, by neverchanje
💜 #298 feat(throttle): support size-based write throttling, by neverchanje


COMMITS

Last week there were 8 commits.
🛠️ feat: support table-level slow query on meta server (#314) by levy5307
🛠️ feat(split): child replica learn parent prepare list and checkpoint (#309) by hycdong
🛠️ feat(dup): verify private log validity before starting to duplicate (#315) by neverchanje
🛠️ feat(dup): implement ship_mutation stage and mutation_batch (#312) by neverchanje
🛠️ feat(throttle): support size-based write throttling (#298) by neverchanje
🛠️ refactor: rename disk_aio to aio_context (#311) by neverchanje
🛠️ refactor: introduce mutation_log::replay_block (#302) by neverchanje
🛠️ refactor: remove empty_aio_provider and posix aio_provider (#310) by neverchanje


CONTRIBUTORS

Last week there were 3 contributors.
👤 levy5307
👤 hycdong
👤 neverchanje


STARGAZERS

Last week there were no stargazers.


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

bug: remove log GC from critical path

Currently for each dsn_logv, simple_logger will check if number of log files exceeds limit, if so, a single logv call will cause one filesystem remove operation. If a file is unable to be deleted, the next logv call has to do again, which may incur a serious performance downgrade on the critical path, and even leads to replica failure.

Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
Failed to remove garbage log file /home/work/app/pegasus/c3srv-feedprofile2/replica/log/log.438.txt
    // TODO: move gc out of criticial path
    while (_index - _start_index > _max_number_of_log_files_on_disk) {
        std::stringstream str2;
        str2 << "log." << _start_index++ << ".txt";
        auto dp = utils::filesystem::path_combine(_log_dir, str2.str());
        if (::remove(dp.c_str()) != 0) {
            printf("Failed to remove garbage log file %s\n", dp.c_str());
            _start_index--;
            break;
        }
    }

Replica failure:

E2019-05-11 21:58:20.075 (1557583100075617036 1966c) replica.local_app6.04009626ef59471f: replica_stub.cpp:1348:response_client(): 2.118@: read fail: client = , code = RPC_RRDB_RRDB_GET, timeout = 1000, status = Unknown, error = ERR_OBJECT_NOT_FOUND

dup: reduce average total_duplicate_latency

What we need to do:

[ ] make private log flush more frequent than when duplication not enabled
[ ] make mutation log loading restarts from last global_start_offset when error (like ERR_INVALID_DATA) encountered.

Weekly Digest (22 September, 2019 - 29 September, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 3 issues were created.
Of these, 0 issues have been closed and 3 issues are still open.

OPEN ISSUES

💚 #321 feat(http): add http interface for get_app_envs, by levy5307
💚 #320 feat(dup): protect private log from missing when duplication is enabled, by neverchanje
💚 #319 feat(split): child replica apply private logs, in-memory mutations and catch up parent, by hycdong

NOISY ISSUE

🔈 #321 feat(http): add http interface for get_app_envs, by levy5307
It received 1 comments.


PULL REQUESTS

Last week, 4 pull requests were created, updated or merged.

OPEN PULL REQUEST

Last week, 1 pull request was opened.
💚 #320 feat(dup): protect private log from missing when duplication is enabled, by neverchanje

UPDATED PULL REQUEST

Last week, 2 pull requests were updated.
💛 #321 feat(http): add http interface for get_app_envs, by levy5307
💛 #319 feat(split): child replica apply private logs, in-memory mutations and catch up parent, by hycdong

MERGED PULL REQUEST

Last week, 1 pull request was merged.
💜 #317 feat(dup): implement procedure load_from_private_log, by neverchanje


COMMITS

Last week there was 1 commit.
🛠️ feat(dup): implement procedure load_from_private_log (#317) by neverchanje


CONTRIBUTORS

Last week there was 1 contributor.
👤 neverchanje


STARGAZERS

Last week there were no stargazers.


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

clang build error

While building rdsn with clang-3.9 I encountered some errors:

/home/mi/git/release/pegasus/rdsn/include/dsn/tool-api/task.h:590:10: error: 'dsn::aio_task::enqueue' hides overloaded virtual function [-Werror,-Woverloaded-virtual]
    void enqueue(error_code err, size_t transferred_size);
         ^
/home/mi/git/release/pegasus/rdsn/include/dsn/tool-api/task.h:197:18: note: hidden overloaded virtual function 'dsn::task::enqueue' declared here: different number of parameters
      (0 vs 2)
    virtual void enqueue();
In file included from /home/mi/git/release/pegasus/rdsn/src/dist/block_service/local/local_service.cpp:6:
/home/mi/git/release/pegasus/rdsn/src/dist/block_service/local/local_service.h:82:20: error: private field '_local_service' is not used [-Werror,-Wunused-private-field]
    local_service *_local_service;
/home/mi/git/release/pegasus/rdsn/src/core/tests/autoref_ptr_test.cpp:140:9: error: explicitly moving variable of type 'dsn::ref_ptr<SelfAssign>' to itself [-Werror,-Wself-move]
    var = std::move(var);
    ~~~ ^           ~~~
In file included from /home/mi/git/release/pegasus/rdsn/src/dist/replication/meta_server/meta_state_service_utils.cpp:32:
/home/mi/git/release/pegasus/rdsn/src/dist/replication/meta_server/meta_state_service_utils_impl.h:73:1: error: 'operation' defined as a struct here but previously declared as a
      class [-Werror,-Wmismatched-tags]
struct operation : pipeline::environment

It seemed they are not serious problems actually.

blob is unsafe

blob is unsafe to be used as a shared buffer, since it can also be created without holding ownership of the internal buffer:

void assign(const char *buffer, int offset, unsigned int length);
blob(const char *buffer, int offset, unsigned int length);

This can easily lead to incorrect use, which may in turn cause double free of memory.
We should remove the two functions above and use dsn::string_view instead.

rpc_address::ipv4_from_host() returns 0 if resolve host failed

and also, rpc_address::ipv4_from_network_interface() returns 0 if no proper IP address found.

so that, rpc_address::assign_ipv4() and rpc_address::assign_ipv4_local_address() may assign unexpected ip(0.0.0.0), which may cause tricky problems.

need fix it.

Weekly Digest (25 August, 2019 - 1 September, 2019)

Here's the Weekly Digest for XiaoMi/rdsn:


ISSUES

Last week 4 issues were created.
Of these, 1 issues have been closed and 3 issues are still open.

OPEN ISSUES

💚 #305 test: add unit tests for task, by neverchanje
💚 #304 feat(dup): add interface mutation_duplicator & duplication procedure, by neverchanje
💚 #302 refactor: reimplement mutation_log::replay in a block by block way, by neverchanje

CLOSED ISSUES

❤️ #303 build: remove MY_PROJ_INC_PATH, by vagetablechicken


PULL REQUESTS

Last week, 7 pull requests were created, updated or merged.

UPDATED PULL REQUEST

Last week, 4 pull requests were updated.
💛 #304 feat(dup): add interface mutation_duplicator & duplication procedure, by neverchanje
💛 #302 refactor: reimplement mutation_log::replay in a block by block way, by neverchanje
💛 #299 feat(split): parent replica prepare states, by hycdong
💛 #298 feat(throttle): support size-based write throttling, by neverchanje

MERGED PULL REQUEST

Last week, 3 pull requests were merged.
💜 #303 build: remove MY_PROJ_INC_PATH, by vagetablechicken
💜 #300 fix(network): use derror rather than dwarn for failed network bootstrap, by neverchanje
💜 #297 feat(dup): implement duplication_sync on meta server side, by neverchanje


COMMITS

Last week there were 3 commits.
🛠️ build: remove MY_PROJ_INC_PATH (#303) by vagetablechicken
🛠️ fix(network): use derror rather than dwarn for failed network bootstrap (#300) by neverchanje
🛠️ feat(dup): implement duplication_sync on meta server side (#297) by neverchanje


CONTRIBUTORS

Last week there were 2 contributors.
👤 vagetablechicken
👤 neverchanje


STARGAZERS

Last week there was 1 stargazer.
stone-wind
You are the star! 🌟


RELEASES

Last week there were no releases.


That's all for last week, please 👀 Watch and Star the repository XiaoMi/rdsn to receive next weekly updates. 😃

You can also view all Weekly Digests by clicking here.

Your Weekly Digest bot. 📆

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.