GithubHelp home page GithubHelp logo

apache / kvrocks Goto Github PK

View Code? Open in Web Editor NEW
3.2K 74.0 426.0 10.21 MB

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.

Home Page: https://kvrocks.apache.org/

License: Apache License 2.0

CMake 1.43% Dockerfile 0.05% Shell 0.11% C++ 66.23% Python 0.84% Go 31.23% Lua 0.09%
redis kv namespace redis-cluster distributed database

kvrocks's Introduction

kvrocks_logo

CI License GitHub stars


Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol. Kvrocks intends to decrease the cost of memory and increase the capacity while compared to Redis. The design of replication and storage was inspired by rocksplicator and blackwidow.

Kvrocks has the following key features:

  • Redis Compatible: Users can access Apache Kvrocks via any Redis client.
  • Namespace: Similar to Redis SELECT but equipped with token per namespace.
  • Replication: Async replication using binlog like MySQL.
  • High Availability: Support Redis sentinel to failover when master or slave was failed.
  • Cluster: Centralized management but accessible via any Redis cluster client.

Thanks to designers Lingyu Tian and Shili Fan for contributing the logo of Kvrocks.

Who uses Kvrocks

You can find Kvrocks users at the Users page.

Users are encouraged to add themselves to the Users page. Either leave a comment on the "Who is using Kvrocks" issue, or directly send a pull request to add company or organization information and logo.

Build and run Kvrocks

Prerequisite

# Ubuntu / Debian
sudo apt update
sudo apt install -y git build-essential cmake libtool python3 libssl-dev

# CentOS / RedHat
sudo yum install -y centos-release-scl-rh
sudo yum install -y git devtoolset-11 autoconf automake libtool libstdc++-static python3 openssl-devel
# download and install cmake via https://cmake.org/download
wget https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.sh -O cmake.sh
sudo bash cmake.sh --skip-license --prefix=/usr
# enable gcc and make in devtoolset-11
source /opt/rh/devtoolset-11/enable

# openSUSE / SUSE Linux Enterprise
sudo zypper install -y gcc11 gcc11-c++ make wget git autoconf automake python3 curl cmake

# Arch Linux
sudo pacman -Sy --noconfirm autoconf automake python3 git wget which cmake make gcc

# macOS
brew install git cmake autoconf automake libtool openssl
# please link openssl by force if it still cannot be found after installing
brew link --force openssl

Build

It is as simple as:

$ git clone https://github.com/apache/kvrocks.git
$ cd kvrocks
$ ./x.py build # `./x.py build -h` to check more options;
               # especially, `./x.py build --ghproxy` will fetch dependencies via ghproxy.com.

To build with TLS support, you'll need OpenSSL development libraries (e.g. libssl-dev on Debian/Ubuntu) and run:

$ ./x.py build -DENABLE_OPENSSL=ON

To build with lua instead of luaJIT, run:

$ ./x.py build -DENABLE_LUAJIT=OFF

Running Kvrocks

$ ./build/kvrocks -c kvrocks.conf

Running Kvrocks using Docker

$ docker run -it -p 6666:6666 apache/kvrocks --bind 0.0.0.0
# or get the nightly image:
$ docker run -it -p 6666:6666 apache/kvrocks:nightly

Please visit Apache Kvrocks on DockerHub for additional details about images.

Connect Kvrocks service

$ redis-cli -p 6666

127.0.0.1:6666> get a
(nil)

Running test cases

$ ./x.py build --unittest
$ ./x.py test cpp # run C++ unit tests
$ ./x.py test go # run Golang (unit and integration) test cases

Supported platforms

  • Linux
  • macOS

Namespace

Namespace is used to isolate data between users. Unlike all the Redis databases can be visited by requirepass, we use one token per namespace. requirepass is regraded as admin token, and only admin token allows to access the namespace command, as well as some commands like config, slaveof, bgsave, etc. See the Namespace page for more details.

# add token
127.0.0.1:6666> namespace add ns1 my_token
OK

# update token
127.0.0.1:6666> namespace set ns1 new_token
OK

# list namespace
127.0.0.1:6666> namespace get *
1) "ns1"
2) "new_token"
3) "__namespace"
4) "foobared"

# delete namespace
127.0.0.1:6666> namespace del ns1
OK

Cluster

Kvrocks implements a proxyless centralized cluster solution but its accessing method is completely compatible with the Redis cluster client. You can use Redis cluster SDKs to access the kvrocks cluster. More details, please see: Kvrocks Cluster Introduction

Documents

Documents are hosted at the official website.

Tools

  • To manage Kvrocks clusters for failover, scaling up/down and more, use kvrocks-controller
  • To export the Kvrocks monitor metrics, use kvrocks_exporter
  • To migrate from Redis to Kvrocks, use RedisShake
  • To migrate from Kvrocks to Redis, use kvrocks2redis built via ./x.py build

Contributing

Kvrocks community welcomes all forms of contribution and you can find out how to get involved on the Community and How to Contribute pages.

Performance

Hardware

  • CPU: 48 cores Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  • Memory: 32 GiB
  • NET: Intel Corporation I350 Gigabit Network Connection
  • DISK: 2TB NVMe Intel SSD DC P4600

Benchmark Client: multi-thread redis-benchmark(unstable branch)

1. Commands QPS

kvrocks: workers = 16, benchmark: 8 threads/ 512 conns / 128 payload

latency: 99.9% < 10ms

image

2. QPS on different payloads

kvrocks: workers = 16, benchmark: 8 threads/ 512 conns

latency: 99.9% < 10ms

image

3. QPS on different workers

kvrocks: workers = 16, benchmark: 8 threads/ 512 conns / 128 payload

latency: 99.9% < 10ms

image

License

Apache Kvrocks is licensed under the Apache License Version 2.0. See the LICENSE file for details.

Social Media

WeChat official account

kvrocks's People

Contributors

aleksraiden avatar caipengbo avatar chriszmf avatar colinchamber avatar ellutionist avatar enjoy-binbin avatar git-hulk avatar gogim1 avatar guoxiangcn avatar infdahai avatar iocing avatar jackwener avatar jihuayu avatar jjz921024 avatar karelrooted avatar kay011 avatar maochongxin avatar maplefu avatar maximsmolskiy avatar pragmatwice avatar shangxiaoxiong avatar shooterit avatar tanruixiang avatar tisonkun avatar torwig avatar wyattjia avatar xiaobiaozhao avatar yangsx-1 avatar zevin02 avatar zncleon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kvrocks's Issues

同步基本不可用

  1. masterA 有数据 slaveA slaveof过来,数据可以同步,flushall masterA的数据, slaveA 没有同步清空掉。都执行过dbsize scan
    2)masterB 没数据 slaveB 同步过来up后 ,masterB上执行写数据,数据没有同步到slaveB
    3)slaveof后 slave节点上经常会报以下错误, 但这个目录是清空后再启动slave的
E0216 13:35:53.319345 26935 storage.cc:703] [storage] Failed to delete dir: IO error: file rmdir: ./kvrocksdatamaster/backup/meta: Directory not empty
E0216 13:35:53.415446 26935 replication.cc:584] [replication] Failed to restore backup while IO error: No such file or directoryWhile opening a file for sequentially reading: ./kvrocksdatamaster/backup/meta/5: No such file or directory
Segmentation fault (core dumped)

kvrocks2redis works weird

[What did I do?]
When I use kvrocks2redis to migrate data to a Redis server, the auth part does not work.
1.kvrocks server configured by:
requirepass=mypwd_kvrocks
# masterauth
2.kvrocks2redis configured by:
kvrocks xx.xx.xx.xx 6666 mypwd_kvrocks
namespace.ns1 xx.xx.xx.xx 6379 mypwd_redis 1

Meanwhile, Redis server is ready with port 6379 & pwd 'mypwd_redis'.
And I did this on kvrocks server:
./redis-cli -h xx.xx.xx.xx -p 6666 -a mypwd_kvrocks namespace add ns1 mypwd_ns1
./redis-cli -h xx.xx.xx.xx -p 6666 -a mypwd_ns1 set key1 abcdefg
./redis-cli -h xx.xx.xx.xx -p 6666 -a mypwd_ns1 lpush key2 a b c

[ERROR LOG]
E1221 16:13:32.715910 99362 sync.cc:58] auth got invalid response
E1221 16:13:32.717257 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.718616 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.719998 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.721364 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.722718 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.724040 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.
E1221 16:13:32.725356 99427 redis_writer.cc:90] [kvrocks2redis] redis select db failed: -NOAUTH Authentication required.

[Question~]
Am I doing the right config for kvrocks & kvrocks2redis? Why migration cannot work and log shows err in AUTH?

段错误

运行一段时间经常会段错误,抓了core看了下,可能问题是在evbuffer_drain这里面,
chain这个地方为空指针取值了,网上搜了下,指出bufferevent_socket_new 的时候要加BEV_OPT_THREADSAFE,猜测可能是这个问题。
image

Support column families in KVRocks

Hi,

I couldn't find anyway of setting up new column families in KVRocks. Is there a way to setup a new namespace KVRocks and execute (redis-cli --pipe) commands against the new column family?

Thanks,
Aniruddha

[BUG] Docker image is dead

I tried the docker version indicated in the readme:
docker run -it -p 6666:6666 bitleak/kvrocks

But it's failing because the docker image is not available anymore on docker hub.

Is there any reason why?

[BUG] dbsize return wrong value

##################

redis

##################
127.0.0.1:6379> keys *

  1. "a"
  2. "tntchecker"
  3. "b"
    127.0.0.1:6379> dbsize <========== keys count
    (integer) 3
    127.0.0.1:6379> info keyspace
    -# Keyspace
    db0:keys=3,expires=0,avg_ttl=0 <======== keys=3

##################

kvrocks

##################
127.0.0.1:6666> keys *

  1. "a"
  2. "tntchecker"
  3. "b"
    127.0.0.1:6666> dbsize <========== keys count
    (integer) 0 <======= always return 0!!
    127.0.0.1:6666> info keyspace
    -# Keyspace
    -# Last scan db time: Thu Jan 1 00:00:00 1970
    db0:keys=0,expires=0,avg_ttl=0,expired=0 <===== keys 0
    sequence:38283437
    used_db_size:603247462
    max_db_size:0
    used_percent: 0%
    disk_capacity:322110992384
    used_disk_size:161157144576
    used_disk_percent: 50%

Any plan to support SSL?

libevent provides ssl with bufferevent_ssl.h and openssl. I guess is not hard to support ssl. Is there any plan to support ssl?

在docker中启动就hang住了

在main处开始打印 也没有任何输出
gdb 堆栈�如下

[Switching to thread 1 (Thread 0x7fd29b678300 (LWP 148))]
#0  0x00007fd29ae4d50d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fd29ae4d50d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fd29ae48e5b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007fd29ae48d28 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000005874a9 in malloc_mutex_lock_final (mutex=0xb224e0 <init_lock>) at include/jemalloc/internal/mutex.h:155
#4  je_malloc_mutex_lock_slow (mutex=mutex@entry=0xb224e0 <init_lock>) at src/mutex.c:85
#5  0x00000000005524f0 in malloc_mutex_lock (tsdn=0x0, mutex=0xb224e0 <init_lock>) at include/jemalloc/internal/mutex.h:221
#6  0x000000000040d821 in malloc_init_hard () at src/jemalloc.c:1739
#7  0x0000000000556961 in malloc_init () at src/jemalloc.c:223
#8  imalloc_init_check (sopts=<optimized out>, dopts=<optimized out>) at src/jemalloc.c:2229
#9  imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2260
#10 calloc (num=1, size=32) at src/jemalloc.c:2494
#11 0x00007fd29b05c550 in _dlerror_run () from /lib64/libdl.so.2
#12 0x00007fd29b05c058 in dlsym () from /lib64/libdl.so.2
#13 0x00007fd29b25fbbc in sysconf (name=name@entry=30) at libsysconfcpus.c:108
#14 0x00000000005881a6 in os_page_detect () at src/pages.c:429
#15 je_pages_boot () at src/pages.c:600
#16 0x000000000040d756 in malloc_init_hard_a0_locked () at src/jemalloc.c:1518
#17 0x000000000040d882 in malloc_init_hard () at src/jemalloc.c:1750
#18 0x000000000055575d in malloc_init () at src/jemalloc.c:223
#19 imalloc_init_check (sopts=<optimized out>, dopts=<optimized out>) at src/jemalloc.c:2229
#20 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2260
#21 je_malloc_default (size=16) at src/jemalloc.c:2289
#22 0x00007fd29a350ecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#23 0x00007fd29a820c27 in google::FlagRegisterer::FlagRegisterer<std::string>(char const*, char const*, char const*, std::string*, std::string*) () from /lib64/libgflags.so.2.2
#24 0x00007fd29a8143b3 in _GLOBAL__sub_I_gflags.cc () from /lib64/libgflags.so.2.2
#25 0x00007fd29b471973 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#26 0x00007fd29b46315a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#27 0x0000000000000001 in ?? ()
#28 0x00007ffdce76a93c in ?? ()
#29 0x0000000000000000 in ?? ()

strace 如下

arch_prctl(ARCH_SET_FS, 0x7f0ef453d300) = 0
mprotect(0x7f0ef2c94000, 16384, PROT_READ) = 0
mprotect(0x7f0ef2eb3000, 4096, PROT_READ) = 0
mprotect(0x7f0ef31b5000, 4096, PROT_READ) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0ef453a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0ef4539000
mprotect(0x7f0ef349f000, 32768, PROT_READ) = 0
mprotect(0x7f0ef36d2000, 4096, PROT_READ) = 0
mprotect(0x7f0ef3f1a000, 4096, PROT_READ) = 0
mprotect(0x7f0ef38f4000, 4096, PROT_READ) = 0
mprotect(0x7f0ef3afa000, 4096, PROT_READ) = 0
mprotect(0x7f0ef3d02000, 4096, PROT_READ) = 0
mprotect(0x7f0ef4122000, 4096, PROT_READ) = 0
mprotect(0x7f0ef4325000, 4096, PROT_READ) = 0
mprotect(0xb0d000, 8192, PROT_READ)     = 0
mprotect(0x7f0ef4548000, 4096, PROT_READ) = 0
munmap(0x7f0ef4543000, 15371)           = 0
set_tid_address(0x7f0ef453d5d0)         = 223
set_robust_list(0x7f0ef453d5e0, 24)     = 0
rt_sigaction(SIGRTMIN, {0x7f0ef3d0a820, [], SA_RESTORER|SA_SIGINFO, 0x7f0ef3d135f0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f0ef3d0a8b0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f0ef3d135f0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
futex(0x7f0ef34bb8ac, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f0ef34bb8b8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
readlink("/etc/malloc.conf", 0x7ffe7608c6e0, 4096) = -1 ENOENT (No such file or directory)
futex(0x7f0ef41230b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0xb22580, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 223 detached
 <detached ...>

build error with master

Hi,

I'm getting the following build error when trying to build from git.

[ 87%] Building CXX object CMakeFiles/kvrocks2redis.dir/tools/kvrocks2redis/parser.cc.o
[ 88%] Linking CXX executable kvrocks2redis
/usr/bin/ld: cannot find -lstdc++
collect2: error: ld returned 1 exit status
make[2]: *** [kvrocks2redis] Error 1
make[1]: *** [CMakeFiles/kvrocks2redis.dir/all] Error 2
make: *** [all] Error 2

  • OS: Red Hat Enterprise Linux Server release 7.2 (Maipo)

To Reproduce
Steps to reproduce the behavior:

  1. sh build.sh build

[NEW] set a max size in mb and max connections to a namespace

It would be really great to be able to set a max size (in mb ideally or number of keys) and a max number of connections to a namespace. The use case is to have a shared instance that can be used safely with quota by a lot of users.

Thanks for considering this :)

find the suitable size for SST

the key-value size may so be quite different in many scenes, and use 256MiB as SST file size may cause data loading(large index/filter block) ineffective when the key-value was too small. kvrocks supports user-defined SST file in config(rocksdb.target_file_size_base), but it still too trivial and inconvenient to adjust the different sizes for different instances. so we want to periodic auto-adjust the SST size in-flight with user avg key-value size. for example:

case 1: 0 ~ 128 bytes,Block Size = 2k, SST size = 16 M

case 2:  128 ~ 1024 bytes, Block Size = 8k, SST size = 64 M

case 3: 1k ~ 32k, Block Size = 32k,   SST size = 128 M

case 4: 32k ~ 256k, Block Size = 256k, SST size = 256 M

case 5: 256k ~ 512k, Block Size = 512k , SST size = 512 M

case 6: > 512 k, Block Size = 1M, SST size = 1 G

TODO:

  • develop the feature

  • code review

  • benchmark between fixed and dynamic SST size(32bytes/128bytes/512bytes/1k/16k/64k/256k/1M)

  • how about dynamic block size? which may affect the performance of the block cache hit rate

rename flushall to flushdb

For compatability with the redis protocol you should implement the flushdb command. Currently you do this functionality with a command called flushall.

[QUESTION]hset mutiple arguments not supported?

@@redis
127.0.0.1:6379> hset me lang english sex male
(integer) 2
127.0.0.1:6379> hgetall me

  1. "lang"
  2. "english"
  3. "sex"
  4. "male"
    127.0.0.1:6379>

===========================================================
@@kvrocks

127.0.0.1:6666> hset me lang english sex male
(error) ERR wrong number of arguments <====================
127.0.0.1:6666>

127.0.0.1:9188> info

Server

version:999.999.999
git_sha1:18d71d0d
os:Linux 4.19.0-16-cloud-amd64 x86_64
gcc_version:8.3.0
arch_bits:64
process_id:13
tcp_port:9188
uptime_in_seconds:313
uptime_in_days:0

Clients

connected_clients:1
monitor_clients:0
...........

RocksDB

estimate_keys[default]:1
block_cache_usage[default]:0
block_cache_pinned_usage[defa
......

Illegal instruction error when running on server (Ubuntu)

Describe the bug
Hi, great project. We're trying to move from redis to kvrocks for temporary KV storage. We have kvrocks successfully running on our local machines but it couldn't run on our servers in DigitalOcean.

Running the binary directly throws error: Illegal instruction on start.
Running in container (hulkdev/kvrocks) on the server fails with no error or logs. Works in local.

  • OS: Ubuntu
  • Version 20.04
  • kvrocks v1.1.32

Thank you.

[BUG] docker image crash

Describe the bug
Hey!
Tried to run kvrocks with your docker image, but it doesn't start or produce any logs. Docker deamon on Mac and ubuntu (20.0.3 & 20.10.6) both report that the container exited with Code 132.
So I cloned into the 2.0 branch and build from source and to build the docker image fresh. And then it worked like charm, it logs Version: 2.0.0 @000b3627 and redis-cli is fine with it.
Did I miss anything while starting the docker container from the README instructions?

To Reproduce
Steps to reproduce the behavior:

  1. docker run -it -p 6666:6666 bitleak/kvrocks

docker inspect ...

"State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 132,
            "Error": "",
            "StartedAt": "2021-05-03T09:02:17.8104824Z",
            "FinishedAt": "2021-05-03T09:02:17.8757551Z"
        },

Supprts new configuration

  • min-slaves-to-write deny the write when too few slaves can keep up with the master
  • min-slaves-max-lag to define max lag when the slave would be regarded as can't keep up with the master

[QUESTION] Should we use backup or checkpoint of rocksdb for kvrocks backup

Currently, we use checkpoint of rocksdb to implement kvrocks full synchronization, but kvrocks backup still uses backup of rocksdb. As we know, rocksdb backup costs much bandwidth and space of disk, but rocksdb backup can implement incremental backup and support to store into HDFS, these are good features. So i think we should make a decision to use backup or checkpoint of rocksdb for kvrocks backup. WDYT? @git-hulk @karelrooted @Alfejik

deadlock

目前发现一个deadlock,原始代码没有复现,我改写的代码会出现,大概几天会出现一次,看原始代码觉得可能也有这个隐患,以下是deadlock时候的堆栈,如有不正确,请指正谢谢!

Thread 7 (Thread 0x7fe5e47f8700 (LWP 97)):
#0  0x00007fe5f8f4ef4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fe5f8f4ad02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fe5f8f4ac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004437d8 in __gthread_mutex_lock(pthread_mutex_t*) ()
#4  0x0000000000443b10 in std::mutex::lock() ()
#5  0x00000000004db93a in std::lock_guard<std::mutex>::lock_guard(std::mutex&) ()
#6  0x00000000004f7731 in Server::UnSubscribeChannel(std::string const&, Redis::Connection*) ()
#7  0x00000000004a7eee in Redis::Connection::UnSubscribeAll() ()
#8  0x00000000004a723d in Redis::Connection::~Connection() ()
#9  0x00000000005223bd in Worker::FreeConnectionByID(int, unsigned long) ()
#10 0x00000000005235b2 in Worker::KickoutIdleClients(int) ()
#11 0x000000000052158e in Worker::TimerCB(int, short, void*) ()

Thread 2 (Thread 0x7fe5e7fff700 (LWP 129)):
#0  0x00007fe5f8f4ef4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fe5f8f4ad02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fe5f8f4ac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004437d8 in __gthread_mutex_lock(pthread_mutex_t*) ()
#4  0x0000000000443b10 in std::mutex::lock() ()
#5  0x00000000004efc87 in std::unique_lock<std::mutex>::lock() ()
#6  0x00000000004ee65d in std::unique_lock<std::mutex>::unique_lock(std::mutex&) ()
#7  0x0000000000522665 in Worker::Reply(int, std::string const&) ()
#8  0x00000000004f7288 in Server::PublishMessage(std::string const&, std::string const&) ()
#9  0x000000000048d06f in Redis::CommandPublish::Execute(Server*, Redis::Connection*, std::string*) ()

1)一个worker获得了server的锁 pubsub_channels_mu_ 等待worker的锁conns_mu_ 线程2
另外一个定时器触发,到时间清理timeout的链接 线程7 获得了worker的锁conns_mu_在connection析构时会等待server的锁 pubsub_channels_mu_,就卡死在这里

2) KickoutIdleClients 这个函数的行为不是很理解,按理是清理conns_和monitor_conns_的连接数,但这里面并没有遍历monitor_conns_的map,却在FreeConnectionByID里面去找monitor_conns_,此外这个地方难道不是直接map erase掉conns_和monitor_conns_,并将符合条件的加入到to_be_killed_conns里面更好?这样FreeConnectionByID无需加锁。

[NEW] Support redis cluster mode

1 Background

Currently, we discard the support for codis mode, if we want to deploy kvrocks cluster, we need to pre-shard for cluster and deploy a proxy like twemproxy to route requests. This cluster solution also is adopted when redis doesn't support cluster mode, but it is not convenient for failover, linear scale, management, moreover, we must route requests by a proxy, that must cause performance loss because of extra network communication. So we hope there a way we can implement a feature that is similar with redis cluster mode.

But if we totally support redis cluster mode in kvrocks, it doesn't seem a good idea, because it is hard to implement redis cluster management and gossip protocol without bugs. We must upgrade server if we find bugs, after all, it costs many years to make redis cluster work well. In order to get benefits from redis cluster mode, we can only support partial redis cluster commands that make a chance to let redis cluster sdk(such as jedis) or redis cluster proxy(such as redis-cluster-proxy, predixy) work well.

2 Implementation

2.1 Basic version

Firstly, every node needs to know the entire cluster topology, itself identifier. When clients(redis-cli, SDK or proxy) connect to arbitrary one node of kvrocks cluster, they could know slot distribution of cluster by sending CLUSTER NODES command, then route requests. And the kvrocks should reply MOVED when clients send requests to wrong server. To do that, we should implement CLUSTER SETNODES command to set the entire cluster topology for every nodes, and provide CLUSTER NODES command to make kvrocks node similar with redis.

  • CLUSTER NODES
    The return of this command is the same format of redis cluster.
    Example:

    07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004 slave e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 0 1426238317239 4 connected
    67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002 master - 0 1426238316232 2 connected 5461-10922
    
  • CLUSTER SETNODES $ALL_NODES_INFO $VERSION FORCE
    $ALL_NODES_INFO format: $node_id $ip $port $role $master_node_id $slot_range. $role: master or slave, $master_node_id: master_node_id if it is slave, otherwise, this value is -. slave needn't set $slot_range.
    Example:

    07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 30004 slave e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca
    67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1 30002 master - 5461-10922
    

    $VERSION means current cluster topology version that is stored or generated by management component such as config server or meta-server.
    Kvrocks can know itself node id by compared with itself ip and port in received all nodes info

From now, it works well, we just tell every node of kvrocks cluster by CLUSTER SETNODES when scale(slot data migration) or failover. Clients(redis-cli, sdk, proxy) will reacquire cluster topology when they finds requests are rejected or MOVED by using old topology to route requests. To avoid missing notify nodes new topology, we can set new cluster topology every longer interval.

*Please notice that: the cluster solution for kvrocks doesn't handle scale or failover.

2.1 Advanced version

We find we always tell every node entire cluster topology even just changing one node info. If the cluster is very large, the message will be big, so we also fall into the damage of network communication, and cluster management component may can't support that. So we may need to support incremental changing notification. When changing topology, we only send one light command to change the topology in kvrocks server. To implement incremental changing notification, we must use version to identify current topology version, otherwise, kvrocks may keep a wrong cluster topology if missing one changing topology message. Kvrocks only supports to apply one command content when its version is valid by compared with the $version in command, if not, kvrocks must return an error, and we will set new entire cluster topology by CLUSTER SETNODES command.

There are some commands we may need to support.

  • (optional) CLUSTER VERSION
    Get version of node, so that we can check its' version is right or not.
  • (optional) CLUSTER SETNODEID $NODE_ID
    Explicitly tell the node its node id.
  • (optional) CLUSTER SETNODE $NODE_ID $NODE_INFO $VERSION
    Only update one node info
  • (optional) CLUSTER DELNODE $NODE_ID $VERSION
    Delete one node of cluster
  • (optional) CLUSTER REPLICATE $NODE_ID
    Just like slaveof, similar with redis command.
  • (optional) CLUSTER MIGRATE $SLOT_ID $NODE_ID
    Migrate all keys belonging the slot to another node
  • (optional) CLUSTER SETSLOT $SLOT_ID $NODE_ID $VERSION
    Only update one slot distribution, that is useful to avoid too many big messages when we scale cluster.
  • (optional) CLUSTER SLOTS/INFO/KEYSLOT/GETKEYSINSLOT/COUNTKEYSINSLOT
    To make kvrocks cluster similar with redis cluster

3 Advantages and disadvantages

3.1 Advantages

  • It is easy to implement and make less bugs.
  • Cluster logic are removed from server, so that we needn't change kvrocks code when optimizing fault check or failover for kvrocks instances.

3.2 Disadvantages

  • Topology changing notifications are sent by other component, clients may feel latency when cluster topology change, even although that is low-frequency.
  • It is not perfect, we still need a cluster management component to manage cluster such as failover and scale.

4 Extra works

  • More redis cluster commands to manage kvrocks cluster conveniently just like redis cluster.
  • Cluster management component. To make kvrocks cluster easy, we may provide a cluster management component to deal with linear scale, failover, or management.

RocksDB optimization

Compression type

  • Support multiple compression types

Parameters

  • enable enable_pipelined_write if having more CPUs
  • use index_type: kTwoLevelIndexSearch to make index and filter blocks smaller
  • enable pin_top_level_index_and_filter, pin_l0_filter_and_index_blocks_in_cache
  • for cache_index_and_filter_blocks_with_high_priority, to avoid too high
  • NewBloomFilterPolicy(bits_per_key, false) i.e. BloomFilterPolicy::kAutoBloom
  • data_block_index_type: BlockBasedTableOptions::kDataBlockBinaryAndHash
  • Set BlockBasedTableOptions::optimize_filters_for_memory to true, jemalloc_usable_size is safe if no bugs, redis adopts it
  • memtable_prefix_bloom_size_ratio and memtable_whole_key_filtering, to enable memtable filter.
  • CompressionOptions::max_dict_bytes for lz4 and zstd

Data partition by ColumnFamily

  • One type one ColumnFamily

编译不成功

cd src && make all
You need to run this command from the toplevel of the working tree.
make[1]: 进入目录“/root/kvrocks/src”

Hint: Fetching submodules from the remote
git submodule init
You need to run this command from the toplevel of the working tree.
make[1]: *** [SUBMODULES] 错误 1
make[1]: 离开目录“/root/kvrocks/src”
make: *** [all] 错误 2

[NEW] Optimize TCP keepalive detection time

The problem/use-case that the feature addresses
On Linux, the tcp keeaplive detection time is too long, it is not useful to find some problem actually

Description of the feature
We need to set the tcp-keepalive interval and probes if the socket keepalive is disabled

Alternatives you've considered
Maybe we also can provide a configure to set keepalive, just list Redis

Additional information
NULL

[NEW] TTL of member

redis> SADD mybasket goods-id
redis> expire mybasket goods-id 10
redis>	1
redis>	expire mybasket goods-id1 goods-id2 goods-id3 10
redis>	3
redis>	sadd myset mem10
redis>	1
redis>	expire myset mem10 5
redis>	1
redis>	ttl myset mem10
redis>	3
redis>	ttl myset mem10
redis>	-2
redis>	smembers myset
redis>	(empty list or set)

This is a feature of redis-enterprise only. (supported data type: SET, ZSET, HASH)

slaveof redis的支持

因为我们实际业务需要,要支持kvrocks slaveof redis实例,目前这部分代码我已经实现了个最短路径:

  • 支持redis的psync协议
  • 解析redis的rdb文件
  • 将rdb文件同步到kvrocks中
  • 定期replconf ack offset
    代码已经在测试环境跑通,当然距离真正上线还有断时间,此外redis slaveof kvrocks也在下步计划中想知道完善后是否可以将这部分代码提到到本仓库?因为可能有其他公司有类似的需求。

编译失败

Centos7

[root@i-h60b6jz2 kvrocks]# make -j4
cd src && make all
make[1]: 进入目录“/root/kvrocks/src”
CC main.o
CC server.o
LINK kvrocks
/root/kvrocks/external/glog/.libs/libglog.a(libglog_la-logging.o):在函数‘__static_initialization_and_destruction_0(int, int)’中:
logging.cc:(.text+0x6299):对‘google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)’未定义的引用
logging.cc:(.text+0x634b):对‘google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)’未定义的引用
logging.cc:(.text+0x63f5):对‘google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)’未定义的引用
logging.cc:(.text+0x649f):对‘google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)’未定义的引用
logging.cc:(.text+0x64dc):对‘google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)’未定义的引用
/root/kvrocks/external/glog/.libs/libglog.a(libglog_la-logging.o):logging.cc:(.text+0x657e): more undefined references to `google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*)' follow
collect2: 错误:ld 返回 1
make[1]: *** [kvrocks] 错误 1
make[1]: 离开目录“/root/kvrocks/src”
make: *** [all] 错误 2

Support encryption at rest

It would be nice to support encryption at rest. CockroachDB's implementation for Rocksdb maybe a good starting point.

Incompatability with hscan and some other incompatibilities.

There is a python ORM library for redis called walrus. It has a very rich set of tests. I found this helpful in identifying problems with compatability with stock redis. https://github.com/coleifer/walrus
I was able to run these tests against your redis and it found some issues

Here is the output of the test run.
https://pastebin.com/mFfEjWPT

I have verified that stock redis passes all the tests.

The tweaks I had to do to the tests was to pass in the password='foobared' to 2 places: runtests.py does a call to Redis().Info() which needs the password set, and
walrus/tests/base.py which also needs the password passed into the Redis() call.

I'd cut a bug for each issue, but I think odds are that once you get the tests running locally you'll be able to find other incompatibilities and fix them.

[BUG] Unable to build.

Attempting to compile as per instructions on Readme. Unable to build.

image

./db/version_edit.h:86:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]

  • OS: Ubuntu 20.04
  • GCC/G++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
  • GNU Make 4.2.1

To Reproduce
Steps to reproduce the behavior:

  1. Build as per instructions on README

QPS测试对比PIKA

对比了下和pika的性能测试,以下面的SET命令为例,不挂slave

redis-benchmark -t set -n 100000 -r 100000 -d 102400 -h host -p port

pika大概QPS稳定在2000+,kvrocks的大概900多,想请教是否有性能编译方面的优化建议以达到和pika差不多的性能?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.