GithubHelp home page GithubHelp logo

bilibili / overlord Goto Github PK

View Code? Open in Web Editor NEW
2.2K 68.0 405.0 11.42 MB

Overlord是哔哩哔哩基于Go语言编写的memcache和redis&cluster的代理及集群管理功能,致力于提供自动化高可用的缓存服务解决方案。

Home Page: https://www.bilibili.com

License: MIT License

Go 88.06% Shell 0.99% Python 2.01% Makefile 0.05% JavaScript 1.59% HTML 0.09% Vue 6.69% SCSS 0.52%
go cache cache-proxy memcached memcache redis redis-cluster

overlord's Introduction

Overlord

Build Status GoDoc codecov Go Report Card

Overlord是哔哩哔哩基于Go语言编写的memcache和redis&cluster的代理及集群管理功能,致力于提供自动化高可用的缓存服务解决方案。主要包括以下组件:

  • proxy:轻量高可用的缓存代理模块,支持memcache和redis的代理,相当于twemproxy,不同在于支持redis-cluster及能将自己伪装为cluster模式。
  • platform:包含apiserver、mesos framework&executor、集群节点任务管理job等。
  • GUI:web管理界面,通过dashboard可视化方便用于集群管理,包括创建删除、扩缩容、加减节点等。
  • anzi:redis-cluster的数据同步工具,可服务化与apiserver进行配合工作。
  • enri:redis-cluster的集群管理工具,可灵活的创建集群、迁移slot等。

Overlord已被哔哩哔哩用于生产环境。

Document

简体中文

GUI

GUI

Architecture

architecture

Cache-Platform

基于mesos&etcd提供了自动化的缓存节点管理平台

cache-platform Architecture


Please report bugs, concerns, suggestions by issues, or join QQ-group 716486124 to discuss problems around source code.

overlord's People

Contributors

dependabot[bot] avatar everpcpc avatar felixhao avatar gutan avatar hawkingrei avatar hbprotoss avatar karenchuang avatar lintanghui avatar linyinsheng avatar liuhao1024 avatar wangrzneu avatar wayslog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

overlord's Issues

多级缓存

motivation

用于缓存容灾,我们需要支持二级缓存。二级缓存可能是一些低速的缓存类型,比如带持久化的redis,比如kv存储等。其中级目的是为了防止缓存被击穿带来的DB雪崩。

资源注入问题

此处的host和port来源与http请求,被直接用于网络访问的地址,可构成资源注入入侵内网系统。

ip := c.Param("ip")
port := c.Param("port")
cmd := c.PostForm("command")
args := strings.Split(cmd, " ")
rcmd, err := svc.Execute(fmt.Sprintf("%s:%s", ip, port), args[0], args[1:]...)

也许是内部使用的系统不用考虑这些?如果是请忽略。

多写功能

motivation

为了支持 memcache 平滑迁移和暖机。我们应该能让overlord能有能力将写请求自动转发到新的节点上。
但是这种多写功能其实是有巨大的限制的:

请求重排列带来的数据混乱可能会造成脏数据,而且可能性还很高。

要如何解决上述问题,需要我们的思考。

合并 memecache text/binary 协议

memcache 协议现在各种客户端的实现参差不齐,binary和text都有在用。现在采取的办法是用两个不同的端口分别指定为不同的协议。但是其实这个协议可以通过parser detect的方式进行兼容。
需要修改的地方:

  • 修改 parser ,让 parser 能同时实例化两种请求
  • 修改 NodeConn ,让 read/write 的操作同时支持
  • 修改 pipeline 模式,让 setq 等更加简单
  • 加入单元测试,测试 text 和 binary 协议随机出现的结果

[Bug Report] miss makezero in slice init

I was running github actions to run linter makezero for top github golang repos.

see issues alingse/go-linter-runner#1

and the github actions output https://github.com/alingse/go-linter-runner/actions/runs/9243212242/job/25427060632

====================================================================================================
append to slice `ofid` with non-zero initialized length at https://github.com/bilibili/overlord/blob/master/platform/mesos/scheduler.go#L243:10
====================================================================================================

the ofid := make([]ms.OfferID, len(offers)) should be ofid := make([]ms.OfferID, 0, len(offers))

lettuce调用overlord出现异常

java 1.8
lettuce 2.0.4

测试代码

`public static void main(String[] args) {
String nodes = "192.168.17.25:27020";
DefaultClientResources clientResources = DefaultClientResources.builder()
.ioThreadPoolSize(10)
.computationThreadPoolSize(10)
.build();

    ClusterClientOptions clusterClient = ClusterClientOptions.builder()
            .autoReconnect(true)
            .pingBeforeActivateConnection(true)
            .build();

    RedisClusterClient client = RedisClusterClient.create(clientResources, RedisURI.create("redis://" + nodes));
    client.setOptions(clusterClient);

    StatefulRedisClusterConnection<String, String> connect = client.connect();
    RedisAdvancedClusterCommands<String, String> sync = connect.sync();

    String set = sync.set("a", "b");
}`

出现的异常

16:17:14.263 [main] DEBUG io.lettuce.core.protocol.DefaultEndpoint - [channel=0xe829f2e8, /192.168.17.1:54639 -> /192.168.17.25:27020, epid=0x2] close() Exception in thread "main" io.lettuce.core.RedisException: Cannot retrieve initial cluster partitions from initial URIs [RedisURI [host='192.168.17.25', port=27020]] at io.lettuce.core.cluster.RedisClusterClient.loadPartitions(RedisClusterClient.java:790) at io.lettuce.core.cluster.RedisClusterClient.initializePartitions(RedisClusterClient.java:761) at io.lettuce.core.cluster.RedisClusterClient.connectClusterImpl(RedisClusterClient.java:500) at io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:339) at io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:316) at com.example.importpsr.OverLordTest.main(OverLordTest.java:34)

热key及key状态分析

motivation

我们现在需要在代理侧统计指标,同时记录热key等数据,并且依靠其他中间件分析热key,慢key等分布。

Support for mesos hostname

Hostnames are written into nodes.conf if mesos is not running with ip address as hostname, which may cause redis reporting itself with hostname in cluster nodes command result.
exp:

10.0.1.2:21000> cluster nodes
0000000000000000000000000000000000000537 test-node2:21000@31000 myself,master - 0 1587906151000 437 connected 0-8192
0000000000000000000000000000000000000540 10.0.1.3:21001@31001 slave 0000000000000000000000000000000000000537 0 1587906150010 437 connected
0000000000000000000000000000000000000539 10.0.1.3:21000@31000 master - 0 1587906152000 436 connected 8193-16383
0000000000000000000000000000000000000538 10.0.1.2:21001@31001 slave 0000000000000000000000000000000000000539 0 1587906153021 436 connected

This could lead to an infinite failure in consistence check.

overlord 重启集群节点异常/删除集群无效问题

问题描述:
1.通过 web 重启创建的集群节点异常
2019/07/22 10:43:46 events_generated.go:30: [INFO] Task localhost:31004-redis_c_b-8 is in state TASK_ERROR with message 'Total resources cpus(allocated: sh001):0.25; mem(allocated: sh001):40; ports(allocated: sh001):[31004-31004] required by task and its executor is more than available cpus(allocated: sh001):2.25; mem(allocated: sh001):6848; disk(allocated: sh001):233952; ports(allocated: sh001):[31007-32000]'
2.通过 web 删除集群,只删除集群记录及关闭mesos任务,实际缓存进程仍在操作系统

想法:
问题1:看异常是请求的资源与提供的资源不符合,但是重启端口号应该相同。mesos 接口参数有支持避免端口冲突还是需要自己先 kill 再启动?
问题2:是否需要自己在 executor 实现关掉进程的逻辑?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.