pruepei / influx-proxy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shell909090/influx-proxy

6.0 8.0 4.0 470 KB

License: Other

Makefile 0.77% Go 90.93% Python 8.30%

influx-proxy's Introduction

InfluxDB Proxy

This project adds a basic high availability layer to InfluxDB.

NOTE: influx-proxy must be built with Go 1.5+, don't implement udp.

Why

We used InfluxDB Relay before, but it doesn't support some demands. We use grafana for visualizing time series data, so we need add datasource for grafana. We need change the datasource config when influxdb is down. We need transfer data across idc, but Relay doesn't support gzip. It's inconvenient to analyse data with connecting different influxdb. Therefore, we made InfluxDB Proxy.

Features

Support gzip.
Support query.
Filter some dangerous influxql.
Transparent for client, like cluster for client.
Cache data to file when write failed, then rewrite.

Requirements

Redis-server
Python >= 2.7

Usage

$ # install redis-server
$ yum install redis
$ # start redis-server on 6379 port
$ redis-server --port 6379 &
$ # Install influxdb-proxy to your $GOPATH/bin
$ go get -u github.com/eleme/influx-proxy
$ # Edit config.py and execute it
$ python config.py
$ # Start influx-proxy!
$ $GOPATH/bin/influxdb-proxy -redis localhost:6379

Configuration

Example configuration file is at config.py. We use config.py to genrate config to redis.

Description

The architecture is fairly simple, one InfluxDB Proxy process and two or more InfluxDB processes. The Proxy should point HTTP requests with measurements to the two InfluxDB servers.

The setup should look like this:

        ┌─────────────────┐
        │writes & queries │
        └─────────────────┘
                 │
                 ▼
         ┌───────────────┐
         │               │
         │InfluxDB Proxy │
         |  (only http)  |
         │               │         
         └───────────────┘       
                 │
                 ▼
        ┌─────────────────┐
        │   measurements  │
        └─────────────────┘
          |              |       
        ┌─┼──────────────┘       
        │ └──────────────┐       
        ▼                ▼       
  ┌──────────┐      ┌──────────┐  
  │          │      │          │  
  │ InfluxDB │      │ InfluxDB │
  │          │      │          │
  └──────────┘      └──────────┘

measurements match principle:

Exact match first. For instance, we use cpu.load for measurement's name. The KEYMAPS has cpu and cpu.load keys. It will use the cpu.load corresponding backends.
Then Prefix match. For instance, we use cpu.load for measurement's name. The KEYMAPS only has cpu key. It will use the cpu corresponding backends.

Query Commands

Unsupported commands

The following commands are forbid.

DELETE
DROP
GRANT
REVOKE

Supported commands

Only support match the following commands.

.*where.*time
show.*from
show.*measurements

License

MIT.

influx-proxy's People

Contributors

Stargazers

Watchers

Forkers

sn0rt lrg87 alastairruhm lijianying10

influx-proxy's Issues

💡Query函数实现的想法

根据阅读代码，可以理解下面函数是实现查询的核心入口，是否可以考虑每goroutine /每查询实例的设计以此来提高性能，各位怎么看？

cluster.go

func (ic *InfluxCluster) Query(w http.ResponseWriter, req *http.Request) (err error) {
	atomic.AddInt64(&ic.stats.QueryRequests, 1)
	defer func(start time.Time) {
		atomic.AddInt64(&ic.stats.QueryRequestDuration, time.Since(start).Nanoseconds())
	}(time.Now())

	switch req.Method {
	case "GET", "POST":
	default:
		w.WriteHeader(400)
		w.Write([]byte("illegal method"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	// TODO: all query in q?
	q := strings.TrimSpace(req.FormValue("q"))
	if q == "" {
		w.WriteHeader(400)
		w.Write([]byte("empty query"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	err = ic.query_executor.Query(w, req)
	if err == nil {
		return
	}

	err = ic.CheckQuery(q)
	if err != nil {
		w.WriteHeader(400)
		w.Write([]byte("query forbidden"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	key, err := GetMeasurementFromInfluxQL(q)
	if err != nil {
		log.Printf("can't get measurement: %s\n", q)
		w.WriteHeader(400)
		w.Write([]byte("can't get measurement"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	apis, ok := ic.GetBackends(key)
	if !ok {
		log.Printf("unknown measurement: %s,the query is %s\n", key, q)
		w.WriteHeader(400)
		w.Write([]byte("unknown measurement"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	// same zone first, other zone. pass non-active.
	// TODO: better way?

	for _, api := range apis {
		if api.GetZone() != ic.Zone {
			continue
		}
		if !api.IsActive() || api.IsWriteOnly() {
			continue
		}
		err = api.Query(w, req)
		if err == nil {
			return
		}
	}

	for _, api := range apis {
		if api.GetZone() == ic.Zone {
			continue
		}
		if !api.IsActive() {
			continue
		}
		err = api.Query(w, req)
		if err == nil {
			return
		}
	}

	w.WriteHeader(400)
	w.Write([]byte("query error"))
	atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
	return
}

函数的错误处理

可以看到如下函数声明是有返回值err的，但是在实现中并没有发现把下层err信息往上层传。

func (ic *InfluxCluster) LoadConfig() (err error) {
	backends, bas, err := ic.loadBackends()
	if err != nil {
		return
	}

	m2bs, err := ic.loadMeasurements(backends)
	if err != nil {
		return
	}

	ic.lock.Lock()
	orig_backends := ic.backends
	ic.backends = backends
	ic.bas = bas
	ic.m2bs = m2bs
	ic.lock.Unlock()

	for name, bs := range orig_backends {
		err = bs.Close()
		if err != nil {
			log.Printf("fail in close backend %s", name)
		}
	}
	return
}

而且看到一处调用方式如下：

func main() {
	initLog()

...
	ic.LoadConfig()
...
}

既然调用方式并没有取返回值，为什么函数要这样声明？

针对查询语句合法性检查位置的设计

因为我们的大多数情况是不分片的，可以简单转发这样的处理路径。但是这个情况下对查询语句的错误检查就不在proxy上做了，而是等后端服务器发生错误时候再来错误处理。

优点：优良的分派查询的性能。
缺点：查询语句的正确由使用者保证，错误在后端服务器上发生后在处理（待reivew）。

关于regex的使用的性能猜想

一个关于regex导致性能损失的猜想有待profiling来验证。

cluster.go:

func (ic *InfluxCluster) CheckQuery(q string) (err error) {
	ic.lock.RLock()
	defer ic.lock.RUnlock()

	if len(ic.ForbiddenQuery) != 0 {
		for _, fq := range ic.ForbiddenQuery { // 放大系数
			if fq.MatchString(q) {  // 热点
				return ErrQueryForbidden
			}
		}
	}

	if len(ic.ObligatedQuery) != 0 {
		for _, pq := range ic.ObligatedQuery { // 同上
			if pq.MatchString(q) {
				return
			}
		}
		return ErrQueryForbidden
	}

	return
}

按照一个业务逻辑，一个sql进来首先要判断是否能执行（单元测试drop之类被//了），而这个函数的实现是使用了regex方法，这个方法按照[^1:]分析是会损失性能的，尤其在业务量比较大的时候会被放大，猜想的验证需要完善benchmark测试。

reference :

1: Go代码调优利器-火焰图

关于函数命名问题

$ grep -rin LoadJson .
Binary file ./bin/influx-proxy matches
./service/main.go:45:func LoadJson(configfile string, cfg interface{}) (err error) {
./service/main.go:81:   err = LoadJson(ConfigFile, &cfg)

如上所示LoadJson函数仅在初始化过程中使用，不建议使用go的导出函数命名风格。

单元测试问题

为什么在tcp连接失败的情况下，单元测试也是通过的？

2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 new measurement: load.cpu
2017/08/15 14:05:18 new measurement: test
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /write?db=test1
2017/08/15 14:05:18 handler any get url: /write?db=write_only
2017/08/15 14:05:18 handler any get url: /write?db=test2
2017/08/15 14:05:19 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:20 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 unknown measurement: test,the query is SELECT cpu_load from test WHERE time > now() - 1m
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=+select+cpu_load+from+cpu+WHERE+time+%3E+now%28%29+-+1m
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=+select+cpu_load+from+%22cpu.load%22+WHERE+time+%3E+now%28%29+-+1m
2017/08/15 14:05:20 unknown measurement: load.cpu,the query is select cpu_load from "load.cpu" WHERE time > now() - 1m
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=SHOW+tag+keys+from+%22cpu%22+
2017/08/15 14:05:20 write meta: 8
2017/08/15 14:05:20 write meta: 0
2017/08/15 14:05:20 http backend write test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /write?db=test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /write?db=test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /query?db=test
2017/08/15 14:05:20 handler any get url: /ping
PASS
ok      github.com/eleme/influx-proxy/backend   5.030s

是否需要简化单元测试去除网络操作性的部分，让单元测试仅仅限于描述接口和测试函数性能？
是否需要端到端测试来补充单元测试在proxy与influxdb之前操作描述的不足？

RFC: 关于rc1版本的需求

写逻辑：

根据measurement获取对应的hash ring
根据measurement获取对应的后端分组
根据measurement+sortedkey 在hash ring找到后端分组索引
根据索引找到对应后端，循环写入数据

查询逻辑：

根据measurement找到后端分组
如果后端分组只有一个，说明只有一个分片，选择第一个可用后端发送请求，然后返回数据，请求结束。如果请求失败，使用第二个可用后端。如果所有后端请求都失败，返回失败请求。
如果后端分组有多个，说明有多个分片，将请求复制到多个分片。请求和上面类似。如果有一个分片请求失败，认为请求失败。从分片获取数据后，在proxy上进行二次计算。

优先考虑实现如下函数:

count,
mean,sum,
min,max

项目的目标：

项目的目标是high performance 和reliable。

dead code summary

func (hs *HttpService) HandlerPing(w http.ResponseWriter, req *http.Request) {
	defer req.Body.Close()
	version, err := hs.ic.Ping()
	if err != nil {
		panic("WTF")
		return // this line is unreachable code !
	}
...
}

关于类型的使用

在阅读cluster_test.go中，发现如下代码片段：

func CreateTestInfluxCluster() (ic *InfluxCluster, err error) {
	redisConfig := &RedisConfigSource{}
...
	cfg.WriteOnly = 1
...
	return
}

中的cfg.WriteOnly = 1 是不是bool类型更合理一些？