GithubHelp home page GithubHelp logo

influx-proxy's Introduction

InfluxDB Proxy

This project adds a basic high availability layer to InfluxDB.

NOTE: influx-proxy must be built with Go 1.5+, don't implement udp.

Why

We used InfluxDB Relay before, but it doesn't support some demands. We use grafana for visualizing time series data, so we need add datasource for grafana. We need change the datasource config when influxdb is down. We need transfer data across idc, but Relay doesn't support gzip. It's inconvenient to analyse data with connecting different influxdb. Therefore, we made InfluxDB Proxy.

Features

  • Support gzip.
  • Support query.
  • Filter some dangerous influxql.
  • Transparent for client, like cluster for client.
  • Cache data to file when write failed, then rewrite.

Requirements

  • Redis-server
  • Python >= 2.7

Usage

$ # install redis-server
$ yum install redis
$ # start redis-server on 6379 port
$ redis-server --port 6379 &
$ # Install influxdb-proxy to your $GOPATH/bin
$ go get -u github.com/eleme/influx-proxy
$ # Edit config.py and execute it
$ python config.py
$ # Start influx-proxy!
$ $GOPATH/bin/influxdb-proxy -redis localhost:6379

Configuration

Example configuration file is at config.py. We use config.py to genrate config to redis.

Description

The architecture is fairly simple, one InfluxDB Proxy process and two or more InfluxDB processes. The Proxy should point HTTP requests with measurements to the two InfluxDB servers.

The setup should look like this:

        ┌─────────────────┐
        │writes & queries │
        └─────────────────┘
                 │
                 ▼
         ┌───────────────┐
         │               │
         │InfluxDB Proxy │
         |  (only http)  |
         │               │         
         └───────────────┘       
                 │
                 ▼
        ┌─────────────────┐
        │   measurements  │
        └─────────────────┘
          |              |       
        ┌─┼──────────────┘       
        │ └──────────────┐       
        ▼                ▼       
  ┌──────────┐      ┌──────────┐  
  │          │      │          │  
  │ InfluxDB │      │ InfluxDB │
  │          │      │          │
  └──────────┘      └──────────┘

measurements match principle:

  • Exact match first. For instance, we use cpu.load for measurement's name. The KEYMAPS has cpu and cpu.load keys. It will use the cpu.load corresponding backends.

  • Then Prefix match. For instance, we use cpu.load for measurement's name. The KEYMAPS only has cpu key. It will use the cpu corresponding backends.

Query Commands

Unsupported commands

The following commands are forbid.

  • DELETE
  • DROP
  • GRANT
  • REVOKE

Supported commands

Only support match the following commands.

  • .*where.*time
  • show.*from
  • show.*measurements

License

MIT.

influx-proxy's People

Contributors

moooofly avatar shell909090 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

influx-proxy's Issues

💡Query函数实现的想法

根据阅读代码,可以理解下面函数是实现查询的核心入口,是否可以考虑每goroutine /每查询实例的设计以此来提高性能,各位怎么看?

cluster.go

func (ic *InfluxCluster) Query(w http.ResponseWriter, req *http.Request) (err error) {
	atomic.AddInt64(&ic.stats.QueryRequests, 1)
	defer func(start time.Time) {
		atomic.AddInt64(&ic.stats.QueryRequestDuration, time.Since(start).Nanoseconds())
	}(time.Now())

	switch req.Method {
	case "GET", "POST":
	default:
		w.WriteHeader(400)
		w.Write([]byte("illegal method"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	// TODO: all query in q?
	q := strings.TrimSpace(req.FormValue("q"))
	if q == "" {
		w.WriteHeader(400)
		w.Write([]byte("empty query"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	err = ic.query_executor.Query(w, req)
	if err == nil {
		return
	}

	err = ic.CheckQuery(q)
	if err != nil {
		w.WriteHeader(400)
		w.Write([]byte("query forbidden"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	key, err := GetMeasurementFromInfluxQL(q)
	if err != nil {
		log.Printf("can't get measurement: %s\n", q)
		w.WriteHeader(400)
		w.Write([]byte("can't get measurement"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	apis, ok := ic.GetBackends(key)
	if !ok {
		log.Printf("unknown measurement: %s,the query is %s\n", key, q)
		w.WriteHeader(400)
		w.Write([]byte("unknown measurement"))
		atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
		return
	}

	// same zone first, other zone. pass non-active.
	// TODO: better way?

	for _, api := range apis {
		if api.GetZone() != ic.Zone {
			continue
		}
		if !api.IsActive() || api.IsWriteOnly() {
			continue
		}
		err = api.Query(w, req)
		if err == nil {
			return
		}
	}

	for _, api := range apis {
		if api.GetZone() == ic.Zone {
			continue
		}
		if !api.IsActive() {
			continue
		}
		err = api.Query(w, req)
		if err == nil {
			return
		}
	}

	w.WriteHeader(400)
	w.Write([]byte("query error"))
	atomic.AddInt64(&ic.stats.QueryRequestsFail, 1)
	return
}

函数的错误处理

可以看到如下函数声明是有返回值err的,但是在实现中并没有发现把下层err信息往上层传。

func (ic *InfluxCluster) LoadConfig() (err error) {
	backends, bas, err := ic.loadBackends()
	if err != nil {
		return
	}

	m2bs, err := ic.loadMeasurements(backends)
	if err != nil {
		return
	}

	ic.lock.Lock()
	orig_backends := ic.backends
	ic.backends = backends
	ic.bas = bas
	ic.m2bs = m2bs
	ic.lock.Unlock()

	for name, bs := range orig_backends {
		err = bs.Close()
		if err != nil {
			log.Printf("fail in close backend %s", name)
		}
	}
	return
}

而且看到一处调用方式如下:

func main() {
	initLog()

...
	ic.LoadConfig()
...
}

既然调用方式并没有取返回值,为什么函数要这样声明?

针对查询语句合法性检查位置的设计

因为我们的大多数情况是不分片的,可以简单转发这样的处理路径。但是这个情况下对查询语句的错误检查就不在proxy上做了,而是等后端服务器发生错误时候再来错误处理。

优点:优良的分派查询的性能。
缺点:查询语句的正确由使用者保证,错误在后端服务器上发生后在处理(待reivew)。

关于regex的使用的性能猜想

一个关于regex导致性能损失的猜想有待profiling来验证。

cluster.go:

func (ic *InfluxCluster) CheckQuery(q string) (err error) {
	ic.lock.RLock()
	defer ic.lock.RUnlock()

	if len(ic.ForbiddenQuery) != 0 {
		for _, fq := range ic.ForbiddenQuery { // 放大系数
			if fq.MatchString(q) {  // 热点
				return ErrQueryForbidden
			}
		}
	}

	if len(ic.ObligatedQuery) != 0 {
		for _, pq := range ic.ObligatedQuery { // 同上
			if pq.MatchString(q) {
				return
			}
		}
		return ErrQueryForbidden
	}

	return
}

按照一个业务逻辑,一个sql进来首先要判断是否能执行(单元测试drop之类被//了),而这个函数的实现是使用了regex方法,这个方法按照[^1:]分析是会损失性能的,尤其在业务量比较大的时候会被放大,猜想的验证需要完善benchmark测试。

reference :

1: Go代码调优利器-火焰图

关于函数命名问题

$ grep -rin LoadJson .
Binary file ./bin/influx-proxy matches
./service/main.go:45:func LoadJson(configfile string, cfg interface{}) (err error) {
./service/main.go:81:   err = LoadJson(ConfigFile, &cfg)

如上所示LoadJson函数仅在初始化过程中使用,不建议使用go的导出函数命名风格。

单元测试问题

  • 为什么在tcp连接失败的情况下,单元测试也是通过的?
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 read meta error: EOF
2017/08/15 14:05:18 new measurement: load.cpu
2017/08/15 14:05:18 new measurement: test
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /ping
2017/08/15 14:05:18 handler any get url: /write?db=test1
2017/08/15 14:05:18 handler any get url: /write?db=write_only
2017/08/15 14:05:18 handler any get url: /write?db=test2
2017/08/15 14:05:19 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 read meta error: EOF
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:19 handler any get url: /ping
2017/08/15 14:05:20 http error: Get http://127.0.0.1:53643/ping: dial tcp 127.0.0.1:53643: getsockopt: connection refused
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 read meta error: EOF
2017/08/15 14:05:20 unknown measurement: test,the query is SELECT cpu_load from test WHERE time > now() - 1m
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=+select+cpu_load+from+cpu+WHERE+time+%3E+now%28%29+-+1m
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=+select+cpu_load+from+%22cpu.load%22+WHERE+time+%3E+now%28%29+-+1m
2017/08/15 14:05:20 unknown measurement: load.cpu,the query is select cpu_load from "load.cpu" WHERE time > now() - 1m
2017/08/15 14:05:20 handler any get url: /query?db=test1&q=SHOW+tag+keys+from+%22cpu%22+
2017/08/15 14:05:20 write meta: 8
2017/08/15 14:05:20 write meta: 0
2017/08/15 14:05:20 http backend write test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /write?db=test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /write?db=test
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /ping
2017/08/15 14:05:20 handler any get url: /query?db=test
2017/08/15 14:05:20 handler any get url: /ping
PASS
ok      github.com/eleme/influx-proxy/backend   5.030s
  • 是否需要简化单元测试去除网络操作性的部分,让单元测试仅仅限于描述接口和测试函数性能?

  • 是否需要端到端测试来补充单元测试在proxy与influxdb之前操作描述的不足?

RFC: 关于rc1版本的需求

写逻辑:

根据measurement获取对应的hash ring
根据measurement获取对应的后端分组
根据measurement+sortedkey 在hash ring找到后端分组索引
根据索引找到对应后端,循环写入数据

查询逻辑:

根据measurement找到后端分组
如果后端分组只有一个,说明只有一个分片,选择第一个可用后端发送请求,然后返回数据,请求结束。如果请求失败,使用第二个可用后端。如果所有后端请求都失败,返回失败请求。
如果后端分组有多个,说明有多个分片,将请求复制到多个分片。请求和上面类似。如果有一个分片请求失败,认为请求失败。从分片获取数据后,在proxy上进行二次计算。

优先考虑实现如下函数:

count,
mean,sum,
min,max

项目的目标:

项目的目标是high performancereliable

dead code summary

func (hs *HttpService) HandlerPing(w http.ResponseWriter, req *http.Request) {
	defer req.Body.Close()
	version, err := hs.ic.Ping()
	if err != nil {
		panic("WTF")
		return // this line is unreachable code !
	}
...
}

关于类型的使用

在阅读cluster_test.go中,发现如下代码片段:

func CreateTestInfluxCluster() (ic *InfluxCluster, err error) {
	redisConfig := &RedisConfigSource{}
...
	cfg.WriteOnly = 1
...
	return
}

中的cfg.WriteOnly = 1 是不是bool类型更合理一些?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.