GithubHelp home page GithubHelp logo

doc's People

Contributors

laiwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doc's Issues

模版中重复的监控项目

有个模版A,套用在机器m1、m2、m3

有个模版B,继承模版A,套用在机器m3

模版A、模版B中同样的监控项目,对m3的影响是怎样的。 这边认为同样的监控项目,模版B会覆盖模版A

是吗?

又如果模版B没有继承模版A。那么重复的监控项目对m3的影响又是怎样的?

ops-updater的管理问题

2015/05/28 23:05:26 heartbeat.go:64: { [<Name:falcon-agent, Version:3.1.4, Tarball:http://127.0.0.1:2000/falcon, Md5:8c2662002401eb724e6aa177b31bb2d6, Cmd:start>]}
2015/05/28 23:05:26 start.go:89: wget -q 8c2662002401eb724e6aa177b31bb2d6/falcon-agent-3.1.4.tar.gz.md5 -O falcon-agent-3.1.4.tar.gz.md5 fail exit status 4

如果meta端配置了md5,那么updater的日志中会有如上的东东,wget部分。
updater无法正常更新agent。

另外,如果不更新agent的包名,是否无法正常更新agent?

Agent的ignore配置无效

配置中可以进行true or false,但是配置为false时依然不会收集数据,只有删除时才会收集数据

启动dashboard时报错

Arbiter(self).run()

File "/home/work/open-falcon/dashboard/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 203, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/home/work/open-falcon/dashboard/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 298, in halt
self.stop()
File "/home/work/open-falcon/dashboard/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 341, in stop
self.reap_workers()
File "/home/work/open-falcon/dashboard/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 452, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>

插件的标准输出内容的格式问题

[root@open-falcon-hbs01:/path/to/plugins/plugin/sys/ntp]#./600_ntp.py
[{"endpoint": "open-falcon-hbs01.bj", "tags": "", "timestamp": 1431349763, "metric": "sys.ntp.offset", "value": 0.73699999999999999, "counterType": "GAUGE", "step": 600}]

问题1:如上的输出结果,最终使用了[...],这个中括号是否必须?
问题2:tags这个key对应的value,该如何填写?

监控流程

比如我想监控nginx,我的流程应该怎么样,expressions中怎样写,我的监控项怎么表述

关于报警发送的若干问题

第一点:
1432691504.713948 [0 10.3.5.26:53515] "RPOP" "/queue/user/sms"
1432691504.714129 [0 10.3.5.26:53520] "RPOP" "/queue/user/mail"
为什么只看到alarm 在往redis中rporp数据,参照wiki是应该alarm往redis中push,然后sender进行pop么

第二点:
在alarm中的
"queue": {
"sms": "/sms",
"mail": "/mail"
},
"userSmsQueue": "/queue/user/sms",
"userMailQueue": "/queue/user/mail"
这4个队列对应是做什么用的
以及
"api": {
"portal": "http://falcon.example.com",
"uic": "http://uic.example.com",
"links": "http://link.example.com"
}
这些API地址只在什么时候去触发,用途是什么?

求助 portal 5050端口打不开,OS:ubuntu1410

OS:ubuntu1410

portal是用于配置报警策略的地方
/open-falcon/portal# ./control tail
==> var/app.log <==
cursor = self.execute(_a, *_kw)
File "/home/work/open-falcon/portal/frame/store.py", line 43, in execute
curso
Traceback (most recent call last):
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
self.handle_request(listener, req, client, addr)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1836, in call
return self.wsgi_app(environ, start_response)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/home/work/open-falcon/portal/env/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/work/open-falcon/portal/web/controller/home.py", line 17, in home_get
vs, total = HostGroup.query(page, limit, query, me)
File "/home/work/open-falcon/portal/web/model/host_group.py", line 39, in query
vs = cls.select_vs(where=where, params=params, page=page, limit=limit, order='grp_name')
File "/home/work/open-falcon/portal/web/model/bean.py", line 90, in select_vs
rows = cls.select(where=where, params=params, order=order, limit=limit, page=page, offset=offset)
File "/home/work/open-falcon/portal/web/model/bean.py", line 86, in select
return db.query_all(sql, params)
File "/home/work/open-falcon/portal/frame/store.py", line 78, in query_all
cursor = self.execute(_a, *_kw)
File "/home/work/open-falcon/portal/frame/store.py", line 43, in execute
cursor = self.get_conn().cursor()
AttributeError: 'NoneType' object has no attribute 'cursor'r = self.get_conn().cursor()

you must use ./env/bin/python2

最小进入集?!

现象

  • 哗, 又一家自主研发 云运维平台 的小伙伴了
  • 但是,以往的 BAT 以及 Yahoo! 传说中的平台都是各种广告,神奇的截屏
  • 可是,从来没有听说在公司以外有人用起来过

分析

  • DevOps 平台,包含了太多硬嵌入的公司业务
  • 以至根本无法简单的从平台中抽离出来
  • 进而导致平台无法安装配置到其它业务场景中工作

建议

  • 给出如何在 MAC/Win/Linux 平台上安装/调试/使用 的详细说明
  • 就主机 <10 台的小微团队给出一个最小可用集
  • 要知道, Google 也是从 2人团队发展起来的
  • o-falcon 能进入越多的小微创业团队,意味着越多可能成为 Google 级别公司的核心运维平台 ;-)

judge是在收到transfer数据作报警判断,那么怎么配置agent.alive的报警?

judge先从hbs获取所有策略列表,静等Transfer的数据转发。 每收到一条Transfer转发过来的数据,立即找到这条数据关联的Strategy、Expression,然后做阈值判断。

alive类型的如果agent存在,会发送alive,如果agent不存在。应该judge就收不到alive数据,这时候judge怎么对这个类型做报警?

安装Dashboard卡壳

按文档安装Dashboard后,运行报错,望指导,多谢

 ./control start
I require gunicorn but it's not installed.  Aborting.

业务采集的数据推荐多久post到agent

业务自己采集的数据,是采集一条就post到本机的agent,还是积攒一定数量再post。
我看文档中,plugin推荐是攒500条发送,业务自己采集的数据也推荐积攒一定数量再post么?

寻求帮助 add host 找不到机器。 OS:ubuntu1410

1-------------
/home/work/open-falcon/agent# vim cfg.json

{
"debug": true,
"hostname": "",
"ip": "",
"plugin": {
"enabled": false,
"dir": "./plugin",
"git": "https://coding.net/ulricqin/plugin.git",
"logs": "./logs"
},
"heartbeat": {
"enabled": true,
"addr": "",
"interval": 60,
"timeout": 1000
},
"transfer": {
"enabled": true,
"addr": "127.0.0.1:8433",
"interval": 60,
"timeout": 1000
},
"http": {
"enabled": true,
"listen": ":1988"
},
"collector": {
"ifacePrefix": ["eth", "em"]

把heartbeat那项enabled设置为true, -->已修改,并重启./control restart
并配置上hbs的rpc地址 <--麻烦问下这个配置是在哪里配置。
2-------------
drwxr-xr-x 2 root root 4096 5月 22 14:05 var/
root@ub14IP239monitorMI:~/open-falcon/hbs# vim cfg.json

{
"debug": true,
"database": "falcon_portal:abc1234@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true",
"hosts": "",
"maxIdle": 100,
"listen": ":6030",
"trustable": [""],
"http": {
"enabled": true,
"listen": "0.0.0.0:6031"
}
}


"database": "falcon_portal:abc1234@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true", 是用这个。
还是用这个。
"database": "root:abc1234@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true",

mysql -uroot -pabc1234 -h127.0.0.1 ,都能正常连接mysql

mysql -ufalcon_portal -pabc1234 -h127.0.0.1 ,都能正常连接mysql

~agent就可以和hbs心跳了 ,这个有命令检查下吗,不知是是在哪配置错了,麻烦指点一,二,谢谢

dashboard中查询不到数据

查看graph配置:
cat graph_backends.txt
"graph-00 127.0.0.1:6070"

query运行日志:
root@zabbix-master:/home/work/open-falcon/query# ./control tail
2015/05/22 14:42:31 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:45 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:45 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:48 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:48 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:51 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:43:51 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:44:02 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:44:02 [E] query one fail: unknown port tcp/6070" [graph.go:42]
2015/05/22 14:44:02 [E] query one fail: unknown port tcp/6070" [graph.go:42]

查看graph监听端口:
netstat -ntlp | grep 6070
tcp6 0 0 :::6070 :::* LISTEN 30171/falcon-graph

查看graph运行日志:
../graph/control tail
2015/05/22 14:03:46 cfg.go:80: read config file: cfg.json successfully
2015/05/22 14:03:46 db.go:27: g.InitDB, ok
2015/05/22 14:03:46 index.go:14: index:Start, ok
2015/05/22 14:03:46 main.go:23: 30171 register signal notify
2015/05/22 14:03:46 http.go:87: http listening 0.0.0.0:6071
2015/05/22 14:03:46 rpc.go:53: rpc listening 0.0.0.0:6070

存储后端

hi,在你们的文章中看到“transfer目前支持的业务后端,有三种,judge、graph、opentsdb。judge是我们开发的高性能告警判定组件,graph是我们开发的高性能数据存储、归档、查询组件,opentsdb是开源的时间序列数据存储服务”

我想请教下哪里可以看到你们的 graph 存储引擎的更多信息?另外,你们的时间粒度是固定的吗(例如 whisper 可以自定义 retentions = 10s:6h,1min:6d,10min:1800d)?

想了解一下graph数据分片的问题

数据分片是基于主机,还是基于监控项做分片?是不是同一个endpoint的数据都存在同一个graph上。在动态增删graph和transfer的时候,分片的数据是如何处理的,是否自动loadblance?

alarm启动错误

2015/05/22 16:06:25 cfg.go:81: read config file: cfg.json successfully
2015/05/22 16:06:25 [config.go:285] [W] open /home/work/open-falcon/alarm/conf/app.conf: no such file or directory
2015/05/22 16:06:25 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command
2015/05/22 16:06:25 [app.go:103] [I] http server Running on 0.0.0.0:7070
2015/05/22 16:06:26 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command
2015/05/22 16:06:27 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command
2015/05/22 16:06:28 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command
2015/05/22 16:06:29 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command
2015/05/22 16:06:30 reader.go:52: get alarm event from redis fail: ERR wrong number of arguments for 'brpop' command

redis启动情况,telnet可以正常访问:
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 11312/redis-server

求助

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.