GithubHelp home page GithubHelp logo

yunlzheng / prometheus-book Goto Github PK

View Code? Open in Web Editor NEW
3.1K 111.0 986.0 106.36 MB

Prometheus操作指南

Home Page: https://yunlzheng.gitbook.io/prometheus-book/

prometheus book gitbook prometheus2 devops kubernetes alertmanager promql metrics grafana

prometheus-book's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prometheus-book's Issues

添加grafana图表时, 变量如何匹配?

up{instance="172.29.50.175:9256",region="ap-southeast-1"}
namedprocess_namegroup_num_procs{instance="172.29.50.175:9256",region="ap-southeast-1"}

我在同一服务器上安装了多个exporter,对应不同的端口。在grafana中我定义了$HOST变量为取出来instance IP, 在图表展示时,如果我只需要 9100端口,如 node_load1{instance="$HOST:9100"},我对生成的query url进行解码,发现是 query=node_load1{instance="1.1.1.1|2.2.2.2|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15"这种格式,实际上我是想匹配 query=node_load1{instance=~"1.1.1.1:9100|2.2.2.2:9100|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15", 请问这种情况表达式如何写呢?

Alertmanager 高可用章节给了错误的配置范例

假如我们有三个 AM 实例,分别运行在 A-8081 B-8082 C-8083 端口,那我们应该为 A 实例配置 peer 8082 和 8083,B实例配置 8081和 8083,C 实例配置 8081 和 8082 端口。

按照你的配置方法,很容易在 A 实例出现问题时无法 bootstrap 集群。

请教一个关于企业微信的alertmanager问题

我使用的版本如下:
alertmanager-0.18.0.linux-amd64
prometheus-2.11.1.linux-amd64
我的告警模板 wechat.tmpl 内容如下:

{{ define "wechat.tmpl" }}
{{ range .Alerts }}
========start==========
告警程序: prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
========end==========
{{ end }}
{{ end }}

日志无报错
level=debug ts=2019-07-17T12:53:33.495Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]
level=debug ts=2019-07-17T12:53:38.496Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]

也能收到企业微信告警,但是内容为空

image

尝试找了别的模板,也把alertmanager降了一个版本,还是一样,可能是哪里的问题呢?

饱和度问题

您好

想问下,google四大黄金指标中的饱和度,与USE方法论中的饱和度有什么区别?

从文章中看还是不太理解,谢谢

第5章 数据与可视化

第1节 使用Console Template
"读者已经对Prometheus已经有了一个相对完成的认识"中的"相对完成"应为"相对完整"
"但是确定也很明显"中的"确定"应为"缺点"

typo issue in one md file.

In prometheus-book/exporter/custom_app_support_prometheus.md

please change "Sring" to "Spring" in this title.

Hobby

是否能监控windows下tomcat服务?

只找到了jmx-exporter linux版本,在windows里使用的时候出现了一些错误。
1551248400 1

另外,如果只监控服务是否运行 端口是否监听可以使用blackbox_exporter(windows) ,有相关的安装文档吗?

如何计算当前指标与5分钟之前的差值?

我想监控某个指标的波动率,如计算CPU使用率当前使用率跟5分钟前的对比,这个差值大于多少或者百分之多少时报警,这个是有内置函数来实现,还是要用prometheus SQL来计算二者的差值?

docker安装prometheus有误

您好!
采用你的命令安装,一直提示如下错误:
[root@vm-ecs-104 ~]# docker run -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/prometheus/prometheus.yml\\\" to rootfs \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged\\\" at \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged/etc/prometheus/prometheus.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

查看了本机的/etc/prometheus目录,发现prometheus.yaml被创建成了目录,而不是文件。建议修改下。

另:推荐还是用Linux系统演示,发现好多都是MacOS的安装包。

Use 'apps/v1' to replace 'extensions/v1beta1' for example-app deployment

Since kubernetes latest version has reached 1.18, in new Version we need new apiVersion value

Use 'apps/v1' to replace 'extensions/v1beta1' for example-app deployment

path : use-operator-manage-prometheus.md

current:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: example-app

new:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app

邮件告警resolved中{{ $labels.value }}的值不是恢复后的值,这个怎么解决呢

告警邮件

[1] Firing
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障Source

恢复邮件

[1] Resolved
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障

附配置文件
rules:
groups:
- name: os-cpu
rules:
- alert: NodeCpuUsage
expr: ceil (100 - (avg(irate(node_cpu{mode='idle'}[5m])) by (instance) * 100)) > 90
for: 1m
labels:
type: "cpu"
annotations:
summary: "{{ $labels.instance }} CPU故障"
description: "CPU使用率大于90%,当前值: {{ $value}}%"
这个问题,困扰好久...

”Alertmanager高可用”章节搭建案例问题

感谢作者简洁清晰的解释了 gossip 协议如何实现了多个 alertmanager 实例对来自相同 prometheus 实例的报警进行去重~~
建议对 fullmesh 结构下 alertmanager 是如何对来自不同 prometheus 实例的相同报警进行去重的提及一下,即:"alertmanager 判定两个指标一致的前提是所有 label 完全一致" ;之所以增加这个描述是因为通常互为备份的两个 prometheus 会设置 external_labels 对 metrics 来源进行标记(尤其是在使用 remote write 时避免冲突),在这种场景下就需要配置alert_relabel_configs将不一致的 label 重置为一致再发送到 alertmanager

相关讨论以及解释见:
prometheus/alertmanager#1448
https://www.robustperception.io/high-availability-prometheus-alerting-and-notification

补充这部分内容可以有助于原理表述的完整性,仅供参考 :D

第3章告警处理的“屏蔽告警通知”表述有严重问题

抑制描述中:“当已经发送的告警通知匹配到target_match和target_match_re规则,当有新的告警规则如果满足source_match或者定义的匹配规则,并且已发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。“
应该刚好相反,source_match是用来匹配已经存在的告警的,而target_match是用来匹配新发的待抑制的告警的。
正确的表述是:规则开始启用后,已经存在/发送的告警通知匹配到source_match和source_match_re规则,当有新的告警满足source_match或者定义的匹配规则,同时已发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。

可参见Prometheus官方文档中的注释:

# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
  [ <labelname>: <labelvalue>, ... ]
target_match_re:
  [ <labelname>: <regex>, ... ]

# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
  [ <labelname>: <labelvalue>, ... ]
source_match_re:
  [ <labelname>: <regex>, ... ]

# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: [<labelname>, ... ] ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.