rfmoz / grafana-dashboards Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 437.0 681 KB

Grafana dashboards

License: Apache License 2.0

grafana-dashboards's People

Contributors

Stargazers

Watchers

Forkers

lborguetti dortegau sportsbitenews javarange ramo-karahasan-riechardt swesterveld xiaoruiguo eeddaann devops-dad krisanalfa dreampuf pluralsight selimb86 juanpaulo geekdave centosredhat xrage infinityhacks bananeweizen zhoffice wangchengguo kangkang721 superq kibab pwinik duongvyr minjung-ryu davistran86 hlibovytskyi roshan-singh-ext jarvis2294 kydryy devaraj-s f9n ming-ddtechcg endkarn denisimmoos isavcic vl41406 hamravesh bkmz desaintmartin mgarmash bale21 imetlenko a-alinichenko jkkitakita maoerniu moonape1226 thiagousa ctyo leojonathanoh junneyang andreluizf kolpator bast66 shinebayar-g wjwidener srinathdh dhalimi harishkrishna17 jellevdk wpjunior hilmiller cktse dmitriymurzin bchaplyk lzpsqzr danletski squidly igorbelitei izeye ilanni2460 limonfhm ridowidi nbari duynguyen1879 boriphuth sfinx26 ronitj kienlv oschistad michalazarovitz stroebitzer jobcespedes osterik bibingeorge1986 acotillard loveen giacomd yashbharatpe dragoangel yasminemb cromefire toxpenguin wrogeru oumarkona zxp86021 sheffercool subnova-etsy

grafana-dashboards's Issues

SWAP Used display not correct

Memory Basic, if display SWAP Used only，it's correct. But, if display all, SWAP Used equal RAM Total.

Thanks.

Incorrect graph for node_disk_io_now

In the panel "Disk Detail" is a graph "Disk IOs Current In Progress". The query it uses is:

irate(node_disk_io_now{instance=~"$node:$port",job=~"$job"}[5m])

However, node_disk_io_now is not a counter, it's a gauge:

# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge

The io_now value is field 9 documented here:

Field  9 -- # of I/Os currently in progress
    The only field that should go to zero. Incremented as requests are
    given to appropriate struct request_queue and decremented as they finish.

Therefore I believe the irate(...) wrapper needs to be removed. This is already an instantaneous snapshot of the outstanding I/O requests.

File system alerts in Grafana using Promethius

Hi,

I using the below query for my new alerts to monitor FS usage. However this query shows all the nodes that have grafana client running on them. I am trying to filter only postgres nodes. Can anyone help tweak the below query to achieve the desired results?

( 1 - (node_filesystem_free_bytes{device!'rootfs'} /
node_filesystem_size_bytes{device!'rootfs'})) * 100 * on(instance) group_left(nodename) (node_uname_info)

How to use with multiple scrape_configs?

My prometheus.yml looks like this:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
  - job_name: 'node'

    # Override the global default and scrape targets from this job every 10 seconds.
    scrape_interval: 10s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
        - targets: ['localhost:9090','cadvisor:8080','node-exporter:9100']

  - job_name: 'servers'
    static_configs:
        - targets: ['server1','server2']
          labels:
            group: 'production'

When I import the dashboard, I can only select the Prometheus node itself. How can I view server1 and server2 stats with this dashboard?

Encounter negative values of CPU Busy during upgrade/downtime of platform

Hello,

In our use case, we imported the node-exporter-full dashboard to monitor our platform.

As shown in the screenshot, we got negative values during the upgrade/downtime.

so we are considering switching to another expression to avoid the unexpected negative values.

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle",instance=~"${instanceIP}:${PORT_NODE_EXPORTER}"}[5m])) * 100)

Would do you consider it's an issue or a bug on your side?
And how would you suggest to solve the issue?

Thanks in advance!

Node Exporter Full 0.16 problem when node_exporter is on different ports

hi Ricardo, first of all - thank you for this excellent dashboard.

On my servers, some of the node_exporters are on port 9100 and some are on port 80.
I discovered that when I use the Host dropdown to select a host where the metrics are on port 80
that the dashboards are all empty (no data). I enabled the port dropdown with the following change:

19505c19505
<         "hide": 2,
---
>         "hide": 0,

When I do that, and select port 80 from the dropdown, then the dashboard charts get populated.
Is this an issue with the regex on line 19513 of the json file?
We basically need the port to track the host appropriately.

here is the relevant section of my prometheus.yml - you can scrape my URLs to test

  # node_exporter metrics from various servers
  - job_name: node
    scrape_interval: 60s
    static_configs:
      - targets:
        - localhost:9100

      - targets:
        - devops.fywss.com:80
        - thrash.fywss.com:80
        - home.fywss.com:80

Cheers
Steve

Port must be the same for multiple targets

I noticed that if the port on all targets is not the same, the the Dashboard picks only one of them as the port and displays only that one, presenting the targets with different ports as 'no data points'

My workaround was to have a consistent node-exporter port for all hosts.

图没有数据问题（即启用collector的方法）

我是在kubernets 上部署的node-exporter，请问一下启用 netstat vmstat ，是在镜像启动添加args 参数吗？我的写法为
- image: prom/node-exporter:latest
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
args:
- --collector.netstat.fields=(.) --collector.vmstat.fields=(.) --collector.interrupts

启动没有问题，但是图还是没有补全。请执教

Is it possible to get Top URLs by Response Code?

Hi,

For the HAProxy grafana dashboard, is it possible to build a graph for Top URL's (http_request) accessed and their response codes?
From the metrics gathered from Prometheus I can't seem to find the http_request being captured as part of stats. How can this be enabled?

count_scalar gone from prometheus 1.8.2

In prometheus 1.8.2's change log， it said that the function count_scalar is removed，so, some dashboards which use this function may cause an error. which function i can use to replace? maybe 'absent'?

error thrown is "grafana unknown function with name count_scalar" for one of the graphs.
solution is to replace count_scalar(...) with scalar(count(...))
credit: https://github.com/grafana/grafana-plugins/issues/45

no data on dashboards

This is my yml config, what is the problem?
scrape_configs:
- job_name: node
static_configs:
- targets: ['192.168.0.17:9100']
labels:
instance: 192.168.0.17:9100

node_exporter 0.16 has LOTS of different labels

the new version appends the unit to the metric, i.e. node_boot_time becomes node_boot_time_seconds.
therefore the entire dashboard is not working.
Hope you find an automated way to rename them ;)

Multi Series Error on Node Exporter Full

I am using Node Exporter Full on my Prometheus setup, everything was working fine, until today out of no reason, some graphs gave me Multi Series Error, but on newly added server, it is working fine.

Convert to using [$__rate_interval]

Grafana 7.2 (released 23 Sep 2020) introduces a new variable for prometheus queries, $__rate_interval - see doc link.

This is intended to get rid of the problems around irate() and rate() queries missing spikes where the graph interval skips over them.

To apply this on the node exporter full dashboard, you'd change every instance of

irate(....[5m])

rate(...[$__rate_interval])

The way it works: $__rate_interval is equal to the sum of the graph step (the time interval between horizontal data points) and the prometheus sampling interval (set as the sample rate in the data source definition).

Remember that rate() calculates the rate between the first and last data points contained within the window. So, say you are scraping node_exporter at 1 minute intervals. Then rate(...[6m]) contains 6 data points, and calculates the rate over the 5 minute period between the first and last point in the window.

Consider various different zoom levels for grafana for the intervals between data points on the X axis:

1 minute interval: you get rate(...[2m])
5 minute interval: you get rate(...[6m])
1 hour interval: you get rate(...[61m])

In each case, the rate correctly calculates the average over the time period between two data points. Spikes are never missed - although of course if you're averaging a spike over a longer time period then the peak shown will be lower.

The only downside I can see for doing this is that it will make the dashboard only usable with grafana 7.2 and later.

Haproxy Monitoring through Grafana

Hi,

I have imported haproxy json in grafana server and add prometheus datasource in it but that data source is unable to connect and plugin is not fetching data. Please look into attached screenshots.

Looking forward to hear from you.
Thank you
Regards,
rlinux57

Used RAM Memory counter is not working with node exporter 0.16+

Hi, the "Used RAM Memory counter" is not working with node exporter 0.16+.

It's just showing 0. I believe it's because of "node_memory_MemAvailable_bytes" that does not exist in the new version as far as im aware.

Titles on Gauges in Latest Version Broken

Hello,

I am having a weird issue with the titles on the gauges. A Grafana instance with an older version of Node Exporter doesn't seem to have this issue. There's a screenshot at the link below:

https://snipboard.io/Iao1qX.jpg

Let me know if any more information is needed.

Network unit measure on Node Exporter Full

Hello,

Network speed is measure on bits per second not bytes. The metrics collected by node_exporter are given in bytes, for example:
node_network_receive_bytes_total
This needs to be multiplied by 8 to get the real value in bits.
To be more specific, the "Network Traffic Basic" has the following query:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
This query shows the data as it's given by the node_exporter, in other words in bytes.

The query should be:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])*8

As a consequence of this, the "unit measure" to show the data in the left "Y" axis needs to be change to "bits/sec" under "data rate":

No values shown

I have a setup with node_exporter 0.15.2, Prometheus 2.1 and Grafana 4.6.3
I am not able to display any values.

When I open Prometheus, it is able to show values from node_exporter. Other dashboards work too.

Multi-value host does not seem to work

I can change the variable node to allow multi-value but it breaks the dashboard, no value shows any data

Axes and labelling for disk I/O graphs

(relates to #58)

Some of the disk I/O graphs would benefit from more accurate Y-axis labels, units, and/or titles. I have put the background info in this article:
https://brian-candler.medium.com/interpreting-prometheus-metrics-for-linux-disk-i-o-utilization-4db53dfedcfc

Ideally the stats would align with what "iostat" reports - I can prepare a PR if you like.

Issue with v16

I've just tried to use v16 on a new install (node_exporter 0.18.1) - all I get is the top 'filter bar' but no graphs etc.

Everything seems to work as expected with v15.

Let me know if there's any extra debugging info I can provide...

Parameter correction for a fuly working dashboard

For get a fully working dashboard, this is the right start commend. The documantation contains brackets instead of "

nohup ./node_exporter —collector.netstat.fields=“.” --collector.vmstat.fields=“.” --collector.interrupts &

Multiple Series Error

Hi, I'm having a bit of a problem. I've imported the dashboard but many of the graphs show error "Multiple Series Error" so not loading any data

I haven't changed anything, basic installed of prometheus, grafana and then imported your dashboard.

Support meaningful instance labels

When using meaningful instance labels there is only a name, and no port number, in the instance label. Unfortunately, the existing node_exporter dashboard does not work without a port.

I suggest using the instance label directly.

Also, given that job and node are not multi-select, these can use = rather than =~ for matching (more efficient, less likely to trip up on regexp metacharacters)

I've made these changes in 12486 which is a direct fork of 1860.

Grafana dashboard multiple series error

Hi, I'm using this dashboard 3894 and it works great. I'm running the exporter as sidecar on kubernetes. Problem is when kubernetes restarts deployment it gives random name and grafana dashboard is getting Multiple series error. Anyone know how can I fix this error?

[HAProxy] Get total bytes IN and OUT per day/week/month?

Is it possible to do so? If "yes", how?

Nothing loads on fresh install

On a fresh install, no metrics load and none of the dropdowns work.

I'm using the following versions:

node_exporter, version 0.18.1 (branch: , revision: )
prometheus, version 2.18.1 (branch: non-git, revision: non-git)
Grafana version 7.0.0

My prometheus.yml is very simple:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
    - targets: ["localhost:9090"]
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

I've confirmed there are no issues between Prometheus/Grafana/node_exporter that are readily apparent - I'm able to create charts in Grafana that are in Prometheus and from node_exporter.

Attaching a screenshot of what I see.

Displaying days instead of weeks for uptime

Is there any way to have the dashboard display value in days instead of weeks for uptime ?

Thanks

Works only on Debian servers and NOT Ubuntu

Hi,
I have installed node-exporter using apt install on all my servers - and I can ONLY get data from my Debian Servers (Debian 10) and NOT from my Ubuntu Servers (18.0.4) At all.
Any one have the same issue?
Please let me know how to make it work with Ubuntu Servers.

Thanks.

Grafana Dashborard compatible

Hi,

I did not find if your node_exporter_full dashboard is compatible with node_exporter 1.0.0 or high, can you tell me if it is compatible?

Job selector is not working

I click Job selector and all the jobs are displayed, then I click any of the jobs to get it selected, but the top one is always active in result. Any ideas?

node-exporter-full: templating variable

Hey there, a more stable label for templating off might be node_exporter_build_info. There is a version label on that as well which might be useful 🤷‍♂️

CPU metric computation

Hi,
Thanks for all your hard work! It's nice not to have to create all those panels from scratch!
I have a question about the CPU usage PromQL. I've noticed that on multi-cpu systems the CPU panels if the Y-axis is set to autoscale the CPU usage sums to N*100 where N is the number of CPUs.
Depending on a axis scaling to make the graph come out to 100% caused me to question the whole formula, so I hunted around the web and found this blog post: https://movio.co/blog/prometheus-lighting-the-way/ with a different formula that comes out to 100(%) even when autoscaled.
So for example for 'user' time, instead of:
sum by (instance)(irate(node_cpu_seconds_total{mode="user",instance="$node",job="$job"}[5m])) * 100
this:
avg(irate(node_cpu_seconds_total{mode='user',instance="$node",job="$job"}[5m])) * 100

Thanks again!
C.

haproxy dashboard - some connection metrics does not exist

The haproxy dashboard uses the haproxy_server_connections_total, haproxy_frontend_connections_total and haproxy_backend_connections_total metrics but this metrics seems not the exist.

When i query prometheus directly i get nothing. It also shows not up with the autocomplete feature so its not just that the the metric is 0.

All other metrics just work fine (even stuff like haproxy_server_connection_errors_total) and the dashboard is really useful :-)

I've also searched the metricsname in https://github.com/prometheus/haproxy_exporter sourcecode but found nothing.

So where are these metrics coming from?

Make resolution consistent (not mixture of 1/2 and 1/4)

In node-exporter-full, some graphs have resolution 1/4 (which really hides important detail) and some have resolution 1/2; sometimes even mixed on the same graph.

Personally I'd prefer 1/1, but I can understand people wanting a faster draw time with 1/2. However if that's the case, it would still be better to make it 1/2 consistently everywhere.

I can't quite work out what's going on in the JSON. Some items have a small step, and some huge:

$ grep '"step"' node-exporter-full.json  | sort | uniq -c
   2               "step": 2
   1               "step": 20
  82               "step": 240
 107               "step": 4
   4               "step": 480
   9               "step": 8
   1           "step": 1800
  14           "step": 240
  11           "step": 900

Changing datasource is not updating job

Hello,

I have setup this dashboard with 2 prometheus datasources.
When I switch from one datasource to another one, the jobs are not updated.

Thanks.

HAProxy Query Params Outdated

HI, thanks for your amazing Dashboard.
We use it for our HAProxy Installation. Unfortunately it seems as several metric names must have changed. Therefor we had to compensate it via following config:

    metric_relabel_configs:
    - source_labels: 
        - __name__
        - proxy
      target_label: frontend
      action: replace
      regex: (haproxy_frontend_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
        - proxy
      target_label: backend
      action: replace
      regex: (haproxy_backend_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
        - proxy
      target_label: backend
      action: replace
      regex: (haproxy_server_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_process_jobs
      replacement: haproxy_up
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_backend_status
      replacement: haproxy_backend_up
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_server_connection_attempts_total
      replacement: haproxy_server_connections_total
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_server_status
      replacement: haproxy_server_up
    - regex: proxy
      action: labeldrop

Without this workaround your Dashboard doesn't even find any host, backend and frontends.Main reason is, that HAProxys' built in prom exporter labels "frontend" and "backend" as "proxy".

Despite relabeling I wasn't to provide a replacement for:

haproxy_server_current_session_rate
haproxy_server_check_duration_milliseconds (-> now called "haproxy_server_check_duration_seconds")

thank you!

RAM used Gauge misleading formula in node-exporter-full

"RAM used" gauge improperly reports RAM allocation.
It can make you think applications could run out of memory, which is not the case.

My proposal for the formula is the following, which aligns with the way RAM used is calculated in the "Memory basic" Panel.

(node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"} - (node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"} + node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"})) / (node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"}) * 100

datasource named ${DS_LOCAL} was not found

I'm new to Grafana. when I import the file, I got following error.

filesystem fill up time

One reason why I still have the host stats dashboard is because it has this neat little table of "Filesystem Fill Up Time" which (tries to?) compute the time at which the filesystem will fill up.

I don't think it's working very well because the results are just off here. But it got me thinking about how this could be implemented and whether you'd be interested in adding this to the dashboard...

The hosts stats dashboard uses this formula:

(node_filesystem_size_bytes{job='node',instance='$instance'} - node_filesystem_free_bytes{job='node',instance='$instance'}) / deriv(node_filesystem_free_bytes{job='node',instance='$instance',fstype!='rootfs',mountpoint!~'/(run|var).*',mountpoint!=''}[3d]) > 0

This blog post suggests instead just using the derivative as a base:

(deriv(node_filesystem_free{device=~"/dev/sd.*",instance=~"$node:.*"}[4h]) > 0)

I would suggest using node_filesystem_avail_bytes in any case, as that is the user-visible metric that will detect actual failures in userspace...

I'm not very familiar with Prometheus formulas, so I'm not sure how it works. I suspect it just doesn't, because it gives me negative numbers here (they don't show up) or absurd estimates (293481462547366 year for a 99% full disk), etc.

Yet this could be an interesting addition.

no hosts match

I was able to load your dashboard after changing the datasource name but cannot get any hosts to appear.
I verified that all of my other dashboards work where the label="Host:" matches the name="host" which I also changed.
How can I debug this to get your dashboard to work?

Not working if machine has empty api port

This is my node uname on rancher with custom hostname

node_uname_info{domainname="(none)",endpoint="https",instance="c1",job="node-exporter",machine="x86_64",namespace="monitoring",nodename="c1",pod="node-exporter-l922f",release="5.3.0-19-generic",service="node-exporter",sysname="Linux",version="#20~18.04.2-Ubuntu SMP Tue Oct 22 18:09:07 UTC 2019"}
--

Your dashboard is expecting all nodenames to be host:port which means I can't view any data...

Wrong value for "SWAP Used" - node-exporter FreeBSD

Besides of replacing \"$node:$port\" with \"$node\" in my personal setup, I had to get rid of the *100 in this line.
https://github.com/rfrail3/grafana-dashboards/blob/master/prometheus/node-exporter-freebsd.json#L399

"SWAP Used" would show 265%, but IRL 2,6%:

$ freecolor -m
Physical  : [######################.............] 63%	(2451/3846)
Swap      : [##################################.] 97%	(751/768)

$ freebsd-version
11.3-RELEASE-p10

# HELP node_processes_max_processes Number of max PIDs limit
# TYPE node_processes_max_processes gauge
node_processes_max_processes 32768
# HELP node_processes_max_threads Limit of threads in the system
# TYPE node_processes_max_threads gauge
node_processes_max_threads 30441
# HELP node_processes_pids Number of PIDs
# TYPE node_processes_pids gauge
node_processes_pids 185
# HELP node_processes_state Number of processes in each state.
# TYPE node_processes_state gauge
node_processes_state{state="S"} 185
# HELP node_processes_threads Allocated threads in system
# TYPE node_processes_threads gauge
node_processes_threads 212

This will require to start the node_exporter with --collector.processes because this collector is disabled by default.

rfmoz / grafana-dashboards Goto Github PK

grafana-dashboards's People

Contributors

Stargazers

Watchers

Forkers

grafana-dashboards's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs