sapcc / ntp_exporter Goto Github PK

Prometheus exporter for NTP offset/stratum of a client

License: Apache License 2.0

Makefile 31.24% Go 62.70% Dockerfile 6.05%

ntp_exporter's Introduction

ntp_exporter

This is a Prometheus exporter that, when running on a node, checks the drift of that node's clock against a given NTP server or servers.

These are the metrics supported.

ntp_build_info
ntp_drift_seconds
ntp_stratum
ntp_rtt_seconds
ntp_reference_timestamp_seconds
ntp_root_delay_seconds
ntp_root_dispersion_seconds
ntp_root_distance_seconds
ntp_precision_seconds
ntp_leap
ntp_scrape_duration_seconds
ntp_server_reachable

As an alternative to the node-exporter's time module, this exporter does not require an NTP component on localhost that it can talk to. We only look at the system clock and talk to the configured NTP server(s).

Installation

Compile make && make install or docker build. The binary can also be installed with go get:

go install github.com/sapcc/ntp_exporter@latest

We also publish pre-built images on Docker Hub as sapcc/ntp-exporter:

docker pull ghcr.io/sapcc/ntp_exporter:v2.2.0

Usage

Common command-line options:

-ntp.source string
   source of information about ntp server (cli / http). (default "cli")
-version
   Print version information.
-web.listen-address string
   Address on which to expose metrics and web interface. (default ":9559")
-web.telemetry-path string
   Path under which to expose metrics. (default "/metrics")

Mode 1: Fixed NTP server

By default, or when the option -ntp.source cli is specified, the NTP server and connection options is defined by command-line options:

-ntp.measurement-duration duration
   Duration of measurements in case of high (>10ms) drift. (default 30s)
-ntp.high-drift duration
   High drift threshold. (default 10ms)
-ntp.protocol-version int
   NTP protocol version to use. (default 4)
-ntp.server string
   NTP server to use (required).

Command-line usage example:

ntp_exporter -ntp.server ntp.example.com -web.telemetry-path "/probe" -ntp.measurement-duration "5s" -ntp.high-drift "50ms"

Mode 2: Variable NTP server

When the option -ntp.source http is specified, the NTP server and connection options are obtained from the query parameters on each GET /metrics HTTP request:

target: NTP server to use
protocol: NTP protocol version (2, 3 or 4)
duration: duration of measurements in case of high drift
high-drift: High drift threshold to trigger multiple probing

For example:

$ curl 'http://localhost:9559/metrics?target=ntp.example.com&protocol=4&duration=10s&high-drift=100ms'

Frequently asked questions (FAQ)

Is there a metric for checking that the exporter is working?

Several people have suggested adding a metric like ntp_up that's always 1, so that people can alert on absent(ntp_up) or something like that. This is not necessary. Prometheus already generates such a metric by itself during scraping. A suitable alert expression could look like

up{job="ntp_exporter",instance="example.com:9559"} != 1 or absent(up{job="ntp_exporter",instance="example.com:9559"})

but the concrete labels will vary depending on your setup and scrape configuration.

ntp_exporter's People

Contributors

Stargazers

Watchers

Forkers

mitraillet hightoxicity dhalturin pygmalios tibkiss miking6 timopek backbonecabal enrikitus sysdiglabs lindhor zhuziyuan sorenisanerd rafouf69

ntp_exporter's Issues

Make release with compiled binary

It would be useful if you add a compiled binary to each release. Most of Prometheus exporters projects do it(https://github.com/prometheus/node_exporter/releases). I think it would help to simplify installation to a lot of people.

At least linux-amd64 and linux-386 architectures would be necessary

Add support for multiple timehosts

Hi there, could you please explain how this exporter differs from node_exproter with enabled ntp collector?

Missing metrics

I wonder if you would consider adding additional metrics to closer match what the node_exporter provide. If I understand your codebase correctly you already get additional values back from the ntp server part of the Query response. I.e. these values (and node_exporters corresponding metric in parentheses)

RTT (node_ntp_rtt)
ReferenceTime (node_ntp_reference_timestamp_seconds)
RootDelay (node_ntp_root_delay)
RootDispersion (node_ntp_root_dispersion)
Leap (node_ntp_leap)

You might also want to add

Precision
KissCode (as a label)

I also would like to say I really like your approach of being able to remotely "blackbox like" poll ntp servers. That is very useful and a great feature compared to node_exporter. It is also a big benefit that ntp_exporter runs on different OS:es compared to node_exporter only on Unix-like ones.

Couldn't get NTP drift

I have all UDP ports open and my docker run command is as follows:

docker run -d -p 9102:9100
--name ntp-exporter
--restart unless-stopped
ntp-exporter:latest
-ntp.server=0.amazon.pool.ntp.org

No matter what I put as the -ntp.server I get constant timeouts:

8/3/2017 3:11:33 PMINFO[0000] Starting ntp_exporter v1.0-7-gf28483e source="main.go:58"
8/3/2017 3:11:33 PMINFO[0000] Listening on :9100 source="main.go:74"
8/3/2017 3:35:22 PMERRO[1429] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:52130->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:35:52 PMERRO[1459] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:35646->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:36:07 PMERRO[1474] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:51390->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:36:37 PMERRO[1504] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:41850->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:37:07 PMERRO[1534] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:55438->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:37:22 PMERRO[1549] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:51386->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:37:52 PMERRO[1579] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:36797->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:38:37 PMERRO[1624] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:40700->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:38:52 PMERRO[1639] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:50951->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:39:22 PMERRO[1669] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:54118->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:39:37 PMERRO[1684] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:32964->167.160.84.183:123: i/o timeout source="collector.go:73"
8/3/2017 3:39:52 PMERRO[1699] couldn't get NTP drift: couldn't get NTP drift: read udp 10.42.0.73:39937->167.160.84.183:123: i/o timeout source="collector.go:73"

Am I doing something wrong? I really want this to work.

Metric when a NTP server is unreachable

A NTP server can be unreachable by the NTP exporter for various reasons. Currently, if that happens the NTP exporter return an empty response.

Would that be revelant to add a new metric whose value would be 1 when the NTP exporter is reachable and 0 when it's unreachable?

[Feature] Select active ntp peer automatically

Context: If I am not mistaken, the -ntp.server option must be passed to the exporter to generate the metrics based on that NTP server. The problem arises when a server has more than one NTP peer, and we cannot know in advance which one is the active one.

Question: Is it possible to add to the exporter the ability to detect inside the server which one is the active peer and generate the metrics based on that server?

false alarms for high clock drift

The clock drift measurement works reasonably well, but some data points are so far off from the preceding and following data points that a measurement error is very likely (see screenshot below). We get a small number of false alarms in our prod systems as a result of this.

Idea for a solution: If the clock drift is unusually big (e.g. > 10 ms), take multiple measurements and submit the median value.

The meaning of indicators：ntp_drift_seconds

Difference between system time and NTP time

May I ask what the system time and NTP time represent

Comparion/contrast with Node Exporter's NTP collector?

Hello:

Good day to you.

I don't know much about NTP, and I'm interested in monitoring my servers' NTP clients and my servers' time accuracy (relative to consensus/"official" time).

There are two (more?) options for a Prometheus-monitored system: yours, and Node Exporter's (https://github.com/prometheus/node_exporter/blob/master/docs/TIME.md).

Why and when might someone choose one option over the other?

Thank You,
James

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Vulnerabilities

Renovate has not found any CVEs on osv.dev.

Detected dependencies

github-actions

.github/workflows/checks.yaml

actions/checkout v4

actions/setup-go v5

golang/govulncheck-action v1

reviewdog/action-misspell v1

.github/workflows/ci.yaml

actions/checkout v4

actions/setup-go v5

golangci/golangci-lint-action v6

actions/checkout v4

actions/setup-go v5

.github/workflows/codeql.yaml

actions/checkout v4

actions/setup-go v5

github/codeql-action v3

github/codeql-action v3

github/codeql-action v3

.github/workflows/container-registry-ghcr.yaml

actions/checkout v4

docker/login-action v3

docker/metadata-action v5

docker/setup-qemu-action v3

docker/setup-buildx-action v3

docker/build-push-action v5

.github/workflows/goreleaser.yaml

actions/checkout v4

actions/setup-go v5

goreleaser/goreleaser-action v5

gomod

go.mod

go 1.22

github.com/beevik/ntp v1.4.2

github.com/prometheus/client_golang v1.19.1

github.com/sapcc/go-api-declarations v1.11.2

github.com/sapcc/go-bits v0.0.0-20240516084938-1c041b7a84ce@1c041b7a84ce

go.uber.org/automaxprocs v1.5.3

Check this box to trigger a request for Renovate to run again on this repository

Outdated dependencies with vulnerabilities

ntp_exporter has two libraries with vulnerabilities. crypto is affected by CVE-2020-29652 and protobuf is affected by CVE-2021-3121. Please, could you update them?

add `exporter_version` label

Misses two nice metrics:

ntp_version 1.1.1
ntp_up{server="127.0.0.1"} 1

Would be nice to have a track on whats installed where (I have a few hundreds of 'em) and also detect whether exporter is actually running. Right now on hosts with chrony it just returns nothing.