Comments (12)
Thank you Ruiwen, for the cardinality issue I have some comments on it:
This is very different from the existing implementation of pod_start_total_duration_seconds
. Waiting for @dashpole or others from sig-instrumentation to give some advice on the best way to record one-time per-pod metrics like this.
from kubernetes.
cc: @ruiwen-zhao for review.
from kubernetes.
/sig instrumentation
from kubernetes.
/sig node
from kubernetes.
Just to bring up previous discussion around metric cardinality, adding both pod name and node name to metric labels might be too much cardinality. We need to come up with a way to address this.
cc @SergeyKanzhelev @logicalhan @dashpole
from kubernetes.
Thank you Ruiwen, for the cardinality issue I have some comments on it:
-
Kubernetes already has some metrics from scheduler that include the pod name, namespace, and node name as metrics label:
-
kubernetes/pkg/kubelet/metrics/collectors/log_metrics.go
Lines 32 to 34 in 06b813f
Kubenetes has another metric kubelet_container_log_filesystem_used_bytes
that also use pod name and namespace as metrics labels.
- KSM also exports pod metrics in prometheus format and some metrics have pod name and namespace as metrics labels: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/pod-metrics.md.
from kubernetes.
from kubernetes.
@yujuhong Yes. The pod_start_total_duration_seconds
is a Distribution over all pods in the node, but newly added metric in this feature proposed to add a gauge metric that provides the exact startup time for a single pod to become ready.
@dashpole Hi David, could you please provide some insights here? Thanks!
from kubernetes.
A few questions to get the discussion started:
- Why a gauge instead of a histogram? A gauge is OK when looking at a single stream, or if you want to graph the average. But durations are often best represented by a histogram, as you can graph percentiles, or show a distribution. But if you graph a bunch of gauges, you will just see lots of lines on the graph, which isn't that helpful.
- Does this need to be in the kubelet? IIUC, this metric is produced by watching pods, and emitting a metric when it becomes ready for the first time. It doesn't need any special knowledge that the kubelet has, right?
- How long would the metric exist for? The startup will occur at the very beginning of the pod's life (in a single instant). Most pod-level metrics exist for the lifetime of the pod, but doing that would mean any aggregation would be less meaningful. Averaging the startup time of all currently-running pods in the cluster won't tell you if pod startup is currently slow. We could emit the metric for an arbitrary amount of time (e.g. 5 minutes), but that risks a scraper missing a pod entirely.
Bikeshedding: From the names, kubelet_pod_full_startup_duration_seconds
vs pod_start_total_duration_seconds
, I wouldn't know what the difference is. Would pod_ready_duration_seconds
or pod_first_ready_duration_seconds
be better?
from kubernetes.
@dashpole Hi David thank you for the comment.
Why a gauge instead of a histogram? A gauge is OK when looking at a single stream, or if you want to graph the average. But durations are often best represented by a histogram, as you can graph percentiles, or show a distribution. But if you graph a bunch of gauges, you will just see lots of lines on the graph, which isn't that helpful.
I want to use a gauge because I want to record the exact startup time of the pod, and it will allow users to know the exact time it takes for their pods to become ready to serve. With the pod-level metric, users could also group them together under the workload (e.g. deployment).
Does this need to be in the kubelet? IIUC, this metric is produced by watching pods, and emitting a metric when it becomes ready for the first time. It doesn't need any special knowledge that the kubelet has, right?
I use kubelet as kubelet will track the status of each pod in pod_startup_latency_tracker, and kubelet will watch for the status change of each pod. Also, kubelet is usually the first layer to process the pod status and it's a stable component (compared to other components in the cluster like kube-state-metrics which I usually see out-of-memory issue..) Do you have any recommendation for other places to add such metric?
How long would the metric exist for? The startup will occur at the very beginning of the pod's life (in a single instant). Most pod-level metrics exist for the lifetime of the pod, but doing that would mean any aggregation would be less meaningful. Averaging the startup time of all currently-running pods in the cluster won't tell you if pod startup is currently slow. We could emit the metric for an arbitrary amount of time (e.g. 5 minutes), but that risks a scraper missing a pod entirely.
For "Most pod-level metrics exist for the lifetime of the pod, but doing that would mean any aggregation would be less meaningful", can you provide more context here to help me understand? Thanks!
From the names, kubelet_pod_full_startup_duration_seconds vs pod_start_total_duration_seconds, I wouldn't know what the difference is. Would pod_ready_duration_seconds or pod_first_ready_duration_seconds be better?
pod_first_ready_duration_seconds
looks good to me!
from kubernetes.
Does this need to be in the kubelet? IIUC, this metric is produced by watching pods, and emitting a metric when it becomes ready for the first time. It doesn't need any special knowledge that the kubelet has, right?
Would something like kube-state-metrics more suitable for this?
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
from kubernetes.
/assign @JeffLuoo
/assign
/triage accepted
from kubernetes.
Related Issues (20)
- jwt: should we allow assertion of non-Kubernetes API server audiences? HOT 6
- [Flaking Test] ci-kubernetes-unit (Unexpected event resourceVersion 2 less than or equal to bookmark 2) HOT 10
- Free $100 Google Play Gift Card Codes 2024 - How to get Google Play Gift Cards for FREE! HOT 5
- Free Google Gift Card (Redeem Code) Generator 2024 HOT 4
- $100 Free Google Play Gift Card Codes Instantly 2024 HOT 4
- NEW*FREE Google Play Gift Cards (EASY) - $100 Redeem Google Play Gift Card Codes HOT 4
- Free Google Play Gift Card Codes - Free Redeem Codes 2024! HOT 4
- Bingo Blitz Free Credits 2024 - Freebies Promo Codes Rewards HOT 4
- β[[FReE!!]]βBingo Blitz Free Credits 2024 - Freebies Promo Codes Rewards HOT 4
- Bingo Blitz Free Credits - Get Bingo- Unlimited Free Coins & Power Ups Hack 2024 HOT 4
- Bingo Blitz Cheat Coins For Mobile π΅ Guide MOD Bingo Blitz π Money For Free !! HOT 4
- Bingo Blitz Free Credits - Get Bingo Blitz Promo Codes 2024 NOW! HOT 4
- kubectl get: when resource name is in form resource_type/name the error occurs HOT 4
- Provide support on Windows for CPUManagerPolicy HOT 2
- [Flaking Test] gce-ubuntu-master-containerd (connection reset by peer) HOT 16
- Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized HOT 4
- kubeadm init failed. HOT 11
- [Flaking Test] Kubernetes e2e suite.[It] [sig-api-machinery] Aggregator Should be able to support the 1.17 Sample API Server using the current Aggregator [Conformance] HOT 5
- The connection to the server Master_IP:6443 was refused - did you specify the right host or port? HOT 5
- Named ports in initContainer sidecars do not work with NetworkPolicies HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.