maksim-paskal / aks-node-termination-handler Goto Github PK
View Code? Open in Web Editor NEWGracefully handle Azure Virtual Machines shutdown within Kubernetes
License: Apache License 2.0
Gracefully handle Azure Virtual Machines shutdown within Kubernetes
License: Apache License 2.0
I'm mostly asking this to see if there is interest for this, I haven't dug deep enough to know how complex this might turn out. If there is interest I'd be happy to assist with development of this feature.
I think the requirements are pretty simple?
Any plans to add eviction event notification to Prometheus Server.
We are using aks-node-termination-handler
with args
-taint.node
-taint.effect=NoExecute
after azure spot termination event, all pods on node terminates without considering PodDistributionBudget
Hey, I was wondering if the webhook client should already honor the HTTPS_PROXY
environment variable or not?
I set
env:
- name: "HTTPS_PROXY"
value: "http://someProxy:somePort"
args:
- "-webhook.url=https://myWebhook"
- "-webhook.template-file=/files/webhook.json"
- "-webhook.contentType=application/json"
- "-webhook.method=POST"
- "-webhook.timeout=30s"
and it does seem to be used, since if I do not set the NO_PROXY
correctly the pod cannot start.
But the webhook requests seem to not go to the configured proxy.
Can a proxy be configured for the webhook client already, if so how?
Else this would be a feature request if possible :)
First of all thank you for your work on this project!
I wanted to enable metric scrapping for Prometheus using a PodMonitor. However, I encountered an issue with the PodMonitor in the following configuration
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
namespace: kube-system
name: podmonitor-aks-node-termination-handler
labels:
release: prometheus
spec:
selector:
matchLabels:
app: aks-node-termination-handler
podMetricsEndpoints:
- port: 17923
path: /metrics
interval: 15s
The problem is that a properly defined PodMonitor should point to the port by name rather than by number - https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#podmetricsendpoint
Could you please add the port definition for the container in DaemonSet so that the appropriate port with a name is created for the container in the pod? I'm attaching an example patch for the DaemonSet below:
kubectl patch daemonset aks-node-termination-handler --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/ports", "value": [{"containerPort": 17923, "name": "metrics", "protocol": "TCP"}]}]'
Thank you
Hi,
When running in OpenShift, there are no VirtualMachineScaleSets (only VirtualMachines), and for that reason, the DaemonSet is crashing (attached logs below).
Can we request for OpenShift support?
{"file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:55","func":"main.main","level":"info","msg":"Starting 1.0.13-d8d5a71-1707463489...","time":"2024-03-07T10:48:45Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert/alert.go:29","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert.Init","level":"warning","msg":"not sending Telegram message, no token","time":"2024-03-07T10:48:45Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client/client.go:45","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client.Init","level":"info","msg":"No kubeconfig file use incluster","time":"2024-03-07T10:48:45Z"}
{"error":"error in getting azure resource name: azure:///subscriptions/dd6b40ef-de5f-4649-95a7-bd2337c71900/resourceGroups/ocp-azure-uat-euw-8npmn-rg/providers/Microsoft.Compute/virtualMachines/master-1: azureProviderID not valid","file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:86","func":"main.main","level":"fatal","msg":"","time":"2024-03-07T10:48:45Z"}
Hi,
Currently when azure sends eventType FREEZE , aks node terminator drains all pods and stops watching for new events.
The issue what we see is , azure does not take that worker node down , so no new worker node creates by VMscaleset.
The worker remains in unscheduled state and it is charged .
I possible for FREEZE state alone , after drain watch for events again and when the new event comes related to unfreeze/normal , uncordon that worker node and keep watching for new events .
hello,
what are the requirements to use this tool, for example do we need to activate the "Instance termination" in the scaleset of the aks pool? and if yes, how do you recommend to activate it? as with "az aks create" i do not see an option to do it.
Hello Maksim
we tried to build from Docker file but it fails with
lstat aks-node-termination-handler: no such file or directory
Hi, I have set values this way:
aks-node-termination-handler:
image: image
imagePullPolicy: Always
args:
- "-webhook.url=https://myhook"
- "-webhook.template='node_termination_event{node=\"{{ .Node }}\"} 1'"
env: []
priorityClassName: "system-node-critical"
k get pod
NAME READY STATUS RESTARTS AGE
aks-node-termination-handler-4r776 1/1 Running 0 8m16s
aks-node-termination-handler-g6x25 1/1 Running 0 8m16s
aks-node-termination-handler-ncccj 1/1 Running 0 8m16s
aks-node-termination-handler-tgrzr 1/1 Running 0 8m16s
aks-node-termination-handler-wc2kf 1/1 Running 0 8m17s
aks-node-termination-handler-xc6dt 1/1 Running 0 8m16s
---
k get pod aks-node-termination-handler-tgrzr -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
...
containers:
- args:
- -webhook.url=https://myhook
- -webhook.template='node_termination_event{node="{{ .Node }}"} 1'
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
In the logs I am getting:
aks-node-termination-handler-r49nz aks-node-termination-handler {"error":"error in sending to webhook: StatusCode=400: http result not OK","file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events/events.go:140","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events.readEndpoint","level":"error","msg":"error in alerts.Send","time":"2024-01-22T13:47:52Z"}
When I try to send message to the channel via curl:
curl -X POST --data-urlencode "payload={\"channel\": \"#mychannel\", \"username\": \"webhookbot\", \"text\": \"This is posted to #my-channel-here and comes from a bot named webhookbot.\", \"icon_emoji\": \":ghost:\"}" https://myhoook
ok%
dublicated Prometheus metric
aks_node_termination_handler_scheduled_events_total{type="Freeze"} 100
logs
{"level":"info","msg":"Excluded event Freeze by user config","time":"2023-06-30T21:47:28Z"}
{"level":"info","msg":"{\"DocumentIncarnation\":7,\"Events\":[{\"EventId\":\"FE88BA46-96D4-432E-8AED-89A0F0E52D99\",\"EventStatus\":\"Scheduled\",\"EventType\":\"Freeze\",\"ResourceType\":\"VirtualMachine\",\"Resources\":[\"aks-spotcpu2m8-23666972-vmss_862\"],\"NotBefore\":\"Fri, 30 Jun 2023 22:00:16 GMT\",\"Description\":\"\",\"EventSource\":\"Platform\",\"DurationInSeconds\":30}]}","time":"2023-06-30T21:47:33Z"}
{"level":"info","msg":"Excluded event Freeze by user config","time":"2023-06-30T21:47:33Z"}
{"level":"info","msg":"{\"DocumentIncarnation\":7,\"Events\":[{\"EventId\":\"FE88BA46-96D4-432E-8AED-89A0F0E52D99\",\"EventStatus\":\"Scheduled\",\"EventType\":\"Freeze\",\"ResourceType\":\"VirtualMachine\",\"Resources\":[\"aks-spotcpu2m8-23666972-vmss_862\"],\"NotBefore\":\"Fri, 30 Jun 2023 22:00:16 GMT\",\"Description\":\"\",\"EventSource\":\"Platform\",\"DurationInSeconds\":30}]}","time":"2023-06-30T21:47:38Z"}
{"level":"info","msg":"Excluded event Freeze by user config","time":"2023-06-30T21:47:38Z"}
{"level":"info","msg":"{\"DocumentIncarnation\":7,\"Events\":[{\"EventId\":\"FE88BA46-96D4-432E-8AED-89A0F0E52D99\",\"EventStatus\":\"Scheduled\",\"EventType\":\"Freeze\",\"ResourceType\":\"VirtualMachine\",\"Resources\":[\"aks-spotcpu2m8-23666972-vmss_862\"],\"NotBefore\":\"Fri, 30 Jun 2023 22:00:16 GMT\",\"Description\":\"\",\"EventSource\":\"Platform\",\"DurationInSeconds\":30}]}","time":"2023-06-30T21:47:43Z"}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.