Version Upgrades require node reboots
In the terraform code, we'll just listen to the orchestrator_version
(k8s version on nodes) and run the upgrade command when that changes.
We will not use kubernetes_version
, which applies to the control plane and should be upgraded first before the node pools.
Background
AKS does not automatically reboot nodes to actually use newer images - to avoid potentially breaking user workloads.
Per Apply security and kernel updates to Linux nodes in Azure Kubernetes Service (AKS)
To protect your clusters, security updates are automatically applied to Linux nodes in AKS. These updates include OS security fixes or kernel updates. Some of these updates require a node reboot to complete the process. AKS doesn't automatically reboot these Linux nodes to complete the update process.
and later…
Some security updates, such as kernel updates, require a node reboot to finalize the process. A Linux node that requires a reboot creates a file named /var/run/reboot-required. This reboot process doesn't happen automatically.
Additionally updates apply to existing nodes. But new nodes will run the old image
Unattended upgrades apply updates to the Linux node OS, but the image used to create nodes for your cluster remains unchanged. If a new Linux node is added to your cluster, the original image is used to create the node.
Best practice - managed upgrades
For for zero downtime upgrades, run az aks upgrade…
which will
- Create new nodes with latest updates
- AKS makes sure no new workloads land that, reserving it for the existing workloads (cordoning)
- Move existing workloads to this new node (draining)
- When finished
- Restart old nodes
- New nodes can take new workloads
Details described in this doc