GithubHelp home page GithubHelp logo

julie-ng / cloudkube-aks-clusters Goto Github PK

View Code? Open in Web Editor NEW
31.0 4.0 17.0 1003 KB

3 Clusters, 1 Repo. Opinionated infrastructure as code for my Azure Kubernetes clusters for running demo apps.

License: MIT License

Makefile 19.15% HCL 80.85%
terraform azure kubernetes infra-as-code

cloudkube-aks-clusters's Introduction

cloudkube-aks-clusters's People

Contributors

julie-ng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cloudkube-aks-clusters's Issues

Reboot nodes on cluster version upgrade

Version Upgrades require node reboots

In the terraform code, we'll just listen to the orchestrator_version (k8s version on nodes) and run the upgrade command when that changes.

We will not use kubernetes_version, which applies to the control plane and should be upgraded first before the node pools.

Background

AKS does not automatically reboot nodes to actually use newer images - to avoid potentially breaking user workloads.

Per Apply security and kernel updates to Linux nodes in Azure Kubernetes Service (AKS)

To protect your clusters, security updates are automatically applied to Linux nodes in AKS. These updates include OS security fixes or kernel updates. Some of these updates require a node reboot to complete the process. AKS doesn't automatically reboot these Linux nodes to complete the update process.

and later…

Some security updates, such as kernel updates, require a node reboot to finalize the process. A Linux node that requires a reboot creates a file named /var/run/reboot-required. This reboot process doesn't happen automatically.

Additionally updates apply to existing nodes. But new nodes will run the old image

Unattended upgrades apply updates to the Linux node OS, but the image used to create nodes for your cluster remains unchanged. If a new Linux node is added to your cluster, the original image is used to create the node.

Best practice - managed upgrades

For for zero downtime upgrades, run az aks upgrade… which will

  • Create new nodes with latest updates
  • AKS makes sure no new workloads land that, reserving it for the existing workloads (cordoning)
  • Move existing workloads to this new node (draining)
  • When finished
    • Restart old nodes
    • New nodes can take new workloads

Details described in this doc

Use Azure `mode` to separate user & system node pools

Problem

Currently the default_node_pool block is used for the system node pool. But this defaults to user mode.

Consequence

AKS has a mode property that can be either system or user which beyond semantics adds the CriticalAddonsOnly=true:NoSchedule taint.

Docs Referender: Manage system node pools in Azure Kubernetes Service (AKS) > System and user node pools

Changes required

  • Swap default_node_pool block and the kubernetes_cluster_node_pool resource definitions.
  • Will probably destroy and re-create the cluster.

Remove Azure Pod Identity

Reason

Per Azure Pod Identity project repo, Pod Identity will be replaced by workload identity federation (preview) in the future.

❗ IMPORTANT: As mentioned in the announcement, we are planning to replace AAD Pod Identity with Azure Workload Identity. Going forward, we will no longer add new features to this project in favor of Azure Workload Identity. However, we will continue patching critical bugs and security vulnerabilities until further notice.

While Workload Identities are in preview, I won't try that in this change yet.

New Identity/Credential Options

So taking pod identity off the table, we have:

  • Workload Based Access Control needed, e.g. multi-tenant security boundary? Use Service Principals
  • Single-tenant security boundary? I'm going to use the Kubelet identity on the nodes that's actually shared by all workloads.

Code Changes

  • Rip out the pod identity workload and all the extra YAML that the Ingress controller needs just for TLS secret.
  • Upgrade to latest Azure Key Vault CSI v1.1.0 release from 2 March - preparing to test workload identity. Managed add-ons are often not latest version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.