GithubHelp home page GithubHelp logo

Comments (3)

jacobweinstock avatar jacobweinstock commented on August 28, 2024 1

Final update:

It appears that version of EKSA before v0.19 did not have this behavior and would not roll control plane nodes when adding or removing the only worker node group configuration. This, for better or worse, was a bug/not the intended behavior. v0.19 has "fixed" this bug/issue. We discussed this internally and we will not be pursuing any change to this behavior. There will be some doc updates to make this clear. One last note on why we won't be changing this behavior (this will be in the docs too). Going from a control plane only cluster to a control plane with worker node(s) cluster changes the nature and fundamental behavior of the control plane nodes. There are significant internal code, behavior, and spec consequences of a change like this. These are a couple of the reasons we have decided to not pursue a change to the current v0.19 behavior.

from eks-anywhere.

jacobweinstock avatar jacobweinstock commented on August 28, 2024

Update:
When a single node cluster is created. The single node is configured so that workload pods are permitted to run on the single node. This means that there is no taint or label prohibiting this. This is done via the taint: node-role.kubernetes.io/control-plane:NoSchedule and the label: node-role.kubernetes.io/control-plane=.

When scaling out a single node cluster with a worker node group, the control plane node "reverts" back to just being a control plane node. This means that the taint: node-role.kubernetes.io/control-plane:NoSchedule and the label: node-role.kubernetes.io/control-plane= are added to the control plane spec. This causes CAPI to see a spec change and trigger a rollout: Rolling out Control Plane machines: Machine [machine object] needs rollout: Machine InitConfiguration or JoinConfiguration are outdated"

In the code/spec this is the difference between:

Single node cluster control plane spec (kubectl get kcp -o yaml):

initConfiguration:
  localAPIEndpoint: {}
  nodeRegistration:
    imagePullPolicy: IfNotPresent
    kubeletExtraArgs:
      anonymous-auth: "false"
      provider-id: PROVIDER_ID
      read-only-port: "0"
      tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    taints: []
joinConfiguration:
  bottlerocketAdmin: {}
  bottlerocketBootstrap: {}
  bottlerocketControl: {}
  discovery: {}
  nodeRegistration:
    ignorePreflightErrors:
    - DirAvailable--etc-kubernetes-manifests
    imagePullPolicy: IfNotPresent
    kubeletExtraArgs:
      anonymous-auth: "false"
      provider-id: PROVIDER_ID
      read-only-port: "0"
      tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    taints: []
  pause: {}
  proxy: {}
  registryMirror: {}

Single node cluster control plane spec scaled out with a worker node group (kubectl get kcp -o yaml):

initConfiguration:
  localAPIEndpoint: {}
  nodeRegistration:
    imagePullPolicy: IfNotPresent
    kubeletExtraArgs:
      anonymous-auth: "false"
      provider-id: PROVIDER_ID
      read-only-port: "0"
      tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
joinConfiguration:
  bottlerocketAdmin: {}
  bottlerocketBootstrap: {}
  bottlerocketControl: {}
  discovery: {}
  nodeRegistration:
    ignorePreflightErrors:
    - DirAvailable--etc-kubernetes-manifests
    imagePullPolicy: IfNotPresent
    kubeletExtraArgs:
      anonymous-auth: "false"
      provider-id: PROVIDER_ID
      read-only-port: "0"
      tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  pause: {}
  proxy: {}
  registryMirror: {}

Diff:

--- singleNode.yaml     2024-04-18 12:56:22.396824833 -0600
+++ scaledout.yaml      2024-04-18 12:56:31.960841460 -0600
@@ -7,7 +7,6 @@
       provider-id: PROVIDER_ID
       read-only-port: "0"
       tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
-    taints: []
 joinConfiguration:
   bottlerocketAdmin: {}
   bottlerocketBootstrap: {}
@@ -22,7 +21,6 @@
       provider-id: PROVIDER_ID
       read-only-port: "0"
       tls-cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
-    taints: []
   pause: {}
   proxy: {}
   registryMirror: {}

In code this is the difference between a nil value and an empty slice []corev1.Taint. Ref: https://github.com/abhay-krishna/cluster-api/blob/f2c51dfbb9cd4dc60718f1ea7218b2ebe43bd0b3/bootstrap/kubeadm/api/v1beta1/kubeadm_types.go#L387

from eks-anywhere.

jacobweinstock avatar jacobweinstock commented on August 28, 2024

I also observed that an eksctl anywhere upgrade cluster command will fail at the command line but the new node did make it into the cluster.

kubectl get nodes
NAME                STATUS   ROLES           AGE    VERSION
<new worker node>   Ready    <none>          145m   v1.27.4-eks-cedffd4
<original CP node>  Ready    control-plane   21h    v1.27.4-eks-cedffd4

The workload cluster is left is a bad state though and subsequent lifecycle commands will fail with:

❌ Validation failed	{"validation": "control plane ready", "error": "1 control plane replicas are unavailable", "remediation": "ensure control plane nodes and pods for cluster workload-test are Ready"}

As the control plane needs to be rolled but there is no available hardware.

kubectl get kcp -n eksa-system workload-cluster
NAME               CLUSTER            INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
workload-cluster   workload-cluster   true          true                   2          1       1         1              4h   v1.27.11-eks-1-27-25

from eks-anywhere.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.