rancher / eks-operator Goto Github PK

License: Apache License 2.0

Dockerfile 0.27% Makefile 2.56% Go 94.91% Shell 2.01% Mustache 0.25%

eks-operator's Introduction

Rancher

This file is auto-generated from README-template.md, please make any changes there.

Rancher is an open source container management platform built for organizations that deploy containers in production. Rancher makes it easy to run Kubernetes everywhere, meet IT requirements, and empower DevOps teams.

Latest Release

v2.8
- Latest - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:latest - Read the full release notes.
- Stable - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:stable - Read the full release notes.
v2.7
- Latest - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
- Stable - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
v2.6
- Latest - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.
- Stable - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.

To get automated notifications of our latest release, you can watch the announcements category in our forums, or subscribe to the RSS feed https://forums.rancher.com/c/announcements.rss.

Quick Start

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher

Open your browser to https://localhost

Installation

See Installing/Upgrading Rancher for all installation options.

Minimum Requirements

Operating Systems
- Please see Support Matrix for specific OS versions for each Rancher version. Note that the link will default to the support matrix for the latest version of Rancher. Use the left navigation menu to select a different Rancher version.
Hardware & Software
- Please see Installation Requirements for hardware and software requirements.

Using Rancher

To learn more about using Rancher, please refer to our Rancher Documentation.

Source Code

This repo is a meta-repo used for packaging and contains the majority of Rancher codebase. For other Rancher projects and modules, see go.mod for the full list.

Rancher also includes other open source libraries and projects, see go.mod for the full list.

Build configuration

Refer to the build docs on how to customize the building and packaging of Rancher.

Support, Discussion, and Community

If you need any help with Rancher, please join us at either our Rancher forums or Slack where most of our team hangs out at.

Please submit any Rancher bugs, issues, and feature requests to rancher/rancher.

For security issues, please first check our security policy and email [email protected] instead of posting a public issue in GitHub. You may (but are not required to) use the GPG key located on Keybase.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

eks-operator's People

Contributors

Stargazers

Watchers

eks-operator's Issues

Automate releasing a new version and updating helm chart and dependencies in a core rancher.

Our current release process takes a lot of manual work. We need to find to automate following tasks:

Create a new release tag and github release
Bump operator dependency in rancher
Bump helm chart version in rancher/charts repo

Imported Cluster: Upgrade of Nodegroup k8s version not succeeding

Rancher version:

v2.7-10d02ae6827fd549ae0f373be84a5e327999925b-head
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Upgrade of Nodegroup k8s version not succeeding for imported EKS cluster, and it remains at same version (lower than CP version)
The cluster does not go into Updating state on UI after upgrade is triggered.

The same operation works on 2.7.5/2.7.6, eks-operator:v1.2.0
Found while validating rancher/rancher#42496

Steps

Import EKS cluster with single/multiple nodegroups (created using Launch template) and wait for it to be Active
Upgrade cluster k8s version (Control plane only) and let it complete successfully
Edit cluster and select node group upgrade checkbox and Save

Expected Result
Imported Cluster: Upgrade of Nodegroup k8s version must complete successfully

Logs (2.7-head):
No EKS-operator logs are generated for the operation

Rancher:

2023/08/21 09:19:24 [INFO] change detected for cluster [c-hg4p4], updating EKSClusterConfig
2023/08/21 09:19:34 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:19:35 [INFO] change detected for cluster [c-hg4p4], updating spec
2023/08/21 09:19:35 [INFO] change detected for cluster [c-hg4p4], updating EKSClusterConfig
2023/08/21 09:19:36 [ERROR] Error during subscribe websocket: close sent
2023/08/21 09:19:50 [ERROR] Error during subscribe websocket: close sent
W0821 09:19:55.703187      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
W0821 09:20:12.164038      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineHealthCheck is deprecated; use cluster.x-k8s.io/v1beta1 MachineHealthCheck
W0821 09:21:56.294592      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineDeployment is deprecated; use cluster.x-k8s.io/v1beta1 MachineDeployment
W0821 09:22:30.958953      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineSet is deprecated; use cluster.x-k8s.io/v1beta1 MachineSet
2023/08/21 09:22:54 [ERROR] Error during subscribe websocket: close sent
2023/08/21 09:23:07 [ERROR] Error during subscribe websocket: close sent
W0821 09:24:09.213292      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Cluster is deprecated; use cluster.x-k8s.io/v1beta1 Cluster
2023/08/21 09:24:35 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:24:35 [INFO] cluster [c-hg4p4] matches upstream, skipping spec sync
2023/08/21 09:26:36 [ERROR] Error during subscribe websocket: close sent
W0821 09:29:16.705076      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
2023/08/21 09:29:35 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:29:35 [INFO] cluster [c-hg4p4] matches upstream, skipping spec sync

Logs (2.7.5):
EKS-operator:

time="2023-08-21T09:33:39Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:34:10Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:34:41Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:35:11Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:35:42Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:36:12Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:36:43Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:37:13Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:37:44Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:38:14Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:38:45Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:39:15Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:39:46Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:40:16Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:40:47Z" level=info msg="cluster [c-hd9z6] finished updating"

Adding EKS cluster breaks Rancher

~~rancher/rancher#41299~~

(was SURE-5647, closed)

Remove support for K8s 1.23 and 1.24 for Rancher 2.8.0

Rancher 2.8.0 supported K8s versions range is 1.25 - 1.27. This means that 1.23 and 1.24 will no longer be officially supported and we want to remove the ability to provision new k8s 1.23-1.24 clusters.

Note: See rancher/rancher#39239 for a similar change done for Rancher 2.7.0 for reference.

PR:

rancher/ui#5071

Add a verify workflow

Add verify workflow to ensure no one forgets to run make generate, example: https://github.com/rancher/aks-operator/blob/master/.github/workflows/verify.yaml

Increase project maintainability

This epic will track tasks for increasing project maintainability:

During migration, EKS clusters don't reconnect to Hosted

SURE-6587

Issue description:

We are currently in the process of migrating a "nonprod" enviroment from self-hosted to Hosted Rancher. We are following the process defined at https://confluence.suse.com/pages/viewpage.action?spaceKey=Hosted&title=Migration+to+Rancher+Hosted+Prime. After following the procedure to reconfiguring EKS clusters, the clusters are not showing up as healthy in the UI and see the error that the agents are disconnected. RKE clusters running on EC2 VMs look fine.

Business impact:

Migration to Hosted is blocked for non-prod. Also blocking the migration for production.

Troubleshooting steps:

Turned on debugging, viewed and collected logs, but nothing is evident in the logs.

Repro steps:

Unknown, but at a high level:

Create self-hosted Rancher environment with EKS cluster(s)
Follow steps at https://confluence.suse.com/pages/viewpage.action?spaceKey=Hosted&title=Migration+to+Rancher+Hosted+Prime to migrate to Hosted

Workaround:

Is workararound available and implemented? unknown
What is the workaround:

Actual behavior:

After the migration process EKS clusters are not healthy (agent disconnected)

Expected behavior:

After migration, EKS clusters are healthy

Files, logs, traces:

See JIRA

Additional notes:

We also ran the websocket test to Hosted from a node within the cluster and that worked fine. The customer does not believe there are any firewall rules in place that are interfering with the agent to server communication.

Update readme with new instructions on running e2e tests, building, and releasing.

K8s upgrade sync from AWS to Rancher fails

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc3

Rancher is installed using Helm.
This bug was also tested on eks-operator HEAD version 20231017.

Steps to Reproduce:

Create an EKS cluster with 1.25 using Rancher.
Upgrade k8s cluster to 1.26 from AWS console.
Wait for the upgrade to finish and check Rancher.

Actual Results:
Upgrade fails with the following error:

 Controller.FailureMessage{ClusterName:"pvala-eks-sync-regular", Message_:"Unsupported Kubernetes minor version update from 1.26 to 1.25", NodegroupName:""}

Expected Results:
Upgrade should show successfully in Rancher.

Notes:
There seems to be no other way to revert the change from Rancher. Upgrading to 1.26 from Rancher does not work either.

Cluster YAML

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    clusters.management.cattle.io/ke-last-refresh: '1697638731'
    field.cattle.io/creatorId: user-q5x25
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: 1.26.9-eks-f8587cb
  creationTimestamp: '2023-10-18T13:50:32Z'
  finalizers:
    - wrangler.cattle.io/mgmt-cluster-remove
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 47
  labels:
    cattle.io/creator: norman
    provider.cattle.io: eks
  managedFields:
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:field.cattle.io/creatorId: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cattle.io/creator: {}
        f:spec:
          .: {}
          f:displayName: {}
          f:dockerRootDir: {}
          f:eksConfig:
            .: {}
            f:amazonCredentialSecret: {}
            f:displayName: {}
            f:imported: {}
            f:kmsKey: {}
            f:kubernetesVersion: {}
            f:loggingTypes: {}
            f:privateAccess: {}
            f:publicAccess: {}
            f:region: {}
            f:secretsEncryption: {}
            f:securityGroups: {}
            f:serviceRole: {}
            f:subnets: {}
            f:tags: {}
          f:enableClusterAlerting: {}
          f:enableClusterMonitoring: {}
          f:enableNetworkPolicy: {}
          f:internal: {}
          f:windowsPreferedCluster: {}
        f:status:
          .: {}
          f:appliedEnableNetworkPolicy: {}
      manager: Go-http-client
      operation: Update
      time: '2023-10-18T13:50:32Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:authz.management.cattle.io/creator-role-bindings: {}
            f:authz.management.cattle.io/initial-sync: {}
            f:clusters.management.cattle.io/ke-last-refresh: {}
            f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
            f:lifecycle.cattle.io/create.cluster-provisioner-controller: {}
            f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
            f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
            f:management.cattle.io/current-cluster-controllers-version: {}
          f:finalizers:
            .: {}
            v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
            v:"controller.cattle.io/cluster-provisioner-controller": {}
            v:"controller.cattle.io/cluster-scoped-gc": {}
            v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
            v:"wrangler.cattle.io/mgmt-cluster-remove": {}
          f:labels:
            f:provider.cattle.io: {}
        f:spec:
          f:agentImageOverride: {}
          f:answers: {}
          f:clusterSecrets: {}
          f:description: {}
          f:desiredAgentImage: {}
          f:desiredAuthImage: {}
          f:eksConfig:
            f:ebsCSIDriver: {}
            f:nodeGroups: {}
            f:publicAccessSources: {}
          f:fleetWorkspaceName: {}
          f:localClusterAuthEndpoint:
            .: {}
            f:enabled: {}
        f:status:
          f:agentFeatures:
            .: {}
            f:embedded-cluster-api: {}
            f:fleet: {}
            f:monitoringv1: {}
            f:multi-cluster-management: {}
            f:multi-cluster-management-agent: {}
            f:provisioningv2: {}
            f:rke2: {}
          f:agentImage: {}
          f:aksStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:rbacEnabled: {}
            f:upstreamSpec: {}
          f:allocatable:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:apiEndpoint: {}
          f:appliedAgentEnvVars: {}
          f:appliedPodSecurityPolicyTemplateId: {}
          f:appliedSpec:
            .: {}
            f:agentImageOverride: {}
            f:answers: {}
            f:clusterSecrets: {}
            f:description: {}
            f:desiredAgentImage: {}
            f:desiredAuthImage: {}
            f:displayName: {}
            f:eksConfig:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:enableClusterAlerting: {}
            f:enableClusterMonitoring: {}
            f:enableNetworkPolicy: {}
            f:internal: {}
            f:localClusterAuthEndpoint:
              .: {}
              f:enabled: {}
            f:windowsPreferedCluster: {}
          f:authImage: {}
          f:caCert: {}
          f:capabilities:
            .: {}
            f:loadBalancerCapabilities: {}
          f:capacity:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:conditions: {}
          f:driver: {}
          f:eksStatus:
            .: {}
            f:generatedNodeRole: {}
            f:managedLaunchTemplateID: {}
            f:managedLaunchTemplateVersions:
              .: {}
              f:dp: {}
            f:privateRequiresTunnel: {}
            f:securityGroups: {}
            f:subnets: {}
            f:upstreamSpec:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:virtualNetwork: {}
          f:gkeStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:upstreamSpec: {}
          f:limits:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:linuxWorkerCount: {}
          f:nodeCount: {}
          f:provider: {}
          f:requested:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:serviceAccountTokenSecret: {}
          f:version:
            .: {}
            f:buildDate: {}
            f:compiler: {}
            f:gitCommit: {}
            f:gitTreeState: {}
            f:gitVersion: {}
            f:goVersion: {}
            f:major: {}
            f:minor: {}
            f:platform: {}
      manager: rancher
      operation: Update
      time: '2023-10-18T14:18:51Z'
  name: c-l948b
  resourceVersion: '19636'
  uid: fc4e2fe0-84a8-4505-99b4-4503752e02d9
spec:
  agentImageOverride: ''
  answers: {}
  clusterSecrets: {}
  description: ''
  desiredAgentImage: ''
  desiredAuthImage: ''
  displayName: pvala-eks-sync-regular
  dockerRootDir: /var/lib/docker
  eksConfig:
    amazonCredentialSecret: cattle-global-data:cc-mkpjd
    displayName: pvala-eks-sync-regular
    ebsCSIDriver: null
    imported: false
    kmsKey: ''
    kubernetesVersion: '1.25'
    loggingTypes: []
    nodeGroups:
      - desiredSize: 1
        diskSize: 20
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.medium
        labels: {}
        launchTemplate: null
        maxSize: 2
        minSize: 1
        nodeRole: >-
          arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
        nodegroupName: dp
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - subnet-08d337145e51d3a3a
          - subnet-0813d93b5f49f4648
        tags: {}
        userData: null
        version: '1.25'
    privateAccess: false
    publicAccess: true
    publicAccessSources:
      - 0.0.0.0/0
    region: ap-south-1
    secretsEncryption: false
    securityGroups: []
    serviceRole: ''
    subnets: []
    tags: {}
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  windowsPreferedCluster: false
status:
  agentFeatures:
    embedded-cluster-api: false
    fleet: false
    monitoringv1: false
    multi-cluster-management: false
    multi-cluster-management-agent: true
    provisioningv2: false
    rke2: false
  agentImage: rancher/rancher-agent:v2.8.0-rc1
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: 1930m
    memory: 3388356Ki
    pods: '17'
  apiEndpoint: https://DB9EB28E21D8176978B3B1663EB4F1D3.gr7.ap-south-1.eks.amazonaws.com
  appliedAgentEnvVars:
    - name: CATTLE_SERVER_VERSION
      value: v2.8.0-rc1
    - name: CATTLE_INSTALL_UUID
      value: fa38b840-8b99-4b12-9727-216877a1b36a
    - name: CATTLE_INGRESS_IP_DOMAIN
      value: sslip.io
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ''
  appliedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets: {}
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: ''
    eksConfig:
      amazonCredentialSecret: cattle-global-data:cc-mkpjd
      displayName: pvala-eks-sync-regular
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.25'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 2
          minSize: 1
          nodeRole: ''
          nodegroupName: dp
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets: []
          tags: {}
          userData: ''
          version: '1.25'
      privateAccess: false
      publicAccess: true
      publicAccessSources: []
      region: ap-south-1
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ''
  caCert: >-
    LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJR2FQcmJUb1JJbDB3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TXpFd01UZ3hNelV5TWpkYUZ3MHpNekV3TVRVeE16VTNNamRhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURETGlURkFid1dTcGsxRGlNdVNCOHNGU2NtbkgydHVQT3ZRNmRkSDRLM1c5djNhM1g3VWtDWUVCK3MKQld2cWxubnN6SGR3U2dFR2hzcHJNbDh2NW1scmxMYU03QS9vM2xGY0hieEE1aDlTVG9kejNMY3F3SUlLTk94cwpKK0FYKzdUZ1hlZ01ncmVkMlJaTXJ0bGo2SnpVM1NiYi9lSnVwbkwxLyt3NVhFcVpNdFpuclQ3UUhscWprZjJGCmREZTZRbWttcEg3NTEvb3Z4SUhjQ2tRdFMvUDV1WSs3WEc3L2Y4dExCWGNacCsrMEV3WGpEUHo2VnBWTnUwK1cKMytKUXF6bWVjdzhCUnUxclIxU2xxVHIvYUwxTTc1cHNFdWp1RkIxOUk3YzBIc0RQOTBxb3cxcFp2NW5BSFRzMApOem9HZkpGbjI5ZTJMZ3ZhU1l4ejRNZlRoQkRKQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJSUjdJMFRPVE41V3cxSVZCR2hZTnNJZmZMVzdUQVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQUlPbVZ4c0IwaQpJZzlPeHk0SG9MeFVGL0JsL3U4SFI3dVUxRVNNSERRbUFxL2dPZWJTT2tjR0hjWEkzMDh1UDNKOXMwdHRBNS96CnBUSXcrdi9sM3AxU1NZSzV3ekRXR2IxR01tYU5EcnFPYjhJSU80SEVNNmRyK0pxM1IxQTVyRXpOU0JKL1RrTjMKRGZEK1RKeFRRcDYzdXBlMUFKN0U2S3kvOTc3YyttUC90eFcyRUxiQVd6T08veFJ0MDdWbFE1ZDJ3cFJjYzJWMAppNzNZdHhXTlM3czZnbkI3cHlLaERIdjNFbGlWRDEyNFJyNGhzeHllWWtJZW1mQnpVc2JPR0FrOU9Fams4Q2t4CnJzbTI4Ym8rRWpxM1cyZlRBc05sY1ltRE01anNnUE9CQ2wwOThSdzkxOTgxbWZ5aXdFT3ZTRzdtbEtnNGppdDQKd1JqQWExRzFsaUVzCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  capabilities:
    loadBalancerCapabilities: {}
  capacity:
    cpu: '2'
    memory: 3943364Ki
    pods: '17'
  conditions:
    - lastUpdateTime: ''
      status: 'True'
      type: Pending
    - lastUpdateTime: '2023-10-18T14:01:51Z'
      status: 'True'
      type: Provisioned
    - lastUpdateTime: '2023-10-18T14:05:44Z'
      status: 'True'
      type: Waiting
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: BackingNamespaceCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: DefaultProjectCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: SystemProjectCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: InitialRolesPopulated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: CreatorMadeOwner
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: NoDiskPressure
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: NoMemoryPressure
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: SecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: ServiceAccountSecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: RKESecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: ACISecretsMigrated
    - lastUpdateTime: '2023-10-18T14:05:37Z'
      status: 'True'
      type: Connected
    - lastUpdateTime: '2023-10-18T14:15:26Z'
      message: >-
        controller.FailureMessage{ClusterName:"pvala-eks-sync-regular",
        Message_:"Unsupported Kubernetes minor version update from 1.26 to
        1.25", NodegroupName:""}
      status: 'False'
      type: Updated
    - lastUpdateTime: '2023-10-18T14:05:44Z'
      status: 'True'
      type: Ready
    - lastUpdateTime: '2023-10-18T14:04:45Z'
      status: 'True'
      type: GlobalAdminsSynced
    - lastUpdateTime: '2023-10-18T14:04:45Z'
      status: 'True'
      type: SystemAccountCreated
    - lastUpdateTime: '2023-10-18T14:04:48Z'
      status: 'True'
      type: AgentDeployed
  driver: EKS
  eksStatus:
    generatedNodeRole: >-
      arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
    managedLaunchTemplateID: lt-0a8341e09aabf99c4
    managedLaunchTemplateVersions:
      dp: '2'
    privateRequiresTunnel: null
    securityGroups: null
    subnets:
      - subnet-08d337145e51d3a3a
      - subnet-0813d93b5f49f4648
    upstreamSpec:
      amazonCredentialSecret: cattle-global-data:cc-mkpjd
      displayName: pvala-eks-sync-regular
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.26'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 2
          minSize: 1
          nodeRole: >-
            arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
          nodegroupName: dp
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - subnet-08d337145e51d3a3a
            - subnet-0813d93b5f49f4648
          tags: {}
          userData: null
          version: '1.25'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: ap-south-1
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    virtualNetwork: vpc-0e0eac4e0f5e4a67c
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: '0'
    memory: 340Mi
    pods: '0'
  linuxWorkerCount: 1
  nodeCount: 1
  provider: eks
  requested:
    cpu: 325m
    memory: 140Mi
    pods: '7'
  serviceAccountTokenSecret: cluster-serviceaccounttoken-5jsjw
  version:
    buildDate: '2023-09-18T22:09:26Z'
    compiler: gc
    gitCommit: 2e0997b730999ec918be56b8ed945394905673c5
    gitTreeState: clean
    gitVersion: v1.26.9-eks-f8587cb
    goVersion: go1.20.8
    major: '1'
    minor: 26+
    platform: linux/amd64

EKS Issue for ARM

https://rancher-users.slack.com/archives/C010H59M335/p1680538797375449

(SURE-5550) Cannot create new node group in EKS cluster

Issue description:

The customer has a downstream EKS cluster they created through Rancher. They were previously able to add two node groups to this EKS cluster from the Rancher UI. But when they try to add another node group from the UI they are running into this error:

Controller.FailureMessage{ClusterName:"", Message_:"You do not have access to a default security group in VPC vpc-3bbc495c. Specify a security group, and try again.", NodegroupName:""}

I saw this GH issue and asked the customer if they are using their own launch templates. They said they are using the Rancher manged launch template, just like they did with the other node groups they were able to add previously.

Unable to upgrade NodeGroup k8s version from Rancher

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc3

Rancher is installed using Helm.

Steps to Reproduce:

Create/Import an EKS cluster.
Upgrade the ControlPlane K8s version either from Rancher or AWS Console, does not matter.
Upgrade the NodeGroup K8s version from Rancher

Actual Results:
Rancher does not do anything. It does not go into Updating state and NodeGroups are not upgraded.

Expected Results:
Rancher must be able to upgrade the NodeGroups K8s version.

EKS: Rancher managed launch template information not being displayed for Nodegroup

Setup

Rancher version: v2.7.7-rc4
Browser: Chrome Latest

Describe the bug
Rancher managed launch template information not being displayed for Nodegroup. The same is being displayed in case of Custom Launch template.

To Reproduce

Start to create downstream EKS cluster
Can keep all the settings default and create Nodegroup using Rancher managed launch template (default behavior)
Wait for the cluster to be Active
Edit cluster and check Launch template information for the above created Nodegroup

Expected Result
Nodegroup must display Launch template information as "Rancher Managed Launch Template" and appropriate Template version, so that User is able to edit Nodegroup details
(check v2.7.5 screenshot below)

Screenshots
Rancher managed LT:

Custom LT:

Rancher managed LT (v2.7.5):

Found while validating rancher/rancher#42496

cc: @rancher/highlander

[SURE-5880] Can not deploy 1.23 eks clusters

Issue description:

Can not provision or upgrade to 1.23+ with EKS clusters though they are listed as supported

Business impact:

Scheduled upgrades of many environments (including prod) is blocked

Troubleshooting steps:

Reproduced
Checked the eks kontainer-engine in 2.6.10: https://sourcegraph.com/github.com/rancher/[email protected]/-/blob/pkg/kontainer-engine/drivers/eks/eks_driver.go?L46&subtree=true

Repro steps:

Attempt to deploy 1.23 EKS cluster using Rancher 2.6.10 and region us-east-1 (though I believe the behavior is the same for other regions

Alternatively

Deploy 1.22 EKS cluster
Attempt to upgrade to 1.23

Workaround:

Is workararound available and implemented? no

We believe if we capture the API request for a 1.22 upgrade and replace the k8s version we should be able to push a 1.23 upgrade, but we need to test before proposing to Pfizer

Actual behavior:

Can not deploy or upgrade to 1.23

Expected behavior:

As 1.23 is listed as supported in the support matrix, in the driver's manifest, and available for the region in EKS, we expect to be able to deploy and upgrade to it in Rancher UI

EKS on Rancher does not go into updating state when a change is made from cloud console

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc4

Steps to Reproduce:

Import/Create an EKS cluster
From AWS Console, add/delete nodegroup or upgrade the k8s version.
Check the EKS cluster on Rancher

Actual Results:
EKS cluster on Rancher remains Active while the cluster is updating in AWS console. After the update is complete on AWS Console, wait for a few minutes and refresh Rancher, the new changes should be visible.

Expected Results:
EKS cluster on Rancher must go into Updating state while the cluster is updating in AWS console and become Active once all the updates have been made.

Add EKS 1.27 k8s versions to hardcoded list

Similar to rancher/ui#5009
Required for validating rancher/rancher#41797

PR's:

rancher/ui#5070

Launch templates are not cleaned-up on cluster deletion

If I create an EKS cluster using Rancher and then later delete it. It doesn't delete the Launch templates that were created.

This causes issues if you try and create a cluster with the same name. And also if you don't clear-up old launch templates you will eventually run into the AWS launch template limit.

Creating Node groups with same name must be disallowed

Rancher version:

v2.7-2e50d9f2f8725dccb40f28c8be2d8c2bb99fa59a-head
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Creation of Node groups with same name is currently allowed. Number of nodes are being set to value provided during new node group creation.

Steps

Provision EKS cluster with one node group having x number of nodes
Edit cluster and add new node group with same name and ~x number of nodes

Expected Result
Creating Node groups with same name must be disallowed

Found while validating rancher/rancher#42496

(SURE-3689) BUG - eks-config-operator fails during EKS node group provisioning and node template is empty

SURE-3689

Find a way to run tests on both the latest stable rancher release and the latest changes on the current dev branch.

After we set up some basic e2e infrastructure, we need to find a way to test on latest stable rancher release and on latest rancher development branch changes.
Depends on #96

(SURE-3392) [RFE] Enable possibility to set private DNS on AKS provisioning

https://jira.suse.com/browse/SURE-3392

Implement UI proxy functions for EKS

This needs to be done as for AKS
https://github.com/rancher/rancher/blob/release/v2.7/pkg/api/norman/customization/aks/handler.go#L106

(SURE-4066) EKS cluster via terraform fails when using a custom AMI in a launch template

SURE-4066

Issue description:

When using a custom launch template with an AMI ID, the creation of the cluster via terraform will fail with an AWS exception, due to an amiType being specified in the CreateNodegroup call [2].

The eks-operator appears to default to specifying an amiType when image_id is null [1]. As would be the case when the AMI is specified in the launch template.

In the Rancher UI, a describe of the launch template is done first, and the image_id for the cluster is pre-populated as a convenience. When using terraform this requires manual AWS provider actions, but is also unexpected.

Another approach could be to perform this describe in the eks-operator itself, so that the correct CreateNodegroup can occur, ie. if the launch template defines an AMI, don't specify an amiType in the call.

Repro steps:

Create a launch template, specify an AMI in the LT, the customer also specifies userdata and a security group however this should not be necessary.
Create the EKS cluster using terraform (example attached) using the launch template ID.

Workaround:

Is workararound available and implemented? yes
What is the workaround: Specify image_id also for the eks_config_v2 to match what is in the launch template (this is a convenience the Rancher UI adds). In my interpretation, the value of image_id doesn't appear to matter, as the launch template takes precedence [3].

Actual behavior:

Cluster is fails to provision causing an AWS API exception

Expected behavior:

Cluster is created as in the Rancher UI

Files, logs, traces:

You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template

[1] https://github.com/rancher/eks-operator/blob/master/controller/nodegroup.go#L242-L248

[2] https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html#AmazonEKS-CreateNodegroup-request-amiType

[3] https://github.com/rancher/eks-operator/blob/master/controller/eks-cluster-config-handler.go#L867*
*

Add unit tests for config handler

As reported in #232 (review), we should have unit tests for EKS config handler (as we do in AKS) and mock specific behaviors, like using a malformed launch template to create node groups.

PR:

#307

(SURE-5616) Intermittently imports of EKS clusters never finish/or finish with Error

Issue description:

Sometimes importing an EKS cluster it finishes in seconds others never complete (saying "Still creating..." for 30 min then time-out), but the cluster is active in the Rancher instance.

Business impact:

Not being able to import EKS clusters.

Troubleshooting steps:

Importing EKS cluster to Rancher v2 using this module:
importing-eks-cluster-to-rancherv2-terraform

The code support used:

╰─$ cat import_eks.tf provider.tf rancher2.tf vars.tf
resource "rancher2_cluster" "eks-national-instruments-two" {
  name        = "eks-national-instruments-two"
  description = "eks-national-instruments-two"
  eks_config_v2 {
    cloud_credential_id = rancher2_cloud_credential.aws.id
    name                = var.aws_eks_name
    region              = var.aws_region
    imported            = true
  }
}
terraform {
  required_providers {
    rancher2 = {
      source  = "rancher/rancher2"
      version = ">= 1.21.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.10.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_url
  token_key = var.rancher2_token_key
  bootstrap = false
}

resource "rancher2_cloud_credential" "aws" {
  name = "aws"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}
variable "rancher2_token_key" {
  default = "token-XXXX:XXXXXX"
}

variable "aws_access_key" {
  default = "my-aws-access-key"
}

variable "aws_secret_key" {
  default = "my-aws-secret-key"
}

variable "rancher_url" {
  default = "https://tadeulatest.support.rancher.space/"
}

variable "aws_eks_name" {
  default = "eks-national-instruments-two"
}

variable "aws_region" {
  default = "sa-east-1"
}

variable "aws_eks_service_role" {
  default = "eks-worker-node"
}

Repro steps:

╰─$ terraform init
╰─$ terraform plan --out=plan
╰─$ terraform apply "plan"

Workaround:

Is workararound available and implemented? yes

What is the workaround:
Only destroying and applying TF again fix the issue and the import is successful.

Actual behavior:

It took 30 minutes to finish the terraform apply "plan" and it ended with error.

ancher2_cluster.eks-national-instruments-two: Still creating... [29m30s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [29m40s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [29m50s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [30m0s elapsed]
╷
│ Error: [ERROR] waiting for cluster (c-zgnjt) to be created: timeout while waiting for state to become 'pending' (last state: 'active', timeout: 30m0s)
│
│   with rancher2_cluster.eks-national-instruments-two,
│   on import_eks.tf line 1, in resource "rancher2_cluster" "eks-national-instruments-two":
│    1: resource "rancher2_cluster" "eks-national-instruments-two" {
│
╵

Expected behavior:

Import the EKS cluster without timeout and errors.

Additional notes:

$ terraform version
Terraform v1.3.4

(SURE-5316) imported EKS cluster stuck in "Updating" state

https://jira.suse.com/browse/SURE-5316

Add dependabot to automatically update dependencies

Add dependabot to automatically update dependencies, it has to be enabled from repository settings. We will also need a config similar to this https://github.com/kubernetes-sigs/cluster-api/blob/main/.github/dependabot.yml so it doesn't update k8s related dependencies.

New imported EKS cluster is not accesible from Cluster Management tab

SURE-5920

Issue description: Access to newly created imported EKS downstream cluster from Cluster Management ==> Clusters ==> "Cluster Name" hangs on loading. **

Access to: https:///dashboard/c/_/manager/provisioning.cattle.io.cluster/fleet-default/ fails.

Access to: https:///dashboard/c/cluster-id/explorer works.

The web browser dev tools show: "Wait for The cluster to become available. It's possible the cluster was created. We suggest checking the clusters page before trying to create another. done immediately."

Troubleshooting steps:

The EKS DS cluster runs in the same VPC as Rancher. From ☰ >EXPLORE CLUSTER, the access to the cluster goes fine. ** The cluster runs fine; some applications have been deployed with the fleet. (CF access-explorecluster-ok)

The access to the cluster from ☰ > Cluster Management is stuck on loading (CF image1-notloading).

The error on the dev tools web browser is: "TypeError: Cannot read properties of null (reading 'filter') at f.showEskNodeGroupWarning (CF image2-error-clustermanagement)

Workaround:

Is workararound available and implemented? yes
What is the workaround: Edit the cluster's configuration and performance a minor change; for instance, add a user. The socket connection was established

Actual behavior:

EKS cluster is not accessible from Cluster Management.

Expected behavior:

Access to the DS cluster should be working.

Files, logs, traces:

$ curl -s -i -N \
> --http1.1 \
> -H "Connection: Upgrade" \
> -H "Upgrade: websocket" \
> -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
> -H "Sec-WebSocket-Version: 13" \
> -H "Authorization: Bearer $TOKEN" \
> -H "Host: $FQDN" \
> -k https://$fqdn/v3/subscribe
HTTP/1.1 101 Switching Protocols

Date: Tue, 14 Feb 2023 18:09:07 GMT
Connection: upgrade
Upgrade: websocket
Sec-WebSocket-Accept: qGEgH3En71di5rrssAZTmtRTyFk= �{"name":"ping","data":{}}�{"name":"ping","data":{}}�{"name":"ping","data":{}}^C

Cover AWS services interactions with unit tests

Add unit tests for AWS service interactions.

Start adding E2E tests

Add E2E tests after we complete creating a set up for this.
Depends on: #96

Error in logs while performing Nodegroup addition

Rancher version:

Rancher version: v2.7.7-rc4
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Error in logs while performing Nodegroup addition. However the operations are completing successfully.

time="2023-08-25T12:28:33Z" level=error msg="Error removing metadata from failure message: message body not formatted as expected"
time="2023-08-25T12:28:33Z" level=error msg="Error recording ekscc [] failure message: resource name may not be empty"
time="2023-08-25T12:28:33Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:28:34Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:29:04Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:29:34Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:30:05Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:30:35Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:31:06Z" level=info msg="cluster [c-kc8p8] finished updating"

Found while validating rancher/rancher#42496

[ENHANCEMENT] - Add tagging of Volumes when EKS provisions cluster

What kind of request is this (question/bug/enhancement/feature request):
Feature enhancement

Steps to reproduce (least amount of steps as possible):
Set Tag Enforcement on EC2 volume resource, provision EKS cluster with RMC filling out (all) tags as required.

Result:
RMC is not adding the volume tag so the cluster fails to come up with error,

{
"allowed": false,
"explicitDeny": true,
"matchedStatements": {
"items": [
{
"statementId": "ScpEnforceTagsApp",
"effect": "DENY",
"principalGroups": {
"items": []
},
"actions": {
"items": [
{
"value": "ec2:RunInstances"
}
]
},
"resources": {
"items": [
{
"value": "arn:aws:ec2:::instance/"
},
{
"value": "arn:aws:ec2:::volume/"
}
]
},

Other details that may be helpful:
It would be useful to get a widget similar to how AWS allows tag application in the launch template somewhere in the RMC 'wizard' for EKS creation. A single webpage module that allows me to create the key:value tag and then asks me what this tag applies to (three fields, two text input, last as a multiselect). This would consolidate the number of times I repeat the same tags throughout the wizard.

Environment information

Rancher version RMC 2.5.7
Installation option (single install/HA): Single

Cluster information

Cluster type: Cloud / AWS managed
Machine type Ec2 / EKS managed
Kubernetes version 1.18

gz#16840

Create a setup for running E2E tests

Create a setup for running E2E test, this includes following things:

Add cloud credentials to the repository secrets
Add a new github workflow similar to this, example: https://github.com/rancher/aks-operator/blob/master/.github/workflows/e2e-latest-rancher.yaml
Add new makefile tasks for running e2e, example: https://github.com/rancher/aks-operator/blob/master/Makefile#L120
Create an e2e package with suite setup, example: https://github.com/rancher/aks-operator/tree/master/test/e2e

(SURE-2757) [EKS] Registering a previously imported cluster fails to communicate with EKS API

https://jira.suse.com/browse/SURE-2757
customer case closed

[Feature] K8s 1.27 support

issue: rancher/rancher#41840

k8s deps bump to 0.27.x required.
need to use latest 1.5.0 rc tag available for rke
other dependencies

github.com/rancher/wrangler v1.1.1-0.20230831050635-df1bd5aae9df
github.com/rancher/lasso v0.0.0-20230830164424-d684fdeb6f29
github.com/rancher/fleet/pkg/apis v0.0.0-20230901075223-437edb7091f5

after that new changes needs to be vendored in rancher

PR's:

#248
#249

(SURE-5259) EKS Provision Failure w/ Rancher2 Terraform Provider

Seems very similar to SURE-4066

Issue description:

Can't provision an EKS downstream cluster using the Rancher2 TF provider. This is the error we see in the Rancher UI on the cluster as well as inside the eks-config-operator pod logs (it is spammed continuously):

time="2022-09-13T23:43:46Z" level=error msg="error syncing 'cattle-global-data/c-ksl6p': handler eks-controller: InvalidParameterException: Launch template details can't be null for Custom ami type node group\n{\n  RespMetadata: {\n    StatusCode: 400,\n    RequestID: \"eca6e2f1-42d5-411f-b2fc-716404d09d13\"\n  },\n  Message_: \"Launch template details can't be null for Custom ami type node group\"\n}, requeuing"

Cloud Trail on the AWS backned is reporting this error for the EventType: UpdateNodegroupVersion

Error code

InvalidParameterException

Event Record.

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDA4SBL6SADYCHEP64QO",
        "arn": "arn:aws:iam::863380606983:user/srvamr-btcsapid",
        "accountId": "1111111111111",
        "accessKeyId": "XXXXXXXXXXXXXXXXXXXXXXXX",
        "userName": "srvamr-btcsapid"
    },
    "eventTime": "2022-09-02T19:42:50Z",
    "eventSource": "eks.amazonaws.com",
    "eventName": "UpdateNodegroupVersion",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "148.168.40.5",
    "userAgent": "aws-sdk-go/1.36.7 (go1.16.4; linux; amd64)",
    "errorCode": "InvalidParameterException",
    "requestParameters": {
        "nodegroupName": "pdcs-dev1d-harim-v2-090122-ng1",
        "clientRequestToken": "D9CB6CAB-3459-4E09-89F0-DE4CF3BB6CAE",
        "name": "pdcs-dev1d-harim-v2-090122",
        "version": "1.21"
    },
    "responseElements": {
        "message": "Launch template details can't be null for Custom ami type node group"
    },
    "requestID": "9b15e35e-8349-4f9f-9586-62e94e253308",
    "eventID": "6bbdb28c-3b56-4011-90a6-27f791fdf035",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "863380606983",
    "eventCategory": "Management"
}

the payload for UpdateNodegroupVersion is expecting LaunchTemplate details which the payload is missing

see: https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateNodegroupVersion.html

The user is using a Launch Template with a custom AMI. It appears that the problem might be because the payload is adding "version": "1.21" as you can see above. The AWS documentation says: " If you specify launchTemplate , and your launch template uses a custom AMI, then don't specify releaseVersion , or the node group update will fail."

So we are wondering if this is the problem here. If so, why is the version being supplied in the payload when it shouldn't be? The user said that when he manually uses the following payload/method with the AWS SDK then it works fine:

response = eks.update_nodegroup_version(
  clusterName = cluster_name,
  nodegroupName = nodeGroup,
  launchTemplate = {
    'name': launchTemplateName,
  },
  force=True
)

The user said this was working fine 3 weeks ago. Did something change with the AWS SDK or how we handle it? We did notice this but not sure if it is related:
#72

When we looked at the user's AWS console, the EKS cluster was healthy, the Node Groups were created and healthy as well.

Business impact:

This is a blocker for them as they are not able to get their automation to work

Troubleshooting steps:

I tried to reproduce the problem in-house. I can get a vanilla EKS cluster to work fine. I am currently trying to test it with Launch Templates and a custom AMI like the user, but I am running into some other issues (AWS permissions which I'm working to get resolved). Once my permissions in AWS get fixed, I'm hoping I can reproduce the problem.

Workaround:

Is workararound available and implemented? yes/no
What is the workaround:

Actual behavior:

EKS cluster with Launch Template and custom AMI should be provisioned successfully

Expected behavior:

EKS cluster with Launch Template and custom AMI is not provisioned successfully

You cannot specify an AMI Type other than CUSTOM error when creating EKS cluster

Rancher Version: v2.7.2
EKS Cluster Version: 1.24

When attempting to create (either within Rancher or using Terraform rancher provider) an EKS version 1.24 cluster using a launch template in which an AMI is defined, I am getting the error below. Using a launch template where an AMI is not defined that launch template works correctly and the node group gets created.

Waiting for API to be available:controller.FailureMessage{ClusterName:"rm-eks-03", Message_:"You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template.", NodegroupName:"rancher_archbox_node_group_amz_linux"}

Using the same launch template version I was able to create a node group using the AWS console (outside of Rancher) but with Rancher I got the error above. This leads me to believe that there is a bug within the EKS operator.

The launch template that is causing this error is.

{
    "LaunchTemplateVersions": [
        {
            "LaunchTemplateId": "lt-XXXXXXXXXXXXXXXXXX",
            "LaunchTemplateName": "rancher_archbox_node_group_amz_linux_lt",
            "VersionNumber": 7,
            "CreateTime": "2023-05-01T17:01:22.000Z",
            "CreatedBy": "XXXXXXXXXXXXXXXXXXXX",
            "DefaultVersion": false,
            "LaunchTemplateData": {
                "BlockDeviceMappings": [
                    {
                        "DeviceName": "/dev/xvda",
                        "Ebs": {
                            "VolumeSize": 200,
                            "VolumeType": "gp2"
                        }
                    }
                ],
                "ImageId": "ami-07bccaac087171156",
                "InstanceType": "t3.xlarge",
                "KeyName": "rancher-managed-cluster-ssh-key",
                "UserData": "TUlNRS1WZXJzaW9uOiAxLjAKQ2XXXXXXXXXXXXXXXX1ZQk9VTkRBUlk9PSIKCi0tPT1NWUJPVU5EQVJZPT0KQ29udGVudC1UeXBlOiB0ZXh0L3gtc2hlbGxzY3JpcHQ7IGNoYXJzZXQ9InVzLWFzY2lpIgoKIyEvYmluL2Jhc2gKL2V0Yy9la3MvYm9vdHN0cmFwLnNoIC0tYXBpc2VydmVyLWVuZHBvaW50ICdodHRwczovL0VDMUU5NEREODVEREI4OTc1RjFCOEQ0NEM3NkU2NDA0LmdyNy51cy13ZXN0LTIuZWtzLmFtYXpvbmF3cy5jb20nIC0tYjY0LWNsdXN0ZXItY2EgJ0xTMHRMUzFDUlVkSlRpQkRSVkpVU1VaSlEwRlVSUzB0TFMwdENrMUpTVU12YWtORFFXVmhaMEYzU1VKQlowbENRVVJCVGtKbmEzRm9hMmxIT1hjd1FrRlJjMFpCUkVGV1RWSk5kMFZSV1VSV1VWRkVSWGR3Y21SWFNtd0tZMjAxYkdSSFZucE5RalJZUkZSSmVrMUVVWGxOVkVVeVRWUkJkMDVHYjXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDFjMG8xVkN0bFV6QXJNMlkzYzBsbmQzTmFOMVJDVWxVMFdubEtTREJqUTJFM2VtUmFOa281VGtjd1dEaHdTa3BsQ2pWRWRHbHNPR1pOUjBjdk1YQllSbEJLY1c4dk1HZFZVVlZuWm5oWlZqSXdOWEF2VkRZMFdsZGFiMU1yVm10RWFUVkZWbGxMVlRkSFN6WXdPSFV3TlVnS1ZVRlRXVmd6YlhKMk1ub3lNVUZGY0hCR1dHNUtRbWxzTmpWQlRtNVVUbUpxWm5CU2VsZFNUSGhQU1V4R1ZEUndLMFJsV1hneFR5OVNRVTVXXXXXXXXXXXXXXXXXXXXXXXXY2tWWVVrdFhXVkpoY1hSVk9VdFFjQ3RvY3pNM1pFUXJSMFEyYUZKRVYzaFVVRnBHUzFNeFdTOTJDa0p2VEM5VmNYVjFOMUJ5YUZsNWNWQm1WRTFGTm1aa2RubG1WMFpxYm5saFVUQktkall3TlRsVWExZG9Xa2REVVhoMmNqSk1PSGxCVnpOWmVXUjJWa3dLUXpGUlBRb3RMUzB0TFVWT1JDQkRSVkpVU1VaSlEwRlVSUzB0TFMwdENnPT0nICdybS1la3MtMDEnCgotLT09TVlCT1VOREFSWT09LS1c",
                "TagSpecifications": [
                    {
                        "ResourceType": "instance",
                        "Tags": [
                            {
                                "Key": "Service",
                                "Value": "rancher"
                            },
                            {
                                "Key": "Managed",
                                "Value": "TF"
                            },
                            {
                                "Key": "Name",
                                "Value": "rancher_archbox_node_group_amz_linux_lt"
                            },
                            {
                                "Key": "Environment",
                                "Value": "dev"
                            },
                            {
                                "Key": "Owner",
                                "Value": "xxxxxxxxxxxx"
                            }
                        ]
                    }
                ],
                "SecurityGroupIds": [
                    "sg-XXXXXXXXXXXXXXXXX"
                ]
            }
        }
    ]
}

The AMI ami-07bccaac087171156 is amazon-eks-node-1.24-v20230411.

My assumption is that when Rancher makes the API call to AWS that it is setting the nodegroup AMI type to something other than CUSTOM, https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType . When I don’t specify an AMI in the launch template looks like Rancher is setting the AMI type to AL2_x86_64 .

Document EKSClusterConfig API type

SURE-5202

We need to document the EKSClusterConfig API type including all child file and structs with:

documentation comments
kubebuilder validation annotations

PR:

#225

Flaky test in Rancher CI

SURE-6455

test_editing_eks_cluster_gives_temp_creds_annotation is flaky

See https://github.com/rancher/rancher/blob/9a33c24de30c81f2ecd900d923d16d7fb2020929/tests/integration/suite/test_kontainer_engine_annotations.py#L87

[Feature] K8s 1.26 support

See rancher/rancher#41113

Verify that k8s 1.26 works with eks-operator.

Update charts accordingly.

Hosted Rancher migration rollback process not working - "Cluster agent is not connected"

SURE-6154

Issue description:

Rancher Hosted Prime currently offers customers the ability to migrate their existing self-hosted Rancher Manager to our hosted environment. The Backup operator is used to take a customer backup and restore it into a new hosted environment. During this process, the server name needs to be updated to a "rancher.cloud" URL and all downstream clusters need to connect to the new URL. In the event something goes wrong, need the ability to rollback any changes made to the downstream clusters so that they point back to the customer's self-managed Rancher Manager. This rollback process which worked in older version (v2.5.x and earlier), seems to no longer work in v2.6.9.

Business impact:

High risk of migrating customers to Hosted, possibility of no a rollback may deter or prevent customers from migrating to Hosted.

Troubleshooting steps:

In addition to trying to run the "kubectl -f apply ..." command to reconfigure the cattle-cluster-agent, also tried doing a "kubectl -f delete ..." as well as manually removing the cattle-system, cattle-fleet-system, and cattle-impersonation-system namespaces. Tried restarting rancher and cluster agent pods. Turned on debug logging.

Repro steps:

Set up a new environment (v2.6.9) or use an existing with a provisioned downstream EKS cluster. Used v1.22.17.
Take a backup of the environment using the Rancher Backup operator (install it if it hasn't already been installed).
Set up a second Rancher Manager environment running 2.6.9 this should have a different servername URL. Don't run rancher (yet).
Restore the backup taken from the first environment.
Update the server-name setting to the name of your second environment.
Install Rancher Manager on the second environment.
Run the cluster registration command on the EKS cluster obtained from the second environment's UI. This should reconfigure the downstream cluster to talk to the second environment instead of the first environment. The first environment should show that the cluster agent is not connected and the second environment should show the cluster as healthy.
Rollback - Run the cluster registration command on the EKS cluster obtained from the first environment's UI. This should reconfigure the cluster to point back to the original Rancher Manager environment. However, this does not appear to work and the cluster remains in a disconnected state.

Workaround:

Is workararound available and implemented? unknown
What is the workaround:

Actual behavior:

After attempting rollback, the original Rancher Manager environment shows "Cluster agent is not connected" error for the downstream cluster:

Expected behavior:
After running the kubectl -f apply to reconfigure the cattle-cluster-agent, expecting the agent to reconnect to the original Rancher Manager server and not show any errors. If a new process is needed to rollback a cluster, need to know what that new process is.

Add support for AWS EKS custom vpc cni due to IP Exhaustion

Is your feature request related to a problem? Please describe.

There is a well known issue with many clusters or large clusters utilizing AWS EKS as a downstream cluster, where every node adds many secondary routable IP address and before you know it, you have exhausted your VPC routable subnets.

There are a few different options that AWS offers and one of those options is "custom networking":
https://aws.github.io/aws-eks-best-practices/networking/custom-networking/

This is a non-routable CG-NAT space set of subnets (ie 100.64.0.0/10) and is NOT applied to the EKS cluster, but rather an aws-node configuration that has to be updated at time of provisioning in order to avoid post provision Node rotation.

The request is to add support for the VPC Custom CNI.

Describe the solution you'd like

The request is to add support for creating the EKS clusters through the Rancher provider using the EKS cluster-addons as shown in this example from the AWS Terraform provider:
https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

cluster_addons = {
    coredns    = {}
    kube-proxy = {}
    vpc-cni = {
      # Specify the VPC CNI addon should be deployed before compute to ensure
      # the addon is configured before data plane compute resources are created
      # See README for further details
      before_compute = true
      most_recent    = true # To ensure access to the latest settings provided
      configuration_values = jsonencode({
        env = {
          # Reference https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }

resource "kubectl_manifest" "eni_config" {
  for_each = zipmap(local.azs, slice(module.vpc.private_subnets, 3, 6))

  yaml_body = yamlencode({
    apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
    kind       = "ENIConfig"
    metadata = {
      name = each.key
    }
    spec = {
      securityGroups = [
        module.eks.cluster_primary_security_group_id,
        module.eks.node_security_group_id,
      ]
      subnet = each.value
    }
  })
}

As you will notice, the subnet is NOT added at runtime in the above AWS example (the routable subnets are sliced and then the non-routable are applied on the ENIConfig as a post cluster step), so using the existing Rancher eks_config_v2 will not work as is without some changes to it.

Describe alternatives you've considered

I have done a manual POC of the steps described in this Terraform and I am currently considering implementing the code outlined in this blog:
https://medium.com/webstep/dont-let-your-eks-clusters-eat-up-all-your-ip-addresses-1519614e9daa

Since this works in a manual setup, I believe it will work as a "post processing" set of Terraform, however, it is really inefficient to have to go and "roll" every node at the end of provisioning in order for this to work (I confirmed this is needed at the end of my manual POC).

Additional context

I have met with AWS vendor regarding this as well. Using this custom CNI solution was one of the easier recommended solutions and they provided the link to their Terraform example as source.

Cluster not editable while nodegroup version upgrade is in progress

Rancher version:

v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
eks-operator:v1.3.0-rc4

Describe the bug
Cluster not editable while sequential nodegroup version upgrade is in progress
Found while validating #209

Steps

Provision EKS cluster with version less than highest supported one (eg. 1.26), with two node groups: test1 & test2
Upgrade EKS cluster to higher version (At this step, only CP is upgraded) and UI displays - "A new cluster version has been selected. Once completed you may come back and upgrade the node version."
Out of 2 node pools, upgrade only test1 to CP version
The cluster goes into Updating state, cluster Nodegroup information is not visible/editable (cluster YAML details below at this state of cluster)

Screenshots

Cluster YAML:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    clusters.management.cattle.io/ke-last-refresh: '1697814813'
    field.cattle.io/creatorId: user-blvw4
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: 1.27.6-eks-f8587cb
  creationTimestamp: '2023-10-20T11:41:55Z'
  finalizers:
    - wrangler.cattle.io/mgmt-cluster-remove
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 171
  labels:
    cattle.io/creator: norman
    provider.cattle.io: eks
  managedFields:
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:field.cattle.io/creatorId: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cattle.io/creator: {}
        f:spec:
          .: {}
          f:displayName: {}
          f:dockerRootDir: {}
          f:eksConfig:
            .: {}
            f:amazonCredentialSecret: {}
            f:displayName: {}
            f:imported: {}
            f:kmsKey: {}
            f:kubernetesVersion: {}
            f:loggingTypes: {}
            f:privateAccess: {}
            f:publicAccess: {}
            f:region: {}
            f:secretsEncryption: {}
            f:securityGroups: {}
            f:serviceRole: {}
            f:subnets: {}
            f:tags: {}
          f:enableClusterAlerting: {}
          f:enableClusterMonitoring: {}
          f:enableNetworkPolicy: {}
          f:internal: {}
          f:windowsPreferedCluster: {}
        f:status:
          .: {}
          f:appliedEnableNetworkPolicy: {}
      manager: Go-http-client
      operation: Update
      time: '2023-10-20T15:11:26Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:authz.management.cattle.io/creator-role-bindings: {}
            f:authz.management.cattle.io/initial-sync: {}
            f:clusters.management.cattle.io/ke-last-refresh: {}
            f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
            f:lifecycle.cattle.io/create.cluster-provisioner-controller: {}
            f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
            f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
            f:management.cattle.io/current-cluster-controllers-version: {}
          f:finalizers:
            .: {}
            v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
            v:"controller.cattle.io/cluster-provisioner-controller": {}
            v:"controller.cattle.io/cluster-scoped-gc": {}
            v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
            v:"wrangler.cattle.io/mgmt-cluster-remove": {}
          f:labels:
            f:provider.cattle.io: {}
        f:spec:
          f:agentImageOverride: {}
          f:answers: {}
          f:clusterSecrets: {}
          f:description: {}
          f:desiredAgentImage: {}
          f:desiredAuthImage: {}
          f:eksConfig:
            f:ebsCSIDriver: {}
            f:nodeGroups: {}
            f:publicAccessSources: {}
          f:fleetWorkspaceName: {}
          f:localClusterAuthEndpoint:
            .: {}
            f:enabled: {}
        f:status:
          f:agentFeatures:
            .: {}
            f:embedded-cluster-api: {}
            f:fleet: {}
            f:monitoringv1: {}
            f:multi-cluster-management: {}
            f:multi-cluster-management-agent: {}
            f:provisioningv2: {}
            f:rke2: {}
          f:agentImage: {}
          f:aksStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:rbacEnabled: {}
            f:upstreamSpec: {}
          f:allocatable:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:apiEndpoint: {}
          f:appliedAgentEnvVars: {}
          f:appliedPodSecurityPolicyTemplateId: {}
          f:appliedSpec:
            .: {}
            f:agentImageOverride: {}
            f:answers: {}
            f:clusterSecrets: {}
            f:description: {}
            f:desiredAgentImage: {}
            f:desiredAuthImage: {}
            f:displayName: {}
            f:eksConfig:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:enableClusterAlerting: {}
            f:enableClusterMonitoring: {}
            f:enableNetworkPolicy: {}
            f:internal: {}
            f:localClusterAuthEndpoint:
              .: {}
              f:enabled: {}
            f:windowsPreferedCluster: {}
          f:authImage: {}
          f:caCert: {}
          f:capabilities:
            .: {}
            f:loadBalancerCapabilities: {}
          f:capacity:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:conditions: {}
          f:driver: {}
          f:eksStatus:
            .: {}
            f:generatedNodeRole: {}
            f:managedLaunchTemplateID: {}
            f:managedLaunchTemplateVersions:
              .: {}
              f:test1: {}
              f:test2: {}
            f:privateRequiresTunnel: {}
            f:securityGroups: {}
            f:subnets: {}
            f:upstreamSpec:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:virtualNetwork: {}
          f:gkeStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:upstreamSpec: {}
          f:limits:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:linuxWorkerCount: {}
          f:nodeCount: {}
          f:provider: {}
          f:requested:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:serviceAccountTokenSecret: {}
          f:version:
            .: {}
            f:buildDate: {}
            f:compiler: {}
            f:gitCommit: {}
            f:gitTreeState: {}
            f:gitVersion: {}
            f:goVersion: {}
            f:major: {}
            f:minor: {}
            f:platform: {}
      manager: rancher
      operation: Update
      time: '2023-10-20T15:14:07Z'
  name: c-sp94j
  resourceVersion: '312219'
  uid: abd5460d-dfe1-438f-a865-69b740f6eeef
spec:
  agentImageOverride: ''
  answers: {}
  clusterSecrets: {}
  description: ''
  desiredAgentImage: ''
  desiredAuthImage: ''
  displayName: cpinjani-eks28
  dockerRootDir: /var/lib/docker
  eksConfig:
    amazonCredentialSecret: cattle-global-data:cc-rhnmv
    displayName: cpinjani-eks28
    ebsCSIDriver: null
    imported: false
    kmsKey: ''
    kubernetesVersion: '1.27'
    loggingTypes: []
    nodeGroups:
      - desiredSize: 1
        diskSize: 40
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.large
        labels: {}
        launchTemplate: null
        maxSize: 1
        minSize: 1
        nodeRole: >-
          arn:aws:iam::<REDACTED>
        nodegroupName: test1
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - <REDACTED>
        tags: {}
        userData: null
        version: null
      - desiredSize: 1
        diskSize: 20
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.medium
        labels: {}
        launchTemplate: null
        maxSize: 1
        minSize: 1
        nodeRole: >-
          arn:aws:iam::<REDACTED>
        nodegroupName: test2
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - <REDACTED>
        tags: {}
        userData: null
        version: '1.26'
    privateAccess: false
    publicAccess: true
    publicAccessSources:
      - 0.0.0.0/0
    region: us-east-2
    secretsEncryption: false
    securityGroups: []
    serviceRole: ''
    subnets: []
    tags: {}
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  windowsPreferedCluster: false
status:
  agentFeatures:
    embedded-cluster-api: false
    fleet: false
    monitoringv1: false
    multi-cluster-management: false
    multi-cluster-management-agent: true
    provisioningv2: false
    rke2: false
  agentImage: rancher/rancher-agent:v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: 7720m
    memory: 25184044Ki
    pods: '122'
  apiEndpoint: <REDACTED>
  appliedAgentEnvVars:
    - name: CATTLE_SERVER_VERSION
      value: v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
    - name: CATTLE_INSTALL_UUID
      value: f2e99488-620a-4edb-ac35-aa0d5247798f
    - name: CATTLE_INGRESS_IP_DOMAIN
      value: sslip.io
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ''
  appliedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets: {}
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: ''
    eksConfig:
      amazonCredentialSecret: cattle-global-data:cc-rhnmv
      displayName: cpinjani-eks28
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.27'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 40
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.large
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test1
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets:
            - <REDACTED>
          tags: {}
          userData: ''
          version: '1.27'
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test2
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets:
            - <REDACTED>
          tags: {}
          userData: ''
          version: '1.26'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: us-east-2
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ''
  caCert: >-
    <REDACTED>
  capabilities:
    loadBalancerCapabilities: {}
  capacity:
    cpu: '8'
    memory: 28012332Ki
    pods: '122'
  conditions:
    - lastUpdateTime: ''
      status: 'True'
      type: Pending
    - lastUpdateTime: '2023-10-20T11:51:13Z'
      status: 'True'
      type: Provisioned
    - lastUpdateTime: '2023-10-20T11:54:57Z'
      status: 'True'
      type: Waiting
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: BackingNamespaceCreated
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: DefaultProjectCreated
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: SystemProjectCreated
    - lastUpdateTime: '2023-10-20T11:41:56Z'
      status: 'True'
      type: InitialRolesPopulated
    - lastUpdateTime: '2023-10-20T11:41:56Z'
      status: 'True'
      type: CreatorMadeOwner
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: NoDiskPressure
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: NoMemoryPressure
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: SecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: ServiceAccountSecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: RKESecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: ACISecretsMigrated
    - lastUpdateTime: '2023-10-20T14:16:32Z'
      status: 'True'
      type: Connected
    - lastUpdateTime: '2023-10-20T15:13:34Z'
      status: Unknown
      type: Updated
    - lastUpdateTime: '2023-10-20T14:16:37Z'
      status: 'True'
      type: Ready
    - lastUpdateTime: '2023-10-20T11:53:57Z'
      status: 'True'
      type: GlobalAdminsSynced
    - lastUpdateTime: '2023-10-20T11:53:58Z'
      status: 'True'
      type: SystemAccountCreated
    - lastUpdateTime: '2023-10-20T11:54:00Z'
      status: 'True'
      type: AgentDeployed
  driver: EKS
  eksStatus:
    generatedNodeRole: >-
      arn:aws:iam::<REDACTED>
    managedLaunchTemplateID: <REDACTED>
    managedLaunchTemplateVersions:
      test1: '4'
      test2: '5'
    privateRequiresTunnel: null
    securityGroups: null
    subnets:
      - <REDACTED>
    upstreamSpec:
      amazonCredentialSecret: cattle-global-data:cc-rhnmv
      displayName: cpinjani-eks28
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.27'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 40
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.large
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test1
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - <REDACTED>
          tags: {}
          userData: null
          version: null
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test2
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - <REDACTED>
          tags: {}
          userData: null
          version: '1.26'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: us-east-2
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    virtualNetwork: <REDACTED>
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: '0'
    memory: 340Mi
    pods: '0'
  linuxWorkerCount: 4
  nodeCount: 4
  provider: eks
  requested:
    cpu: 700m
    memory: 140Mi
    pods: '14'
  serviceAccountTokenSecret: cluster-serviceaccounttoken-jjr5t
  version:
    buildDate: '2023-09-18T22:12:19Z'
    compiler: gc
    gitCommit: b6911bf9eade7d8ca7dd82af5e80626965829947
    gitTreeState: clean
    gitVersion: v1.27.6-eks-f8587cb
    goVersion: go1.20.8
    major: '1'
    minor: 27+
    platform: linux/amd64

Cover controller package with unit tests, ideally use envtest

The controller package doesn't include any tests, we should add some.
Ideally, we can use envtest https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest for testing it but that will require some investigation.

This issue depends on #93 as I expect it will require some minor refactoring in controller package.

Out of tree EBS CSI driver needed for EKS v1.23

With EKS v1.23 and higher it requires the use of the out of tree EBS CSI driver. If this isn't available then and PVC will not be bound and pods using the volumes will stay in pending.

This is a problem for creating new clusters and for upgrading clusters from 1.22

Docs: https://docs.amazonaws.cn/en_us/eks/latest/userguide/ebs-csi.html

Sequential upgrade of multiple node groups version not completing successfully

Rancher version:

2.7-head - 95f0b50
eks-operator:v1.2.2-rc3

Installation option: Docker
Proxy/Cert Details: Self-signed

Information about the Cluster
Kubernetes version: 1.25
Cluster Type: Downstream EKS cluster

User Information
What is the role of the user logged in? Standard user

Describe the bug
Sequential upgrade of multiple node groups version not completing successfully. In case of simultaneous upgrade of Node groups, it completes successfully.

Steps

Provision EKS cluster with version less than highest supported one, with more than one node group
Upgrade EKS cluster to higher version (At this step, only CP is upgraded) and UI displays - "A new cluster version has been selected. Once completed you may come back and upgrade the node version."
Out of 2 node pools, upgrade only one to CP version
Now update the remaining Node group to CP version using Ranhcer UI

Expected Result
Sequential upgrade of multiple node groups version must complete successfully

Screenshots
After step 4, below is Cluster state on AWS:

Rancher shows operation completed, but other Node group is still on lower version and Cluster Edit still shows and upgrade available.

cc: @rancher/highlander

PR's

(SURE-4552) Ability to configure agent deployments per cluster

Ability to configure agent deployments per cluster

Business case

AWS - EKS clusters utilized two node groups. One of the node groups leveraged EC2 on-demand instances and the other - spot instances (within auto-scaling groups). The on-demand node group nodes were assigned a custom taint (i.e., platform/node-lifecycle=NoSchedule) to ensure that only specific application workloads were being scheduled on those particular nodes. Alternatively, for the more ephemeral / least critical application workloads, the preference was to schedule them onto the spot instances / node-groups. Therefore, within the default configuration, when the cattle-cluster-agent pods were being scheduled, they were always getting assigned spot instance because of the custom taint being added (above), causing various flavors of instability within the cluster instance of Rancher UI manager. https://rancher.com/docs/rancher/v2.5/en/cluster-provisioning/rke-clusters/rancher-agents/#scheduling-rules

The solution would be to provide a method for setting custom parameters to override cattle-cluster-agent and fleet-agent settings at a per cluster level or when deploying rancher via any of the supported deployment options (i.e., helm). (we are focusing on the per-cluster configuration but should acknowledge the original ask).

There are two major asks for configuring agent deployments:

Setting Tolerations and Affinity Rules (Rancher Federal)
Setting Resource Limits

As Rancher does not have a unified cluster interface for every type of cluster consideration needs to be given to how we present these options in each distribution and we may wish to do a phased rollout, the customer's requesting this functionality use.

We should also add RKE1 to any MVP as well.

Original Requests:

Provide a taint directly onto the target "on-demand" nodes of "cattle.io/cluster-agent=true:NoSchedule", given that there is already an existing toleration for that scenario Outside of doing a patch of the cattle-cluster-agent deployment directly, or leveraging gitops solutions to synchronize diffs from underlying git repo (i.e., via kustomize/k8s manifests), there was no alternative solution that would guarantee that somewhere down the line (that we could think of), a subsequent process wouldn't override said patch (i.e., rancher upgrade) and blow away updates.

The cattle-cluster-agent gets deployed where there are not sufficient resources, and therefore want to define resource limits. They also mention it as good kubernetes practice to always operate with resource limits

Resource limits CAN be set on cattle-cluster-agent using kubernetes patch like RFed, but it is cumbersome, and should be settable in Rancher UI

Panic while sync of node group with status - Create failed

Rancher Server Setup
Rancher version: 2.7-head - fbca7c3
Installation option: Helm Chart
If Helm Chart, Kubernetes Cluster and version: EKS 1.25

Information about the Cluster
Kubernetes version: EKS with Kubernetes 1.25
Cluster Type (Downstream): Hosted = EKS

User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) - Admin

Describe the bug
Rancher is crashing while syncing with AWS of node group with status of Create failed

To Reproduce

Add valid AWS cloud credentials
Provision downstream EKS cluster with nodegroup with Rancher created Launch template for nodes & t3.large instance type
On AWS console, add new nodegroup with nodes using above launch template's Default version 1 (not latest version)
Node group creation fails on AWS, on next sync Rancher crashes

Logs

2023-05-29T10:39:13.278461607Z 2023/05/29 10:39:13 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:39:14.330149095Z 2023/05/29 10:39:14 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:44:14.278471912Z 2023/05/29 10:44:14 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:44:15.346803789Z 2023/05/29 10:44:15 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:49:15.279370474Z 2023/05/29 10:49:15 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:49:16.355974571Z 2023/05/29 10:49:16 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:54:16.278451114Z 2023/05/29 10:54:16 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:54:17.323399780Z 2023/05/29 10:54:17 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:59:17.271275337Z 2023/05/29 10:59:17 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:59:18.826879608Z E0529 10:59:18.826702      33 runtime.go:79] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
2023-05-29T10:59:18.826923005Z goroutine 5355 [running]:
2023-05-29T10:59:18.826929941Z k8s.io/apimachinery/pkg/util/runtime.logPanic({0x428e3a0?, 0xc0074b2dc8})
2023-05-29T10:59:18.826934975Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
2023-05-29T10:59:18.826939771Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00deb6a90?})
2023-05-29T10:59:18.826944525Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
2023-05-29T10:59:18.826948931Z panic({0x428e3a0, 0xc0074b2dc8})
2023-05-29T10:59:18.826954038Z  /usr/lib64/go/1.19/src/runtime/panic.go:884 +0x212
2023-05-29T10:59:18.826960880Z github.com/rancher/eks-operator/controller.BuildUpstreamClusterState({0xc00a4a4ee0, 0xc}, {0xc01982d488, 0x14}, 0xc011179618, {0xc0117c36d0, 0x2, 0xc00a4a4f79?}, {0x4fa5c80, 0xc00b9ad908}, ...)
2023-05-29T10:59:18.826966881Z  /go/pkg/mod/github.com/rancher/[email protected]/controller/eks-cluster-config-handler.go:861 +0x1729
2023-05-29T10:59:18.826972822Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.BuildEKSUpstreamSpec({0x4f9ccf8?, 0xc00a145f80?}, 0xc0034eca80)
2023-05-29T10:59:18.826978467Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/eks_upstream_spec.go:83 +0x525
2023-05-29T10:59:18.826987235Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.getComparableUpstreamSpec({0x4f9ccf8, 0xc00a145f80}, {0x7fa430e02c60, 0xc00019f730}, 0xc0034eca80)
2023-05-29T10:59:18.826992305Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:288 +0xec
2023-05-29T10:59:18.826996827Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).refreshClusterUpstreamSpec(0xc005a07220, 0xc0034eca80, {0x45b2d0b, 0x3})
2023-05-29T10:59:18.827001264Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:155 +0xd1
2023-05-29T10:59:18.827005685Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).onClusterChange(0xc005a07220, {0xc00fb45e60, 0x7}, 0xc0034eca80)
2023-05-29T10:59:18.827010308Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:79 +0x165
2023-05-29T10:59:18.827014755Z github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3.FromClusterHandlerToHandler.func1({0xc00fb45e60?, 0x0?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.827019430Z  /go/src/github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3/cluster.go:105 +0x44
2023-05-29T10:59:18.827026065Z github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x4514280?, {0xc00fb45e60?, 0x4626c7d?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.827030854Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x38
2023-05-29T10:59:18.827035753Z github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000d6e3c0, {0xc00fb45e60, 0x7}, {0x4f75a68, 0xc00a382a80})
2023-05-29T10:59:18.827040398Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x23f
2023-05-29T10:59:18.827045367Z github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000e4c630, {0xc00fb45e60, 0x7})
2023-05-29T10:59:18.827050000Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:232 +0x93
2023-05-29T10:59:18.827070669Z github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000e4c630, {0x39a6ce0?, 0xc00deb6a90?})
2023-05-29T10:59:18.827075678Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:213 +0x105
2023-05-29T10:59:18.827081198Z github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000e4c630)
2023-05-29T10:59:18.827085961Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:190 +0x46
2023-05-29T10:59:18.827090760Z github.com/rancher/lasso/pkg/controller.(*controller).runWorker(0xc0043ef6a0?)
2023-05-29T10:59:18.827095466Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:179 +0x25
2023-05-29T10:59:18.827100555Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0a28326e?)
2023-05-29T10:59:18.827105380Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
2023-05-29T10:59:18.827110588Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x38737574617473?, {0x4f67880, 0xc00426c210}, 0x1, 0xc001339b00)
2023-05-29T10:59:18.827115298Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
2023-05-29T10:59:18.827133385Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xa646e756f662073?, 0x3b9aca00, 0x0, 0x54?, 0x1422001006a3d19b?)
2023-05-29T10:59:18.827139588Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
2023-05-29T10:59:18.827144575Z k8s.io/apimachinery/pkg/util/wait.Until(0x6874242a64657470?, 0x616974696e692065?, 0x2073656d616e206c?)
2023-05-29T10:59:18.827149867Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92 +0x25
2023-05-29T10:59:18.827155012Z created by github.com/rancher/lasso/pkg/controller.(*controller).run
2023-05-29T10:59:18.827159906Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:147 +0x2a7
2023-05-29T10:59:18.830401178Z panic: runtime error: index out of range [0] with length 0 [recovered]
2023-05-29T10:59:18.830422679Z  panic: runtime error: index out of range [0] with length 0
2023-05-29T10:59:18.830436773Z 
2023-05-29T10:59:18.830443958Z goroutine 5355 [running]:
2023-05-29T10:59:18.830449778Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00deb6a90?})
2023-05-29T10:59:18.830455640Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
2023-05-29T10:59:18.830460955Z panic({0x428e3a0, 0xc0074b2dc8})
2023-05-29T10:59:18.830465881Z  /usr/lib64/go/1.19/src/runtime/panic.go:884 +0x212
2023-05-29T10:59:18.830471711Z github.com/rancher/eks-operator/controller.BuildUpstreamClusterState({0xc00a4a4ee0, 0xc}, {0xc01982d488, 0x14}, 0xc011179618, {0xc0117c36d0, 0x2, 0xc00a4a4f79?}, {0x4fa5c80, 0xc00b9ad908}, ...)
2023-05-29T10:59:18.830477522Z  /go/pkg/mod/github.com/rancher/[email protected]/controller/eks-cluster-config-handler.go:861 +0x1729
2023-05-29T10:59:18.830482562Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.BuildEKSUpstreamSpec({0x4f9ccf8?, 0xc00a145f80?}, 0xc0034eca80)
2023-05-29T10:59:18.830487640Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/eks_upstream_spec.go:83 +0x525
2023-05-29T10:59:18.830492466Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.getComparableUpstreamSpec({0x4f9ccf8, 0xc00a145f80}, {0x7fa430e02c60, 0xc00019f730}, 0xc0034eca80)
2023-05-29T10:59:18.830497365Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:288 +0xec
2023-05-29T10:59:18.830502458Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).refreshClusterUpstreamSpec(0xc005a07220, 0xc0034eca80, {0x45b2d0b, 0x3})
2023-05-29T10:59:18.830519214Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:155 +0xd1
2023-05-29T10:59:18.830524923Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).onClusterChange(0xc005a07220, {0xc00fb45e60, 0x7}, 0xc0034eca80)
2023-05-29T10:59:18.830529821Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:79 +0x165
2023-05-29T10:59:18.830535053Z github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3.FromClusterHandlerToHandler.func1({0xc00fb45e60?, 0x0?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.830539798Z  /go/src/github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3/cluster.go:105 +0x44
2023-05-29T10:59:18.830544900Z github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x4514280?, {0xc00fb45e60?, 0x4626c7d?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.830549981Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x38
2023-05-29T10:59:18.830554887Z github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000d6e3c0, {0xc00fb45e60, 0x7}, {0x4f75a68, 0xc00a382a80})
2023-05-29T10:59:18.830559665Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x23f
2023-05-29T10:59:18.830564310Z github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000e4c630, {0xc00fb45e60, 0x7})
2023-05-29T10:59:18.830569051Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:232 +0x93
2023-05-29T10:59:18.830573755Z github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000e4c630, {0x39a6ce0?, 0xc00deb6a90?})
2023-05-29T10:59:18.830578516Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:213 +0x105
2023-05-29T10:59:18.830583918Z github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000e4c630)
2023-05-29T10:59:18.830589085Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:190 +0x46
2023-05-29T10:59:18.830594238Z github.com/rancher/lasso/pkg/controller.(*controller).runWorker(0xc0043ef6a0?)
2023-05-29T10:59:18.830599133Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:179 +0x25
2023-05-29T10:59:18.830604091Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0a28326e?)
2023-05-29T10:59:18.830609038Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
2023-05-29T10:59:18.830614177Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x38737574617473?, {0x4f67880, 0xc00426c210}, 0x1, 0xc001339b00)
2023-05-29T10:59:18.830625830Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
2023-05-29T10:59:18.830631545Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xa646e756f662073?, 0x3b9aca00, 0x0, 0x54?, 0x1422001006a3d19b?)
2023-05-29T10:59:18.830637133Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
2023-05-29T10:59:18.830646011Z k8s.io/apimachinery/pkg/util/wait.Until(0x6874242a64657470?, 0x616974696e692065?, 0x2073656d616e206c?)
2023-05-29T10:59:18.830651251Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92 +0x25
2023-05-29T10:59:18.830656007Z created by github.com/rancher/lasso/pkg/controller.(*controller).run
2023-05-29T10:59:18.830660908Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:147 +0x2a7

Result
Panic while syncing of node group with status - Create failed

Screenshots

Cluster provisioning logs entries not logged on UI

Rancher Server Setup
Rancher version: 2.7-head - fbca7c3
Installation option: Helm Chart
If Helm Chart, Kubernetes Cluster and version: EKS 1.25

Information about the Cluster
Kubernetes version: EKS with Kubernetes 1.25
Cluster Type (Downstream): Hosted = EKS

User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) - Admin

Describe the bug
Cluster provisioning logs entries not logged on UI
Issue exists for other Hosted providers also.

To Reproduce
Add valid AWS cloud credentials
Provision downstream EKS cluster
Check Cluster > Provisioning Log tab

Result
User unable to check provisioning logs

Screenshots

Publish nightly builds of image

To help with testing, lets publish nightly builds of the operator.

rancher / eks-operator Goto Github PK

eks-operator's Introduction

Rancher

Latest Release

Quick Start

Installation

Minimum Requirements

Using Rancher

Source Code

Build configuration

Support, Discussion, and Community

License

eks-operator's People

Contributors

Stargazers

Watchers

Forkers

eks-operator's Issues

SURE-6587

Issue description:

Business impact:

Troubleshooting steps:

Repro steps:

Workaround:

Actual behavior:

Expected behavior:

Files, logs, traces:

Additional notes:

Issue description:

Issue description:

Business impact:

Troubleshooting steps:

Repro steps:

Alternatively

Workaround:

Actual behavior:

Expected behavior:

SURE-4066

Issue description:

Repro steps:

Workaround:

Actual behavior:

Expected behavior:

Files, logs, traces:

Issue description:

Business impact:

Troubleshooting steps:

Repro steps:

Workaround:

Actual behavior:

Expected behavior:

Additional notes:

SURE-5920

Issue description: Access to newly created imported EKS downstream cluster from Cluster Management ==> Clusters ==> "Cluster Name" hangs on loading. **

Troubleshooting steps:

Workaround:

Actual behavior:

Expected behavior:

Files, logs, traces:

Issue description:

Error code

Business impact:

Troubleshooting steps:

Workaround:

Actual behavior:

Expected behavior:

SURE-5202

PR:

SURE-6455

SURE-6154

Issue description:

Business impact:

Troubleshooting steps:

Repro steps:

Workaround:

Actual behavior:

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context