GithubHelp home page GithubHelp logo

eks-operator's Introduction

Rancher

This file is auto-generated from README-template.md, please make any changes there.

Build Status Docker Pulls Go Report Card

Rancher is an open source container management platform built for organizations that deploy containers in production. Rancher makes it easy to run Kubernetes everywhere, meet IT requirements, and empower DevOps teams.

Latest Release

  • v2.8
    • Latest - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:latest - Read the full release notes.
    • Stable - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:stable - Read the full release notes.
  • v2.7
    • Latest - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
    • Stable - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
  • v2.6
    • Latest - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.
    • Stable - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.

To get automated notifications of our latest release, you can watch the announcements category in our forums, or subscribe to the RSS feed https://forums.rancher.com/c/announcements.rss.

Quick Start

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher

Open your browser to https://localhost

Installation

See Installing/Upgrading Rancher for all installation options.

Minimum Requirements

  • Operating Systems
    • Please see Support Matrix for specific OS versions for each Rancher version. Note that the link will default to the support matrix for the latest version of Rancher. Use the left navigation menu to select a different Rancher version.
  • Hardware & Software

Using Rancher

To learn more about using Rancher, please refer to our Rancher Documentation.

Source Code

This repo is a meta-repo used for packaging and contains the majority of Rancher codebase. For other Rancher projects and modules, see go.mod for the full list.

Rancher also includes other open source libraries and projects, see go.mod for the full list.

Build configuration

Refer to the build docs on how to customize the building and packaging of Rancher.

Support, Discussion, and Community

If you need any help with Rancher, please join us at either our Rancher forums or Slack where most of our team hangs out at.

Please submit any Rancher bugs, issues, and feature requests to rancher/rancher.

For security issues, please first check our security policy and email [email protected] instead of posting a public issue in GitHub. You may (but are not required to) use the GPG key located on Keybase.

License

Copyright (c) 2014-2024 Rancher Labs, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

eks-operator's People

Contributors

aiyengar2 avatar alexander-demicev avatar cbron avatar chiukapoor avatar cmurphy avatar dependabot[bot] avatar furkatgofurov7 avatar jakefhyde avatar jiaqiluo avatar kevinjoiner avatar krunalhinguu avatar macedogm avatar mbologna avatar mjura avatar oxr463 avatar phillipsj avatar richardcase avatar rmweir avatar salasberryfin avatar superseb avatar yiannistri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eks-operator's Issues

Imported Cluster: Upgrade of Nodegroup k8s version not succeeding

Rancher version:

v2.7-10d02ae6827fd549ae0f373be84a5e327999925b-head
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Upgrade of Nodegroup k8s version not succeeding for imported EKS cluster, and it remains at same version (lower than CP version)
The cluster does not go into Updating state on UI after upgrade is triggered.

The same operation works on 2.7.5/2.7.6, eks-operator:v1.2.0
Found while validating rancher/rancher#42496

Steps

  • Import EKS cluster with single/multiple nodegroups (created using Launch template) and wait for it to be Active
  • Upgrade cluster k8s version (Control plane only) and let it complete successfully
  • Edit cluster and select node group upgrade checkbox and Save

Expected Result
Imported Cluster: Upgrade of Nodegroup k8s version must complete successfully

Logs (2.7-head):
No EKS-operator logs are generated for the operation

Rancher:

2023/08/21 09:19:24 [INFO] change detected for cluster [c-hg4p4], updating EKSClusterConfig
2023/08/21 09:19:34 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:19:35 [INFO] change detected for cluster [c-hg4p4], updating spec
2023/08/21 09:19:35 [INFO] change detected for cluster [c-hg4p4], updating EKSClusterConfig
2023/08/21 09:19:36 [ERROR] Error during subscribe websocket: close sent
2023/08/21 09:19:50 [ERROR] Error during subscribe websocket: close sent
W0821 09:19:55.703187      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
W0821 09:20:12.164038      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineHealthCheck is deprecated; use cluster.x-k8s.io/v1beta1 MachineHealthCheck
W0821 09:21:56.294592      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineDeployment is deprecated; use cluster.x-k8s.io/v1beta1 MachineDeployment
W0821 09:22:30.958953      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineSet is deprecated; use cluster.x-k8s.io/v1beta1 MachineSet
2023/08/21 09:22:54 [ERROR] Error during subscribe websocket: close sent
2023/08/21 09:23:07 [ERROR] Error during subscribe websocket: close sent
W0821 09:24:09.213292      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Cluster is deprecated; use cluster.x-k8s.io/v1beta1 Cluster
2023/08/21 09:24:35 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:24:35 [INFO] cluster [c-hg4p4] matches upstream, skipping spec sync
2023/08/21 09:26:36 [ERROR] Error during subscribe websocket: close sent
W0821 09:29:16.705076      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
2023/08/21 09:29:35 [INFO] checking cluster [c-hg4p4] upstream state for changes
2023/08/21 09:29:35 [INFO] cluster [c-hg4p4] matches upstream, skipping spec sync

Logs (2.7.5):
EKS-operator:

time="2023-08-21T09:33:39Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:34:10Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:34:41Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:35:11Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:35:42Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:36:12Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:36:43Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:37:13Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:37:44Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:38:14Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:38:45Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:39:15Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:39:46Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:40:16Z" level=info msg="waiting for cluster [c-hd9z6] to update nodegroups [ranchernodes1]"
time="2023-08-21T09:40:47Z" level=info msg="cluster [c-hd9z6] finished updating"

Increase project maintainability

This epic will track tasks for increasing project maintainability:

  • Enable github workflows for this repository
  • Add golang-ci lint workflow
  • #92
  • Add new task to makefile for building operator binary
  • Create AWS services mocks to use for unit tests
  • Set up a basic unit test suite
  • #93
  • #94
  • #95
  • #96
  • #97
  • #98
  • #99
  • #100

During migration, EKS clusters don't reconnect to Hosted

SURE-6587

Issue description:

We are currently in the process of migrating a "nonprod" enviroment from self-hosted to Hosted Rancher. We are following the process defined at https://confluence.suse.com/pages/viewpage.action?spaceKey=Hosted&title=Migration+to+Rancher+Hosted+Prime. After following the procedure to reconfiguring EKS clusters, the clusters are not showing up as healthy in the UI and see the error that the agents are disconnected. RKE clusters running on EC2 VMs look fine.

Business impact:

Migration to Hosted is blocked for non-prod. Also blocking the migration for production.

Troubleshooting steps:

Turned on debugging, viewed and collected logs, but nothing is evident in the logs.

Repro steps:

Unknown, but at a high level:

Create self-hosted Rancher environment with EKS cluster(s)
Follow steps at https://confluence.suse.com/pages/viewpage.action?spaceKey=Hosted&title=Migration+to+Rancher+Hosted+Prime to migrate to Hosted

Workaround:

Is workararound available and implemented? unknown
What is the workaround:

Actual behavior:

After the migration process EKS clusters are not healthy (agent disconnected)

Expected behavior:

After migration, EKS clusters are healthy

Files, logs, traces:

See JIRA

Additional notes:

We also ran the websocket test to Hosted from a node within the cluster and that worked fine. The customer does not believe there are any firewall rules in place that are interfering with the agent to server communication.

K8s upgrade sync from AWS to Rancher fails

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc3

Rancher is installed using Helm.
This bug was also tested on eks-operator HEAD version 20231017.

Steps to Reproduce:

  1. Create an EKS cluster with 1.25 using Rancher.
  2. Upgrade k8s cluster to 1.26 from AWS console.
  3. Wait for the upgrade to finish and check Rancher.

Actual Results:
Upgrade fails with the following error:

 Controller.FailureMessage{ClusterName:"pvala-eks-sync-regular", Message_:"Unsupported Kubernetes minor version update from 1.26 to 1.25", NodegroupName:""} 

Expected Results:
Upgrade should show successfully in Rancher.

Notes:
There seems to be no other way to revert the change from Rancher. Upgrading to 1.26 from Rancher does not work either.

Cluster YAML

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    clusters.management.cattle.io/ke-last-refresh: '1697638731'
    field.cattle.io/creatorId: user-q5x25
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: 1.26.9-eks-f8587cb
  creationTimestamp: '2023-10-18T13:50:32Z'
  finalizers:
    - wrangler.cattle.io/mgmt-cluster-remove
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 47
  labels:
    cattle.io/creator: norman
    provider.cattle.io: eks
  managedFields:
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:field.cattle.io/creatorId: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cattle.io/creator: {}
        f:spec:
          .: {}
          f:displayName: {}
          f:dockerRootDir: {}
          f:eksConfig:
            .: {}
            f:amazonCredentialSecret: {}
            f:displayName: {}
            f:imported: {}
            f:kmsKey: {}
            f:kubernetesVersion: {}
            f:loggingTypes: {}
            f:privateAccess: {}
            f:publicAccess: {}
            f:region: {}
            f:secretsEncryption: {}
            f:securityGroups: {}
            f:serviceRole: {}
            f:subnets: {}
            f:tags: {}
          f:enableClusterAlerting: {}
          f:enableClusterMonitoring: {}
          f:enableNetworkPolicy: {}
          f:internal: {}
          f:windowsPreferedCluster: {}
        f:status:
          .: {}
          f:appliedEnableNetworkPolicy: {}
      manager: Go-http-client
      operation: Update
      time: '2023-10-18T13:50:32Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:authz.management.cattle.io/creator-role-bindings: {}
            f:authz.management.cattle.io/initial-sync: {}
            f:clusters.management.cattle.io/ke-last-refresh: {}
            f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
            f:lifecycle.cattle.io/create.cluster-provisioner-controller: {}
            f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
            f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
            f:management.cattle.io/current-cluster-controllers-version: {}
          f:finalizers:
            .: {}
            v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
            v:"controller.cattle.io/cluster-provisioner-controller": {}
            v:"controller.cattle.io/cluster-scoped-gc": {}
            v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
            v:"wrangler.cattle.io/mgmt-cluster-remove": {}
          f:labels:
            f:provider.cattle.io: {}
        f:spec:
          f:agentImageOverride: {}
          f:answers: {}
          f:clusterSecrets: {}
          f:description: {}
          f:desiredAgentImage: {}
          f:desiredAuthImage: {}
          f:eksConfig:
            f:ebsCSIDriver: {}
            f:nodeGroups: {}
            f:publicAccessSources: {}
          f:fleetWorkspaceName: {}
          f:localClusterAuthEndpoint:
            .: {}
            f:enabled: {}
        f:status:
          f:agentFeatures:
            .: {}
            f:embedded-cluster-api: {}
            f:fleet: {}
            f:monitoringv1: {}
            f:multi-cluster-management: {}
            f:multi-cluster-management-agent: {}
            f:provisioningv2: {}
            f:rke2: {}
          f:agentImage: {}
          f:aksStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:rbacEnabled: {}
            f:upstreamSpec: {}
          f:allocatable:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:apiEndpoint: {}
          f:appliedAgentEnvVars: {}
          f:appliedPodSecurityPolicyTemplateId: {}
          f:appliedSpec:
            .: {}
            f:agentImageOverride: {}
            f:answers: {}
            f:clusterSecrets: {}
            f:description: {}
            f:desiredAgentImage: {}
            f:desiredAuthImage: {}
            f:displayName: {}
            f:eksConfig:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:enableClusterAlerting: {}
            f:enableClusterMonitoring: {}
            f:enableNetworkPolicy: {}
            f:internal: {}
            f:localClusterAuthEndpoint:
              .: {}
              f:enabled: {}
            f:windowsPreferedCluster: {}
          f:authImage: {}
          f:caCert: {}
          f:capabilities:
            .: {}
            f:loadBalancerCapabilities: {}
          f:capacity:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:conditions: {}
          f:driver: {}
          f:eksStatus:
            .: {}
            f:generatedNodeRole: {}
            f:managedLaunchTemplateID: {}
            f:managedLaunchTemplateVersions:
              .: {}
              f:dp: {}
            f:privateRequiresTunnel: {}
            f:securityGroups: {}
            f:subnets: {}
            f:upstreamSpec:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:virtualNetwork: {}
          f:gkeStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:upstreamSpec: {}
          f:limits:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:linuxWorkerCount: {}
          f:nodeCount: {}
          f:provider: {}
          f:requested:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:serviceAccountTokenSecret: {}
          f:version:
            .: {}
            f:buildDate: {}
            f:compiler: {}
            f:gitCommit: {}
            f:gitTreeState: {}
            f:gitVersion: {}
            f:goVersion: {}
            f:major: {}
            f:minor: {}
            f:platform: {}
      manager: rancher
      operation: Update
      time: '2023-10-18T14:18:51Z'
  name: c-l948b
  resourceVersion: '19636'
  uid: fc4e2fe0-84a8-4505-99b4-4503752e02d9
spec:
  agentImageOverride: ''
  answers: {}
  clusterSecrets: {}
  description: ''
  desiredAgentImage: ''
  desiredAuthImage: ''
  displayName: pvala-eks-sync-regular
  dockerRootDir: /var/lib/docker
  eksConfig:
    amazonCredentialSecret: cattle-global-data:cc-mkpjd
    displayName: pvala-eks-sync-regular
    ebsCSIDriver: null
    imported: false
    kmsKey: ''
    kubernetesVersion: '1.25'
    loggingTypes: []
    nodeGroups:
      - desiredSize: 1
        diskSize: 20
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.medium
        labels: {}
        launchTemplate: null
        maxSize: 2
        minSize: 1
        nodeRole: >-
          arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
        nodegroupName: dp
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - subnet-08d337145e51d3a3a
          - subnet-0813d93b5f49f4648
        tags: {}
        userData: null
        version: '1.25'
    privateAccess: false
    publicAccess: true
    publicAccessSources:
      - 0.0.0.0/0
    region: ap-south-1
    secretsEncryption: false
    securityGroups: []
    serviceRole: ''
    subnets: []
    tags: {}
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  windowsPreferedCluster: false
status:
  agentFeatures:
    embedded-cluster-api: false
    fleet: false
    monitoringv1: false
    multi-cluster-management: false
    multi-cluster-management-agent: true
    provisioningv2: false
    rke2: false
  agentImage: rancher/rancher-agent:v2.8.0-rc1
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: 1930m
    memory: 3388356Ki
    pods: '17'
  apiEndpoint: https://DB9EB28E21D8176978B3B1663EB4F1D3.gr7.ap-south-1.eks.amazonaws.com
  appliedAgentEnvVars:
    - name: CATTLE_SERVER_VERSION
      value: v2.8.0-rc1
    - name: CATTLE_INSTALL_UUID
      value: fa38b840-8b99-4b12-9727-216877a1b36a
    - name: CATTLE_INGRESS_IP_DOMAIN
      value: sslip.io
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ''
  appliedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets: {}
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: ''
    eksConfig:
      amazonCredentialSecret: cattle-global-data:cc-mkpjd
      displayName: pvala-eks-sync-regular
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.25'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 2
          minSize: 1
          nodeRole: ''
          nodegroupName: dp
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets: []
          tags: {}
          userData: ''
          version: '1.25'
      privateAccess: false
      publicAccess: true
      publicAccessSources: []
      region: ap-south-1
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ''
  caCert: >-
    LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJR2FQcmJUb1JJbDB3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TXpFd01UZ3hNelV5TWpkYUZ3MHpNekV3TVRVeE16VTNNamRhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURETGlURkFid1dTcGsxRGlNdVNCOHNGU2NtbkgydHVQT3ZRNmRkSDRLM1c5djNhM1g3VWtDWUVCK3MKQld2cWxubnN6SGR3U2dFR2hzcHJNbDh2NW1scmxMYU03QS9vM2xGY0hieEE1aDlTVG9kejNMY3F3SUlLTk94cwpKK0FYKzdUZ1hlZ01ncmVkMlJaTXJ0bGo2SnpVM1NiYi9lSnVwbkwxLyt3NVhFcVpNdFpuclQ3UUhscWprZjJGCmREZTZRbWttcEg3NTEvb3Z4SUhjQ2tRdFMvUDV1WSs3WEc3L2Y4dExCWGNacCsrMEV3WGpEUHo2VnBWTnUwK1cKMytKUXF6bWVjdzhCUnUxclIxU2xxVHIvYUwxTTc1cHNFdWp1RkIxOUk3YzBIc0RQOTBxb3cxcFp2NW5BSFRzMApOem9HZkpGbjI5ZTJMZ3ZhU1l4ejRNZlRoQkRKQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJSUjdJMFRPVE41V3cxSVZCR2hZTnNJZmZMVzdUQVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQUlPbVZ4c0IwaQpJZzlPeHk0SG9MeFVGL0JsL3U4SFI3dVUxRVNNSERRbUFxL2dPZWJTT2tjR0hjWEkzMDh1UDNKOXMwdHRBNS96CnBUSXcrdi9sM3AxU1NZSzV3ekRXR2IxR01tYU5EcnFPYjhJSU80SEVNNmRyK0pxM1IxQTVyRXpOU0JKL1RrTjMKRGZEK1RKeFRRcDYzdXBlMUFKN0U2S3kvOTc3YyttUC90eFcyRUxiQVd6T08veFJ0MDdWbFE1ZDJ3cFJjYzJWMAppNzNZdHhXTlM3czZnbkI3cHlLaERIdjNFbGlWRDEyNFJyNGhzeHllWWtJZW1mQnpVc2JPR0FrOU9Fams4Q2t4CnJzbTI4Ym8rRWpxM1cyZlRBc05sY1ltRE01anNnUE9CQ2wwOThSdzkxOTgxbWZ5aXdFT3ZTRzdtbEtnNGppdDQKd1JqQWExRzFsaUVzCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  capabilities:
    loadBalancerCapabilities: {}
  capacity:
    cpu: '2'
    memory: 3943364Ki
    pods: '17'
  conditions:
    - lastUpdateTime: ''
      status: 'True'
      type: Pending
    - lastUpdateTime: '2023-10-18T14:01:51Z'
      status: 'True'
      type: Provisioned
    - lastUpdateTime: '2023-10-18T14:05:44Z'
      status: 'True'
      type: Waiting
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: BackingNamespaceCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: DefaultProjectCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: SystemProjectCreated
    - lastUpdateTime: '2023-10-18T13:50:33Z'
      status: 'True'
      type: InitialRolesPopulated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: CreatorMadeOwner
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: NoDiskPressure
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: NoMemoryPressure
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: SecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: ServiceAccountSecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: RKESecretsMigrated
    - lastUpdateTime: '2023-10-18T13:50:36Z'
      status: 'True'
      type: ACISecretsMigrated
    - lastUpdateTime: '2023-10-18T14:05:37Z'
      status: 'True'
      type: Connected
    - lastUpdateTime: '2023-10-18T14:15:26Z'
      message: >-
        controller.FailureMessage{ClusterName:"pvala-eks-sync-regular",
        Message_:"Unsupported Kubernetes minor version update from 1.26 to
        1.25", NodegroupName:""}
      status: 'False'
      type: Updated
    - lastUpdateTime: '2023-10-18T14:05:44Z'
      status: 'True'
      type: Ready
    - lastUpdateTime: '2023-10-18T14:04:45Z'
      status: 'True'
      type: GlobalAdminsSynced
    - lastUpdateTime: '2023-10-18T14:04:45Z'
      status: 'True'
      type: SystemAccountCreated
    - lastUpdateTime: '2023-10-18T14:04:48Z'
      status: 'True'
      type: AgentDeployed
  driver: EKS
  eksStatus:
    generatedNodeRole: >-
      arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
    managedLaunchTemplateID: lt-0a8341e09aabf99c4
    managedLaunchTemplateVersions:
      dp: '2'
    privateRequiresTunnel: null
    securityGroups: null
    subnets:
      - subnet-08d337145e51d3a3a
      - subnet-0813d93b5f49f4648
    upstreamSpec:
      amazonCredentialSecret: cattle-global-data:cc-mkpjd
      displayName: pvala-eks-sync-regular
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.26'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 2
          minSize: 1
          nodeRole: >-
            arn:aws:iam::879933548321:role/pvala-eks-sync-regular-node-instan-NodeInstanceRole-NHFFyeQKO33Q
          nodegroupName: dp
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - subnet-08d337145e51d3a3a
            - subnet-0813d93b5f49f4648
          tags: {}
          userData: null
          version: '1.25'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: ap-south-1
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    virtualNetwork: vpc-0e0eac4e0f5e4a67c
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: '0'
    memory: 340Mi
    pods: '0'
  linuxWorkerCount: 1
  nodeCount: 1
  provider: eks
  requested:
    cpu: 325m
    memory: 140Mi
    pods: '7'
  serviceAccountTokenSecret: cluster-serviceaccounttoken-5jsjw
  version:
    buildDate: '2023-09-18T22:09:26Z'
    compiler: gc
    gitCommit: 2e0997b730999ec918be56b8ed945394905673c5
    gitTreeState: clean
    gitVersion: v1.26.9-eks-f8587cb
    goVersion: go1.20.8
    major: '1'
    minor: 26+
    platform: linux/amd64

Screenshot from 2023-10-18 19-54-26

(SURE-5550) Cannot create new node group in EKS cluster

Issue description:

The customer has a downstream EKS cluster they created through Rancher. They were previously able to add two node groups to this EKS cluster from the Rancher UI. But when they try to add another node group from the UI they are running into this error:ย 

Controller.FailureMessage{ClusterName:"", Message_:"You do not have access to a default security group in VPC vpc-3bbc495c. Specify a security group, and try again.", NodegroupName:""}

I saw this GH issue and asked the customer if they are using their own launch templates. They said they are using the Rancher manged launch template, just like they did with the other node groups they were able to add previously.

Unable to upgrade NodeGroup k8s version from Rancher

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc3

Rancher is installed using Helm.

Steps to Reproduce:

  1. Create/Import an EKS cluster.
  2. Upgrade the ControlPlane K8s version either from Rancher or AWS Console, does not matter.
  3. Upgrade the NodeGroup K8s version from Rancher

Actual Results:
Rancher does not do anything. It does not go into Updating state and NodeGroups are not upgraded.

Expected Results:
Rancher must be able to upgrade the NodeGroups K8s version.

EKS: Rancher managed launch template information not being displayed for Nodegroup

Setup

Rancher version: v2.7.7-rc4
Browser: Chrome Latest

Describe the bug
Rancher managed launch template information not being displayed for Nodegroup. The same is being displayed in case of Custom Launch template.

To Reproduce

  • Start to create downstream EKS cluster
  • Can keep all the settings default and create Nodegroup using Rancher managed launch template (default behavior)
  • Wait for the cluster to be Active
  • Edit cluster and check Launch template information for the above created Nodegroup

Expected Result
Nodegroup must display Launch template information as "Rancher Managed Launch Template" and appropriate Template version, so that User is able to edit Nodegroup details
(check v2.7.5 screenshot below)

Screenshots
Rancher managed LT:
image

Custom LT:
image

Rancher managed LT (v2.7.5):
image

Found while validating rancher/rancher#42496

cc: @rancher/highlander

[SURE-5880] Can not deploy 1.23 eks clusters

Issue description:

Can not provision or upgrade to 1.23+ with EKS clusters though they are listed as supported

Business impact:

Scheduled upgrades of many environments (including prod) is blocked

Troubleshooting steps:

Reproduced
Checked the eks kontainer-engine in 2.6.10: https://sourcegraph.com/github.com/rancher/[email protected]/-/blob/pkg/kontainer-engine/drivers/eks/eks_driver.go?L46&subtree=true

Repro steps:

Attempt to deploy 1.23 EKS cluster using Rancher 2.6.10 and region us-east-1 (though I believe the behavior is the same for other regions

Alternatively

ย Deploy 1.22 EKS cluster
Attempt to upgrade to 1.23

Workaround:

Is workararound available and implemented? no

We believe if we capture the API request for a 1.22 upgrade and replace the k8s version we should be able to push a 1.23 upgrade, but we need to test before proposing to Pfizer

Actual behavior:

Can not deploy or upgrade to 1.23

Expected behavior:

As 1.23 is listed as supported in the support matrix, in the driver's manifest, and available for the region in EKS, we expect to be able to deploy and upgrade to it in Rancher UI

EKS on Rancher does not go into updating state when a change is made from cloud console

Versions:
Rancher: v2.8.0-rc1
EKS Operator: rancher-eks-operator:103.0.0+up1.3.0-rc4

Steps to Reproduce:

  1. Import/Create an EKS cluster
  2. From AWS Console, add/delete nodegroup or upgrade the k8s version.
  3. Check the EKS cluster on Rancher

Actual Results:
EKS cluster on Rancher remains Active while the cluster is updating in AWS console. After the update is complete on AWS Console, wait for a few minutes and refresh Rancher, the new changes should be visible.

Expected Results:
EKS cluster on Rancher must go into Updating state while the cluster is updating in AWS console and become Active once all the updates have been made.

Launch templates are not cleaned-up on cluster deletion

If I create an EKS cluster using Rancher and then later delete it. It doesn't delete the Launch templates that were created.

This causes issues if you try and create a cluster with the same name. And also if you don't clear-up old launch templates you will eventually run into the AWS launch template limit.

Creating Node groups with same name must be disallowed

Rancher version:

v2.7-2e50d9f2f8725dccb40f28c8be2d8c2bb99fa59a-head
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Creation of Node groups with same name is currently allowed. Number of nodes are being set to value provided during new node group creation.

Steps

  • Provision EKS cluster with one node group having x number of nodes
  • Edit cluster and add new node group with same name and ~x number of nodes

Expected Result
Creating Node groups with same name must be disallowed

Found while validating rancher/rancher#42496

(SURE-4066) EKS cluster via terraform fails when using a custom AMI in a launch template

SURE-4066

Issue description:

When using a custom launch template with an AMI ID, the creation of the cluster via terraform will fail with an AWS exception, due to an amiType being specified in the CreateNodegroup call [2].

The eks-operator appears to default to specifying an amiType when image_id is null [1]. As would be the case when the AMI is specified in the launch template.

In the Rancher UI, a describe of the launch template is done first, and the image_id for the cluster is pre-populated as a convenience. When using terraform this requires manual AWS provider actions, but is also unexpected.

Another approach could be to perform this describe in the eks-operator itself, so that the correct CreateNodegroup can occur, ie. if the launch template defines an AMI, don't specify an amiType in the call.

Repro steps:

Create a launch template, specify an AMI in the LT, the customer also specifies userdata and a security group however this should not be necessary.
Create the EKS cluster using terraform (example attached) using the launch template ID.

Workaround:

Is workararound available and implemented? yes
What is the workaround: Specify image_id also for the eks_config_v2 to match what is in the launch template (this is a convenience the Rancher UI adds). In my interpretation, the value of image_id doesn't appear to matter, as the launch template takes precedence [3].

Actual behavior:

Cluster is fails to provision causing an AWS API exception

Expected behavior:

Cluster is created as in the Rancher UI

Files, logs, traces:

You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template

[1] https://github.com/rancher/eks-operator/blob/master/controller/nodegroup.go#L242-L248

[2] https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html#AmazonEKS-CreateNodegroup-request-amiType

[3] https://github.com/rancher/eks-operator/blob/master/controller/eks-cluster-config-handler.go#L867*
*

(SURE-5616) Intermittently imports of EKS clusters never finish/or finish with Error

See also rancher/terraform-provider-rancher2#1003

Issue description:

Sometimes importing an EKS cluster it finishes in seconds others never complete (saying "Still creating..." for 30 min then time-out), but the cluster is active in the Rancher instance.

Business impact:

Not being able to import EKS clusters.

Troubleshooting steps:

Importing EKS cluster to Rancher v2 using this module:
importing-eks-cluster-to-rancherv2-terraform

The code support used:

โ•ฐโ”€$ cat import_eks.tf provider.tf rancher2.tf vars.tf
resource "rancher2_cluster" "eks-national-instruments-two" {
  name        = "eks-national-instruments-two"
  description = "eks-national-instruments-two"
  eks_config_v2 {
    cloud_credential_id = rancher2_cloud_credential.aws.id
    name                = var.aws_eks_name
    region              = var.aws_region
    imported            = true
  }
}
terraform {
  required_providers {
    rancher2 = {
      source  = "rancher/rancher2"
      version = ">= 1.21.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.10.0"
    }
  }
}

provider "rancher2" {
  api_url   = var.rancher_url
  token_key = var.rancher2_token_key
  bootstrap = false
}

resource "rancher2_cloud_credential" "aws" {
  name = "aws"
  amazonec2_credential_config {
    access_key = var.aws_access_key
    secret_key = var.aws_secret_key
  }
}
variable "rancher2_token_key" {
  default = "token-XXXX:XXXXXX"
}

variable "aws_access_key" {
  default = "my-aws-access-key"
}

variable "aws_secret_key" {
  default = "my-aws-secret-key"
}

variable "rancher_url" {
  default = "https://tadeulatest.support.rancher.space/"
}

variable "aws_eks_name" {
  default = "eks-national-instruments-two"
}

variable "aws_region" {
  default = "sa-east-1"
}

variable "aws_eks_service_role" {
  default = "eks-worker-node"
}

Repro steps:

โ•ฐโ”€$ terraform init
โ•ฐโ”€$ terraform plan --out=plan
โ•ฐโ”€$ terraform apply "plan"

Workaround:

Is workararound available and implemented? yes

What is the workaround:
Only destroying and applying TF again fix the issue and the import is successful.

Actual behavior:

It took 30 minutes to finish the terraform apply "plan" and it ended with error.

ancher2_cluster.eks-national-instruments-two: Still creating... [29m30s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [29m40s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [29m50s elapsed]
rancher2_cluster.eks-national-instruments-two: Still creating... [30m0s elapsed]
โ•ท
โ”‚ Error: [ERROR] waiting for cluster (c-zgnjt) to be created: timeout while waiting for state to become 'pending' (last state: 'active', timeout: 30m0s)
โ”‚
โ”‚   with rancher2_cluster.eks-national-instruments-two,
โ”‚   on import_eks.tf line 1, in resource "rancher2_cluster" "eks-national-instruments-two":
โ”‚    1: resource "rancher2_cluster" "eks-national-instruments-two" {
โ”‚
โ•ต

Expected behavior:

Import the EKS cluster without timeout and errors.

Additional notes:

$ terraform version
Terraform v1.3.4

New imported EKS cluster is not accesible from Cluster Management tab

SURE-5920

Issue description: Access to newly created imported EKS downstream cluster from Cluster Management ==> Clusters ==> "Cluster Name" hangs on loading. **ย 

Access to: https:///dashboard/c/_/manager/provisioning.cattle.io.cluster/fleet-default/ย  fails.

Access to: ย https:///dashboard/c/cluster-id/explorer works.

The web browser dev tools show: "Wait for The cluster to become available. It's possible the cluster was created. We suggest checking the clusters page before trying to create another. done immediately."

Troubleshooting steps:

The EKS DS cluster runs in the same VPC as Rancher. From ย โ˜ฐ >EXPLORE CLUSTER, the access to the cluster goes fine. ** The cluster runs fine; some applications have been deployed with the fleet.ย  (CF access-explorecluster-ok)

The access to the cluster fromย  โ˜ฐ > Cluster Management is stuck on loading (CF image1-notloading).

The error on the dev tools web browser is: "TypeError: Cannot read properties of null (reading 'filter') at f.showEskNodeGroupWarning (CF image2-error-clustermanagement)

Workaround:

Is workararound available and implemented? yes
What is the workaround: Edit the cluster's configuration and performance a minor change; for instance, add a user. The socket connection was established

Actual behavior:

EKS cluster is not accessible from Cluster Management.

Expected behavior:

Access to the DS cluster should be working.

Files, logs, traces:

$ curl -s -i -N \
> --http1.1 \
> -H "Connection: Upgrade" \
> -H "Upgrade: websocket" \
> -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
> -H "Sec-WebSocket-Version: 13" \
> -H "Authorization: Bearer $TOKEN" \
> -H "Host: $FQDN" \
> -k https://$fqdn/v3/subscribe
HTTP/1.1 101 Switching Protocols

Date: Tue, 14 Feb 2023 18:09:07 GMT
Connection: upgrade
Upgrade: websocket
Sec-WebSocket-Accept: qGEgH3En71di5rrssAZTmtRTyFk= ๏ฟฝ{"name":"ping","data":{}}๏ฟฝ{"name":"ping","data":{}}๏ฟฝ{"name":"ping","data":{}}^C

Error in logs while performing Nodegroup addition

Rancher version:

Rancher version: v2.7.7-rc4
eks-operator: v1.2.2-rc3

Cluster Type: Downstream EKS cluster

Describe the bug
Error in logs while performing Nodegroup addition. However the operations are completing successfully.

time="2023-08-25T12:28:33Z" level=error msg="Error removing metadata from failure message: message body not formatted as expected"
time="2023-08-25T12:28:33Z" level=error msg="Error recording ekscc [] failure message: resource name may not be empty"

time="2023-08-25T12:28:33Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:28:34Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:29:04Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:29:34Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:30:05Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:30:35Z" level=info msg="waiting for cluster [c-kc8p8] to update nodegroups [test2]"
time="2023-08-25T12:31:06Z" level=info msg="cluster [c-kc8p8] finished updating"

Found while validating rancher/rancher#42496

[ENHANCEMENT] - Add tagging of Volumes when EKS provisions cluster

What kind of request is this (question/bug/enhancement/feature request):
Feature enhancement

Steps to reproduce (least amount of steps as possible):
Set Tag Enforcement on EC2 volume resource, provision EKS cluster with RMC filling out (all) tags as required.

Result:
RMC is not adding the volume tag so the cluster fails to come up with error,

{
"allowed": false,
"explicitDeny": true,
"matchedStatements": {
"items": [
{
"statementId": "ScpEnforceTagsApp",
"effect": "DENY",
"principalGroups": {
"items": []
},
"actions": {
"items": [
{
"value": "ec2:RunInstances"
}
]
},
"resources": {
"items": [
{
"value": "arn:aws:ec2:::instance/"
},
{
"value": "arn:aws:ec2:
::volume/"
}
]
},

Other details that may be helpful:
It would be useful to get a widget similar to how AWS allows tag application in the launch template somewhere in the RMC 'wizard' for EKS creation. A single webpage module that allows me to create the key:value tag and then asks me what this tag applies to (three fields, two text input, last as a multiselect). This would consolidate the number of times I repeat the same tags throughout the wizard.

Environment information

  • Rancher version RMC 2.5.7
  • Installation option (single install/HA): Single

Cluster information

  • Cluster type: Cloud / AWS managed
  • Machine type Ec2 / EKS managed
  • Kubernetes version 1.18

gz#16840

Create a setup for running E2E tests

Create a setup for running E2E test, this includes following things:

[Feature] K8s 1.27 support

issue: rancher/rancher#41840

  • k8s deps bump to 0.27.x required.
  • need to use latest 1.5.0 rc tag available for rke
  • other dependencies
github.com/rancher/wrangler v1.1.1-0.20230831050635-df1bd5aae9df
github.com/rancher/lasso v0.0.0-20230830164424-d684fdeb6f29
github.com/rancher/fleet/pkg/apis v0.0.0-20230901075223-437edb7091f5

  • after that new changes needs to be vendored in rancher

PR's:

(SURE-5259) EKS Provision Failure w/ Rancher2 Terraform Provider

Seems very similar to SURE-4066

Issue description:

Can't provision an EKS downstream cluster using the Rancher2 TF provider. This is the error we see in the Rancher UI on the cluster as well as inside the eks-config-operator pod logs (it is spammed continuously):

time="2022-09-13T23:43:46Z" level=error msg="error syncing 'cattle-global-data/c-ksl6p': handler eks-controller: InvalidParameterException: Launch template details can't be null for Custom ami type node group\n{\n ย RespMetadata: {\n ย  ย StatusCode: 400,\n ย  ย RequestID: \"eca6e2f1-42d5-411f-b2fc-716404d09d13\"\n ย },\n ย Message_: \"Launch template details can't be null for Custom ami type node group\"\n}, requeuing" 

Cloud Trail on the AWS backned is reporting this error for the EventType: UpdateNodegroupVersion

Error code

InvalidParameterException

Event Record.

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDA4SBL6SADYCHEP64QO",
        "arn": "arn:aws:iam::863380606983:user/srvamr-btcsapid",
        "accountId": "1111111111111",
        "accessKeyId": "XXXXXXXXXXXXXXXXXXXXXXXX",
        "userName": "srvamr-btcsapid"
    },
    "eventTime": "2022-09-02T19:42:50Z",
    "eventSource": "eks.amazonaws.com",
    "eventName": "UpdateNodegroupVersion",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "148.168.40.5",
    "userAgent": "aws-sdk-go/1.36.7 (go1.16.4; linux; amd64)",
    "errorCode": "InvalidParameterException",
    "requestParameters": {
        "nodegroupName": "pdcs-dev1d-harim-v2-090122-ng1",
        "clientRequestToken": "D9CB6CAB-3459-4E09-89F0-DE4CF3BB6CAE",
        "name": "pdcs-dev1d-harim-v2-090122",
        "version": "1.21"
    },
    "responseElements": {
        "message": "Launch template details can't be null for Custom ami type node group"
    },
    "requestID": "9b15e35e-8349-4f9f-9586-62e94e253308",
    "eventID": "6bbdb28c-3b56-4011-90a6-27f791fdf035",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "863380606983",
    "eventCategory": "Management"
}

the payload for UpdateNodegroupVersion is expecting LaunchTemplate details which the payload is missing

see: https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateNodegroupVersion.html

The user is using a Launch Template with a custom AMI. It appears that the problem might be because the payload is adding "version": "1.21" as you can see above. The AWS documentation says: " If you specify launchTemplate , and your launch template uses a custom AMI, then don't specify releaseVersion , or the node group update will fail."

So we are wondering if this is the problem here. If so, why is the version being supplied in the payload when it shouldn't be? The user said that when he manually uses the following payload/method with the AWS SDK then it works fine:

response = eks.update_nodegroup_version(
  clusterName = cluster_name,
  nodegroupName = nodeGroup,
  launchTemplate = {
    'name': launchTemplateName,
  },
  force=True
)

The user said this was working fine 3 weeks ago. Did something change with the AWS SDK or how we handle it?ย  We did notice this but not sure if it is related:
#72

When we looked at the user's AWS console, the EKS cluster was healthy, the Node Groups were created and healthy as well.

Business impact:

This is a blocker for them as they are not able to get their automation to work

Troubleshooting steps:

I tried to reproduce the problem in-house. I can get a vanilla EKS cluster to work fine. I am currently trying to test it with Launch Templates and a custom AMI like the user, but I am running into some other issues (AWS permissions which I'm working to get resolved). Onceย  my permissions in AWS get fixed, I'm hoping I can reproduce the problem.

Workaround:

Is workararound available and implemented? yes/no
What is the workaround:

Actual behavior:

EKS cluster with Launch Template and custom AMI should be provisioned successfully

Expected behavior:

EKS cluster with Launch Template and custom AMI is not provisioned successfully

You cannot specify an AMI Type other than CUSTOM error when creating EKS cluster

Rancher Version: v2.7.2
EKS Cluster Version: 1.24

When attempting to create (either within Rancher or using Terraform rancher provider) an EKS version 1.24 cluster using a launch template in which an AMI is defined, I am getting the error below. Using a launch template where an AMI is not defined that launch template works correctly and the node group gets created.

Waiting for API to be available:controller.FailureMessage{ClusterName:"rm-eks-03", Message_:"You cannot specify an AMI Type other than CUSTOM, when specifying an image id in your launch template.", NodegroupName:"rancher_archbox_node_group_amz_linux"}

Using the same launch template version I was able to create a node group using the AWS console (outside of Rancher) but with Rancher I got the error above. This leads me to believe that there is a bug within the EKS operator.

The launch template that is causing this error is.

{
    "LaunchTemplateVersions": [
        {
            "LaunchTemplateId": "lt-XXXXXXXXXXXXXXXXXX",
            "LaunchTemplateName": "rancher_archbox_node_group_amz_linux_lt",
            "VersionNumber": 7,
            "CreateTime": "2023-05-01T17:01:22.000Z",
            "CreatedBy": "XXXXXXXXXXXXXXXXXXXX",
            "DefaultVersion": false,
            "LaunchTemplateData": {
                "BlockDeviceMappings": [
                    {
                        "DeviceName": "/dev/xvda",
                        "Ebs": {
                            "VolumeSize": 200,
                            "VolumeType": "gp2"
                        }
                    }
                ],
                "ImageId": "ami-07bccaac087171156",
                "InstanceType": "t3.xlarge",
                "KeyName": "rancher-managed-cluster-ssh-key",
                "UserData": "TUlNRS1WZXJzaW9uOiAxLjAKQ2XXXXXXXXXXXXXXXX1ZQk9VTkRBUlk9PSIKCi0tPT1NWUJPVU5EQVJZPT0KQ29udGVudC1UeXBlOiB0ZXh0L3gtc2hlbGxzY3JpcHQ7IGNoYXJzZXQ9InVzLWFzY2lpIgoKIyEvYmluL2Jhc2gKL2V0Yy9la3MvYm9vdHN0cmFwLnNoIC0tYXBpc2VydmVyLWVuZHBvaW50ICdodHRwczovL0VDMUU5NEREODVEREI4OTc1RjFCOEQ0NEM3NkU2NDA0LmdyNy51cy13ZXN0LTIuZWtzLmFtYXpvbmF3cy5jb20nIC0tYjY0LWNsdXN0ZXItY2EgJ0xTMHRMUzFDUlVkSlRpQkRSVkpVU1VaSlEwRlVSUzB0TFMwdENrMUpTVU12YWtORFFXVmhaMEYzU1VKQlowbENRVVJCVGtKbmEzRm9hMmxIT1hjd1FrRlJjMFpCUkVGV1RWSk5kMFZSV1VSV1VWRkVSWGR3Y21SWFNtd0tZMjAxYkdSSFZucE5RalJZUkZSSmVrMUVVWGxOVkVVeVRWUkJkMDVHYjXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDFjMG8xVkN0bFV6QXJNMlkzYzBsbmQzTmFOMVJDVWxVMFdubEtTREJqUTJFM2VtUmFOa281VGtjd1dEaHdTa3BsQ2pWRWRHbHNPR1pOUjBjdk1YQllSbEJLY1c4dk1HZFZVVlZuWm5oWlZqSXdOWEF2VkRZMFdsZGFiMU1yVm10RWFUVkZWbGxMVlRkSFN6WXdPSFV3TlVnS1ZVRlRXVmd6YlhKMk1ub3lNVUZGY0hCR1dHNUtRbWxzTmpWQlRtNVVUbUpxWm5CU2VsZFNUSGhQU1V4R1ZEUndLMFJsV1hneFR5OVNRVTVXXXXXXXXXXXXXXXXXXXXXXXXY2tWWVVrdFhXVkpoY1hSVk9VdFFjQ3RvY3pNM1pFUXJSMFEyYUZKRVYzaFVVRnBHUzFNeFdTOTJDa0p2VEM5VmNYVjFOMUJ5YUZsNWNWQm1WRTFGTm1aa2RubG1WMFpxYm5saFVUQktkall3TlRsVWExZG9Xa2REVVhoMmNqSk1PSGxCVnpOWmVXUjJWa3dLUXpGUlBRb3RMUzB0TFVWT1JDQkRSVkpVU1VaSlEwRlVSUzB0TFMwdENnPT0nICdybS1la3MtMDEnCgotLT09TVlCT1VOREFSWT09LS1c",
                "TagSpecifications": [
                    {
                        "ResourceType": "instance",
                        "Tags": [
                            {
                                "Key": "Service",
                                "Value": "rancher"
                            },
                            {
                                "Key": "Managed",
                                "Value": "TF"
                            },
                            {
                                "Key": "Name",
                                "Value": "rancher_archbox_node_group_amz_linux_lt"
                            },
                            {
                                "Key": "Environment",
                                "Value": "dev"
                            },
                            {
                                "Key": "Owner",
                                "Value": "xxxxxxxxxxxx"
                            }
                        ]
                    }
                ],
                "SecurityGroupIds": [
                    "sg-XXXXXXXXXXXXXXXXX"
                ]
            }
        }
    ]
}

The AMI ami-07bccaac087171156 is amazon-eks-node-1.24-v20230411.

My assumption is that when Rancher makes the API call to AWS that it is setting the nodegroup AMI type to something other than CUSTOM, https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType . When I donโ€™t specify an AMI in the launch template looks like Rancher is setting the AMI type to AL2_x86_64 .

Hosted Rancher migration rollback process not working - "Cluster agent is not connected"

SURE-6154

Issue description:

Rancher Hosted Prime currently offers customers the ability to migrate their existing self-hosted Rancher Manager to our hosted environment. The Backup operator is used to take a customer backup and restore it into a new hosted environment. During this process, the server name needs to be updated to a "rancher.cloud" URL and all downstream clusters need to connect to the new URL. In the event something goes wrong, need the ability to rollback any changes made to the downstream clusters so that they point back to the customer's self-managed Rancher Manager. This rollback process which worked in older version (v2.5.x and earlier), seems to no longer work in v2.6.9.

Business impact:

High risk of migrating customers to Hosted, possibility of no a rollback may deter or prevent customers from migrating to Hosted.

Troubleshooting steps:

In addition to trying to run the "kubectl -f apply ..." command to reconfigure the cattle-cluster-agent, also tried doing a "kubectl -f delete ..." as well as manually removing the cattle-system, cattle-fleet-system, and cattle-impersonation-system namespaces. Tried restarting rancher and cluster agent pods. Turned on debug logging.

Repro steps:

Set up a new environment (v2.6.9) or use an existing with a provisioned downstream EKS cluster. Used v1.22.17.
Take a backup of the environment using the Rancher Backup operator (install it if it hasn't already been installed).
Set up a second Rancher Manager environment running 2.6.9 this should have a different servername URL. Don't run rancher (yet).
Restore the backup taken from the first environment.
Update the server-name setting to the name of your second environment.
Install Rancher Manager on the second environment.
Run the cluster registration command on the EKS cluster obtained from the second environment's UI. This should reconfigure the downstream cluster to talk to the second environment instead of the first environment. The first environment should show that the cluster agent is not connected and the second environment should show the cluster as healthy.
Rollback - Run the cluster registration command on the EKS cluster obtained from the first environment's UI. This should reconfigure the cluster to point back to the original Rancher Manager environment. However, this does not appear to work and the cluster remains in a disconnected state.

Workaround:

Is workararound available and implemented? unknown
What is the workaround:

Actual behavior:

After attempting rollback, the original Rancher Manager environment shows "Cluster agent is not connected" error for the downstream cluster:

Expected behavior:
After running the kubectl -f apply to reconfigure the cattle-cluster-agent, expecting the agent to reconnect to the original Rancher Manager server and not show any errors. If a new process is needed to rollback a cluster, need to know what that new process is.

Add support for AWS EKS custom vpc cni due to IP Exhaustion

Is your feature request related to a problem? Please describe.

There is a well known issue with many clusters or large clusters utilizing AWS EKS as a downstream cluster, where every node adds many secondary routable IP address and before you know it, you have exhausted your VPC routable subnets.

There are a few different options that AWS offers and one of those options is "custom networking":
https://aws.github.io/aws-eks-best-practices/networking/custom-networking/

This is a non-routable CG-NAT space set of subnets (ie 100.64.0.0/10) and is NOT applied to the EKS cluster, but rather an aws-node configuration that has to be updated at time of provisioning in order to avoid post provision Node rotation.

The request is to add support for the VPC Custom CNI.

Describe the solution you'd like

The request is to add support for creating the EKS clusters through the Rancher provider using the EKS cluster-addons as shown in this example from the AWS Terraform provider:
https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

cluster_addons = {
    coredns    = {}
    kube-proxy = {}
    vpc-cni = {
      # Specify the VPC CNI addon should be deployed before compute to ensure
      # the addon is configured before data plane compute resources are created
      # See README for further details
      before_compute = true
      most_recent    = true # To ensure access to the latest settings provided
      configuration_values = jsonencode({
        env = {
          # Reference https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }
resource "kubectl_manifest" "eni_config" {
  for_each = zipmap(local.azs, slice(module.vpc.private_subnets, 3, 6))

  yaml_body = yamlencode({
    apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
    kind       = "ENIConfig"
    metadata = {
      name = each.key
    }
    spec = {
      securityGroups = [
        module.eks.cluster_primary_security_group_id,
        module.eks.node_security_group_id,
      ]
      subnet = each.value
    }
  })
}

As you will notice, the subnet is NOT added at runtime in the above AWS example (the routable subnets are sliced and then the non-routable are applied on the ENIConfig as a post cluster step), so using the existing Rancher eks_config_v2 will not work as is without some changes to it.

Describe alternatives you've considered

I have done a manual POC of the steps described in this Terraform and I am currently considering implementing the code outlined in this blog:
https://medium.com/webstep/dont-let-your-eks-clusters-eat-up-all-your-ip-addresses-1519614e9daa

Since this works in a manual setup, I believe it will work as a "post processing" set of Terraform, however, it is really inefficient to have to go and "roll" every node at the end of provisioning in order for this to work (I confirmed this is needed at the end of my manual POC).

Additional context

I have met with AWS vendor regarding this as well. Using this custom CNI solution was one of the easier recommended solutions and they provided the link to their Terraform example as source.

Cluster not editable while nodegroup version upgrade is in progress

Rancher version:

v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
eks-operator:v1.3.0-rc4

Describe the bug
Cluster not editable while sequential nodegroup version upgrade is in progress
Found while validating #209

Steps

  • Provision EKS cluster with version less than highest supported one (eg. 1.26), with two node groups: test1 & test2
  • Upgrade EKS cluster to higher version (At this step, only CP is upgraded) and UI displays - "A new cluster version has been selected. Once completed you may come back and upgrade the node version."
  • Out of 2 node pools, upgrade only test1 to CP version
  • The cluster goes into Updating state, cluster Nodegroup information is not visible/editable (cluster YAML details below at this state of cluster)

Screenshots
image

image

Cluster YAML:

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    authz.management.cattle.io/initial-sync: 'true'
    clusters.management.cattle.io/ke-last-refresh: '1697814813'
    field.cattle.io/creatorId: user-blvw4
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: 1.27.6-eks-f8587cb
  creationTimestamp: '2023-10-20T11:41:55Z'
  finalizers:
    - wrangler.cattle.io/mgmt-cluster-remove
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
  generateName: c-
  generation: 171
  labels:
    cattle.io/creator: norman
    provider.cattle.io: eks
  managedFields:
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:field.cattle.io/creatorId: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cattle.io/creator: {}
        f:spec:
          .: {}
          f:displayName: {}
          f:dockerRootDir: {}
          f:eksConfig:
            .: {}
            f:amazonCredentialSecret: {}
            f:displayName: {}
            f:imported: {}
            f:kmsKey: {}
            f:kubernetesVersion: {}
            f:loggingTypes: {}
            f:privateAccess: {}
            f:publicAccess: {}
            f:region: {}
            f:secretsEncryption: {}
            f:securityGroups: {}
            f:serviceRole: {}
            f:subnets: {}
            f:tags: {}
          f:enableClusterAlerting: {}
          f:enableClusterMonitoring: {}
          f:enableNetworkPolicy: {}
          f:internal: {}
          f:windowsPreferedCluster: {}
        f:status:
          .: {}
          f:appliedEnableNetworkPolicy: {}
      manager: Go-http-client
      operation: Update
      time: '2023-10-20T15:11:26Z'
    - apiVersion: management.cattle.io/v3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:authz.management.cattle.io/creator-role-bindings: {}
            f:authz.management.cattle.io/initial-sync: {}
            f:clusters.management.cattle.io/ke-last-refresh: {}
            f:lifecycle.cattle.io/create.cluster-agent-controller-cleanup: {}
            f:lifecycle.cattle.io/create.cluster-provisioner-controller: {}
            f:lifecycle.cattle.io/create.cluster-scoped-gc: {}
            f:lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: {}
            f:management.cattle.io/current-cluster-controllers-version: {}
          f:finalizers:
            .: {}
            v:"controller.cattle.io/cluster-agent-controller-cleanup": {}
            v:"controller.cattle.io/cluster-provisioner-controller": {}
            v:"controller.cattle.io/cluster-scoped-gc": {}
            v:"controller.cattle.io/mgmt-cluster-rbac-remove": {}
            v:"wrangler.cattle.io/mgmt-cluster-remove": {}
          f:labels:
            f:provider.cattle.io: {}
        f:spec:
          f:agentImageOverride: {}
          f:answers: {}
          f:clusterSecrets: {}
          f:description: {}
          f:desiredAgentImage: {}
          f:desiredAuthImage: {}
          f:eksConfig:
            f:ebsCSIDriver: {}
            f:nodeGroups: {}
            f:publicAccessSources: {}
          f:fleetWorkspaceName: {}
          f:localClusterAuthEndpoint:
            .: {}
            f:enabled: {}
        f:status:
          f:agentFeatures:
            .: {}
            f:embedded-cluster-api: {}
            f:fleet: {}
            f:monitoringv1: {}
            f:multi-cluster-management: {}
            f:multi-cluster-management-agent: {}
            f:provisioningv2: {}
            f:rke2: {}
          f:agentImage: {}
          f:aksStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:rbacEnabled: {}
            f:upstreamSpec: {}
          f:allocatable:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:apiEndpoint: {}
          f:appliedAgentEnvVars: {}
          f:appliedPodSecurityPolicyTemplateId: {}
          f:appliedSpec:
            .: {}
            f:agentImageOverride: {}
            f:answers: {}
            f:clusterSecrets: {}
            f:description: {}
            f:desiredAgentImage: {}
            f:desiredAuthImage: {}
            f:displayName: {}
            f:eksConfig:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:enableClusterAlerting: {}
            f:enableClusterMonitoring: {}
            f:enableNetworkPolicy: {}
            f:internal: {}
            f:localClusterAuthEndpoint:
              .: {}
              f:enabled: {}
            f:windowsPreferedCluster: {}
          f:authImage: {}
          f:caCert: {}
          f:capabilities:
            .: {}
            f:loadBalancerCapabilities: {}
          f:capacity:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:conditions: {}
          f:driver: {}
          f:eksStatus:
            .: {}
            f:generatedNodeRole: {}
            f:managedLaunchTemplateID: {}
            f:managedLaunchTemplateVersions:
              .: {}
              f:test1: {}
              f:test2: {}
            f:privateRequiresTunnel: {}
            f:securityGroups: {}
            f:subnets: {}
            f:upstreamSpec:
              .: {}
              f:amazonCredentialSecret: {}
              f:displayName: {}
              f:ebsCSIDriver: {}
              f:imported: {}
              f:kmsKey: {}
              f:kubernetesVersion: {}
              f:loggingTypes: {}
              f:nodeGroups: {}
              f:privateAccess: {}
              f:publicAccess: {}
              f:publicAccessSources: {}
              f:region: {}
              f:secretsEncryption: {}
              f:securityGroups: {}
              f:serviceRole: {}
              f:subnets: {}
              f:tags: {}
            f:virtualNetwork: {}
          f:gkeStatus:
            .: {}
            f:privateRequiresTunnel: {}
            f:upstreamSpec: {}
          f:limits:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:linuxWorkerCount: {}
          f:nodeCount: {}
          f:provider: {}
          f:requested:
            .: {}
            f:cpu: {}
            f:memory: {}
            f:pods: {}
          f:serviceAccountTokenSecret: {}
          f:version:
            .: {}
            f:buildDate: {}
            f:compiler: {}
            f:gitCommit: {}
            f:gitTreeState: {}
            f:gitVersion: {}
            f:goVersion: {}
            f:major: {}
            f:minor: {}
            f:platform: {}
      manager: rancher
      operation: Update
      time: '2023-10-20T15:14:07Z'
  name: c-sp94j
  resourceVersion: '312219'
  uid: abd5460d-dfe1-438f-a865-69b740f6eeef
spec:
  agentImageOverride: ''
  answers: {}
  clusterSecrets: {}
  description: ''
  desiredAgentImage: ''
  desiredAuthImage: ''
  displayName: cpinjani-eks28
  dockerRootDir: /var/lib/docker
  eksConfig:
    amazonCredentialSecret: cattle-global-data:cc-rhnmv
    displayName: cpinjani-eks28
    ebsCSIDriver: null
    imported: false
    kmsKey: ''
    kubernetesVersion: '1.27'
    loggingTypes: []
    nodeGroups:
      - desiredSize: 1
        diskSize: 40
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.large
        labels: {}
        launchTemplate: null
        maxSize: 1
        minSize: 1
        nodeRole: >-
          arn:aws:iam::<REDACTED>
        nodegroupName: test1
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - <REDACTED>
        tags: {}
        userData: null
        version: null
      - desiredSize: 1
        diskSize: 20
        ec2SshKey: null
        gpu: false
        imageId: null
        instanceType: t3.medium
        labels: {}
        launchTemplate: null
        maxSize: 1
        minSize: 1
        nodeRole: >-
          arn:aws:iam::<REDACTED>
        nodegroupName: test2
        requestSpotInstances: false
        resourceTags: {}
        spotInstanceTypes: null
        subnets:
          - <REDACTED>
        tags: {}
        userData: null
        version: '1.26'
    privateAccess: false
    publicAccess: true
    publicAccessSources:
      - 0.0.0.0/0
    region: us-east-2
    secretsEncryption: false
    securityGroups: []
    serviceRole: ''
    subnets: []
    tags: {}
  enableClusterAlerting: false
  enableClusterMonitoring: false
  enableNetworkPolicy: false
  fleetWorkspaceName: fleet-default
  internal: false
  localClusterAuthEndpoint:
    enabled: false
  windowsPreferedCluster: false
status:
  agentFeatures:
    embedded-cluster-api: false
    fleet: false
    monitoringv1: false
    multi-cluster-management: false
    multi-cluster-management-agent: true
    provisioningv2: false
    rke2: false
  agentImage: rancher/rancher-agent:v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
  aksStatus:
    privateRequiresTunnel: null
    rbacEnabled: null
    upstreamSpec: null
  allocatable:
    cpu: 7720m
    memory: 25184044Ki
    pods: '122'
  apiEndpoint: <REDACTED>
  appliedAgentEnvVars:
    - name: CATTLE_SERVER_VERSION
      value: v2.8-136e9ccd054866dd0f504abfa592dc2b519179c9-head
    - name: CATTLE_INSTALL_UUID
      value: f2e99488-620a-4edb-ac35-aa0d5247798f
    - name: CATTLE_INGRESS_IP_DOMAIN
      value: sslip.io
  appliedEnableNetworkPolicy: false
  appliedPodSecurityPolicyTemplateId: ''
  appliedSpec:
    agentImageOverride: ''
    answers: {}
    clusterSecrets: {}
    description: ''
    desiredAgentImage: ''
    desiredAuthImage: ''
    displayName: ''
    eksConfig:
      amazonCredentialSecret: cattle-global-data:cc-rhnmv
      displayName: cpinjani-eks28
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.27'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 40
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.large
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test1
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets:
            - <REDACTED>
          tags: {}
          userData: ''
          version: '1.27'
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: ''
          gpu: false
          imageId: ''
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test2
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: []
          subnets:
            - <REDACTED>
          tags: {}
          userData: ''
          version: '1.26'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: us-east-2
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    enableClusterAlerting: false
    enableClusterMonitoring: false
    enableNetworkPolicy: null
    internal: false
    localClusterAuthEndpoint:
      enabled: false
    windowsPreferedCluster: false
  authImage: ''
  caCert: >-
    <REDACTED>
  capabilities:
    loadBalancerCapabilities: {}
  capacity:
    cpu: '8'
    memory: 28012332Ki
    pods: '122'
  conditions:
    - lastUpdateTime: ''
      status: 'True'
      type: Pending
    - lastUpdateTime: '2023-10-20T11:51:13Z'
      status: 'True'
      type: Provisioned
    - lastUpdateTime: '2023-10-20T11:54:57Z'
      status: 'True'
      type: Waiting
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: BackingNamespaceCreated
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: DefaultProjectCreated
    - lastUpdateTime: '2023-10-20T11:41:55Z'
      status: 'True'
      type: SystemProjectCreated
    - lastUpdateTime: '2023-10-20T11:41:56Z'
      status: 'True'
      type: InitialRolesPopulated
    - lastUpdateTime: '2023-10-20T11:41:56Z'
      status: 'True'
      type: CreatorMadeOwner
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: NoDiskPressure
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: NoMemoryPressure
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: SecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: ServiceAccountSecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: RKESecretsMigrated
    - lastUpdateTime: '2023-10-20T11:41:57Z'
      status: 'True'
      type: ACISecretsMigrated
    - lastUpdateTime: '2023-10-20T14:16:32Z'
      status: 'True'
      type: Connected
    - lastUpdateTime: '2023-10-20T15:13:34Z'
      status: Unknown
      type: Updated
    - lastUpdateTime: '2023-10-20T14:16:37Z'
      status: 'True'
      type: Ready
    - lastUpdateTime: '2023-10-20T11:53:57Z'
      status: 'True'
      type: GlobalAdminsSynced
    - lastUpdateTime: '2023-10-20T11:53:58Z'
      status: 'True'
      type: SystemAccountCreated
    - lastUpdateTime: '2023-10-20T11:54:00Z'
      status: 'True'
      type: AgentDeployed
  driver: EKS
  eksStatus:
    generatedNodeRole: >-
      arn:aws:iam::<REDACTED>
    managedLaunchTemplateID: <REDACTED>
    managedLaunchTemplateVersions:
      test1: '4'
      test2: '5'
    privateRequiresTunnel: null
    securityGroups: null
    subnets:
      - <REDACTED>
    upstreamSpec:
      amazonCredentialSecret: cattle-global-data:cc-rhnmv
      displayName: cpinjani-eks28
      ebsCSIDriver: null
      imported: false
      kmsKey: ''
      kubernetesVersion: '1.27'
      loggingTypes: []
      nodeGroups:
        - desiredSize: 1
          diskSize: 40
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.large
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test1
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - <REDACTED>
          tags: {}
          userData: null
          version: null
        - desiredSize: 1
          diskSize: 20
          ec2SshKey: null
          gpu: false
          imageId: null
          instanceType: t3.medium
          labels: {}
          launchTemplate: null
          maxSize: 1
          minSize: 1
          nodeRole: >-
            arn:aws:iam::<REDACTED>
          nodegroupName: test2
          requestSpotInstances: false
          resourceTags: {}
          spotInstanceTypes: null
          subnets:
            - <REDACTED>
          tags: {}
          userData: null
          version: '1.26'
      privateAccess: false
      publicAccess: true
      publicAccessSources:
        - 0.0.0.0/0
      region: us-east-2
      secretsEncryption: false
      securityGroups: []
      serviceRole: ''
      subnets: []
      tags: {}
    virtualNetwork: <REDACTED>
  gkeStatus:
    privateRequiresTunnel: null
    upstreamSpec: null
  limits:
    cpu: '0'
    memory: 340Mi
    pods: '0'
  linuxWorkerCount: 4
  nodeCount: 4
  provider: eks
  requested:
    cpu: 700m
    memory: 140Mi
    pods: '14'
  serviceAccountTokenSecret: cluster-serviceaccounttoken-jjr5t
  version:
    buildDate: '2023-09-18T22:12:19Z'
    compiler: gc
    gitCommit: b6911bf9eade7d8ca7dd82af5e80626965829947
    gitTreeState: clean
    gitVersion: v1.27.6-eks-f8587cb
    goVersion: go1.20.8
    major: '1'
    minor: 27+
    platform: linux/amd64

Sequential upgrade of multiple node groups version not completing successfully

Rancher version:

2.7-head - 95f0b50
eks-operator:v1.2.2-rc3

Installation option: Docker
Proxy/Cert Details: Self-signed

Information about the Cluster
Kubernetes version: 1.25
Cluster Type: Downstream EKS cluster

User Information
What is the role of the user logged in? Standard user

Describe the bug
Sequential upgrade of multiple node groups version not completing successfully. In case of simultaneous upgrade of Node groups, it completes successfully.

Steps

  1. Provision EKS cluster with version less than highest supported one, with more than one node group
  2. Upgrade EKS cluster to higher version (At this step, only CP is upgraded) and UI displays - "A new cluster version has been selected. Once completed you may come back and upgrade the node version."
  3. Out of 2 node pools, upgrade only one to CP version
  4. Now update the remaining Node group to CP version using Ranhcer UI

Expected Result
Sequential upgrade of multiple node groups version must complete successfully

Screenshots
After step 4, below is Cluster state on AWS:
image

Rancher shows operation completed, but other Node group is still on lower version and Cluster Edit still shows and upgrade available.
image

cc: @rancher/highlander

PR's

(SURE-4552) Ability to configure agent deployments per cluster

Ability to configure agent deployments per cluster

See also rancher/rancher#41035

Business case

AWS - EKS clusters utilized two node groups. One of the node groups leveraged EC2 on-demand instances and the other - spot instances (within auto-scaling groups). The on-demand node group nodes were assigned a custom taint (i.e., platform/node-lifecycle=NoSchedule) to ensure that only specific application workloads were being scheduled on those particular nodes. Alternatively, for the more ephemeral / least critical application workloads, the preference was to schedule them onto the spot instances / node-groups. Therefore, within the default configuration, when the cattle-cluster-agent pods were being scheduled, they were always getting assigned spot instance because of the custom taint being added (above), causing various flavors of instability within the cluster instance of Rancher UI manager. https://rancher.com/docs/rancher/v2.5/en/cluster-provisioning/rke-clusters/rancher-agents/#scheduling-rules

The solution would be to provide a method for setting custom parameters to override cattle-cluster-agent and fleet-agent settings at a per cluster level or when deploying rancher via any of the supported deployment options (i.e., helm). (we are focusing on the per-cluster configuration but should acknowledge the original ask).

There are two major asks for configuring agent deployments:

Setting Tolerations and Affinity Rules (Rancher Federal)
Setting Resource Limits

As Rancher does not have a unified cluster interface for every type of cluster consideration needs to be given to how we present these options in each distribution and we may wish to do a phased rollout, the customer's requesting this functionality use.

We should also add RKE1 to any MVP as well.

Original Requests:

Provide a taint directly onto the target "on-demand" nodes of "cattle.io/cluster-agent=true:NoSchedule", given that there is already an existing toleration for that scenario Outside of doing a patch of the cattle-cluster-agent deployment directly, or leveraging gitops solutions to synchronize diffs from underlying git repo (i.e., via kustomize/k8s manifests), there was no alternative solution that would guarantee that somewhere down the line (that we could think of), a subsequent process wouldn't override said patch (i.e., rancher upgrade) and blow away updates.

The cattle-cluster-agent gets deployed where there are not sufficient resources, and therefore want to define resource limits. They also mention it as good kubernetes practice to always operate with resource limits

Resource limits CAN be set on cattle-cluster-agent using kubernetes patch like RFed, but it is cumbersome, and should be settable in Rancher UI

Panic while sync of node group with status - Create failed

Rancher Server Setup
Rancher version: 2.7-head - fbca7c3
Installation option: Helm Chart
If Helm Chart, Kubernetes Cluster and version: EKS 1.25

Information about the Cluster
Kubernetes version: EKS with Kubernetes 1.25
Cluster Type (Downstream): Hosted = EKS

User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) - Admin

Describe the bug
Rancher is crashing while syncing with AWS of node group with status of Create failed

To Reproduce

  • Add valid AWS cloud credentials
  • Provision downstream EKS cluster with nodegroup with Rancher created Launch template for nodes & t3.large instance type
  • On AWS console, add new nodegroup with nodes using above launch template's Default version 1 (not latest version)
  • Node group creation fails on AWS, on next sync Rancher crashes

Logs

2023-05-29T10:39:13.278461607Z 2023/05/29 10:39:13 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:39:14.330149095Z 2023/05/29 10:39:14 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:44:14.278471912Z 2023/05/29 10:44:14 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:44:15.346803789Z 2023/05/29 10:44:15 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:49:15.279370474Z 2023/05/29 10:49:15 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:49:16.355974571Z 2023/05/29 10:49:16 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:54:16.278451114Z 2023/05/29 10:54:16 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:54:17.323399780Z 2023/05/29 10:54:17 [INFO] cluster [c-mc9lr] matches upstream, skipping spec sync
2023-05-29T10:59:17.271275337Z 2023/05/29 10:59:17 [INFO] checking cluster [c-mc9lr] upstream state for changes
2023-05-29T10:59:18.826879608Z E0529 10:59:18.826702      33 runtime.go:79] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
2023-05-29T10:59:18.826923005Z goroutine 5355 [running]:
2023-05-29T10:59:18.826929941Z k8s.io/apimachinery/pkg/util/runtime.logPanic({0x428e3a0?, 0xc0074b2dc8})
2023-05-29T10:59:18.826934975Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
2023-05-29T10:59:18.826939771Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00deb6a90?})
2023-05-29T10:59:18.826944525Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
2023-05-29T10:59:18.826948931Z panic({0x428e3a0, 0xc0074b2dc8})
2023-05-29T10:59:18.826954038Z  /usr/lib64/go/1.19/src/runtime/panic.go:884 +0x212
2023-05-29T10:59:18.826960880Z github.com/rancher/eks-operator/controller.BuildUpstreamClusterState({0xc00a4a4ee0, 0xc}, {0xc01982d488, 0x14}, 0xc011179618, {0xc0117c36d0, 0x2, 0xc00a4a4f79?}, {0x4fa5c80, 0xc00b9ad908}, ...)
2023-05-29T10:59:18.826966881Z  /go/pkg/mod/github.com/rancher/[email protected]/controller/eks-cluster-config-handler.go:861 +0x1729
2023-05-29T10:59:18.826972822Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.BuildEKSUpstreamSpec({0x4f9ccf8?, 0xc00a145f80?}, 0xc0034eca80)
2023-05-29T10:59:18.826978467Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/eks_upstream_spec.go:83 +0x525
2023-05-29T10:59:18.826987235Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.getComparableUpstreamSpec({0x4f9ccf8, 0xc00a145f80}, {0x7fa430e02c60, 0xc00019f730}, 0xc0034eca80)
2023-05-29T10:59:18.826992305Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:288 +0xec
2023-05-29T10:59:18.826996827Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).refreshClusterUpstreamSpec(0xc005a07220, 0xc0034eca80, {0x45b2d0b, 0x3})
2023-05-29T10:59:18.827001264Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:155 +0xd1
2023-05-29T10:59:18.827005685Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).onClusterChange(0xc005a07220, {0xc00fb45e60, 0x7}, 0xc0034eca80)
2023-05-29T10:59:18.827010308Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:79 +0x165
2023-05-29T10:59:18.827014755Z github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3.FromClusterHandlerToHandler.func1({0xc00fb45e60?, 0x0?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.827019430Z  /go/src/github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3/cluster.go:105 +0x44
2023-05-29T10:59:18.827026065Z github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x4514280?, {0xc00fb45e60?, 0x4626c7d?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.827030854Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x38
2023-05-29T10:59:18.827035753Z github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000d6e3c0, {0xc00fb45e60, 0x7}, {0x4f75a68, 0xc00a382a80})
2023-05-29T10:59:18.827040398Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x23f
2023-05-29T10:59:18.827045367Z github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000e4c630, {0xc00fb45e60, 0x7})
2023-05-29T10:59:18.827050000Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:232 +0x93
2023-05-29T10:59:18.827070669Z github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000e4c630, {0x39a6ce0?, 0xc00deb6a90?})
2023-05-29T10:59:18.827075678Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:213 +0x105
2023-05-29T10:59:18.827081198Z github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000e4c630)
2023-05-29T10:59:18.827085961Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:190 +0x46
2023-05-29T10:59:18.827090760Z github.com/rancher/lasso/pkg/controller.(*controller).runWorker(0xc0043ef6a0?)
2023-05-29T10:59:18.827095466Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:179 +0x25
2023-05-29T10:59:18.827100555Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0a28326e?)
2023-05-29T10:59:18.827105380Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
2023-05-29T10:59:18.827110588Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x38737574617473?, {0x4f67880, 0xc00426c210}, 0x1, 0xc001339b00)
2023-05-29T10:59:18.827115298Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
2023-05-29T10:59:18.827133385Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xa646e756f662073?, 0x3b9aca00, 0x0, 0x54?, 0x1422001006a3d19b?)
2023-05-29T10:59:18.827139588Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
2023-05-29T10:59:18.827144575Z k8s.io/apimachinery/pkg/util/wait.Until(0x6874242a64657470?, 0x616974696e692065?, 0x2073656d616e206c?)
2023-05-29T10:59:18.827149867Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92 +0x25
2023-05-29T10:59:18.827155012Z created by github.com/rancher/lasso/pkg/controller.(*controller).run
2023-05-29T10:59:18.827159906Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:147 +0x2a7
2023-05-29T10:59:18.830401178Z panic: runtime error: index out of range [0] with length 0 [recovered]
2023-05-29T10:59:18.830422679Z  panic: runtime error: index out of range [0] with length 0
2023-05-29T10:59:18.830436773Z 
2023-05-29T10:59:18.830443958Z goroutine 5355 [running]:
2023-05-29T10:59:18.830449778Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00deb6a90?})
2023-05-29T10:59:18.830455640Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
2023-05-29T10:59:18.830460955Z panic({0x428e3a0, 0xc0074b2dc8})
2023-05-29T10:59:18.830465881Z  /usr/lib64/go/1.19/src/runtime/panic.go:884 +0x212
2023-05-29T10:59:18.830471711Z github.com/rancher/eks-operator/controller.BuildUpstreamClusterState({0xc00a4a4ee0, 0xc}, {0xc01982d488, 0x14}, 0xc011179618, {0xc0117c36d0, 0x2, 0xc00a4a4f79?}, {0x4fa5c80, 0xc00b9ad908}, ...)
2023-05-29T10:59:18.830477522Z  /go/pkg/mod/github.com/rancher/[email protected]/controller/eks-cluster-config-handler.go:861 +0x1729
2023-05-29T10:59:18.830482562Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.BuildEKSUpstreamSpec({0x4f9ccf8?, 0xc00a145f80?}, 0xc0034eca80)
2023-05-29T10:59:18.830487640Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/eks_upstream_spec.go:83 +0x525
2023-05-29T10:59:18.830492466Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.getComparableUpstreamSpec({0x4f9ccf8, 0xc00a145f80}, {0x7fa430e02c60, 0xc00019f730}, 0xc0034eca80)
2023-05-29T10:59:18.830497365Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:288 +0xec
2023-05-29T10:59:18.830502458Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).refreshClusterUpstreamSpec(0xc005a07220, 0xc0034eca80, {0x45b2d0b, 0x3})
2023-05-29T10:59:18.830519214Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:155 +0xd1
2023-05-29T10:59:18.830524923Z github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher.(*clusterRefreshController).onClusterChange(0xc005a07220, {0xc00fb45e60, 0x7}, 0xc0034eca80)
2023-05-29T10:59:18.830529821Z  /go/src/github.com/rancher/rancher/pkg/controllers/management/clusterupstreamrefresher/cluster_upstream_refresher.go:79 +0x165
2023-05-29T10:59:18.830535053Z github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3.FromClusterHandlerToHandler.func1({0xc00fb45e60?, 0x0?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.830539798Z  /go/src/github.com/rancher/rancher/pkg/generated/controllers/management.cattle.io/v3/cluster.go:105 +0x44
2023-05-29T10:59:18.830544900Z github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x4514280?, {0xc00fb45e60?, 0x4626c7d?}, {0x4f75a68?, 0xc0034eca80?})
2023-05-29T10:59:18.830549981Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedcontroller.go:29 +0x38
2023-05-29T10:59:18.830554887Z github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000d6e3c0, {0xc00fb45e60, 0x7}, {0x4f75a68, 0xc00a382a80})
2023-05-29T10:59:18.830559665Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/sharedhandler.go:75 +0x23f
2023-05-29T10:59:18.830564310Z github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc000e4c630, {0xc00fb45e60, 0x7})
2023-05-29T10:59:18.830569051Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:232 +0x93
2023-05-29T10:59:18.830573755Z github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc000e4c630, {0x39a6ce0?, 0xc00deb6a90?})
2023-05-29T10:59:18.830578516Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:213 +0x105
2023-05-29T10:59:18.830583918Z github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc000e4c630)
2023-05-29T10:59:18.830589085Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:190 +0x46
2023-05-29T10:59:18.830594238Z github.com/rancher/lasso/pkg/controller.(*controller).runWorker(0xc0043ef6a0?)
2023-05-29T10:59:18.830599133Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:179 +0x25
2023-05-29T10:59:18.830604091Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0a28326e?)
2023-05-29T10:59:18.830609038Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x3e
2023-05-29T10:59:18.830614177Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x38737574617473?, {0x4f67880, 0xc00426c210}, 0x1, 0xc001339b00)
2023-05-29T10:59:18.830625830Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xb6
2023-05-29T10:59:18.830631545Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xa646e756f662073?, 0x3b9aca00, 0x0, 0x54?, 0x1422001006a3d19b?)
2023-05-29T10:59:18.830637133Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x89
2023-05-29T10:59:18.830646011Z k8s.io/apimachinery/pkg/util/wait.Until(0x6874242a64657470?, 0x616974696e692065?, 0x2073656d616e206c?)
2023-05-29T10:59:18.830651251Z  /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92 +0x25
2023-05-29T10:59:18.830656007Z created by github.com/rancher/lasso/pkg/controller.(*controller).run
2023-05-29T10:59:18.830660908Z  /go/pkg/mod/github.com/rancher/[email protected]/pkg/controller/controller.go:147 +0x2a7

Result
Panic while syncing of node group with status - Create failed

Screenshots
image

Cluster provisioning logs entries not logged on UI

Rancher Server Setup
Rancher version: 2.7-head - fbca7c3
Installation option: Helm Chart
If Helm Chart, Kubernetes Cluster and version: EKS 1.25

Information about the Cluster
Kubernetes version: EKS with Kubernetes 1.25
Cluster Type (Downstream): Hosted = EKS

User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) - Admin

Describe the bug
Cluster provisioning logs entries not logged on UI
Issue exists for other Hosted providers also.

To Reproduce
Add valid AWS cloud credentials
Provision downstream EKS cluster
Check Cluster > Provisioning Log tab

Result
User unable to check provisioning logs

Screenshots
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.