GithubHelp home page GithubHelp logo

rancher / aks-operator Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 27.0 1.58 MB

Azure Kubernetes Service operator for Rancher

License: Apache License 2.0

Dockerfile 0.29% Makefile 2.14% Shell 2.23% Go 95.06% Mustache 0.27%

aks-operator's Introduction

Rancher

This file is auto-generated from README-template.md, please make any changes there.

Build Status Docker Pulls Go Report Card

Rancher is an open source container management platform built for organizations that deploy containers in production. Rancher makes it easy to run Kubernetes everywhere, meet IT requirements, and empower DevOps teams.

Latest Release

  • v2.8
    • Latest - v2.8.3 - rancher/rancher:v2.8.3 / rancher/rancher:latest - Read the full release notes.
    • Stable - v2.8.3 - rancher/rancher:v2.8.3 / rancher/rancher:stable - Read the full release notes.
  • v2.7
    • Latest - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
    • Stable - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
  • v2.6
    • Latest - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.
    • Stable - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.

To get automated notifications of our latest release, you can watch the announcements category in our forums, or subscribe to the RSS feed https://forums.rancher.com/c/announcements.rss.

Quick Start

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher

Open your browser to https://localhost

Installation

See Installing/Upgrading Rancher for all installation options.

Minimum Requirements

  • Operating Systems
    • Please see Support Matrix for specific OS versions for each Rancher version. Note that the link will default to the support matrix for the latest version of Rancher. Use the left navigation menu to select a different Rancher version.
  • Hardware & Software

Using Rancher

To learn more about using Rancher, please refer to our Rancher Documentation.

Source Code

This repo is a meta-repo used for packaging and contains the majority of Rancher codebase. For other Rancher projects and modules, see go.mod for the full list.

Rancher also includes other open source libraries and projects, see go.mod for the full list.

Build configuration

Refer to the build docs on how to customize the building and packaging of Rancher.

Support, Discussion, and Community

If you need any help with Rancher, please join us at either our Rancher forums or Slack where most of our team hangs out at.

Please submit any Rancher bugs, issues, and feature requests to rancher/rancher.

For security issues, please first check our security policy and email [email protected] instead of posting a public issue in GitHub. You may (but are not required to) use the GPG key located on Keybase.

License

Copyright (c) 2014-2024 Rancher Labs, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

aks-operator's People

Contributors

a-blender avatar aiyengar2 avatar alexander-demicev avatar cbron avatar chiukapoor avatar cmurphy avatar dependabot[bot] avatar furkatgofurov7 avatar harrisonwaffel avatar kevinjoiner avatar kinarashah avatar macedogm avatar mbologna avatar mjura avatar oats87 avatar oxr463 avatar paynejacob avatar phillipsj avatar rajan2 avatar richardcase avatar rmweir avatar salasberryfin avatar smallteeths avatar superseb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aks-operator's Issues

Deprecation warnings for cluster.x-k8s.io in logs

Rancher version:

v2.7-0b2f7fa0f878c3402ddf416a5b622327949b9348-head
aks-operator:v1.1.3-rc1

Cluster Type: Downstream AKS cluster
Issue may also exists for other hosted providers.

Describe the bug
Deprecation warnings for cluster.x-k8s.io/v1alpha3 in Rancher pod logs while performing CRUD operations

Logs

W0810 08:17:11.341671      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Cluster is deprecated; use cluster.x-k8s.io/v1beta1 Cluster
W0810 08:17:38.902696      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineSet is deprecated; use cluster.x-k8s.io/v1beta1 MachineSet
W0810 08:19:13.335649      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineDeployment is deprecated; use cluster.x-k8s.io/v1beta1 MachineDeployment
W0810 08:19:58.448549      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineHealthCheck is deprecated; use cluster.x-k8s.io/v1beta1 MachineHealthCheck
W0810 08:21:18.464801      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
W0810 08:23:16.904796      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineSet is deprecated; use cluster.x-k8s.io/v1beta1 MachineSet
W0810 08:25:48.450469      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineHealthCheck is deprecated; use cluster.x-k8s.io/v1beta1 MachineHealthCheck
W0810 08:26:10.343897      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Cluster is deprecated; use cluster.x-k8s.io/v1beta1 Cluster
W0810 08:26:34.337982      38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineDeployment is deprecated; use cluster.x-k8s.io/v1beta1 MachineDeployment

Found while validating rancher/rancher#42496

Support creating release PR when release.yaml is empty

Following what was done with the other hosted providers, edit the update-rancher-charts script to support creating a release PR against rancher/charts when release.yaml is empty, since this file is cleaned periodically.

(SURE-5339) Upgrading downstream AKS clusters not working for both control plane and worker node pools.

Issue description:

Upgrading downstream AKS clusters not working for both control plane and worker node pools

Business impact:

Unable to upgrade downstream clusters means it can't follow along with release schedule

Troubleshooting steps:

Re-create environment in AKS and import into Rancher

Repro steps:

step 1: attempt to upgrade downstream 
step 2

Workaround:

Is workararound available and implemented? no
What is the workaround: n/a

Actual behavior:

Throws back errors due to not being able to fulfill a particular parameter

Expected behavior:

Upgrade process completes successfully on downstream clusters

(SURE-4646) Rancher deletes unmanaged objects

Issue description:

Rancher deletes VNET created outside of Rancher upon cluster deletion if VNET is empty

Business impact:

We have managment VNET by terraform and some peering in our onprem networks manualy by Network Team. So in that case if Rancher delete VNET, we need recreate this, and Network Team need repair peering.

Reproduction steps:

Delete AKS cluster using VNET not created by Rancher but only used by Rancher AKS cluster

Customer reports these are the steps taken:

  1. Create VNET on Azure with Terraform
  2. Create new kubernetes cluster with Terraform and import as a downstream cluster into Rancher
  3. Delete imported downstream cluster and see VNET destroyed

Workaround:

Recommended that customer tag VNET objects to skip deletion, but they said locking resources in azure would be problematic.

Actual behavior:

Rancher deletes VNET objects if there are no resources using it after a cluster is deleted

Expected behavior:

Rancher ignores Azure VNET objects not created by Rancher

Unable to deploy AKS cluster due to Rancher generating and appending characters to the node resource group which is too long.

SURE-4913

Issue description:

Customer is unable to deploy AKS cluster due to Rancher generating and appending characters to the cluster name 

Business impact:

Unable to deploy AKS cluster using Rancher. 

Troubleshooting steps:

From Rancher UI we do not have an option to provide a node resource group name. Rancher in the backend automatically generates characters and appends them with the name provided in the cluster resource group and creates a node resource group name that is too long. 

Error failed to create cluster: containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="InvalidParameter" Message="The length of the node resource group name is too long. The maximum length is 80 and the length of the value provided is 87. Please see https://aka.ms/aks-naming-rules for more details." Target="name" 

We can create an AKS cluster with a shorter name but this is against the naming conventions of CLASS infrastructure.  Example the resource-group CLASS specified was "RG-Test-Kafka-RemoteService-AKS-3". Rancher generates characters with cluster name and region name (MC_RG-Test-Kafka-RemoteService-AKS-3__) which becomes too long and the cluster deployment fails. 

Repro steps:

Create an AKS cluster with a longer name and also provide a larger cluster resource group name. This will fail the cluster deployment. 

Workaround:

Is a workaround available and implemented? yes  
What is the workaround: To provide a shorter name but this cannot be implemented as its against the CLASS naming convention. 

Actual behavior:

The customer is unable to deploy AKS cluster due to Rancher generating and appending characters to the node resource group which is too long.

Expected behavior:

Rancher should provide an option to pass the node resource group name in the Rancher UI while deploying AKS cluster. This option is available in Azure CLI method and the customer is expecting the same from the Rancher side. 

Rancher cannot import AKS behind http-proxy & vpn connection to Azure

SURE-6717

Issue description:

Rancher  provisions the AKS cluster using the UI, but it does not finish importing it correctly to Rancher. On the Rancher UI , the AKS cluster is stuck on "waiting for API to be available"

Business impact:

Cannot manage AKS clusters from Rancher UI, as the cluster is created but not successfully imported into Rancher. The service account token for the sa "cattle" isn't created.

Troubleshooting steps:

**

Provisioned an AKS cluster on a different region using Azure Console, and importing into Rancher -> KO
Provision an AKS cluster on a different region using Rancher UI -> KO, cluster is provisioned but not imported into Rancher
Create the "cattle" serviceaccount token with kubectl commands, as this is not created during Rancher agent deployment, but the cattle-cluster-agent pods isn't aware of it.

Repro steps:

Setup an AKS cluster using Rancher UI.
AKS Cluster settings: Attached cluster.yml
AKS cluster is pending API to be available, due to no serviceaccount token is created during the Rancher-agent deployment on the AKS cluster.

Workaround:

Is a workaround available and implemented? yes
What is the workaround: Add the following parameters to the cattle-cluster-agent deployment, it will remain stable, however, any other operation/installation on the cluster, fails due to no connectivity to the AKS API.

Actual behavior:

AKS cluster is not manageable through Rancher, either Rancher-provisioned or Rancher-imported

Expected behavior:

Service account "cattle"'s token is created, the cattle-cluster-agent can talk to the API and the AKS is imported successfully.

Files, logs, traces:

Attached Rancher pod logs, Aks-operator logs, cluster.yml and

(SURE-3392) [RFE] Enable possibility to set private DNS on AKS provisioning

Request description:

Enable the possibility to set private DNS on AKS provisioning cluster.

Actual behavior:

Currently, there is no setting in Rancher to set a private DNS for AKS cluster.

Expected behavior:

Add the possibility to set private DNS on AKS provisioning cluster.

Additional notes:

Customer network infrastructure is more or less the same as the "hub and spoke" on the page: https://docs.microsoft.com/en-us/azure/aks/private-clusters#hub-and-spoke-with-custom-dns

Customer DNS is centralized in the hub network, and they want to run AKS clusters in spoke networks.
To get it to work, they need to be able to point the AKS clusters to the central private DNS zone. Otherwise, the created AKS cluster's Kubernetes API endpoint address won't resolve, and the cluster will fail to provision, and the VMs can't resolve the Kubernetes API address.

Additional Azure documentation on how to set a custom private DNS zone: https://docs.microsoft.com/en-us/azure/aks/private-clusters#create-a-private-aks-cluster-with-a-custom-private-dns-zone

Changes:

Cannot modify AKS if nodepool is == 12 chars long

SURE-6333

Issue description:

Microsoft describes as 1-12 characters long for it's linux nodepool as naming convention, whereas in Rancher the limit is 1-11 characters.

If you create a nodepool in Azure portal/Az cli (either system or user) 12 characters long, then the AKS configuration cannot be changed, due to triggering this Rancher nodepool naming limit.

Business impact:

The AKS cluster is stuck and cannot be configured. The nodepools, or kubernetes version or any other configuration cannot be applied as always the "Linux node pool names must be 1-11 characters." is shown in the Rancher UI.

Repro steps:

Create an AKS cluster from Rancher, and create an agent pool.
Create in the Azure portal a 2nd nodepool for the AKS cluster, with a name 12 characters long. Then, in Rancher console the cluster cannot be configured (change labels, upgrade the k8s version, add additional nodepools, change/remove current nodepools). 

Workaround:

Is a workaround available and implemented? yes
What is the workaround: Create an additional nodepool 11chars long, and then delete the previous Nodepool with the 12chars-long name. 

Actual behavior:

Nodepools created with a 12chars-long in AKS via Azure Portal (as Rancher does not allow it), provokes that the cluster configuration cannot be changed.

It can still be managed as a k8s cluster, deploy applications, storage, monitoring, create projects&namespaces, assign users as cluster Members / Cluster users / project Members / project Owners , and so on, but the Nodepools or any k8s version upgrade or cluster label update is not possible, all fail with the error "Linux node pool names must be 1-11 characters.")

Expected behavior:

Nodepools in Rancher can be created with a 12 chars-long-name. Even creating a nodepool outside of Rancher (Azure portal or Az cli), if it has 12 characters long, the cluster is still manageable through Rancher.

Files, logs, traces:
Attached are screenshots of the bug.

Additional notes:

https://learn.microsoft.com/en-us/azure/aks/use-multiple-node-pools#add-a-node-pool

https://learn.microsoft.com/en-us/azure/aks/use-system-pools?tabs=azure-cli#limitations

(SURE-4222) 'clusterRegistrationToken' API endpoint changes breaks customer deployment pipeline

Issue description:

The customer deploys new clusters to AKS using an Azure DevOps pipeline and imports the new clusters using the following curl command

# Step 1: Grab the cluster_id
export rancher_cluster_id=`curl -k $RANCHER_URL/clusters -H 'content-type: application/json' \
-H "Authorization: Bearer $RANCHER_BEARER_TOKEN" | jq --arg CLUSTER_NAME_JQ "$CLUSTER_NAME" '.data[] | select(.name==$CLUSTER_NAME_JQ) | .id'`

# Step 2: Grab the correct registration command using the aformentioned cluster_id
# The resulting command is run on the controlplane to import the cluster
 curl -k $RANCHER_URL/clusterregistrationtoken -H 'content-type: application/json' \
-H "Authorization: Bearer $RANCHER_BEARER_TOKEN" \
-d '{"type":"clusterRegistrationToken","clusterId":'$rancher_cluster_id'}' | jq '.command'

After upgrading to v2.6.3, step 2 no longer works. The 'clusterRegistrationToken' endpoint now returns 'null' when accessing the '.command' key.

Actual behavior:

The following curl command returns a 'null' value for Rancher 2.6.3

curl -k $RANCHER_URL/clusterregistrationtoken -H 'content-type: application/json' \ -H "Authorization: Bearer $RANCHER_BEARER_TOKEN" \ -d '{"type":"clusterRegistrationToken","clusterId":'$rancher_cluster_id'}' | jq '.command'

 

Expected behavior:

The aforementioned curl command should return the registration command as it does in Rancher 2.5.8

Increase project maintainability

This epic will track tasks for increasing project maintainability:

  • Enable github workflows for this repository
  • Add golang-ci lint workflow
  • #108
  • Add new task to makefile for building operator binary
  • Create Azure services mocks to use for unit tests
  • Set up a basic unit test suite
  • Cover Azure services interactions with unit tests
  • #109
  • #110
  • Create a setup for running E2E tests
  • Start adding E2E tests
  • #111
  • #112

(SURE-6076) AKS TF-provisioned private cluster import is failing

Issue description:

AKS private cluster import fails with the error: "cannot detect cluster [xxxx] upstream DNS prefix"

Business impact:

Not being able to manage AKS clusters through Rancher.

Troubleshooting steps:

Import a private AKS cluster using the Rancher new AKS cluster Registration command=-> KO (dns error msg)
Create a new private AKS cluster using Rancher UI, and used 'az command-invoke' to register the AKS cluster in Rancher> KO (same error msg)
Create a new private AKS cluster using Azure portal, used Rancher UI to import the cluster without success. Used 'az command-invoke' from az cli, to import the cluster, with no success.
Create a new private AKS cluster using TF, used the Rancher UI to import the cluster => KO 
Verified that Rancher is reachable (curl to Rancher_Server URL and name resolvable), from within a pod of a newly TF-provisioned private AKS cluster
Set Rancher in Debug mode (showed the error message, but no more evidences)

Repro steps:

Provision a private AKS cluster TF, using  azure_rm_provider, using a custom vnet and custom private DNS zone
Import the cluster into Rancher from the UI

Workaround:

Is a workaround available and implemented? yes
What is the workaround: Import the AKS clusters as custom clusters

Actual behavior:

AKS private clusters cannot be imported into Rancher.

Expected behavior:

AKS private clusters with custom networking (existing vnet) and custom private DNS zone can be imported into Rancher.

Files, logs, traces:

Attached is the Rancher debug log, traces of DNS and AKS reachability from on-prem, and the Rancher reachability test from Azure.

Additional notes:

Customer's Rancher server is hosted in an on-premises environment, with an on-prem network.
AKS clusters are private, and deployed in private existing networks (vnet) and custom DNS zone.  On-premises network is connected to Azure networks via ExpressRoute
AKS clusters can be directly managed from the customer's on-prem environment using kubectl with the kubeconfig credentials downloaded via az cli.

(SURE-5510) Unable to upgrade an imported AKS cluster from Rancher GUI

Issue description:

We are trying to upgrade from Rancher GUI an imported AKS cluster (1.22.15 to 1.23.8) but unfortunately, nothing happens and the aks-config-operator is crashing in "CrashLoopBackOff" state.

Repro steps:

create AKS cluster (1.22.11) via terraform or azure portal
rancher 2.6.8 > import existing cluster > upgrade imported cluster from 1.22.11 to 1.22.15 

Workaround:
Is workararound available and implemented? yes
What is the workaround:

steps :

  1. upgrade control planes from Azure GUI and

  2. upgrade workers from Rancher GUI.

Actual behavior:

The upgrade is not happening and the  aks-config-operator is crashing

Expected behavior:

The upgrade should happen without any issues.

Files, logs, traces:

[aks-config-operator-7dc976c58c-n8cvx] time="2022-11-02T15:25:45Z" level=info msg="Checking configuration for cluster [azuk8s-devops-aks-dev]" [aks-config-operator-7dc976c58c-n8cvx] time="2022-11-02T15:25:46Z" level=info msg="Updating kubernetes version for cluster [azuk8s-devops-aks-dev]" [aks-config-operator-7dc976c58c-n8cvx] E1102 15:25:47.010451 8 runtime.go:79] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) [aks-config-operator-7dc976c58c-n8cvx] goroutine 31 [running]: [aks-config-operator-7dc976c58c-n8cvx]

[k8s.io/apimachinery/pkg/util/runtime.logPanic](https://k8s.io/apimachinery/pkg/util/runtime.logPanic)

({0x13bd1a0, 0x170e640}) [aks-config-operator-7dc976c58c-n8cvx] /go/pkg/mod/

[k8s.io/[email protected]/pkg/util/runtime/runtime.go:75](https://k8s.io/[email protected]/pkg/util/runtime/runtime.go:75)

+0x85 [aks-config-operator-7dc976c58c-n8cvx]

[k8s.io/apimachinery/pkg/util/runtime.HandleCrash](https://k8s.io/apimachinery/pkg/util/runtime.HandleCrash)

({0x0, 0x0, 0xc0000b83c0}) [aks-config-operator-7dc976c58c-n8cvx] /go/pkg/mod/

[k8s.io/[email protected]/pkg/util/runtime/runtime.go:49](https://k8s.io/[email protected]/pkg/util/runtime/runtime.go:49)

+0x75 [aks-config-operator-7dc976c58c-n8cvx] panic({0x13bd1a0, 0x170e640}

Additional notes:

Github issue: rancher/rancher#37863

spec/aksConfig displaying null values for imported cluster

Rancher version:

v2.7-0b2f7fa0f878c3402ddf416a5b622327949b9348-head
aks-operator:v1.1.3-rc1

Cluster Type: Downstream AKS cluster
Issue also exists for other hosted providers.

Describe the bug
spec.aksConfig displaying null values for most the parameters for imported cluster
Values are being displayed after Editing cluster from UI

Steps

  • Import Downstream AKS cluster in Rancher UI and wait for it to be Active
  • Click 'View YAML' for the imported cluster

Screenshots
image

After Edit cluster:
image

Found while validating rancher/rancher#42496

(SURE-4131) Error handling and logging for creating AKS cluster without permissions to create tags

See also rancher/rancher#36982

Issue description:

User attempted to create an AKS cluster with two tags. Appears user's account does not have permissions to create tags on an AKS cluster and rancher's aks operator was stuck in a loop retrying tag creating every 1-2 seconds.

Business impact:

No system feedback on why provisioning AKS cluster is stuck

Troubleshooting steps:

Viewed aks-config-operator logs (see attached)

Repro steps:

Create an AKS cluster, specify one or more tags. Make sure Azure account used does not have permissions to create tags on an AKS cluster.

Workaround:

Is workararound available and implemented? yes
What is the workaround: Manually add tags to the AKS cluster using the Azure portal or removing the tags from the AKS cluster on the rancher UI.

Actual behavior:

AKS cluster is stuck with waiting for provision, or also seeing this throttling error (likely due to aggressive attempts to update tags:

time="2022-02-21T23:10:00Z" level=error msg="error syncing 'cattle-global-data/c-w9h6r': handler aks-controller: containerservice.ManagedClustersClient#UpdateTags: Failure sending request: StatusCode=0 -- Original Error: Code=\"PatchResourceGroupError\" Message=\"Reconcile resource group failed with client error. Details: ResourceGroupReconciler retry failed: Category: ClientError; SubCode: SubscriptionRequestsThrottled; Dependency: Azure Resource Group; OrginalError: Code=\\\"SubscriptionRequestsThrottled\\\" Message=\\\"Number of write requests for subscription 'e7739666-3f36-4271-97ed-eff50e997132' exceeded the limit of '1200' for time interval '01:00:00'. Please try again after '303' seconds.\\\"; AKSTeam: Unknown, Retriable: false.\", requeuing"

Seeing continuous attempts to upgrade tags:

time="2022-02-22T00:19:31Z" level=info msg="Checking configuration for cluster [aks-dev-canadacentral]"
time="2022-02-22T00:19:32Z" level=info msg="Updating tags for cluster [aks-dev-canadacentral]"
time="2022-02-22T00:19:34Z" level=info msg="Checking configuration for cluster [aks-dev-canadacentral]"
time="2022-02-22T00:19:35Z" level=info msg="Updating tags for cluster [aks-dev-canadacentral]"
time="2022-02-22T00:19:37Z" level=info msg="Checking configuration for cluster [aks-dev-canadacentral]"
time="2022-02-22T00:19:38Z" level=info msg="Updating tags for cluster [aks-dev-canadacentral]"

Expected behavior:

aks-config-operator should log why it is not able to Update tags, if it failed
aks-config-operator can continue to re-attempt updating tags, but it should have an incremental backoff so it doesn't throttle the whole Azure subscription
An error should be propagated to the user UI. Something along the lines of - "Cluster could be provisioned. Failed to update tags. Error: "

(SURE-2565) Support for managed identity in AKS

See also rancher/rancher#27559

We got an issue when using Rancher to provision Windows nodes using Managed identities for Azure resources (formerly known as Managed Service Identity - MSI).

Rancher provisioning script only handles cases where we authenticate using service principal and will not work properly with managed identity.

Relevant code: https://github.com/rancher/rke-tools/blob/a23ff70c7a1ae0b8ec5c91bc56d51b0ad9f541ad/windows/cloud-provider.psm1#L70

Currently, we have to patch the file in runtime to use az login --identity instead of az login --service-principal

Steps to reproduce (least amount of steps as possible):

Create an AKS cluster without specifying any Service Principal

Result:

An AKS cluster is created using the --enable-managed-identity flag.

Other details that may be helpful:

The feature is stable in AKS: https://docs.microsoft.com/en-us/azure/aks/use-managed-identity

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.4.4
Installation option (single install/HA): single container

The UI presents mandatory fields to enter the SP for AKS; I propose to add a ratio button where the user can select to create a MI instead.

[Feature] K8s 1.27 support

issue: rancher/rancher#41840

  • k8s deps bump to 0.27.x required.
  • bump rancher machine to v0.15.0-rancher103
  • need to use latest 1.5.0 rc tag available for rke
  • other dependencies
github.com/rancher/wrangler v1.1.1-0.20230831050635-df1bd5aae9df
github.com/rancher/lasso v0.0.0-20230830164424-d684fdeb6f29
github.com/rancher/fleet/pkg/apis v0.0.0-20230901075223-437edb7091f5
github.com/rancher/norman v0.0.0-20230831160711-5de27f66385d
  • after that new changes needs to be vendored in rancher

IHME receives an error when importing a AKS cluster

SURE-6357

Issue description:

We are attempting to import an existing AKS cluster to Rancher to make the management interface more consistent for users. When trying to import the cluster, we get the following message: 

Validation failed in API: a Config field must be set is required

Steps to reproduce

  • From the Rancher dashboard, click import existing
  • Select Azure AKS
  • An example of how this was filled out is in the screenshot attached to the case
  • Click Create
  • Error is presented

They tried switching off the AKS driver from cluster mgmt., enabling it again, and attempting the steps again, and the same error is presented.

Workaround:

Is workararound available and implemented? no

Actual behavior:

The user receives an error when trying to import an AKS cluster.

Expected behavior:

The user can import a new AKS cluster.

Additional notes:

We do have an active Azure AKS driver in Cluster mgmt > Drivers, and that is the error message we’re seeing when we hit “create” in the import page. The user I’m using has global admin permissions.

Related GitHub issue:

rancher/rancher#33753

Unable to change Node pool mode

Rancher version:

v2.7-0b2f7fa0f878c3402ddf416a5b622327949b9348-head
aks-operator:v1.1.3-rc1

Cluster Type: Downstream AKS cluster

Describe the bug
In rancher/dashboard#8105, user is now allowed to add new System/User node pool.
During edit of nodepools of an already running Active cluster, user can toggle mode of node pool from User to System and vice versa. But the operation is not succeeding.

Screenshots
image

Found while validating rancher/rancher#42496

PR's

Charts

[BUG] Trying to edit AKS cluster after credentials expiration/deletion/invalidation does not return a descriptive error message

Rancher Server Setup

  • Rancher version: 2.7.5-rc1
  • Installation option: Helm
    • If Helm Chart, Kubernetes Cluster and version: AKS with Kubernetes v1.25.5

Information about the Cluster

  • Kubernetes version: v1.26.0
  • Cluster Type: AKS cluster

User Information

  • What is the role of the user logged in? Admin

Describe the bug
This issue was found during validation of version 2.7.5-rc1.

When trying to edit a cluster, either created from Rancher UI or imported from Azure, originally provisioned with a set of credentials that is no longer valid (deleted/invalidated/expired), the error message is not very useful and does not point to the source of the problem, which makes it hard to debug if the user is not aware of credentials being unavailable/obsolete.

The error in the UI shows:

[object Object]

To Reproduce

  • Configure a set of credentials creds1 for Azure
  • Using creds1, provision a new AKS cluster from Rancher UI or import an existing cluster created with Azure
  • Delete/Invalidate creds1
  • From Cluster Management try to edit the configuration of the cluster: Cluster Management -> Edit Config -> Next: Configure Cluster

Result
Error will display: [object Object]

Expected Result
The error includes context of what went wrong. Something like Invalid cloud credentials.

Screenshots
image

Additional context
This issue has been identified for AKS clusters, but may also exist for the other cloud providers.

Resource group not getting cleaned up after cluster removal

Setup
Rancher version: v2.7.5-rc5, aks-operator:v1.1.1

Describe the bug
Resource group not getting cleaned up after cluster removal

To Reproduce

  • Provision AKS cluster with Rancher UI with all default configuration (provide new name for resource group)
  • Once cluster is up & running, check resource groups on AKS
  • Two Resource groups: For Infrastructure & for AKS cluster must be created (with name provided above)
  • Delete the cluster from Rancher UI

Result
Resource group in which AKS cluster resides not getting cleaned up after cluster removal

Expected Result
Entities created by Rancher must be cleaned up after cluster removal

(SURE-5504) Failed to upgrade Kubernetes version of imported AKS from Rancher UI

Issue description:

Failed to upgrade Kubernetes version of imported AKS from Rancher UI.

Repro steps:

  • step1:

Create Kubernetes 1.21.7 cluster with RKE 1.3.3 on Amazon Linux 2 EC2 instance.

Install Rancher 2.6.3 and upgrade it to 2.6.8.

  • step2:

Create AKS cluster with this command.

> az aks create -g <resource group name> -n <cluster name> --kubernetes-version 1.22.15 --enable-managed-identity --node-count 1 --enable-addons monitoring --enable-msi-auth-for-monitoring --generate-ssh-keys 
  • step3:

Import AKS cluster from Rancher UI.

Import exisiting -> Azure AKS

  • step4:

Upgrade Kubernetes version of AKS from Rancher UI.

Cluster Management -> Edit Config

Kubernetes version: 1.22.15 -> 1.23.8

Rancher UI shows Kubernetes version v1.23.8 (upgrade.png).

But Azure powershell console shows v1.22.15.

> kubectl get node
NAME                                STATUS   ROLES   AGE   VERSION
aks-nodepool1-70363388-vmss000000   Ready    agent   31m   v1.22.15 

Linux shell shows v1.22.15 also with downloaded kubeconfig.

$ kubectl --kubeconfig=ds-azure.yaml get node
NAME                                STATUS   ROLES   AGE   VERSION
aks-nodepool1-70363388-vmss000000   Ready    agent   35m   v1.22.15 

Workaround:

Is workararound available and implemented? no

Actual behavior:

Kuberenetes version is not upgraded.

Expected behavior:

Kuberenetes version is upgraded.

Files, logs, traces:

aks-config-operator.log, upgrade.png

(SURE-5437) AzureGov Cloud Credentials fail to create/provisioning broken

Describe the bug

When creating a cloud-credential for Azure using the AzureUSGovernmentCloud environment option, the creation will fail. When inspecting the logs in the browser, there is an error referencing 'SubscriptionId not found'. The subscription id has been validated to work as the service principal being used in the credential was created using it.

This error is not present when using standard Azure. It is suspected that AzureGov endpoints are not correctly set internally on a dependent tool.

To Reproduce

Create Service Principal in AzureGov using

az ad sp create-for-rbac \ -name="<Rancher ServicePrincipal name>" \ --role="Contributor" \ --scopes="/subscriptions/<subscription Id>" * Sign into Rancher UI>Cluster-Management->Cloud-Credentials and create a new Azure Credential

Select AzureUSGovernmentCloud for the environment
Plug in generated appId into Client-Id field
Plug in generated password into Client-Secret field
Plug in subscription id into Subscription Id field
Click Create, see non-descriptive error
Open Console and inspect Network and fire command again
Inspect Response to see RESTful error describing the unknown subscription Id

Expected Result

Credential Created Successfully

Additional context

When bypassing the cloud credential creation using Terraform, the credential itself will fail to work when creating a cluster and repeat the same 'subscription id not found' error.

Not able to use own vnet with kubenet

It seems that the use of own vnet and subnet is possible only with Azure CNI.
Is there a reason for that?
We would like to use our own vnet and subnets with kubenet CNI.

(SURE-5510) Unable to upgrade an imported AKS cluster from Rancher GUI

Issue description:

We are trying to upgrade from Rancher GUI an imported AKS cluster (1.22.15 to 1.23.8) but unfortunately, nothing happens and the aks-config-operator is crashing in "CrashLoopBackOff" state.

Repro steps:

create AKS cluster (1.22.11) via terraform or azure portal
rancher 2.6.8 > import existing cluster > upgrade imported cluster from 1.22.11 to 1.22.15 

Workaround:
Is workararound available and implemented? yes
What is the workaround:

steps :

  1. upgrade control planes from Azure GUI and

  2. upgrade workers from Rancher GUI.

Actual behavior:

The upgrade is not happening and the  aks-config-operator is crashing

Expected behavior:

The upgrade should happen without any issues.

Files, logs, traces:

[aks-config-operator-7dc976c58c-n8cvx] time="2022-11-02T15:25:45Z" level=info msg="Checking configuration for cluster [azuk8s-devops-aks-dev]" [aks-config-operator-7dc976c58c-n8cvx] time="2022-11-02T15:25:46Z" level=info msg="Updating kubernetes version for cluster [azuk8s-devops-aks-dev]" [aks-config-operator-7dc976c58c-n8cvx] E1102 15:25:47.010451 8 runtime.go:79] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) [aks-config-operator-7dc976c58c-n8cvx] goroutine 31 [running]: [aks-config-operator-7dc976c58c-n8cvx]

[k8s.io/apimachinery/pkg/util/runtime.logPanic](https://k8s.io/apimachinery/pkg/util/runtime.logPanic)

({0x13bd1a0, 0x170e640}) [aks-config-operator-7dc976c58c-n8cvx] /go/pkg/mod/

[k8s.io/[email protected]/pkg/util/runtime/runtime.go:75](https://k8s.io/[email protected]/pkg/util/runtime/runtime.go:75)

+0x85 [aks-config-operator-7dc976c58c-n8cvx]

[k8s.io/apimachinery/pkg/util/runtime.HandleCrash](https://k8s.io/apimachinery/pkg/util/runtime.HandleCrash)

({0x0, 0x0, 0xc0000b83c0}) [aks-config-operator-7dc976c58c-n8cvx] /go/pkg/mod/

[k8s.io/[email protected]/pkg/util/runtime/runtime.go:49](https://k8s.io/[email protected]/pkg/util/runtime/runtime.go:49)

+0x75 [aks-config-operator-7dc976c58c-n8cvx] panic({0x13bd1a0, 0x170e640}

Additional notes:

Github issue: rancher/rancher#37863

[SURE-7333] Downstream AKS `Cluster Management > Cluster details` page not displaying accurate message during update

SURE-7333

Setup

  • Rancher version: 2.7.0
  • Browser type & version: Chrome

Describe the bug

  • When updating a downstream AKS cluster, Cluster Management > Cluster details page does not display accurate message.

To Reproduce

  1. Fresh install of 2.7.0
  2. Provision a downstream AKS cluster
  3. Update downstream AKS cluster
  4. Navigate to Cluster Management > Cluster Details page
  5. Reproduced

Result

  • The following message is seen:
    This resource is currently in a transitioning state, but there isn't a detailed message available.

Expected Result

  • expected to show more detailed message, as seen in AKS operator logs:
    time="2023-03-07T02:10:50Z" level=info msg="Waiting for cluster [REDACTED] to update node pool [REDACTED]"

Screenshots
Screen Shot 2023-03-06 at 6 10 24 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.