GithubHelp home page GithubHelp logo

Comments (34)

snasovich avatar snasovich commented on June 27, 2024 11

QA validations were completed and rancher/charts#3700 has just been merged meaning rancher-provisioning-capi version 103.2.0+up0.0.1 is now released and all default-configured non-airgap v2.8.0-v2.8.2 Rancher deployments will automatically update to this fixed version of chart and the issue should be fixed.

Any users that applied the workaround to downgrade the chart to 103.0.0+up0.0.1 (e.g. by following instructions in #44929 (comment)) should now be free to rollback the workaround.

Important Note: Rancher automatically refreshes chart data every 6 hours. To force immediate refresh, please follow these steps:

  1. Select local cluster
  2. Open "Apps" -> "Repositories"
  3. Locate and check Rancher from the list of displayed repos
  4. Select "Refresh" and wait for repo status to update to "Active"
  5. rancher-provisioning-capi will upgrade to version 103.2.0+up0.0.1 shortly

Keeping the issue open for some time to ensure the fix works for all affected users.

from rancher.

m4rCsi avatar m4rCsi commented on June 27, 2024 9

We had the same issue. Suddenly, out of nowhere, across all our rancher clusters and "downstream" clusters, we saw the same issue.

After a long investigation, we came to the same conclusion (i.e., the capi upgrade from 1.4.4 to 1.5.5 is the cause).

We weren't (and aren't) sure what is the easiest way to pin it to 1.4.4. Every time we downgraded, it auto-upgraded itself back to the broken 1.5.5 version. What we ended up doing in the interest of speed was:

  • Going into the Apps - Repositories Section
  • Changing The Repository Named "Rancher" from Branch "release-v2.8" to a318ef65fddf66b44c468d4a2636930ef39a88fd
  • Going to Installed Apps
  • Downgrading rancher-provisioning-capi. ( from 103.1.0+up0.1.0 to 103.0.0+up0.0.1 )

Maybe that will help someone as well to workaround this until a proper fix has been found.

If someone knows how to pin it to 1.4.4 in a better way, please let us know :)

from rancher.

hansbogert avatar hansbogert commented on June 27, 2024 8

Setting the capi-controller-manager to version 1.4.4 (was 1.5.5), lets me deploy new clusters with correct health status. Deploying new clusters was not possible anymore since roughly 10 hours.

from rancher.

nickvth avatar nickvth commented on June 27, 2024 2

same here

from rancher.

josh383451 avatar josh383451 commented on June 27, 2024 2

QA validations were completed and rancher/charts#3700 has just been merged meaning rancher-provisioning-capi version 103.2.0+up0.0.1 is now released and all default-configured non-airgap v2.8.0-v2.8.2 Rancher deployments will automatically update to this fixed version of chart and the issue should be fixed.

Any users that applied the workaround to downgrade the chart to 103.0.0+up0.0.1 (e.g. by following instructions in #44929 (comment)) should now be free to rollback the workaround.

Keeping the issue open for some time to ensure the fix works for all affected users.

Can confirm this is working with provisioning AWS EC2 instances using 103.2.0+up0.0.1
image

from rancher.

daleckystepan avatar daleckystepan commented on June 27, 2024 1

@richardcase no labels at all on secret app-kubeconfig

from rancher.

richardcase avatar richardcase commented on June 27, 2024 1

Thanks @nickvth & @daleckystepan . The lack of labels appears to be the issue. Just to let you know we are looking at how to resolve this.

from rancher.

josh383451 avatar josh383451 commented on June 27, 2024 1

We had the same issue. Suddenly, out of nowhere, across all our rancher clusters and "downstream" clusters, we saw the same issue.

After a long investigation, we came to the same conclusion (i.e., the capi upgrade from 1.4.4 to 1.5.5 is the cause).

We weren't (and aren't) sure what is the easiest way to pin it to 1.4.4. Every time we downgraded, it auto-upgraded itself back to the broken 1.5.5 version. What we ended up doing in the interest of speed was:

* Going into the Apps - Repositories Section

* Changing The Repository Named "Rancher" from Branch "release-v2.8" to a318ef65fddf66b44c468d4a2636930ef39a88fd

* Going to Installed Apps

* Downgrading `rancher-provisioning-capi`.  ( from 103.1.0+up0.1.0 to 103.0.0+up0.0.1 )

Maybe that will help someone as well to workaround this until a proper fix has been found.

If someone knows how to pin it to 1.4.4 in a better way, please let us know :)

Can confirm this is working for new cluster provisioning with AWS EC2

from rancher.

zackbradys avatar zackbradys commented on June 27, 2024 1

I can confirm that the above fix worked for existing clusters and provisioning new clusters.

Additionally, an alternative fix would be to redeploy rancher with useBundledSystemChart: true, which will redeploy the capi-controller-manager and any other related resources. I havenโ€™t tried manually labeling the cluster, but others stated earlier that it worked as well.

rancher-cluster-screenshot

from rancher.

nickvth avatar nickvth commented on June 27, 2024 1

Workaround deploy rancher with useBundledSystemChart=true, maybe always the recommended way if you don't want that every merge/push to release-v2.8 git branch will update your cluster.

Configure Rancher server to use the packaged copy of Helm system charts. The system charts repository contains all the catalog items required for features such as monitoring, logging, alerting and global DNS. These Helm charts are located in GitHub, but since you are in an air gapped environment, using the charts that are bundled within Rancher is much easier than setting up a Git mirror.

After that:

  • Going to Installed Apps
  • Downgrading rancher-provisioning-capi. ( from 103.1.0+up0.1.0 to 103.0.0+up0.0.1 )
  • No new version available

from rancher.

snasovich avatar snasovich commented on June 27, 2024 1

@kingnarmer , please use the workaround from #44929 (comment) as it pins the charts repo to commit before this problematic updated chart was released.

As an overall update, we're working on releasing new version of chart that essentially rolls back CAPI version upgrade which should address the issue. This is currently undergoing QA process and rancher/charts#3700 is the PR to release this fixed chart.

from rancher.

Denys-Janrain-L avatar Denys-Janrain-L commented on June 27, 2024 1

My list of installed apps is always empty, don't know why.
So only first two steps of workaround worked for me, after that I had to open rancher local console ( through browser ) and:
helm -n cattle-provisioning-capi-system rollback rancher-provisioning-capi 1
which rolled back it to 103.0.0+up0.0.1

from rancher.

Oats87 avatar Oats87 commented on June 27, 2024 1

Unfortunately, this did not work. I also had added the cluster name label to the kubeconfig secret under fleet-default. At the moment, I still cannot provision a custom RKE2 cluster. Is it possible im doing something wrong here? the rancher-provisioning-capi was updated to the latest version as well.

image

Your screenshot shows that your cluster is waiting for a worker node to be registered (and on top of that, your cluster does not have a worker listed in your machine list.

from rancher.

bagutzu avatar bagutzu commented on June 27, 2024

+1

from rancher.

nickvth avatar nickvth commented on June 27, 2024

Look like it's broken after upgrade to https://github.com/rancher/charts/tree/release-v2.8/charts/rancher-provisioning-capi/103.1.0%2Bup0.1.0

from rancher.

fplantinga-guida avatar fplantinga-guida commented on June 27, 2024

Same here

from rancher.

daleckystepan avatar daleckystepan commented on June 27, 2024

I also tried to downgrade but it is quickly updated again.

from rancher.

richardcase avatar richardcase commented on June 27, 2024

@daleckystepan (or anyone else seeing this issue) - could you look at the secret that contains the kubeconfig for one of the clusters and see what labels it has? I'd be interested if there is one called cluster.x-k8s.io/cluster-name.

from rancher.

nickvth avatar nickvth commented on June 27, 2024

No label cluster.x-k8s.io/cluster-name @richardcase

kg secret donald-prod-1-kubeconfig  -o yaml
apiVersion: v1
data:
  token: *****
  value: *****
kind: Secret
metadata:
  creationTimestamp: "2024-02-02T09:42:29Z"
  name: donald-prod-1-kubeconfig
  namespace: fleet-default
  ownerReferences:
  - apiVersion: provisioning.cattle.io/v1
    kind: Cluster
    name: donald-prod-1
    uid: a9e0e6f8-ba10-4a5d-9ed5-ded4077631dd
  resourceVersion: "142716571"
  uid: 5aacaca0-1ea7-4849-a055-cced99c75d4c
type: Opaque

from rancher.

daleckystepan avatar daleckystepan commented on June 27, 2024

I added label manually and it seems to be working.

from rancher.

Oats87 avatar Oats87 commented on June 27, 2024

We had the same issue. Suddenly, out of nowhere, across all our rancher clusters and "downstream" clusters, we saw the same issue.

After a long investigation, we came to the same conclusion (i.e., the capi upgrade from 1.4.4 to 1.5.5 is the cause).

We weren't (and aren't) sure what is the easiest way to pin it to 1.4.4. Every time we downgraded, it auto-upgraded itself back to the broken 1.5.5 version. What we ended up doing in the interest of speed was:

  • Going into the Apps - Repositories Section
  • Changing The Repository Named "Rancher" from Branch "release-v2.8" to a318ef65fddf66b44c468d4a2636930ef39a88fd
  • Going to Installed Apps
  • Downgrading rancher-provisioning-capi. ( from 103.1.0+up0.1.0 to 103.0.0+up0.0.1 )

Maybe that will help someone as well to workaround this until a proper fix has been found.

If someone knows how to pin it to 1.4.4 in a better way, please let us know :)

This is the most effective way as of now that I can think of to pin the version of chart so that it does not get inadvertently upgraded.

I added label manually and it seems to be working.

Adding the label manually to the kubeconfig secrets is also a solution in the short term, but if Rancher deems that the kubeconfig is invalid i.e. token is no longer valid or the server-url is changed etc, it will recreate that kubeconfig secret sans the label.

from rancher.

atsai1220 avatar atsai1220 commented on June 27, 2024

This was described in the migration notes from 1.4 to 1.5: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/book/src/developer/providers/migrations/v1.4-to-v1.5.md#other

The generated kubeconfig by the Control Plane providers must be labelled with the key-value pair cluster.x-k8s.io/cluster-name=${CLUSTER_NAME}. This is required for the CAPI managers caches to store and retrieve them for the required operations.

This was the PR that propagated the change to Rancher 2.8.x environments. rancher/charts#3688

from rancher.

nickvth avatar nickvth commented on June 27, 2024

This was described in the migration notes from 1.4 to 1.5: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/book/src/developer/providers/migrations/v1.4-to-v1.5.md#other

The generated kubeconfig by the Control Plane providers must be labelled with the key-value pair cluster.x-k8s.io/cluster-name=${CLUSTER_NAME}. This is required for the CAPI managers caches to store and retrieve them for the required operations.

This was the PR that propagated the change to Rancher 2.8.x environments. rancher/charts#3688

Thanks for sharing, but 2.8.3 is not released. So why propaged this change.

from rancher.

atsai1220 avatar atsai1220 commented on June 27, 2024

This was described in the migration notes from 1.4 to 1.5: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/book/src/developer/providers/migrations/v1.4-to-v1.5.md#other

The generated kubeconfig by the Control Plane providers must be labelled with the key-value pair cluster.x-k8s.io/cluster-name=${CLUSTER_NAME}. This is required for the CAPI managers caches to store and retrieve them for the required operations.

This was the PR that propagated the change to Rancher 2.8.x environments. rancher/charts#3688

Thanks for sharing, but 2.8.3 is not released. So why propaged this change.

I have the same question. We will look to useBundledSystemChart=true in the future to prevent surprises.

from rancher.

pvlkov avatar pvlkov commented on June 27, 2024

Thank you for providing a quick fix. We will use the label workaround for now and upgrade to 2.8.3 as soon as it's out.

from rancher.

kingnarmer avatar kingnarmer commented on June 27, 2024

Unfortunately both workarounds didn't work for me.

  • I downgraded rancher-provisioning-capi to from 103.1.0+up0.1.0 to 103.0.0+up0.0.1 from rancher gui--> apps --> installed apps . It was fine for few minutes then came back.

  • Updated useBundledSystemChart=true on existing rancher had no effect.

Appreciate help on how to mitigate .

from rancher.

sulaimantok avatar sulaimantok commented on June 27, 2024

Same here, use this workaround also work for me workaround

from rancher.

daleckystepan avatar daleckystepan commented on June 27, 2024

I tried to change CATTLE_SYSTEM_CATALOG to bundled but it has probably no effect if Rancher is already installed. Any other way to prevent those online updates and make it more transparent and managabale for us?

from rancher.

avthart avatar avthart commented on June 27, 2024

I tried to change CATTLE_SYSTEM_CATALOG to bundled but it has probably no effect if Rancher is already installed. Any other way to prevent those online updates and make it more transparent and managabale for us?

Either wait for the fix or you can manually rollback rancher-provisioning-capi using Rancher Apps.

from rancher.

qhris avatar qhris commented on June 27, 2024

The capi upgrade also makes it impossible to provision new clusters because of the same reason.

We verified that the workaround with setting the labels works on the kubeconfig secret.
Running rancher 2.8.3-rc6 was something we also tested that works.

from rancher.

romarioschneider avatar romarioschneider commented on June 27, 2024

same issue

from rancher.

dylanthepodman avatar dylanthepodman commented on June 27, 2024

Unfortunately, this did not work. I also had added the cluster name label to the kubeconfig secret under fleet-default. At the moment, I still cannot provision a custom RKE2 cluster. Is it possible im doing something wrong here? the rancher-provisioning-capi was updated to the latest version as well.

image

from rancher.

snasovich avatar snasovich commented on June 27, 2024

@dylanthepodman , thank you for reporting this. Most likely you're running into some different issue. Was provisioning working OK on the same setup earlier?

from rancher.

dylanthepodman avatar dylanthepodman commented on June 27, 2024

Unfortunately, this did not work. I also had added the cluster name label to the kubeconfig secret under fleet-default. At the moment, I still cannot provision a custom RKE2 cluster. Is it possible im doing something wrong here? the rancher-provisioning-capi was updated to the latest version as well.
image

Your screenshot shows that your cluster is waiting for a worker node to be registered (and on top of that, your cluster does not have a worker listed in your machine list.

I tried, and it is working now. Thank you for mentioning this to me.

Interestingly enough, I tried this same setup in Rancher v2.6.12 and it did not have this problem.

from rancher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.