Comments (21)
I could manually fix it:
- go to the harvester embedded rancher and get the kube config
- update the kubeconfig in the harvester credential in the cattle-global-data namespace in the local cluster (running rancher). they are probably name hv-cred
from rancher.
I was able to replicate the issue by setting the token expiry to 10 minutes as shared by @m-ildefons. The Rancher deployed is v2.8.3, and the Harvester version is v1.2.1. I noticed that when the token associated with the downstream cluster expires, the connection between Rancher and Harvester is disrupted. Well, this token expiry value is set to 30 days by default, as documented here. This change has been introduced in Rancher 2.8, as mentioned in this PR.
I guess this issue has not been observed in earlier versions of Rancher, as the token value was set to infinite, as documented here. It appears that it has been implemented for security reasons, performance enhancements, and to manage too many unexpired tokens.
from rancher.
I've managed to reproduce the issue in a test environment:
- Install Harvester (tested with v1.2.1, likely irrelevant)
- Install Rancher (tested with v2.8.2, testing with other versions TBD)
- Import Harvester cluster
- Upload a suitable cloud image and create a VM network, so VMs can be created
- To shorten the time to reproduce, set the default token TTL in Rancher to e.g. 10 minutes. This is a global config setting in Rancher.
- Create a Cloud Credential for the Harvester cluster
- Create a K8s cluster with the Harvester cluster as infrastructure provider, using the previously created Cloud Credential for authentication
- Wait until the default token TTL is over. The token associated with the Cloud Credentials will be expired and eventually removed, but the Cloud Credential will remain. This will not cause an error just yet though.
- Scale the K8s cluster from step 8 up or down. This operation will fail with behavior and errors similar to the reported problem.
I'm not sure if and how OIDC interacts here, since it wasn't in my test environment. Since the original bug report does not include any mention of an external identity provider and it makes my test environment simpler, I'll focus on locally provided users.
As a workaround, I suggest to create Cloud Credentials associated with a token without an expiration date.
To do that, set the maximum token TTL and the default token TTL settings (both global settings in Rancher) both to 0
. Then create the Cloud Credentials to be used to create a K8s cluster on Harvester. Then create a K8s cluster using these Cloud Credentials.
To recover an existing cluster, adjust the maximum token TTL and default token TTL to 0
, create a new Cloud Credential for the Harvester cluster and edit the Yaml for the cluster, such that .spec.cloudCredentialSecretName
points to the new Cloud Credentials.
The K8s cluster will eventually recover and any outstanding scaling operation will be completed. The old Cloud Credentials can be disposed of afterwards.
from rancher.
maybe related to #44929 ?
from rancher.
Seems to occur even after the fix for #44929 , both on scaling and creating a new cluster
from rancher.
And I am on rancher v2.8.2
from rancher.
Looking at the created job (for a worker node scaleup):
"args": [ 8 items
"--driver-download-url=https://<host>/assets/docker-machine-driver-harvester",
"--driver-hash=a9c2847eff3234df6262973cf611a91c3926f3e558118fcd3f4197172eda3434",
"--secret-namespace=fleet-default",
"--secret-name=staging-pool-worker-bbfc2798-d5jsj-machine-state",
"rm",
"-y",
"--update-config",
"staging-pool-worker-bbfc2798-d5jsj"
the first thing the driver tries is to delete the non-exisiting pod and fails.... I would expect a create instead. I just don't know in where this command is generated
from rancher.
@bpedersen2 do you have rancher running inside a nested VM or in the same kubernetes cluster of Harvester itself?
from rancher.
Following the manual fix steps by getting the kubeconfig and manually updating the secret in Rancher worked for me!
from rancher.
@bpedersen2 do you have rancher running inside a nested VM or in the same kubernetes cluster of Harvester itself?
No, it is running standalone.
from rancher.
What I observe is that the token in harvester changes.
Rancher is configured to use OIDC, and in the rancher logs I get
Error refreshing token principals, skipping: oauth2: "invalid_grant" "Token is not active"
2024/04/02 11:43:26 [ERROR] [keycloak oidc] GetPrincipal: error creating new http client: oauth2: "invalid_grant" "Token is not active"
2024/04/02 11:43:26 [ERROR] error syncing 'user-XXX': handler mgmt-auth-userattributes-controller: oauth2: "invalid_grant" "Token is not active", requeuing
With a local user, it seems to work
from rancher.
I reregistred the harvester cluster using a non-oidc admin account, now the connections seems to be stable again. It looks like a problem with token expiration to me.
from rancher.
I have the same problem:
Failed creating server [fleet-default/rke2-rc-control-plane-2aae5bdf-2m48z] of kind (HarvesterMachine) for machine rke2-rc-control-plane-5b74797746x4dpcs-ncdxf in infrastructure provider: CreateError: Downloading driver from https://HOST/assets/docker-machine-driver-harvester Doing /etc/rancher/ssl docker-machine-driver-harvester docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped Trying to access option which does not exist THIS ***WILL*** CAUSE UNEXPECTED BEHAVIOR Type assertion did not go smoothly to string for key Running pre-create checks... Error with pre-create check: "the server has asked for the client to provide credentials (get settings.harvesterhci.io server-version)" The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag.
Rancher v2.8.2
Dashboard v2.8.0
Helm v2.16.8-rancher2
Machine v0.15.0-rancher106
Harvester: v1.2.1
from rancher.
I have loop for many hours:
New VM is created, error, new VM is deleted, and new VM is created and again error and again new VM is deleted...
from rancher.
I could manually fix it:
- go to the harvester embedded rancher and get the kube config
- update the kubeconfig in the harvester credential in the cattle-global-data namespace in the local cluster (running rancher). they are probably name hv-cred
OK that's worked for me. I have Rancher with users provided by Active Directory.
from rancher.
Now i have this error:
Failed deleting server [fleet-default/rke2-rc-control-plane-3fba9236-dxptf] of kind (HarvesterMachine) for machine rke2-rc-control-plane-77f9455c9dx9xgsk-4kcwf in infrastructure provider: DeleteError: Downloading driver from https://HOST/assets/docker-machine-driver-harvester Doing /etc/rancher/ssl docker-machine-driver-harvester docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped About to remove rke2-rc-control-plane-3fba9236-dxptf WARNING: This action will delete both local reference and remote instance. Error removing host "rke2-rc-control-plane-3fba9236-dxptf": the server has asked for the client to provide credentials (get virtualmachines.kubevirt.io rke2-rc-control-plane-3fba9236-dxptf)
from rancher.
Hi,
thanks for this bug report. May I ask which Harvester versions you were using, @bpedersen2, @sarahhenkens and when you last updated them?
from rancher.
I am on harvester 1.2.1 and rancher 2.8.3 ( and waiting for 1.2.2 to be able to upgrade to 1.3.x eventually)
from rancher.
Ran in the same issue today as @dawid10353. I'm running harvester 1.2.1 and rancher 2.8.2.
from rancher.
Could you please check the expiry of your API access tokens? There needs to be a kubeconfig token that isn't expired and is associated with the Harvester cluster.
It would also be helpful to know what you were trying to do when you observed the problems and at which step in the process the problems started to occur.
from rancher.
When reading the original issue #41919 which introduces the default TTL value (30 minutes for kubeconf) to securely manage tokens for users (or headless users for programming purposes).
However, Harvester cloud credentials, which are used for authenticating and authorizing the Rancher cluster to manage downstream Harvester clusters, should not use the unified TTL applied in this case, as the token is not for users but an internal mechanism instead.
cc @ibrokethecloud @bk201 @Vicente-Cheng @m-ildefons
from rancher.
Related Issues (20)
- [BUG] Steve VAI API returns 500 error when filtering secrets by label selectors
- Validate SLE Micro 6 on Rancher HOT 1
- [BUG] Failed creating new cluster RKE2 HOT 7
- [BUG] Reconciling stuck message when adding a second node HOT 2
- [BUG] CRDs for `rancher-monitoring-crd` helm chart are not updated
- [BUG] RKE2 custom cluster does not successfully provision when setting the CIS profile HOT 1
- [flaky-test] Certificate Rotation HOT 5
- [RFE] vSphere cluster creation failures lead to cruft
- CA check improvements for `system-agent` HOT 2
- CA check improvements for `rancher-agent` HOT 2
- [BUG] CIS scan fails on K3s cluster created using ubuntu 24.04 nodes.
- add a webhook check for the agent-tls-mode setting
- [SURE-8656] [RFE] Support Creating a Cluster with a Service Account HOT 1
- [BUG] [2.8] Some members with Github as Auth Provider could not be shown in cluster member list HOT 1
- [RFE] [rancher-monitoring] Provide an easy solution to enable the Prometheus ingress HOT 2
- [RFE] 1.29 support
- [BUG] Rancher after executing backup in different upgraded cluster goes 503 HOT 1
- [BUG] Waiting for at least one control plane, etcd, and worker node to be registered
- [BUG] rancher monitoring chart lacks Network Policy permission to collect metrics from GUI's ingress-nginx pods
- [BUG] cannot drop the username and password of a private registry in the secret cattle-system/cattle-private-registry in the downstream cluster once it is set on RKE1 downstream cluster HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rancher.