Comments (12)
Could be connected - Since a long time we are getting this log entry on Bitbucket side:
[auth_basic:error] [pid xxxxxx] [client xxx.xxx.xxx.xxx:yyyyy] AH01617: user xxxxxx: authentication failure for "/bitbucket/xxxxx/xxxx/some-repo.git/info/refs": Password Mismatch
This happens quite often and is polluting the server logs considerably.
from fleet.
I was not able to reproduce it. I tried like this:
- Deployed Rancher 2.7.9 on k3s
v1.26.10+k3s2
- Deployed private Bitbucket repos both locally and on an RKE2 downstream cluster
- Upgraded later to 2.8.2
No issues found:
from fleet.
ok. After 75 mins approximately I was able to see some logs similar to the ones described.
For a fraction of a moment the ui also displayed the errors on it, but after a bit they were gone
Pasting here the logs found:
time="2024-03-07T15:18:32Z" level=debug msg="Enqueueing gitjob fleet-local/bitbucket-local in 15 seconds"
time="2024-03-07T15:18:33Z" level=error msg="Error fetching latest commit: Get \"https://bitbucket.org/fleet-test-bitbucket/bitbucket-fleet-test/info/refs?service=git-upload-pack\": context deadline exceeded"
time="2024-03-07T15:18:33Z" level=debug msg="Enqueueing gitjob fleet-default/bit-butcket-local in 15 seconds"
E0307 15:18:44.117346 7 leaderelection.go:327] error retrieving resource lock cattle-fleet-system/gitjob: Get "https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-system/configmaps/gitjob": context deadline exceeded
I0307 15:18:44.117363 7 leaderelection.go:280] failed to renew lease cattle-fleet-system/gitjob: timed out waiting for the condition
W0307 15:18:57.116748 7 reflector.go:456] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0307 15:18:57.116804 7 reflector.go:456] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: watch of *v1.Job ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0307 15:18:57.116807 7 reflector.go:456] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0307 15:18:57.116807 7 reflector.go:456] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: watch of *v1.GitJob ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
E0307 15:18:57.116841 7 leaderelection.go:303] Failed to release lock: Put "https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-system/configmaps/gitjob": http2: client connection lost
time="2024-03-07T15:18:57Z" level=fatal msg="leaderelection lost for gitjob"
from fleet.
We have over 150 gitjobs running on about 15 downstream clusters. This issue arises in just a few minutes and persists.
from fleet.
thanks @rajeshneo. Just a couple of questions:
is the error gone if you force update the gitjobs?
could you please increase the polling interval per git repo?
from fleet.
It becomes active for 5 minutes or so and then returns to the error state with the same error message. I tried changing polling also using FLEET_CLUSTER_ENQUEUE_DELAY variable but no improvement.
from fleet.
Sorry, but I was not able to reproduce this issue on a steady manner using k3s as the main cluster and RKE2 as the downstream one. I was able to see the UI error message after disconnecting and reconnecting the clusters, but no logs pointing to the actual disconnection with the repos, and they were gone either when forcing an update or extending the pollingInterval
to over 45 seconds.
from fleet.
@mmartin24 Not sure if there are other users as well who are facing this. for now, I have downgraded my fleet to 0.8.0 version and everything is healthy again.
from fleet.
@mmartin24 Not sure if there are other users as well who are facing this. for now, I have downgraded my fleet to 0.8.0 version and everything is healthy again.
To my knowledge, this is the only ticket so far. I can see it has an internal Jira issue and it has been already queued to be addressed.
from fleet.
Maybe this timeout is too small? https://github.com/rancher/fleet/blob/main/pkg/git/lsremote.go#L140C1-L140C31
Let's make it configurable.
Related:
from fleet.
Seeing the same thing here with git (cloud gitlab) on 2.8.2, so it's not specific to bitbucket. Have another cluster on 2.8.3 that we will evaluate on shortly.
from fleet.
Running on RKE2 1.28.8 and Rancher 2.8.3. This problem started appearing since Rancher 2.8.2, I believe. Fleet version is fleet:103.1.4+up0.9.4
Most of the pollingInterval
on all the Fleet repos are set to 5 minutes here since the beginning. It does not help avoiding this problem.
A configurable Timeout might actually help here.
from fleet.
Related Issues (20)
- Bundle is not re-generated and Gitrepo status is not updated when deleting a bundle. HOT 2
- metrics: Add metrics to gitops controller
- Enable node selection for shards
- Grafana Dashboard for Metrics
- Fleet Repo doesn't show any error when there is an issue (in fleet 0.9) HOT 1
- Improve Content resources cleanup
- Add extraEnvs to allow setting env vars for the controller
- Error `no matches for kind \"GitRepo\" in version \"fleet.cattle.io/v1alpha1\"` in gitjob logs after start HOT 1
- Error with stacktrace in gitjob pod after sending a webhook event with wrong credentials
- ‘Continuous Delivery Dashboard’ shows bundles in not ready state
- Force Update on GitRepo is creating multiple job workloads that fill-up the entire pod limit. HOT 1
- [SURE-8482] Misleading error message when trying to deploy cluster resources with targetNamespace HOT 1
- [SURE-8481] When deploying a helm chart stored in a OCI repository, if the configuration is invalid, fleet won't throw any error. Instead, the bundle will remain absent from the bundle list without any indications. HOT 3
- Submodules aren't cloned recursively anymore since Fleet 0.9.x
- cannot clone ssh url "SSH agent requested but SSH_AUTH_SOCK not-specified" HOT 2
- Merge gitopts and gitrepo controllers
- [0.9] [SURE-8550] drift detection is generating secrets without cleaning HOT 2
- [forwardport v0.10][SURE-8550] drift detection is generating secrets without cleaning HOT 1
- Problems with Gitrepos with wrong URLs HOT 1
- Drift detection/correction should omit status fields
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fleet.