jenkins-x-plugins / jx-kh-check Goto Github PK
View Code? Open in Web Editor NEWA custom kuberhealthy check for verifying the health of a Jenkins X installation
License: Apache License 2.0
A custom kuberhealthy check for verifying the health of a Jenkins X installation
License: Apache License 2.0
I wanted to change the token used for jx git operator
jx admin operator
succeeded as can be seen below but the health check still show healthy=false
in Octant and
jx health get status --all-namespaces --watch
shows
NAME NAMESPACE STATUS ERROR MESSAGE
jx-bot-token jx ERROR failed to verify bot account with https://api.github.com/user, response status 401 Unauthorized code 401
changing to the jx namespace to verify
jx ns jx
Now using namespace 'jx' on server ''.
jx verify ingress -b
now verifying docker registry ingress setup in dir .
jx gitops webhook update --warn-on-fail
W1102 13:39:26.780733 2046 warnings.go:67] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Checking hooks for repository jtf-ops/jx3-20201023
Found matching hook for url http://hook-jx.jx.docure.ai/hook
boot Job pod jx-boot-500f167bc44fda69928bce3-l64cg has Succeeded
boot Job jx-boot-500f167bc44fda69928bce3 has Succeeded
I'm seeing the below in helmfile.yaml
- chart: kuberhealthy/kuberhealthy
- chart: jx3/jx-kh-check
- chart: jx3/jx-kh-check
name: health-checks-jx
- chart: jx3/jx-kh-check
name: health-checks-install
after running
jx gitops helmfile add --chart kuberhealthy/kuberhealthy
jx gitops helmfile add --chart jx3/jx-kh-check
jx gitops helmfile add --chart jx3/jx-kh-check --name health-checks-jx
jx gitops helmfile add --chart jx3/jx-kh-check --name health-checks-install
is -chart: jx3/jx-kh-check
without a name:
needed?
Given:
jxSecrets:
enabled: true
cluster:
enabled: true
Template charts/jx-kh-check/templates/jx-secrets-check.yaml renders incorrectly:
---
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
name: jx-secrets
annotations:
docs.jenkins-x.io: https://jenkins-x.io/v3/admin/troubleshooting/secrets/
spec:
runInterval: <no value>
timeout: <no value>
podSpec:
securityContext:
runAsUser: 999
fsGroup: 999
containers:
- env:
image: <no value>:<no value>
imagePullPolicy: IfNotPresent
name: main
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
serviceAccountName: jx-secrets-sa
The "- env:" line is missing the empty array, it should be: "- env: []" or be removed
On a new AKS cluster, I've got this error:
time="2021-01-12T12:49:05Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-12T12:49:05Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
starting jx-webhooks health checks
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12080b4]
goroutine 1 [running]:
main.Options.findErrors(0x15b0f80, 0xc000341430, 0x0, 0x22, 0x0, 0x0, 0x0)
/workspace/source/cmd/jx-secrets/main.go:68 +0x114
main.main()
/workspace/source/cmd/jx-secrets/main.go:32 +0x107
stream closed
(commit ref for the cluster: jx3-gitops-repositories/jx3-terraform-azure@8688dcb)
Another crash, same context as here.
time="2021-01-13T21:10:11Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-13T21:10:11Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-01-13T21:10:11Z" level=info msg="Check time limit set to: 14m47.427936725s"
time="2021-01-13T21:10:11Z" level=info msg="Check pod is running on node: aks-default-15766151-vmss000000"
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for node to reach minimum age: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for kube proxy to be ready: error getting kuberhealthy pod: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=info msg="Running DNS status checker"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check testing hostname: kubernetes.default"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check determined that kubernetes.default was OK."
2021/01/13 21:10:11 checkClient: DEBUG: Reporting SUCCESS
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with error length of:0
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with ok state of:true
2021/01/13 21:10:11 checkClient: INFO: Using kuberhealthy reporting URL:http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus
2021/01/13 21:10:11 checkClient: DEBUG: Making POST request to kuberhealthy:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11f1473]
goroutine 1 [running]:
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.sendReport(0x202a688, 0x0, 0x0, 0x1, 0xc0003265e8, 0xc0002edd74)
/build/pkg/checks/external/checkclient/main.go:99 +0x4e3
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.ReportSuccess(0xc0002eddf0, 0xc0003265a0)
/build/pkg/checks/external/checkclient/main.go:44 +0x7e
main.reportKHSuccess(0xc0002eddc8, 0xc0002edd70)
/build/cmd/dns-resolution-check/main.go:182 +0x2d
main.(*Checker).Run(0xc0002f5ee0, 0xc00026fe40, 0xc0002eded0, 0x2)
/build/cmd/dns-resolution-check/main.go:161 +0x204
main.main()
/build/cmd/dns-resolution-check/main.go:119 +0x3ce
stream closed
I have a multi-cluster JenkinsX setup on GKE, the installation was done according to the multi-cluster example documentation.
I know that the part of this setup was to disable webhooks, but it seems that the kuberhealthy checks are still enabled and I'm not sure how to disable them permanently. And because of that, the jx-webhook-events pods are failing with the below error:
time="2022-04-05T08:59:18Z" level=info msg="Found instance namespace: jx"
time="2022-04-05T08:59:18Z" level=info msg="Kuberhealthy is located in the jx namespace."
starting jx-webhook-events health checks
FATAL: failed to list source repositories: failed to find local endpoints: failed to find endpoints hook in namespace jx: endpoints "hook" not found
I have tried to disable jx-webhook-events by adding the custom values.yaml to helmfiles/jx dir and attach it to jxgh/jx-kh-check chart in helmfile.yaml wit the below content:
jxWebhookEvents:
enabled: false
image:
repository: ghcr.io/jenkins-x/jx-webhook-events
runInterval: 5m # The interval that Kuberhealthy will run your check on
timeout: 2m # After this much time, Kuberhealthy will kill your check and consider it "failed"
jxWebhooks:
enabled: false
image:
repository: ghcr.io/jenkins-x/jx-webhooks
runInterval: 90s # The interval that Kuberhealthy will run your check on
timeout: 2m # After this much time, Kuberhealthy will kill your check and consider it "failed"
The webhook ConfigMaps were removed from the config-root after PR merge, but it looks like the kuberhealthy deployment was not updated with this change, and the pods are still deploying and failing in jx namespace.
Also the jx health status -A is returning below errors:
NAME NAMESPACE STATUS ERROR MESSAGE
certmanager-tls kuberhealthy OK
daemonset kuberhealthy OK
deployment kuberhealthy OK
dns-status-internal kuberhealthy OK
jx-bot-token jx ERROR Check execution error: jx/jx-bot-token: failed to see pod running within timeout
jx-install jx-git-operator OK
jx-pod-status kuberhealthy OK
jx-secrets kuberhealthy OK
jx-webhook jx OK
jx-webhook-events jx ERROR Check execution error: jx/jx-webhook-events: timed out waiting for checker pod to report in
network-connection-check kuberhealthy OK
pod-restarts kuberhealthy OK
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.