GithubHelp home page GithubHelp logo

jenkins-x-plugins / jx-kh-check Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 12.0 277 KB

A custom kuberhealthy check for verifying the health of a Jenkins X installation

License: Apache License 2.0

Dockerfile 0.87% Makefile 38.03% Shell 11.96% Go 49.14%
hacktoberfest

jx-kh-check's People

Contributors

jenkins-x-bot avatar jenkins-x-bot-test avatar jrx-sjg avatar jstrachan avatar msvticket avatar rawlingsj avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

jx-kh-check's Issues

Question about entries in helmfile.yaml

I'm seeing the below in helmfile.yaml

- chart: kuberhealthy/kuberhealthy
- chart: jx3/jx-kh-check
- chart: jx3/jx-kh-check
  name: health-checks-jx
- chart: jx3/jx-kh-check
 name: health-checks-install

after running

jx gitops helmfile add --chart kuberhealthy/kuberhealthy
jx gitops helmfile add --chart jx3/jx-kh-check
jx gitops helmfile add --chart jx3/jx-kh-check --name health-checks-jx
jx gitops helmfile add --chart jx3/jx-kh-check --name health-checks-install

is -chart: jx3/jx-kh-check without a name: needed?

jx-webhook-events pods are failing in multi-cluster setup

I have a multi-cluster JenkinsX setup on GKE, the installation was done according to the multi-cluster example documentation.

I know that the part of this setup was to disable webhooks, but it seems that the kuberhealthy checks are still enabled and I'm not sure how to disable them permanently. And because of that, the jx-webhook-events pods are failing with the below error:

time="2022-04-05T08:59:18Z" level=info msg="Found instance namespace: jx"
time="2022-04-05T08:59:18Z" level=info msg="Kuberhealthy is located in the jx namespace."
starting jx-webhook-events health checks
FATAL: failed to list source repositories: failed to find local endpoints: failed to find endpoints hook in namespace jx: endpoints "hook" not found

I have tried to disable jx-webhook-events by adding the custom values.yaml to helmfiles/jx dir and attach it to jxgh/jx-kh-check chart in helmfile.yaml wit the below content:

jxWebhookEvents:
  enabled: false
  image:
    repository: ghcr.io/jenkins-x/jx-webhook-events
  runInterval: 5m # The interval that Kuberhealthy will run your check on
  timeout: 2m # After this much time, Kuberhealthy will kill your check and consider it "failed"
jxWebhooks:
  enabled: false
  image:
    repository: ghcr.io/jenkins-x/jx-webhooks
  runInterval: 90s # The interval that Kuberhealthy will run your check on
  timeout: 2m # After this much time, Kuberhealthy will kill your check and consider it "failed"

The webhook ConfigMaps were removed from the config-root after PR merge, but it looks like the kuberhealthy deployment was not updated with this change, and the pods are still deploying and failing in jx namespace.

Also the jx health status -A is returning below errors:

NAME                          NAMESPACE                     STATUS                        ERROR MESSAGE
certmanager-tls               kuberhealthy                  OK                   
daemonset                     kuberhealthy                  OK                   
deployment                    kuberhealthy                  OK                   
dns-status-internal           kuberhealthy                  OK                   
jx-bot-token                  jx                            ERROR                         Check execution error: jx/jx-bot-token: failed to see pod running within timeout
jx-install                    jx-git-operator               OK                   
jx-pod-status                 kuberhealthy                  OK                   
jx-secrets                    kuberhealthy                  OK                   
jx-webhook                    jx                            OK                   
jx-webhook-events             jx                            ERROR                         Check execution error: jx/jx-webhook-events: timed out waiting for checker pod to report in
network-connection-check      kuberhealthy                  OK                   
pod-restarts                  kuberhealthy                  OK   

jx-bot-token shows Healthy=false

I wanted to change the token used for jx git operator

jx admin operator succeeded as can be seen below but the health check still show healthy=false in Octant and

jx health get status --all-namespaces --watch shows

NAME                          NAMESPACE                     STATUS                        ERROR MESSAGE
jx-bot-token                  jx                            ERROR                         failed to verify bot account with https://api.github.com/user, response status 401 Unauthorized code 401
changing to the jx namespace to verify
jx ns jx
Now using namespace 'jx' on server ''.
jx verify ingress -b
now verifying docker registry ingress setup in dir .
jx gitops webhook update --warn-on-fail
W1102 13:39:26.780733    2046 warnings.go:67] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Checking hooks for repository jtf-ops/jx3-20201023
Found matching hook for url http://hook-jx.jx.docure.ai/hook
boot Job pod jx-boot-500f167bc44fda69928bce3-l64cg has Succeeded
boot Job jx-boot-500f167bc44fda69928bce3 has Succeeded

dns-status-internal healcheck crashes

Another crash, same context as here.

time="2021-01-13T21:10:11Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-13T21:10:11Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-01-13T21:10:11Z" level=info msg="Check time limit set to: 14m47.427936725s"
time="2021-01-13T21:10:11Z" level=info msg="Check pod is running on node: aks-default-15766151-vmss000000"
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for node to reach minimum age: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for kube proxy to be ready: error getting kuberhealthy pod: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=info msg="Running DNS status checker"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check testing hostname: kubernetes.default"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check determined that kubernetes.default was OK."
2021/01/13 21:10:11 checkClient: DEBUG: Reporting SUCCESS
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with error length of:0
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with ok state of:true
2021/01/13 21:10:11 checkClient: INFO: Using kuberhealthy reporting URL:http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus
2021/01/13 21:10:11 checkClient: DEBUG: Making POST request to kuberhealthy:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11f1473]
goroutine 1 [running]:
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.sendReport(0x202a688, 0x0, 0x0, 0x1, 0xc0003265e8, 0xc0002edd74)
   /build/pkg/checks/external/checkclient/main.go:99 +0x4e3
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.ReportSuccess(0xc0002eddf0, 0xc0003265a0)
   /build/pkg/checks/external/checkclient/main.go:44 +0x7e
main.reportKHSuccess(0xc0002eddc8, 0xc0002edd70)
   /build/cmd/dns-resolution-check/main.go:182 +0x2d
main.(*Checker).Run(0xc0002f5ee0, 0xc00026fe40, 0xc0002eded0, 0x2)
   /build/cmd/dns-resolution-check/main.go:161 +0x204
main.main()
   /build/cmd/dns-resolution-check/main.go:119 +0x3ce
stream closed

Many heath check pods crash

On a new AKS cluster, I've got this error:

time="2021-01-12T12:49:05Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-12T12:49:05Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
starting jx-webhooks health checks
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12080b4]
goroutine 1 [running]:
main.Options.findErrors(0x15b0f80, 0xc000341430, 0x0, 0x22, 0x0, 0x0, 0x0)
    /workspace/source/cmd/jx-secrets/main.go:68 +0x114
main.main()
    /workspace/source/cmd/jx-secrets/main.go:32 +0x107
stream closed

(commit ref for the cluster: jx3-gitops-repositories/jx3-terraform-azure@8688dcb)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.