GithubHelp home page GithubHelp logo

Comments (3)

dprotaso avatar dprotaso commented on August 25, 2024 1

What I discovered in slack is that the webhook crashes after the logs - thus our liveness probes are failing.

From kubernetes/kubernetes#114872 (comment)

The delay on start protects against a skewed clock component taking over an active lease because it has a wrong view of the current time (or because the active lease component has a wrong view of the current time). This PR appears to drop that protection and make us vulnerable to that again. Is that right?

So essentially we need to adjust our probing to ensure we accomodate that when we have an expired lease we won't have a new leader until the second retry and the lease duration has passed since the elector has started.

Since our default lease duration is 60s that means we'll need to wait at least that long.

from serving.

dprotaso avatar dprotaso commented on August 25, 2024 1

/triage accepted

from serving.

mukulgit123 avatar mukulgit123 commented on August 25, 2024

@dprotaso helped with debugging this issue and we tried below:

Enabled debug logging on webhook deployment by following this

We also added below argument to the webhook deployment.

      - args:
        - -v=4

We checked the logs and found below:

I0527 12:39:17.441659       1 merged_client_builder.go:121] Using in-cluster configuration
2024/05/27 12:39:17 Registering 2 clients
2024/05/27 12:39:17 Registering 2 informer factories
2024/05/27 12:39:17 Registering 3 informers
2024/05/27 12:39:17 Registering 3 controllers
{"severity":"debug","timestamp":"2024-05-27T12:39:17.540Z","caller":"logging/config.go:116","message":"Successfully created the logger."}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.540Z","caller":"logging/config.go:117","message":"Logging level set to: debug"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.540Z","logger":"net-istio-webhook","caller":"profiling/server.go:65","message":"Profiling enabled: false","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.545Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:47","message":"Running with Standard leader election","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/prometheus_exporter.go:52","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_webhook prometheus 5000000000 <nil>  false 41478 0.0.0.0}. Start the server for Prometheus exporter.","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_webhook prometheus 5000000000 <nil>  false 41478 0.0.0.0}","commit":"fcad131-dirty"}
I0527 12:39:17.560657       1 reflector.go:221] Starting reflector *v1.ConfigMap (0s) from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.560673       1 reflector.go:257] Listing and watching *v1.ConfigMap from k8s.io/[email protected]/tools/cache/reflector.go:169
{"severity":"info","timestamp":"2024-05-27T12:39:17.560Z","logger":"net-istio-webhook","caller":"sharedmain/main.go:283","message":"Starting configuration manager...","commit":"fcad131-dirty"}
I0527 12:39:17.660674       1 shared_informer.go:285] caches populated
{"level":"info","ts":1716813557.660793,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
I0527 12:39:17.660953       1 reflector.go:221] Starting reflector *v1.MutatingWebhookConfiguration (10h0m0s) from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.660965       1 reflector.go:257] Listing and watching *v1.MutatingWebhookConfiguration from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.661140       1 reflector.go:221] Starting reflector *v1.ValidatingWebhookConfiguration (10h0m0s) from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.661148       1 reflector.go:257] Listing and watching *v1.ValidatingWebhookConfiguration from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.661218       1 reflector.go:221] Starting reflector *v1.Secret (10h0m0s) from k8s.io/[email protected]/tools/cache/reflector.go:169
I0527 12:39:17.661226       1 reflector.go:257] Listing and watching *v1.Secret from k8s.io/[email protected]/tools/cache/reflector.go:169
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.838Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:420","message":"Adding to queue config.webhook.istio.networking.internal.knative.dev (depth: 2)","commit":"fcad131-dirty","knative.dev/key":"/config.webhook.istio.networking.internal.knative.dev"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.843Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:420","message":"Adding to queue webhook.istio.networking.internal.knative.dev (depth: 2)","commit":"fcad131-dirty","knative.dev/key":"/webhook.istio.networking.internal.knative.dev"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:220","message":"Informers have been synced, unblocking admission webhooks.","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"sharedmain/main.go:311","message":"Starting controllers...","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.configmapwebhook.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_bf929e35-d818-4a6f-9d9e-573e7924657e\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:513","message":"Processing from queue config.webhook.istio.networking.internal.knative.dev (depth: 0)","commit":"fcad131-dirty"}
I0527 12:39:17.861619       1 shared_informer.go:285] caches populated
I0527 12:39:17.861650       1 shared_informer.go:285] caches populated
I0527 12:39:17.861654       1 shared_informer.go:285] caches populated
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.webhookcertificates.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_5fc9e68f-b2d8-4446-9aa8-c157f8ec4f29\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 0)","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.defaultingwebhook.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_190f0c2f-86ff-41e2-a16e-9ff5d2faeb92\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:513","message":"Processing from queue webhook.istio.networking.internal.knative.dev (depth: 0)","commit":"fcad131-dirty"}
I0527 12:39:17.862139       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01...
I0527 12:39:17.862375       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01...
I0527 12:39:17.862445       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01...
I0527 12:39:17.938291       1 leaderelection.go:352] lock is held by ip-10-68-193-123_023c8900-8001-4a0e-8071-6244c6313cdd and has not yet expired
I0527 12:39:17.938312       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01
I0527 12:39:17.938404       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:17.938412       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
I0527 12:39:17.938469       1 leaderelection.go:352] lock is held by ip-10-68-201-34_dddd68b1-eccc-47b9-9a6b-9d718a628bfa and has not yet expired
I0527 12:39:17.938479       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01
I0527 12:39:30.132824       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:30.132851       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
I0527 12:39:34.633445       1 leaderelection.go:352] lock is held by ip-10-68-193-123_023c8900-8001-4a0e-8071-6244c6313cdd and has not yet expired
I0527 12:39:34.633471       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01
I0527 12:39:35.971552       1 leaderelection.go:352] lock is held by ip-10-68-201-34_dddd68b1-eccc-47b9-9a6b-9d718a628bfa and has not yet expired
I0527 12:39:35.971571       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01
I0527 12:39:46.912546       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:46.912568       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
{"severity":"warn","timestamp":"2024-05-27T12:39:48.649Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:176","message":"server key missing","commit":"fcad131-dirty"}
2024/05/27 12:39:48 http: TLS handshake error from 10.68.203.196:46328: tls: no certificates configured
{"severity":"warn","timestamp":"2024-05-27T12:39:48.649Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:176","message":"server key missing","commit":"fcad131-dirty"}
2024/05/27 12:39:48 http: TLS handshake error from 10.68.203.196:46312: tls: no certificates configured

After we figured out there is something off with the lease we ran below command to check for respective lease status during pod restarts:

watch -d -n1 kubectl get leases -n net-istio-webhook.configmapwebhook.00-of-01 -o yaml

This brought us to the conclusion that lease is having an older timestamp is not getting renewed.

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2023-06-01T14:18:27Z"
  name: net-istio-webhook.webhookcertificates.00-of-01
  namespace: knative-serving
  resourceVersion: "1950716655"
  uid: 26f8b702-2896-4d69-b69c-87c469bc05cf
spec:
  acquireTime: "2024-05-23T11:44:34.479253Z"
  holderIdentity: ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87
  leaseDurationSeconds: 60
  leaseTransitions: 2244
  renewTime: "2024-05-23T11:44:44.575179Z"

Dave then pointed out that this is probably a bug with go client and pointed to this fix which has already been introduced in the recent version of knative.

Below workaround was suggested which resolved the issue for us.

  1. Delete the lease which is not getting renewed.
  2. Restart the webhook pods/deployment which is failing

Result:

  1. webhook-certs get successfully populated
  2. MutatingWebhookConfiguration recieves the caBundle for the same
  3. Webhook pods start successfully.

@dprotaso Thanks for the great work and help on this so far!! 🥇 🏆

from serving.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.