GithubHelp home page GithubHelp logo

cert-controller's People

Contributors

acpana avatar adammw avatar adrianludwin avatar astefanutti avatar bdun1013 avatar brycecr avatar dependabot[bot] avatar fedepaol avatar gliptak avatar haiyanmeng avatar jaydipgabani avatar jorturfer avatar lpcalisi avatar maxsmythe avatar ritazh avatar shomron avatar sozercan avatar step-security-bot avatar stijndehaes avatar virrages avatar willbeason avatar yiqigao217 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cert-controller's Issues

Configure certificate validity duration

We are using cert-controller library for bootstrapping of webhookserver in our cluster. We'd like to reduce the certificate validity duration to comply with our security policies.

I see there is already an open issue for this. I can raise a pull request and add an option to either configure this or use the default 10 years validity.

Allow cert-controller to restart if a secret was changed

We made this change in HNC's version of cert-controller and it reduced the initial startup time from >100s to about 10s. See kubernetes-retired/multi-tenancy@b070055.

I'd like to make the same change here - if the --cert-restart-on-secret-refresh flag is set (name is negotiable), then cert-controller will call os.Exit() when it updates a secret. This should only happen after initial installation or every 10y.

@maxsmythe , @ritazh , any thoughts on this?

Allow for coordinated rotation of keys across multiple pods

This is blocked on #27

We should have the ability for one centralized process to manage the key rotation, so that it can be done in a gradual manner.

The alternative would be to have some sort of leader election to figure out which pod is managing key rotation.

We may want to support both models, given that different consumers may have different hosting schemes and availability requirements.

Failed to wait for cert-rotator caches to sync in non-leader elected instances

In the case the cert controller is added to a non-leader manager, i.e., with CertRotator.RequireLeaderElection set to false, it fails with the following error message:

{"level":"error","ts":"2023-12-12T18:04:44.726776367Z","caller":"controller/controller.go:203","msg":"Could not wait for Cache to sync","controller":"cert-rotator","error":"failed to wait for cert-rotator caches to sync: timed out waiting for cache to be synced for Kind *v1.Secret","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:203\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}

Recommended way to configure/run in multi-replica setting

I'm looking at using this awesome library in my admission webhook after a long search.

I'm curious if the library has any builtin mechanisms to coordinate first-time cert provisioning or renewals when the webhook itself is deployed as a ReplicaSet with >1 instances (and they race each other and end up with different certs or have write-write conflict on webhookconfiguration caBundle field)?

Or is this concern inherently not valid (maybe because Secrets eventually propagate and processes restart etc)?

CI

  • linter
  • unit test
  • e2e
  • PR gate

Ready channel is never signaled on non-leaders

With the addition of #45, the cert-controller can be set to run only in the leader - instructing the leader to be responsible for the certificate injection and management.

But how can we send the same signal to the followers?
With the current implementation, the ready channel will never be signaled.

Add config options to control validity duration for generated certs

Add config options for:

  • Setting the validity duration of generated certs
  • Lookahead time for the regenerate-when-expiry-is-near trigger

We should put off doing this until more frequent cert rotations are safe WRT availability. The work for doing so is listed in the "Allow Setting Cert Validity Duration" milestone.

configurable certificate validity

We are using gatekeeper with the automatic certificate management provided by cert-controller. We'd like to be able to configure the duration for certificate validity (and likely lookahead interval) to align with our internal policies.

If you are open to this change, I am happy to create a PR for it.

Delay when the certs are mounted and available for use

I'm using the cert-controller in one of the projects to bootstrap a mutating webhook. I've configured the rotator using the example provided in the doc. Interestingly in most of the CI runs and local testing, I'm seeing a delay when the certs are available in the mount. Seeing it take upto 1m30s in few instances before the certs are ready in the mount path. The delay could be because the Kubernetes secret update is delayed and the mount republish is missed at the first attempt.

Is this a known behavior? Is that why there is RestartOnSecretRefresh property in struct?

	github.com/open-policy-agent/cert-controller v0.2.0
	k8s.io/kubernetes v1.21.2
	sigs.k8s.io/controller-runtime v0.9.2

Usage:

	// Make sure certs are generated and valid if cert rotation is enabled.
	setupFinished := make(chan struct{})
	if !disableCertRotation {
		entryLog.Info("setting up cert rotation")
		if err := rotator.AddRotator(mgr, &rotator.CertRotator{
			SecretKey: types.NamespacedName{
				Namespace: util.GetNamespace(),
				Name:      secretName,
			},
			CertDir:        webhookCertDir,
			CAName:         caName,
			CAOrganization: caOrganization,
			DNSName:        dnsName,
			IsReady:        setupFinished,
			Webhooks:       webhooks,
		}); err != nil {
			entryLog.Error(err, "unable to set up cert rotation")
			os.Exit(1)
		}
	} else {
		close(setupFinished)
	}

Downtime after a caBundle until Secret propagation to pod

Based on my experimentation, it seems that the kubelet's latency to reflect the updates on a watched Secret (configMapAndSecretChangeDetectionStrategy=Watch) to a container's filesystem seems to be ranging from 30-100 seconds (i.e. not instant), regardless of minikube, kind, GKE or kubeadm clusters.

Does this basically mean that until the container that's running the webhook (and automating certificate management with cert-controller package), the webhook actually will be down because this library updates WebhookConfiguration's .caBundle field with the new CA cert (which instantly takes effect) and it will no longer match the served TLS certificate for another minute or so?

Is this a known issue, or something that's factored to the current design that's solved (maybe I'm seeing it incorrectly).

rotator.AddRotator doesn't exit when the process is terminated

If you use rotator.AddRotator to build a rotator.ReconcileWH and add it to the controller manager, it uses a context.Background() which is never cancelled. This means the Watch added to the controller manager is never terminated, causing the controller manager to wait its entire Options.GracefulShutdownTimeout (default: 30s) before exiting after SIGTERM/SIGINT.

https://github.com/open-policy-agent/cert-controller/blob/master/pkg/rotator/rotator.go#L110

Can rotator.AddRotator be made to accept a context and pass it through so the controller can exit quickly & gracefully?

Or am I using it wrong?

The webhook not start because of certFile check when deploy g8r out of cluster

I have a special scene which g8r is deployed out of cluster and I configure the --kubeconfig option in controller-runtime to make g8r watch the user behavior in the cluster which I would like to.

In this case, cert-controller will generate ca and update the secret which is in the remote cluster. However, the local file, such as tls.crt in certDir will not update. So, because of the certFile check below, the webhook will not start.

// ensureCertsMounted ensure the cert files exist.
func (cr *CertRotator) ensureCertsMounted() {
checkFn := func() (bool, error) {
certFile := cr.CertDir + "/" + certName
_, err := os.Stat(certFile)
if err == nil {
return true, nil
}
return false, nil
}
if err := wait.ExponentialBackoff(wait.Backoff{
Duration: 1 * time.Second,
Factor: 2,
Jitter: 1,
Steps: 10,
}, checkFn); err != nil {
crLog.Error(err, "max retries for checking certs existence")
close(cr.certsNotMounted)
return
}
crLog.Info(fmt.Sprintf("certs are ready in %s", cr.CertDir))
close(cr.certsMounted)
}

I wonder if it check the tls.crt in the secret is better. And actually the caBundle which is injected in webhook is based on the secret, not the certFile. Or we should some sync logic if the caFile in the secret is different from the local File. I think the latter is better.

CA and Server certificate potentially get updated before ValidatingWebhookConfiguration

Is it not a problem that the ValidatingWebhookConfiguration and the Secret are updated independently from each other? I think this can lead to the condition where the CA is already renewed but the ValidatingWebhookConfiguration still have the old CA and thus calls to the webhook would fail?

I did not really had problems with this. I only looked into the code and thought that this might become a problem. Or do I miss something?

Question on usefulness of RestartOnSecretRefresh

Follow up to #44, it appears that 4842e47 added the RestartOnSecretRefresh, which restarts the process (os.Exit(0)) every time refreshCerts() is called, to update the Secret.

That said, Kubernetes typically takes ~up to 1 minute delivering the secret to kubelet (easily reproducible on minikube, or kind, or a GKE cluster) with default kubelet configurations.

Since the delivery of updated Secret to the Pod is not instant (or even a duration that can be considered quick), what makes the os.Exit(0) useful if the kubelet will still serve the old Secret upon the restart?

cc: @stijndehaes

Create a new release that supports K8s 1.22+

/cc @maxsmythe
/cc @ritazh

Hey Max + Rita, I see that we've recently gotten rid of the pre-v1 APIs we were using to access CRDs (I'd forgotten that we weren't on v1 yet). Does it make sense to cut a v0.3.0 release so there's a stable tag for this repo that supports K8s 1.22+?

Thanks!

Support multiple dnsNames

In KEDA we are integrating this solution as a "default not safe" for cert management. Our problem is that we have 3 different components, the admission webhooks, the operator and the metrics server and all of them should share the certs because we want to use them for some internal communications.
Using cert-manager or other solution, we can just create a certificate with multiple dns names and sharing the same secret, we can secure all the internal communications and also webhooks and api services, but for new adopters, the requirement of a 3rd party could be a problem, and that's why we want to introduce this project for certificate management in non-productive environments, but not having the option for setting multiple dns names block us.

I'll open a PR with the changes in case you think this is useful

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.