enix / kube-image-keeper Goto Github PK

kuik is a container image caching system for Kubernetes

License: MIT License

Dockerfile 1.36% Makefile 5.35% Go 90.78% Smarty 2.51%

cache kubernetes oci-image oci-registry containers

kube-image-keeper's Introduction

kube-image-keeper (kuik)

kube-image-keeper (a.k.a. kuik, which is pronounced /kwɪk/, like "quick") is a container image caching system for Kubernetes. It saves the container images used by your pods in its own local registry so that these images remain available if the original becomes unavailable.

Upgrading

From 1.6.0 o 1.7.0

ACTION REQUIRED

To follow Helm3 best pratices, we moved cachedimage and repository custom resources definition from the helm templates directory to the dedicated crds directory. This will cause the cachedimage CRD to be deleted during the 1.7.0 upgrade.

We advice you to uninstall your helm release, clean the remaining custom resources by removing their finalizer, then reinstall kuik in 1.7.0

You may also recreate the custom resource definition right after the upgrade to 1.7.0 using

kubectl apply -f https://raw.githubusercontent.com/enix/kube-image-keeper/main/helm/kube-image-keeper/crds/cachedimage-crd.yaml
kubectl apply -f https://raw.githubusercontent.com/enix/kube-image-keeper/main/helm/kube-image-keeper/crds/repository-crd.yaml

Why and when is it useful?

At Enix, we manage production Kubernetes clusters both for our internal use and for various customers; sometimes on premises, sometimes in various clouds, public or private. We regularly run into image availability issues, for instance:

the registry is unavailable or slow;
a critical image was deleted from the registry (by accident or because of a misconfigured retention policy),
the registry has pull quotas (or other rate-limiting mechanisms) and temporarily won't let us pull more images.

(The last point is a well-known challenge when pulling lots of images from the Docker Hub, and becomes particularly painful when private Kubernetes nodes access the registry through a single NAT gateway!)

We needed a solution that would:

work across a wide range of Kubernetes versions, container engines, and image registries,
preserve Kubernetes' out-of-the-box image caching behavior and image pull policies,
have fairly minimal requirements,
and be easy and quick to install.

We investigated other options, and we didn't find any that would quite fit our requirements, so we wrote kuik instead.

Prerequisites

A Kubernetes cluster¹ (duh!)
Admin permissions²
cert-manager³
Helm⁴ >= 3.2.0
CNI plugin with port-mapper⁵ enabled
In a production environment, we definitely recommend that you use persistent⁶ storage

¹A local development cluster like minikube or KinD is fine.
²In addition to its own pods, kuik needs to register a MutatingWebhookConfiguration.
³kuik uses cert-manager to issue and configure its webhook certificate. You don't need to configure cert-manager in a particular way (you don't even need to create an Issuer or ClusterIssuer). It's alright to just kubectl apply the YAML as shown in the cert-manager installation instructions.
⁴If you prefer to install with "plain" YAML manifests, we'll tell you how to generate these manifests.
⁵Most CNI plugins these days enable port-mapper out of the box, so this shouldn't be an issue, but we're mentioning it just in case.
⁶You can use kuik without persistence, but if the pod running the registry gets deleted, you will lose your cached images. They will be automatically pulled again when needed, though.

Supported Kubernetes versions

kuik has been developed for, and tested with, Kubernetes 1.24 to 1.28; but the code doesn't use any deprecated (or new) feature or API, and should work with newer versions as well.

How it works

When a pod is created, kuik's mutating webhook rewrites its images on the fly to point to the local caching registry, adding a localhost:{port}/ prefix (the port is 7439 by default, and is configurable). This means that you don't need to modify/rewrite the source registry url of your manifest/helm chart used to deploy your solution, kuik will take care of it.

On localhost:{port}, there is an image proxy that serves images from kuik's caching registry (when the images have been cached) or directly from the original registry (when the images haven't been cached yet).

One controller watches pods, and when it notices new images, it creates CachedImage custom resources for these images.

Another controller watches these CachedImage custom resources, and copies images from source registries to kuik's caching registry accordingly. When images come from a private registry, the controller will use the imagePullSecrets from the CachedImage spec, those are set from the pod that produced the CachedImage.

Here is what our images look like when using kuik:

$ kubectl get pods -o custom-columns=NAME:metadata.name,IMAGES:spec.containers[*].image
NAME                   IMAGES
debugger               localhost:7439/registrish.s3.amazonaws.com/alpine
factori-0              localhost:7439/factoriotools/factorio:1.1
nvidiactk-b5f7m        localhost:7439/nvcr.io/nvidia/k8s/container-toolkit:v1.12.0-ubuntu20.04
sshd-8b8c6cfb6-l2tc9   localhost:7439/ghcr.io/jpetazzo/shpod
web-8667899c97-2v88h   localhost:7439/nginx
web-8667899c97-89j2h   localhost:7439/nginx
web-8667899c97-fl54b   localhost:7439/nginx

The kuik controllers keep track of how many pods use a given image. When an image isn't used anymore, it is flagged for deletion, and removed one month later. This expiration delay can be configured. You can see kuik's view of your images by looking at the CachedImages custom resource:

$ kubectl get cachedimages
NAME                                                       CACHED   EXPIRES AT             PODS COUNT   AGE
docker.io-dockercoins-hasher-v0.1                          true     2023-03-07T10:50:14Z                36m
docker.io-factoriotools-factorio-1.1                       true                            1            4m1s
docker.io-jpetazzo-shpod-latest                            true     2023-03-07T10:53:57Z                9m18s
docker.io-library-nginx-latest                             true                            3            36m
ghcr.io-jpetazzo-shpod-latest                              true                            1            36m
nvcr.io-nvidia-k8s-container-toolkit-v1.12.0-ubuntu20.04   true                            1            29m
registrish.s3.amazonaws.com-alpine-latest                                                  1            35m

Architecture and components

In kuik's namespace, you will find:

a Deployment to run kuik's controllers,
a DaemonSet to run kuik's image proxy,
a StatefulSet to run kuik's image cache, a Deployment is used instead when this component runs in HA mode.

The image cache will obviously require a bit of disk space to run (see Garbage collection and limitations below). Otherwise, kuik's components are fairly lightweight in terms of compute resources. This shows CPU and RAM usage with the default setup, featuring two controllers in HA mode:

$ kubectl top pods
NAME                                             CPU(cores)   MEMORY(bytes)
kube-image-keeper-0                              1m           86Mi
kube-image-keeper-controllers-5b5cc9fcc6-bv6cp   1m           16Mi
kube-image-keeper-controllers-5b5cc9fcc6-tjl7t   3m           24Mi
kube-image-keeper-proxy-54lzk                    1m           19Mi

Metrics

Refer to the dedicated documentation.

Installation

Make sure that you have cert-manager installed. If not, check its installation page (it's fine to use the kubectl apply one-liner, and no further configuration is required).
Install kuik's Helm chart from our charts repository:

helm upgrade --install \
     --create-namespace --namespace kuik-system \
     kube-image-keeper kube-image-keeper \
     --repo https://charts.enix.io/

That's it!

Our container images are available across multiple registries for reliability. You can find them on Github Container Registry, Quay and DockerHub.

CAUTION: If you use a storage backend that runs in the same cluster as kuik but in a different namespace, be sure to filter the storage backend's pods. Failure to do so may lead to interdependency issues, making it impossible to start both kuik and its storage backend if either encounters an issue.

Installation with plain YAML files

You can use Helm to generate plain YAML files and then deploy these YAML files with kubectl apply or whatever you want:

helm template --namespace kuik-system \
     kube-image-keeper kube-image-keeper \
     --repo https://charts.enix.io/ \
     > /tmp/kuik.yaml
kubectl create namespace kuik-system
kubectl apply -f /tmp/kuik.yaml --namespace kuik-system

Configuration and customization

If you want to change e.g. the expiration delay, the port number used by the proxy, enable persistence (with a PVC) for the registry cache... You can do that with standard Helm values.

You can see the full list of parameters (along with their meaning and default values) in the chart's values.yaml file, or on kuik's page on the Artifact Hub.

For instance, to extend the expiration delay to 3 months (90 days), you can deploy kuik like this:

helm upgrade --install \
     --create-namespace --namespace kuik-system \
     kube-image-keeper kube-image-keeper \
     --repo https://charts.enix.io/ \
     --set cachedImagesExpiryDelay=90

Advanced usage

Pod filtering

There are 3 ways to tell kuik which pods it should manage (or, conversely, which ones it should ignore).

If a pod has the label kube-image-keeper.enix.io/image-caching-policy=ignore, kuik will ignore the pod (it will not rewrite its image references).
If a pod is in an ignored Namespace, it will also be ignored. Namespaces can be ignored by setting the Helm value controllers.webhook.ignoredNamespaces (kube-system and the kuik namespace will be ignored whatever the value of this parameter). (Note: this feature relies on the NamespaceDefaultLabelName feature gate to work.)
Finally, kuik will only work on pods matching a specific selector. By default, the selector is empty, which means "match all the pods". The selector can be set with the Helm value controllers.webhook.objectSelector.matchExpressions.

This logic isn't implemented by the kuik controllers or webhook directly, but through Kubernetes' standard webhook object selectors. In other words, these parameters end up in the MutatingWebhookConfiguration template to filter which pods get presented to kuik's webhook. When the webhook rewrites the images for a pod, it adds a label to that pod, and the kuik controllers then rely on that label to know which CachedImages resources to create.

Keep in mind that kuik will ignore pods scheduled into its own namespace or in the kube-system namespace as recommended in the kubernetes documentation (Avoiding operating on the kube-system namespace).

It is recommended to exclude the namespace where your webhook is running with a namespaceSelector. [...] Accidentally mutating or rejecting requests in the kube-system namespace may cause the control plane components to stop functioning or introduce unknown behavior.

Image pull policy

In the case of a container configured with imagePullPolicy: Never, the container will always be filtered out as it makes no sense to cache an image that would never be cached and always read from the disk.

In the case of a container configured with imagePullPolicy: Always, or with the tag latest, or with no tag (defaulting to latest), by default, the container will be filtered out in order to keep the default behavior of kubernetes, which is to always pull the new version of the image (thus not using the cache of kuik). This can be disabled by setting the value controllers.webhook.ignorePullPolicyAlways to false.

Cache persistence

Persistence is disabled by default. You can enable it by setting the Helm value registry.persistence.enabled=true. This will create a PersistentVolumeClaim with a default size of 20 GiB. You can change that size by setting the value registry.persistence.size. Keep in mind that enabling persistence isn't enough to provide high availability of the registry! If you want kuik to be highly available, please refer to the high availability guide.

Note that persistence requires your cluster to have some PersistentVolumes. If you don't have PersistentVolumes, kuik's registry Pod will remain Pending and your images won't be cached (but they will still be served transparently by kuik's image proxy).

Retain policy

Sometimes, you want images to stay cached even when they are not used anymore (for instance when you run a workload for a fixed amount of time, stop it, and run it again later). You can choose to prevent CachedImages from expiring by manually setting the spec.retain flag to true like shown below:

apiVersion: kuik.enix.io/v1alpha1
kind: CachedImage
metadata:
  name: docker.io-library-nginx-1.25
spec:
  retain: true # here
  sourceImage: nginx:1.25

Multi-arch cluster / Non-amd64 architectures

By default, kuik only caches the amd64 variant of an image. To cache more/other architectures, you need to set the architectures field in your helm values.

Example:

architectures: [amd64, arm]

Kuik will only cache available architectures for an image, but will not crash if the architecture doesn't exist.

No manual action is required when migrating an amd64-only cluster from v1.3.0 to v1.4.0.

Corporate proxy

To configure kuik to work behind a corporate proxy, you can set the well known http_proxy and https_proxy environment variables (upper and lowercase variant both works) through helm values proxy.env and controllers.env like shown below:

controllers:
  env:
    - name: http_proxy
      value: https://proxy.mycompany.org:3128
    - name: https_proxy
      value: https://proxy.mycompany.org:3128
proxy:
  env:
    - name: http_proxy
      value: https://proxy.mycompany.org:3128
    - name: https_proxy
      value: https://proxy.mycompany.org:3128

Be careful that both the proxy and the controllers need to access the kubernetes API, so you might need to define the no_proxy variable as well to ignore the kubernetes API in case it is not reachable from your proxy (which is true most of the time).

Insecure registries & self-signed certificates

In some cases, you may want to use images from self-hosted registries that are insecure (without TLS or with an invalid certificate for instance) or using a self-signed certificate. By default, kuik will not allow to cache images from those registries for security reasons, even though you configured your container runtime (e.g. Docker, containerd) to do so. However you can choose to trust a list of insecure registries to pull from using the helm value insecureRegistries. If you use a self-signed certificate you can store the root certificate authority in a secret and reference it with the helm value rootCertificateAuthorities. Here is an example of the use of those two values:

insecureRegistries:
  - http://some-registry.com
  - https://some-other-registry.com

rootCertificateAuthorities:
  secretName: some-secret
  keys:
    - root.pem

You can of course use as many insecure registries or root certificate authorities as you want. In the case of a self-signed certificate, you can either use the insecureRegistries or the rootCertificateAuthorities value, but trusting the root certificate will always be more secure than allowing insecure registries.

Registry UI

For debugging reasons, it may be useful to be able to access the registry through an UI. This can be achieved by enabling the registry UI with the value registryUI.enabled=true. The UI will not be publicly available through an ingress, you will need to open a port-forward from port 80. You can set a custom username and password with values registryUI.auth.username (default is admin) and registryUI.auth.password (empty by default).

Garbage collection and limitations

When a CachedImage expires because it is not used anymore by the cluster, the image is deleted from the registry. However, since kuik uses Docker's registry, this only deletes reference files like tags. It doesn't delete blobs, which account for most of the used disk space. Garbage collection allows removing those blobs and free up space. The garbage collecting job can be configured to run thanks to the registry.garbageCollectionSchedule configuration in a cron-like format. It is disabled by default, because running garbage collection without persistence would just wipe out the cache registry.

Garbage collection can only run when the registry is read-only (or stopped), otherwise image corruption may happen. (This is described in the registry documentation.) Before running garbage collection, kuik stops the registry. During that time, all image pulls are automatically proxified to the source registry so that garbage collection is mostly transparent for cluster nodes.

Reminder: since garbage collection recreates the cache registry pod, if you run garbage collection without persistence, this will wipe out the cache registry. It is not recommended for production setups!

Currently, if the cache gets deleted, the status.isCached field of CachedImages isn't updated automatically, which means that kubectl get cachedimages will incorrectly report that images are cached. However, you can trigger a controller reconciliation with the following command, which will pull all images again:

kubectl annotate cachedimages --all --overwrite "timestamp=$(date +%s)"

Known issues

Conflicts with other mutating webhooks

Kuik's core functionality intercepts pod creation events to modify the definition of container images, facilitating image caching. However, some Kubernetes operators create pods autonomously and don't expect modifications to the image definitions (for example cloudnative-pg), the unexpected rewriting of the pod.specs.containers.image field can lead to inifinite reconciliation loop because the operator's expected target container image will be endlessly rewritten by the kuik MutatingWebhookConfiguration. In that case, you may want to disable kuik for specific pods using the following Helm values:

controllers:
  webhook:
    objectSelector:
      matchExpressions:
        - key: cnpg.io/podRole
          operator: NotIn
          values:
            - instance

Private images are a bit less private

Imagine the following scenario:

pods A and B use a private image, example.com/myimage:latest
pod A correctly references `imagePullSecrets, but pod B does not

On a normal Kubernetes cluster (without kuik), if pods A and B are on the same node, then pod B will run correctly, even though it doesn't reference imagePullSecrets, because the image gets pulled when starting pod A, and once it's available on the node, any other pod can use it. However, if pods A and B are on different nodes, pod B won't start, because it won't be able to pull the private image. Some folks may use that to segregate sensitive image to specific nodes using a combination of taints, tolerations, or node selectors.

Howevever, when using kuik, once an image has been pulled and stored in kuik's registry, it becomes available for any node on the cluster. This means that using taints, tolerations, etc. to limit sensitive images to specific nodes won't work anymore.

Cluster autoscaling delays

With kuik, all image pulls (except in the namespaces excluded from kuik) go through kuik's registry proxy, which runs on each node thanks to a DaemonSet. When a node gets added to a Kubernetes cluster (for instance, by the cluster autoscaler), a kuik registry proxy Pod gets scheduled on that node, but it will take a brief moment to start. During that time, all other image pulls will fail. Thanks to Kubernetes automatic retry mechanisms, they will eventually succeed, but on new nodes, you may see Pods in ErrImagePull or ImagePullBackOff status for a minute before everything works correctly. If you are using cluster autoscaling and try to achieve very fast scale-up times, this is something that you might want to keep in mind.

Garbage collection issue

We use Docker Distribution in Kuik, along with the integrated garbage collection tool. There is a bug that occurs when untagged images are pushed into the registry, causing it to crash. It's possible to end up in a situation where the registry is in read-only mode and becomes unusable. Until a permanent solution is found, we advise keeping the value registry.garbageCollection.deleteUntagged set to false.

Images with digest

As of today, there is no way to manage container images based on a digest. The rational behind this limitation is that a digest is an image manifest hash, and the manifest contains the registry URL associated with the image. Thus, pushing the image to another registry (our cache registry) changes its digest and as a consequence, it is not anymore referenced by its original digest. Digest validation prevent from pushing a manifest with an invalid digest. Therefore, we currently ignore all images based on a digest, those images will not be rewritten nor put in cache to prevent malfunctionning of kuik.

kube-image-keeper's People

Contributors

Stargazers

Watchers

kube-image-keeper's Issues

ImagePullBackOff error on Graviton Node in EKS

I have a mixed node cluster (linux intel /linux graviton / windows)
The container doesn't support windows so I've excluded the namespaces for that and set the node selector to linux only.
The intel nodes run fine on bottlerocket and amazon linux (the generic aws images).
On the ARM node the kube-image-keeper pod starts and all seems fine (logs attached); but none of the images load.

OS: Bottlerocket OS 1.16.1 (aws-k8s-1.26)
EKS 1.26
App: quay.io/enix/kube-image-keeper:1.4.0

Name:         grafana-k8s-monitoring-prometheus-node-exporter-b6kcq
Namespace:    monitoring
Priority:     0
Node:         ip-10-22-1-49.eu-west-2.compute.internal/10.22.1.49
Start Time:   Thu, 14 Dec 2023 12:11:27 +0000
Labels:       app.kubernetes.io/component=metrics
              app.kubernetes.io/instance=grafana-k8s-monitoring
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=prometheus-node-exporter
              app.kubernetes.io/part-of=prometheus-node-exporter
              app.kubernetes.io/version=1.7.0
              controller-revision-hash=5b6c77bd
              helm.sh/chart=prometheus-node-exporter-4.24.0
              kuik.enix.io/images-rewritten=true
              pod-template-generation=3
Annotations:  cattle.io/timestamp: 2023-12-14T12:11:27Z
              cluster-autoscaler.kubernetes.io/safe-to-evict: true
              original-image-node-exporter: quay.io/prometheus/node-exporter:v1.7.0
Status:       Pending
IP:           10.22.1.49
IPs:
  IP:           10.22.1.49
Controlled By:  DaemonSet/grafana-k8s-monitoring-prometheus-node-exporter
Containers:
  node-exporter:
    Container ID:  
    Image:         localhost:7439/quay.io/prometheus/node-exporter:v1.7.0
    Image ID:      
    Port:          9100/TCP
    Host Port:     9100/TCP
    Args:
      --path.procfs=/host/proc
      --path.sysfs=/host/sys
      --path.rootfs=/host/root
      --path.udev.data=/host/root/run/udev/data
      --web.listen-address=[$(HOST_IP)]:9100
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      HOST_IP:  0.0.0.0
    Mounts:
      /host/proc from proc (ro)
      /host/root from root (ro)
      /host/sys from sys (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  
  sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  
  root:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  
QoS Class:         BestEffort
Node-Selectors:    kubernetes.io/os=linux
Tolerations:       :NoSchedule op=Exists
                   node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                   node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                   node.kubernetes.io/not-ready:NoExecute op=Exists
                   node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                   node.kubernetes.io/unreachable:NoExecute op=Exists
                   node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason   Age                     From     Message
  ----    ------   ----                    ----     -------
  Normal  BackOff  4m25s (x510 over 119m)  kubelet  Back-off pulling image "localhost:7439/quay.io/prometheus/node-exporter:v1.7.0"

Error on reconcile when source image was deleted

I believe the following are the steps to reproduce:

Run a pod with image X, to completion
Delete the source image
Restart kuik's pods (persistence is turned on, pvc attached)
Run kubectl annotate cachedimages --all --overwrite "timestamp=$(date +%s)"
Controller repeatedly fails with the following:

2023-04-04T05:34:25.004Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": ""}
2023-04-04T05:34:25.004Z	INFO	controller-runtime.manager.controller.cachedimage	caching image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:27.790Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-6qklp-jf98p", "namespace": "env0-agent-pr11529"}
2023-04-04T05:34:27.801Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-6qklp-jf98p", "namespace": "env0-agent-pr11529", "cachedImage": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:27.801Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-6qklp-jf98p", "namespace": "env0-agent-pr11529"}
2023-04-04T05:34:30.490Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-jts2r-7fcsg", "namespace": "env0-agent-pr11529"}
2023-04-04T05:34:30.504Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-jts2r-7fcsg", "namespace": "env0-agent-pr11529", "cachedImage": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:30.504Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "env0-jts2r-7fcsg", "namespace": "env0-agent-pr11529"}
2023-04-04T05:34:37.714Z	INFO	controller-runtime.manager.controller.cachedimage	image cached	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.721Z	INFO	controller-runtime.manager.controller.cachedimage	reconciled cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.721Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": ""}
2023-04-04T05:34:37.721Z	INFO	controller-runtime.manager.controller.cachedimage	caching image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.750Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.754Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": ""}
2023-04-04T05:34:37.754Z	INFO	controller-runtime.manager.controller.cachedimage	caching image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.765Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.771Z	INFO	controller-runtime.manager.controller.cachedimage	reconciled cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.771Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": ""}
2023-04-04T05:34:37.771Z	INFO	controller-runtime.manager.controller.cachedimage	caching image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.784Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:37.789Z	INFO	controller-runtime.manager.controller.cachedimage	reconciled cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-532aabb", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-532aabb"}
2023-04-04T05:34:39.162Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-775f533", "namespace": ""}
2023-04-04T05:34:39.163Z	INFO	controller-runtime.manager.controller.cachedimage	caching image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-775f533", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-775f533"}
2023-04-04T05:34:39.416Z	ERROR	controller-runtime.manager.controller.cachedimage	failed to cache image	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "ghcr.io-env0-deployment-service-pr11529-sha-775f533", "namespace": "", "sourceImage": "ghcr.io/env0/deployment-service:pr11529-sha-775f533", "error": "could not find source image"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99

This also causes a noticeable memory increase

S3 support for storage

Hi!

Thanks for this great project :)
Do you think it could be possible to use S3 as a storage backend instead of a PVC ?

ImagePullSecrets from ServiceAccounts are not considered for CachedImage

Hi,

first of all, thank you for this great tool.

Currently kuik uses the imagePullSecrets: list from the pod spec in order to fill the imagePullSecrets in the CachedImage.
Unfortunately it is also possible to specify imagePullSecrets attached to the serviceAccount which gets attached to the pod.

example pod:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    ...
  labels:
    ...
  name: xyz
  namespace: default
spec:
  containers:
  - name: first
    ...
  serviceAccount: pod-sa
  serviceAccountName: pod-sa
  ...

example pod-sa:

apiVersion: v1
imagePullSecrets:
- name: my-pull-secret
kind: ServiceAccount
metadata:
  name: pod-sa
  namespace: default
  ...

Unfortunately different external helm charts we have to use, use this feature to define the imagePullSecrets.

It would be great if kuik would also read the imagePullSecrets from defined ServiceAccounts.

Infinite loop of caching failed on some images and all following new images

Hi , i'm currently running kuik 1.20 with s3 backend and noticed several imaged failed to be cached with the following event:

│ Events:                                                                                                                            │
│   Type     Reason       Age                     From                    Message                                                    │
│   ----     ------       ----                    ----                    -------                                                    │
│   Normal   Caching      6m11s (x300 over 3d9h)  cachedimage-controller  Start caching image docker.io/bitnami/redis:6.2.10-debian-11-r13                                                                                                                             │
│   Warning  CacheFailed  6m9s (x300 over 3d9h)   cachedimage-controller  Failed to cache image docker.io/bitnami/redis:6.2.10-debian-11-r13, reason: POST http://kube-image-keeper-registry:5000/v2/docker.io/bitnami/redis/blobs/uploads/: unexpected status code 405 Method Not Allowed: Method not allowed

Restarting the controller or registry pod doesn't seem to fix it. I've also tried deleting the cached image but same result.

The only workaround that seems to solve it was to uninstall kuik, delete the s3 bucket, and reinstall it. But soon when an image fail to be cached, all new image can no longer be cached.

Proxy pod takes 30+ seconds to start

Proxy takes quite long to start (~34 seconds) presumably because of big number of CRDs installed in the cluster. During this time no pod can start on the node and fails with error:

Failed to pull image "localhost:7439/gitlab/gitlab-runner-helper:arm64-latest": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/gitlab/gitlab-runner-helper:arm64-latest": failed to resolve reference "localhost:7439/gitlab/gitlab-runner-helper:arm64-latest": failed to do request: Head "http://localhost:7439/v2/gitlab/gitlab-runner-helper/manifests/arm64-latest": dial tcp [::1]:7439: connect: connection refused

Proxy logs:

$ kubectl -n kube-image-keeper logs kube-image-keeper-proxy-wd8lb
I0609 14:50:40.754613       1 main.go:41] using in-cluster configuration
I0609 14:50:40.754796       1 main.go:48] starting 
I0609 14:50:40.956686       1 request.go:597] Waited for 185.529955ms due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/sfn.aws.crossplane.io/v1alpha1?timeout=32s
<cut>
I0609 14:51:15.156560       1 request.go:597] Waited for 34.383470032s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/backup.aws.upbound.io/v1beta1?timeout=32s
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] POST   /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] PUT    /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] PATCH  /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] HEAD   /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] OPTIONS /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] DELETE /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] CONNECT /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] TRACE  /v2/*catch-all            --> github.com/enix/kube-image-keeper/internal/proxy.(*Proxy).Serve.func2 (5 handlers)
[GIN-debug] [WARNING] You trusted all proxies, this is NOT safe. We recommend you to set a value.
Please check https://pkg.go.dev/github.com/gin-gonic/gin#readme-don-t-trust-all-proxies for details.
[GIN-debug] Listening and serving HTTP on :8082

We have more than 1k CRDs:

$ kubectl get crd | wc -l
1138

Is there any way to speed up proxy start somehow?

No support of kubernetes.io/dockercfg secret type

When using kube-image-keeper with GitLab Kubernetes Runner, the latter fails to pull any image with the error in corresponding kube-image-keeper proxy pod:

I0523 14:22:43.455311       1 server.go:113] "proxying request" repository="gitlab/gitlab-runner-helper" originRegistry="public.ecr.aws"
I0523 14:22:43.460211       1 server.go:116] "cached image is not available, proxying origin" originRegistry="public.ecr.aws" error="404 Not Found"
I0523 14:22:43.460239       1 server.go:200] "listing CachedImages" repositoryLabel="public.ecr.aws-gitlab-gitlab-runner-helper"
[GIN] 2023/05/23 - 14:22:43 | 401 |   30.418805ms |    10.23.24.186 | HEAD     "/v2/public.ecr.aws/gitlab/gitlab-runner-helper/manifests/x86_64-latest"
Error #01: invalid secret: missing .dockerconfigjson key

The root cause might be that GitLab runner uses kubernetes.io/dockercfg secret type of imagePullSecret for it's runner pods which has .dockercfg key in it instead of .dockerconfigjson.

Would it be possible to add support of kubernetes.io/dockercfg secret type?

What is kuik pod finalizer function?

Hi,

Finalizers on pods sound a bit scary from an operational standpoint, as they could prevent (very) large number of pods to get terminated in the event of the underlying controller being unavailable or removed.
While considering kube-image-keeper for our production environments, I was hoping you could shed some light on the function of pod.kuik.enix.io/finalizer in kuik architecture.

Thanks again for sharing and maintaining this very promising project!

Garbage collector error

Hello,

We face a problem with the garbage collector which prevents the registry to start:

Log of the GC container:
failed to garbage collect: failed to mark: filesystem: filesystem: failed to retrieve tags unknown repository name=xxxxx

The failling GC container prevents the registry container to start.

Image caching fails when multiple imagePullSecrets for the same registry

Hello,

I found the following issue when evaluating kube-image-keeper 1.4.0 (and 1.5.0-beta.1):

if a cachedimage has multiple imagePullSecrets for the same registry, only the first imagePullsSecret is tried and the image may fail to be pull, even if another imagePullSecret contains working credentials.

Let me give some context.
At our organization we host our docker images on the gitlab.com managed registry.

The structure of our organization docker registry looks like this:

$ tree 
.
└── registry.gitlab.com
    └── myorganization
        ├── clients
        │   ├── myclient1
        │   ├── myclient2
        │   └── myclient3
        └── platform
            └── docker

8 directories, 0 files

To prevent myclient2 from accessing the docker images from myclient1, we create group access tokens to give access to docker images hosted under each client, plus another group access token to give access to myorganization/platform/docker .
Those group access tokens are used in imagePullSecrets in the cluster.

Note : all these group access tokens use the same registry (registry.gitlab.com) but give access to different subsets of the registry.

Here is how the default service account look like on our cluster:

`$ kubectl get sa default -o yaml
apiVersion: v1
imagePullSecrets:
- name: docker-client-all     # this secret gives access to registry.gitlab.com/myorganization/clients/*
- name: docker-allinone      # and this one gives access to registry.gitlab.com/myorganization/platform/docker/*
kind: ServiceAccount
metadata:
  labels:
    name: default
  name: default
  namespace: applications
secrets:
- name: default-token-zzng2

Now the symptoms of the bug.

Kube-image-keeper v1.4.0 is installed on a kubernetes 1.26.6 cluster using the helm chart (slightly modified).

In the application namespaces:

all pods which have an image in registry.gitlab.com/myorganization/platform/docker are stuck in ImagePullBackoff
all pods which have an image in registry.gitlab.com/myorganization/clients/myclient pull their image from the kuik registry and become Running

Here is how the kubectl describe of the broken cachedimage:

$ kubectl describe cachedimage registry.gitlab.com-myorganization-platform-docker-myapp-2.2.2
Name:         registry.gitlab.com-myorganization-platform-docker-myapp-2.2.2
Namespace:    
Labels:       kuik.enix.io/repository=registry.gitlab.com-myorganization-platform-docker-myapp
Annotations:  <none>
API Version:  kuik.enix.io/v1alpha1
Kind:         CachedImage
Metadata:
  Creation Timestamp:  2023-12-19T14:42:32Z
  Finalizers:
    cachedimage.kuik.enix.io/finalizer
  .... # snip
  Resource Version:  120137242
  UID:               07b31cca-715f-45a9-8055-ef5eb7859ecb
Spec:
  Pull Secret Names:
    docker-client-all
    docker-allinone
  Pull Secrets Namespace:  applications
  Source Image:            registry.gitlab.com/myorganization/platform/docker/myapp:2.2.2
Status:
  Used By:
    Count:  1
    Pods:
      Namespaced Name:  applications/myapp-68dc7944b9-pt4wv
Events:
  Type     Reason       Age                  From                    Message
  ----     ------       ----                 ----                    -------
  Warning  CacheFailed  11m (x12 over 12m)   cachedimage-controller  Failed to cache image registry.gitlab.com/myorganization/platform/docker/myapp:2.2.2, reason: GET https://registry.gitlab.com/v2/myorganization/platform/docker/myapp/manifests/2.2.2: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:myorganization/platform/docker/myapp ProjectPath: Type:repository]]
  Normal   Caching      2m7s (x22 over 12m)  cachedimage-controller  Start caching image registry.gitlab.com/myorganization/platform/docker/myapp:2.2.2

My theory is that the controller tried to pull the docker image using the first imagePullSecret (docker-client-all) and gave up. If the controller had fall back to trying the second imagePullSecret (docker-allinone) then it could have cached the image successfully.

If I manually edit the cachedimage to remove the first imagePullSecret (docker-client-all) then kuik successfully pulls the docker image and the myapp pods become Running.

ImagePullSecret discovery : Wilcard never match

Using private registry, ex xxx.jfrog.io
The secret used to pull images from this repos is formated as below

.dockerconfigjson: '{"auths": {"*.jfrog.io": {"username":"mylogin","password":"xxx","email":"foo@bar"}}}'

the wildcard not seems to be used to discover the correct credential to use.

If a change the wildcard by the complete registry hostname ex : demo-api.jfrog.io, all works fine.

AWS ECR auth doesn't work while proxy on cache miss

When image is not found in cache ("cached image is not available, proxying origin"), proxy fails to auth correctly to ECR.
E0804 13:50:46.022737 1 server.go:159] could not proxy registry: invalid character 'N' looking for beginning of value
The response from ECR is "Not Authorized" while proxy tries to parse JSON.

The problem is only during direct proxy to ECR on cache-miss, at the same time ECR auth works correctly for image caching (it is eventually cached)

Image Verification before storing into cache

First of all, kudos to Team. Very good project.

I think adding image verification step before storing the images, by leveraging new k8s & CRI runtime capabilities will add good value from security standpoint (software supply chain security).

Reference:
https://kubernetes.io/blog/2023/06/29/container-image-signature-verification/

Cache failed for private Harbor registry

Hi,

I installed latest (1.3.0) kube-image-keeper Helm chart with basically default values.yaml except I added my own expressions for the webhook selector.
I tested the example from the repository and everything works fine. The images from docker and quay are said to be cached.

But for the deployments that use images from my private Harbor image registry, let's call it https://harbor.a.b.c things are not working. My deployments use imagePullSecrets to pull images from there.

From the events I see things like this:

Warning  CacheFailed  16m (x12 over 16m)  cachedimage-controller  
Failed to cache image harbor.a.b.c/project/image:tag, reason: could not find source image

Warning  Failed           19m (x4 over 20m)   kubelet            
Failed to pull image "localhost:7439/harbor.a.b.c/project/image:tag": 
	rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/harbor.a.b.c/project/image:tag": 
	failed to resolve reference "localhost:7439/harbor.a.b.c/project/image:tag": 
	pulling from host localhost:7439 failed with status code [manifests tag]: 401 Unauthorized

and regarding kube-image-keeper logs I see things like this:

controller:
2023-10-06T12:30:07.109Z        ERROR   controller.cachedimage  Reconciler error        
{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "harbor.a.b.c/project/image:tag", 
"namespace": "", "error": "could not find source image"}

proxy:
Error #01: Get "https://harbor.a.b.c/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority

In the Helm chart I don't see any special options regarding private registries, certificates, secrets, etc.
What should I do and is this supported?
Also if you need more information please say so.

Thanks in advance!

Error while proxying to origin for image with dash

For ECR image name with dash in hostname part, for example
"541517799999.dkr.ecr.us-east-2.amazonaws.com/foo/bar:tag"

proxy fails with error:

I0726 20:43:12.023260 1 server.go:138] "proxying request" repository="foo/bar" originRegistry="541517799999.dkr.ecr.us-east:2.amazonaws.com"
I0726 20:43:12.281235 1 server.go:141] "cached image is not available, proxying origin" originRegistry="541517799999.dkr.ecr.us-east:2.amazonaws.com" error="307 Temporary Redirect"

It breaks hostname part
541517799999.dkr.ecr.us-east:2.amazonaws.com
it should be
541517799999.dkr.ecr.us-east-2.amazonaws.com

Tested on 1.3.0-beta.1

Digest invalid when explicitly using sha256 tagged images

Seeing a ton of uncached images all relating to ones I have manually on a particular sha256.

These are the logs I am seeing:

time="2023-04-14T00:08:40.017201297Z" level=error msg="response completed with error" err.code="digest invalid" err.message="provided digest did not match uploaded content" go.version=go1.16.15 http.request.contenttype="application/vnd.oci.image.manifest.v1+json" http.request.host="kube-image-keeper-service:5000" http.request.id=f58ef57a-f2af-4aa5-ad11-0ee8f0483ac6 http.request.method=PUT http.request.remoteaddr="10.244.3.244:54752" http.request.uri="/v2/ghcr.io/authelia/authelia/manifests/sha256:a9544d198f430812cf419c7c3fe80c64af22473c84782b8815a053ef35aa0791" http.request.useragent="go-containerregistry/v0.6.0" http.response.contenttype="application/json; charset=utf-8" http.response.duration="891.682µs" http.response.status=400 http.response.written=98 vars.name="ghcr.io/authelia/authelia" vars.reference="sha256:a9544d198f430812cf419c7c3fe80c64af22473c84782b8815a053ef35aa0791"

  Type     Reason       Age                  From                    Message
  ----     ------       ----                 ----                    -------
  Normal   Caching      10m (x32 over 4h9m)  cachedimage-controller  Start caching image ghcr.io/authelia/authelia:master@sha256:a9544d198f430812cf419c7c3fe80c64af22473c84782b8815a053ef35aa0791
  Warning  CacheFailed  10m (x32 over 4h8m)  cachedimage-controller  Failed to cache image ghcr.io/authelia/authelia:master@sha256:a9544d198f430812cf419c7c3fe80c64af22473c84782b8815a053ef35aa0791, reason: PUT http://kube-image-keeper-service:5000/v2/ghcr.io/authelia/authelia/manifests/sha256:a9544d198f430812cf419c7c3fe80c64af22473c84782b8815a053ef35aa0791: DIGEST_INVALID: provided digest did not match uploaded content

The webhook certificate secret is not named correctly in the Helm chart

Inside the Helm chart here: https://artifacthub.io/packages/helm/enix/kube-image-keeper?modal=template&template=webhook-certificate.yaml

It seems that the secret referenced by the certificate is supposed to be kube-image-keeper-webhook-server-cert, however, that is not the case.

It is created with just the name webhook-server-cert and I found this out as I deployed another helm chart which also uses that name, and they fought with eachother.

I tried to set the fullnameOverride (https://artifacthub.io/packages/helm/enix/kube-image-keeper?modal=template&template=_helpers.tpl) but that did not help either. Something is up with the template.

Doesn't appear to be working - TLS bad certificate

I just deployed this into my kubernetes cluster that is running k3s 1.26.1 and calico cni 3.25 with ebpf/dsr

Helm values

    controllers:
      image:
        repository: quay.io/enix/kube-image-keeper
    proxy:
      image:
        repository: quay.io/enix/kube-image-keeper
    registry:
      image:
        repository: public.ecr.aws/docker/library/registry
      persistence:
        enabled: true
        storageClass: ceph-filesystem
        size: 20Gi

Resources

❯ k get po -n kube-system | rg kube-
kube-image-keeper-0                                     1/1     Running   0               39m
kube-image-keeper-controllers-6ddc99bfb9-7c2t4          1/1     Running   0               39m
kube-image-keeper-controllers-6ddc99bfb9-vrv4h          1/1     Running   0               39m
kube-image-keeper-proxy-8hszk                           1/1     Running   0               39m
kube-image-keeper-proxy-9p2cw                           1/1     Running   0               39m
kube-image-keeper-proxy-gzdvx                           1/1     Running   0               39m
kube-image-keeper-proxy-l64nh                           1/1     Running   0               39m
kube-image-keeper-proxy-ntrt8                           1/1     Running   0               39m
kube-image-keeper-proxy-r62cf                           1/1     Running   0               39m

❯ k get cachedimages -A
No resources found

❯ k get certificates -A
NAMESPACE               NAME                                     READY   SECRET                                   AGE
kube-system             kuik-serving-cert                        True    webhook-server-cert

Logs

I notice when I try to restart any pod or deploy a new one these errors happen..

kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:05:07.813Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:05:07.814Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:06:15.900Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:06:15.901Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:06:47 http: TLS handshake error from 192.168.42.12:34158: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:07:32.832Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:07:32.832Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:07:50 http: TLS handshake error from 192.168.42.12:52718: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:08:52.861Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:08:52.862Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:09:58.798Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:09:58.799Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:11:08.860Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:11:08.861Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:12:20.916Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:12:20.917Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:13:22.915Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:13:22.915Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:14:31.927Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:14:31.928Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:15:54.815Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:15:54.816Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:15 http: TLS handshake error from 192.168.42.12:45374: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:15 http: TLS handshake error from 192.168.42.12:45384: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:16 http: TLS handshake error from 192.168.42.12:45390: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:16 http: TLS handshake error from 192.168.42.12:45398: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:17:16.856Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:17:16.856Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:49 http: TLS handshake error from 192.168.42.12:47174: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:17:49 http: TLS handshake error from 192.168.42.12:47176: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:18:19.858Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:18:19.858Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:19:42.904Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:19:42.904Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:20:59.841Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:20:59.842Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023/02/03 14:21:40 http: TLS handshake error from 192.168.42.11:41622: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:21:45.054Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:21:45.054Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:22:15.854Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:22:15.855Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:23:12.017Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:23:12.018Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:23:33.792Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:23:33.793Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023/02/03 14:23:42 http: TLS handshake error from 192.168.42.11:46590: remote error: tls: bad certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:24:21.961Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:24:21.961Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:24:43.812Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:24:43.813Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:25:31.030Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:25:31.031Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:26:10.861Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:26:10.861Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-0 kube-image-keeper time="2023-02-03T14:26:16.098319022Z" level=info msg="PurgeUploads starting: olderThan=2023-01-27 14:26:16.098137541 +0000 UTC m=-603059.989528036, actuallyDelete=true"
kube-image-keeper-0 kube-image-keeper time="2023-02-03T14:26:16.098560456Z" level=info msg="Purge uploads finished.  Num deleted=0, num errors=0"
kube-image-keeper-0 kube-image-keeper time="2023-02-03T14:26:16.098602693Z" level=info msg="Starting upload purge in 24h0m0s" go.version=go1.16.15 instance.id=4ef434ec-57d5-4cda-b732-c77407757e00 service=registry version="v2.8.1+unknown"
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:26:38.059Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-vrv4h cache-manager 2023-02-03T14:26:38.059Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:27:16.814Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
kube-image-keeper-controllers-6ddc99bfb9-7c2t4 cache-manager 2023-02-03T14:27:16.814Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate

Let me know if there's any other information you need.

unexpected status code 401 Unauthorized: Not Authorized

Hi, everyone

Thanks for this great project.

I installed kube-image-keeper on my EKS.
I'm getting a "401 Unauthorized" error deploying Pod with an image in the ECR.

Contents
2023-09-12T11:33:53.701Z INFO controller.pod cachedimage patched {"reconciler group": "", "reconciler kind": "Pod", "name": "PODNAME", "namespace": "ns", "cachedImage": {"name":"MYID.dkr.ecr.ap-northeast-2.amazonaws.com-REPONAME-lastest"}, "sourceImage": "MYID.dkr.ecr.ap-northeast-2.amazonaws.com/REPONAME:lastest"}
2023-09-12T11:33:53.710Z ERROR controller.cachedimage failed to cache image {"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "MYID.dkr.ecr.ap-northeast-2.amazonaws.com-REPONAME-lastest", "namespace": "", "sourceImage": "MYID.dkr.ecr.ap-northeast-2.amazonaws.com/REPONAME:lastest", "error": "GET https://MYID.dkr.ecr.ap-northeast-2.amazonaws.com/v2/REPONAME/manifests/lastest: unexpected status code 401 Unauthorized: Not Authorized\n"}
2023-09-12T11:33:53.710Z ERROR controller.cachedimage Reconciler error {"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "MYID.dkr.ecr.ap-northeast-2.amazonaws.com-REPONAME-lastest", "namespace": "", "error": "GET https://MYID.dkr.ecr.ap-northeast-2.amazonaws.com/v2/REPONAME/manifests/lastest: unexpected status code 401 Unauthorized: Not Authorized\n"}

I checked the IAM permissions connected to EKS. It can use ECR.
I looked for issues and various documents, but I couldn't find a solution.

Please tell me how to solve it.
Thank you.

[Question] Best practicies of usage that keeper for multitenancy cluster

Hello!
Can you please tell me about the best practices?
Currently, we have started installing new clusters (multitenancy) on hardware where there are no statefull apps and dedicated storage.
And I would like to understand how you solve these problems?
since image caches on clusters are stateful.

P.S. I serve Gitlab Registry and our k8s engineers that kube-image-keeper is not help for the situation once our Gitlab registry is not helpful once registry is down.

Cheers,
GT

Webhook TLS handshake error

Hi, i'm seeing this log on the controller, and at the same time, image won't be rewritten. What could possible be the cause?

2023/08/18 02:03:43 http: TLS handshake error from 10.251.1.83:60588: remote error: tls: bad certificate

Registry Persistent - PersistentVolumeClaim - Access Modes

It would be useful on deployments where the PVC can support ReadWriteMany (like EFS CSI Driver on EKS) that we can specify the access mode. This will allow us to use a PVC with AWS Native CSI drivers for both S3 and EFS without having to specify the credentials in the deployment.

This will also allow us to tell the deployment that the PVC is HA, where the StorageClass supports this.

MinIO integration doesn't work if release name is not kube-image-keeper

What I do:

Install kuik with MinIO integration, using a release name of kuik, like this:

helm upgrade --install --namespace kuik --create-namespace \
--repo https://charts.enix.io/ kuik kube-image-keeper \
--set minio.enabled=true \
--set minio.auth.existingSecret=minio-root-auth

kubectl create secret generic minio-root-auth --namespace kuik \
--from-literal=root-username=root --from-literal=root-password=very.secret.password

What I expect to see:

kuik starts fine

What I see instead:

The minio provisioning pod fails to start:

$ kubectl describe pod kuik-minio-provisioning-vpzlg | tail
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    7m17s                 default-scheduler  Successfully assigned kuik/kuik-minio-provisioning-vpzlg to scw-cli-k8s-happy-swirles-default-489a13773af9
  Warning  FailedMount  66s (x11 over 7m18s)  kubelet            MountVolume.SetUp failed for volume "users-secret-0" : secret "kube-image-keeper-minio-registry-users" not found
  Warning  FailedMount  66s (x11 over 7m18s)  kubelet            MountVolume.SetUp failed for volume "registry-keys" : secret "kube-image-keeper-s3-registry-keys" not found

And the secrets seem to have an extra kuik- prefix:

$ kubectl get secrets
NAME                                              TYPE                 DATA   AGE
kuik-kube-image-keeper-minio-registry-passwords   Opaque               1      6m39s
kuik-kube-image-keeper-minio-registry-users       Opaque               1      6m39s
kuik-kube-image-keeper-registry-http-secret       Opaque               1      6m39s
kuik-kube-image-keeper-s3-registry-keys           Opaque               2      6m39s
kuik-kube-image-keeper-webhook-server-cert        kubernetes.io/tls    3      13m
minio-root-auth                                   Opaque               2      6m52s
sh.helm.release.v1.kuik.v1                        helm.sh/release.v1   1      13m
sh.helm.release.v1.kuik.v2                        helm.sh/release.v1   1      6m40s

Theory:

I see that the Secret name (defined in minio-registry-users.yaml) builds on {{ include "kube-image-keeper.fullname" . }}.

I don't know for sure where the MinIO provisioning pod gets the Secret name from. Maybe the subchart values, specifically
minio.provisioning.usersExistingSecrets which defaults to [ kube-image-keeper-minio-registry-users ] in the kuik chart default values.

Further notes:

I don't know if this is easily fixable.

Ideal scenario: find a way to inject that secret name into the minio subchart.
Less ideal but still cool scenario: in the kuik chart, don't use kube-image-keeper.fullname; just use a common value between the kuik chart and the minio subchart.
Less less ideal scenario: just document this as a shortcoming (and/or issue a warning in NOTES.txt when using a different release name).

no arm64 proxy image

When using this project in a cluster with arm64 nodes, the kube-image-keeper-proxy pods fail to run on these nodes (wrong exec format).

This would require to build a muti arch image.

PS: awesome project nevertheless !

Registry with PVC should run as a Deployment

When the registry is configured with filesystem persistence, it should not run as a StatefulSet but as a Deployment just like the stateless mode.

In this persistence mode, the registry is not a scalable resource and only one Pod and it's PVC can run. Relying on a StatefulSet is dangerous as this controller makes decisions that do not favor availability, and is incompatible with a pet Pod critical to the cluster.
For example if a Node is shut down or crashes while not being drained, the registry replica will be stuck in the Terminating state and not rescheduled in the cluster, amplifying the incident.

Running as a Deployment with spec.strategy.type: Recreate would allow the registry to quickly recover. The CSI driver will take care of the multi-attachment protection and volume fencing.

Support for Ephemeral Containers

This may be considered as a very low priority. I don't know if any operator makes use of ephemeral containers yet.

Currently kuik will not rewrite image locations for ephemeralContainer:

❯ kubectl run test --image=nginx --restart=Never

❯ kubectl debug -it test --image=busybox --target=test

❯ kubectl get pod test -o jsonpath="{.spec.containers[*].image} ; {.spec.ephemeralContainers[*].image}"
localhost:7439/nginx ; busybox⏎

Upgrading from version 1.2 to 1.3 makes Pods stuck on Terminating

Hey All,

I want to start by saying that we really like this Kuik solution and think you are doing a great job.

Yesterday we upgraded to version 1.3 and started noticing pods that were being deleted are stuck on terminating.

After a little bit of code digging, we noticed that the finalizer was removed from version 1.3. This of course makes the terminating pods stuck since no one is handling the finalizers anymore.

So we delete all the finalizers from old pods, but I think you need to add a migration part in the documentation and maybe bump the version to 2 since there are breaking changes

Thank you

(Or maybe we are completely wrong and missed something)

http: server gate HTTP response to HTTPS client

Hi,

I'm facing the issue "http: server gate HTTP response to HTTPS client" with the deployment on my cluster<
I've already made a successfull deployment on another but this one fails with this kind of error :

1s          Normal    BackOff                  pod/xxx-xxx-documentation-7fdc5f5684-qdxr9       Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
1s          Warning   Failed                   pod/xxx-xxx-documentation-7fdc5f5684-qdxr9       Error: ImagePullBackOff
1s          Warning   Failed                   pod/xxx-xxx-ui-56bd49bc88-gl9pd                  Error: ImagePullBackOff
0s          Normal    Pulling                  pod/xxx-xxx-uploadcache-bbb4c4f46-dj55j          Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-uploadcache-bbb4c4f46-dj55j          Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-uploadcache-bbb4c4f46-dj55j          Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-webserver-cbc4f5597-fvdzf            Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-webserver-cbc4f5597-fvdzf            Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-webserver-cbc4f5597-fvdzf            Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-postgres-init-btsff                  Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres:13-2.15"
0s          Normal    BackOff                  pod/xxx-xxx-cfssl-f579d77c-vtmxx                 Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-cfssl:1.0.15"
0s          Normal    BackOff                  pod/xxx-xxx-rabbitmq-5c4466b794-tzj46            Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-cfssl-f579d77c-vtmxx                 Error: ImagePullBackOff
0s          Warning   Failed                   pod/xxx-xxx-rabbitmq-5c4466b794-tzj46            Error: ImagePullBackOff
0s          Warning   Failed                   pod/xxx-xxx-postgres-init-btsff                  Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres:13-2.15": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres:13-2.15": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres:13-2.15": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres/manifests/13-2.15": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-postgres-init-btsff                  Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-redis-68f67dbcc8-z7c6m               Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-redis:2023.1.2"
0s          Normal    BackOff                  pod/xxx-xxx-storage-774d44ffb4-2gttm             Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Normal    Pulling                  pod/xxx-xxx-matchengine-579969b75d-rbwp8         Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-storage-774d44ffb4-2gttm             Error: ImagePullBackOff
0s          Normal    BackOff                  pod/xxx-xxx-registration-669bff77b4-vqd6z        Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-registration-669bff77b4-vqd6z        Error: ImagePullBackOff
0s          Warning   Failed                   pod/xxx-xxx-matchengine-579969b75d-rbwp8         Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-matchengine-579969b75d-rbwp8         Error: ErrImagePull
0s          Warning   Failed                   pod/xxx-xxx-redis-68f67dbcc8-z7c6m               Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-redis:2023.1.2": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-redis:2023.1.2": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-redis:2023.1.2": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-redis/manifests/2023.1.2": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-redis-68f67dbcc8-z7c6m               Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-scan-859f5d7866-kcrtb                Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-scan-859f5d7866-kcrtb                Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-scan-859f5d7866-kcrtb                Error: ErrImagePull
0s          Normal    BackOff                  pod/xxx-xxx-authentication-5c549bddc8-nb7cm      Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Normal    Pulling                  pod/xxx-xxx-documentation-7fdc5f5684-qdxr9       Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-authentication-5c549bddc8-nb7cm      Error: ImagePullBackOff
0s          Normal    Pulling                  pod/xxx-xxx-ui-56bd49bc88-gl9pd                  Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-ui-56bd49bc88-gl9pd                  Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-ui-56bd49bc88-gl9pd                  Error: ErrImagePull
0s          Warning   Failed                   pod/xxx-xxx-documentation-7fdc5f5684-qdxr9       Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
1s          Warning   Failed                   pod/xxx-xxx-documentation-7fdc5f5684-qdxr9       Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-bomengine-65756f9557-mpv4n           Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-bomengine-65756f9557-mpv4n           Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-bomengine-65756f9557-mpv4n           Error: ErrImagePull
0s          Normal    Pulling                  pod/xxx-xxx-jobrunner-d759c4474-trbw9            Pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-jobrunner-d759c4474-trbw9            Failed to pull image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to resolve reference "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4": failed to do request: Head "https://localhost:7439/v2/docker.io/xxxsoftware/xxx-postgres-waiter/manifests/1.0.4": http: server gave HTTP response to HTTPS client
0s          Warning   Failed                   pod/xxx-xxx-jobrunner-d759c4474-trbw9            Error: ErrImagePull
0s          Normal    BackOff                  pod/xxx-xxx-webapp-logstash-5fbbf6986c-jfd9t     Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-webapp-logstash-5fbbf6986c-jfd9t     Error: ImagePullBackOff
0s          Normal    BackOff                  pod/xxx-xxx-uploadcache-bbb4c4f46-dj55j          Back-off pulling image "localhost:7439/docker.io/xxxsoftware/xxx-postgres-waiter:1.0.4"
0s          Warning   Failed                   pod/xxx-xxx-uploadcache-bbb4c4f46-dj55j          Error: ImagePullBackOff

Here is my helm values file:

installCRD: true
registry:
  persistence:
    enabled: true
    storageClass: gp2
    size: 30Gi

kubectl get cachedimages -A
NAME                                                            CACHED   EXPIRES AT   PODS COUNT   AGE
docker.io-xxxsoftware-xxx-authentication-2023.1.2   true                  1            11m
docker.io-xxxsoftware-xxx-bomengine-2023.1.2        true                  1            11m
docker.io-xxxsoftware-xxx-cfssl-1.0.15              true                  1            11m
docker.io-xxxsoftware-xxx-documentation-2023.1.2    true                  1            11m
docker.io-xxxsoftware-xxx-jobrunner-2023.1.2        true                  1            11m
docker.io-xxxsoftware-xxx-logstash-1.0.26           true                  1            11m
docker.io-xxxsoftware-xxx-matchengine-2023.1.2      true                  1            11m
docker.io-xxxsoftware-xxx-nginx-2.0.31              true                  1            11m
docker.io-xxxsoftware-xxx-postgres-13-2.15          true                  1            11m
docker.io-xxxsoftware-xxx-postgres-waiter-1.0.4     true                  13           11m
docker.io-xxxsoftware-xxx-redis-2023.1.2            true                  1            11m
docker.io-xxxsoftware-xxx-registration-2023.1.2     true                  1            11m
docker.io-xxxsoftware-xxx-scan-2023.1.2             true                  1            11m
docker.io-xxxsoftware-xxx-storage-2023.1.2          true                  1            11m
docker.io-xxxsoftware-xxx-upload-cache-1.0.34       true                  1            11m
docker.io-xxxsoftware-xxx-webapp-2023.1.2           true                  1            11m
docker.io-xxxsoftware-xxx-webui-2023.1.2            true                  1            11m
docker.io-xxxsoftware-rabbitmq-1.2.15                     true                  1            11m

kubectl get all -n kube-image-keeper
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/kube-image-keeper-0                              1/1     Running   0          12m
pod/kube-image-keeper-controllers-56569ffdff-48q2p   1/1     Running   0          12m
pod/kube-image-keeper-controllers-56569ffdff-bkphk   1/1     Running   0          12m
pod/kube-image-keeper-proxy-5fwf9                    1/1     Running   0          12m
pod/kube-image-keeper-proxy-6nnzb                    1/1     Running   0          12m
pod/kube-image-keeper-proxy-lgw4s                    1/1     Running   0          12m

NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/kube-image-keeper-service   ClusterIP   172.20.68.157   <none>        5000/TCP   12m
service/kuik-webhook-service        ClusterIP   172.20.84.15    <none>        443/TCP    12m

NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/kube-image-keeper-proxy   3         3         3       3            3           <none>          12m

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kube-image-keeper-controllers   2/2     2            2           12m

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/kube-image-keeper-controllers-56569ffdff   2         2         2       12m

NAME                                 READY   AGE
statefulset.apps/kube-image-keeper   1/1     12m

NAME                                                 SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/kube-image-keeper-garbage-collection   0 0 * * 0   False     0        <none>          12m

Can you tell me which infos I can gather to help you ?

Thanks for your help and thanks for the great tool ! :)

Support for mutable tags (like `latest`)

Current situation

Kuik saves the current version of an image in its registry and serves this copy as long as it is cached. This is a great feature from an availability perspective and works perfectly for immutable tags.

imagePullPolicy: Never

kuik works as expected (it is not used at all)

imagePullPolicy: IfNotPresent

kuik works as expected for immutable tags
kuik works as expected for mutable tags (no new version ever downloaded)

BUT: IfNotPresent is not useful for mutable tags.

imagePullPolicy: Always

kuik works as expected for immutable tags (as they never ever change again)
kuik does not work as expected for mutable tags (as they are expected to change)
After all :latest is mutable too.

Sometimes mutable tags are needed

As written above mutable tags are useful and sometimes also needed in some environments (used together with imagePullPolicy: Always).
It helps using the latest version of images (e.g. postgres:15) which might contain some relevant security fixes. And pulling that new image is as easy as killing a running pod.

If kuik is installed (and configured to be active for such images) however the situation is different. Once kuik has a cached version of an image it never gets an update (as long as the image is in active use and therefore not garbage collected).

So again: from an availability perspective kuik is great but from a security and also an usability perspective there are some pitfalls.

Examples to illustrate the current situation

If one deploys a statefulset for an image like postgres:15 (PostgreSQL database), kube-image-keeper will cache the postgres:15 image the moment a corresponding pod gets created. This exact image is stored inside the kuik registry.
Now if postgres:15 gets an update, which might be important for security reasons, and a developer tries to upgrade the pods, the cached version will be used and it won't be updated to the newer, security fixed version of postgres:15.
And that person has to watch the log outputs in depth to find out that there was no update.

For mutable tags like :latest the situation can be even worse as an developer assumes imagePullPolicy: Always. But unfortunately the image never gets an update in the future while kube-image-keeper is actively caching that image. This behavior is clearly completely different from the expected default behavior of imagePullPolicy: Always.

Avoiding single point of failures

One could argue that using imagePullPolicy: Always is bad anyways, because it produces a single point of failure (the image registry). But kube-image-keeper is able to solve this SPOF situation.

And therefore I would like to present an idea on how to improve kube-image-keeper's capabilities to solve this.

An idea how to fix those issues

The proxy component of kube-image-keeper could implement a mechanism that checks the upstream registry for updates and also the ability to re-download an update if one is available.
The basic code already seems to be there as kuik checks and might also download an update if one manually deletes the CachedImage object.

Clearly there is a need to still use the already cached (and maybe outdated) version in some cases and this is what makes kuik outstanding:

If the upstream image registry is not reachable. (wisely set timeout needed)
If the image is not available (anymore) in the upstream registry.
If the provided imagePullSecrets are currently not working.

Different update modes might be possible: There could be several update options to decide whether kuik should check for image updates and it might also be possible to configure this during the kuik installation:

Always check for updates
Check for updates if the last check was some configurable minutes, hours or even days ago
Never check for updates (current behavior)

To be clear: Those checks should only be made if there is a request from the container runtime for that image anyway. This should not be a recurring background job. That way kuik does not do needlessly many calls to the upstream image registry.

Similar issue

A similar request was also made to the Kubernetes issue board in kubernetes/kubernetes#111822 . There the author proposed a new imagePullPolicy named IfAvailable which would update an image if the image registry is available and the image itself has an update. Otherwise it would deliver the already present version.
I think that using kube-image-keeper with this enhancements would solve the said problem and also improve the availability even further.

Pull pod docker images with an ECR one in it.

Hi guys,

I would like to use kuik to cache some linkerd proxy container sidecar public images cr.l5d.io/linkerd/proxy. But the image of our APP is hosted on ECR and I have a 401 when the controller is trying to pull from it.

Caching ECR images in kuik doesn't really matters to me the only thing I want is to pull linkerd images from the kuik cache.

Do you have any idea on how I can only fetch some pod images and not the others ?

Port numbers in URLs break image caching

Hi,
We're having problems with a private repo that has an unusual port number, kube-image-keeper seems to mistake the port numbers as part of the hostname and fails to resolve it.

Otherwise Kube Image cacher is working great for public repos and private repos with hostnames without ports.

Our internal repo is on port 5050 but most of the time we get something like this (I've changed the name of the repo for privacy but the real name has the same number of dots and only letters):

cachedimage-controller: Start caching image gitlab.dev.local-5050/devops/internal-wiki:main
cachedimage-controller: Failed to cache image gitlab.dev.local-5050/devops/internal-wiki:main, reason: Get "https://gitlab.dev.local-5050/v2/": dial tcp: lookup gitlab.dev.local-5050 on 172.20.0.10:53: no such host; Get "http://gitlab.dev.local-5050/v2/": dial tcp: lookup gitlab.dev.local-5050 on 172.20.0.10:53: no such host

Unfortunately changing the port number isn't currently an option so it would be great for it to work on arbitrary ports.

Thanks,
Mike

Open port 10250 vulnerabilty lead to remote code execution.

Hi.

I found an IP associated with you on Shodan with a vulnerable Kubernetes port 10250 that is susceptible to Remote Code Execution (RCE). I'm searching for the best place to report this other than directly to Enix SAS on GitHub.
Usually, I report this for bug bounty purposes, but since I couldn't find your bug bounty program, I'm reporting it here instead

This is a vulnerable ip address:

Vulnerable pods and container is blue-fc745d5bb-crx99 │ default │ color

└─$ kubeletctl -s 45.140.108.163 exec "/bin/sh" -p "blue-fc745d5bb-crx99" -c "color"
/ # ^[[50;5Rid
id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
/ # ^[[51;5R

I
I hope you can fix it.

Regards

Nan

https://hackerone.com/nanwn

Support for registry 307 response (S3 storage)

Right now it looks like proxy doesn't support 307 responses.

"cached image is not available, proxying origin" originRegistry="541519999999.dkr.ecr.us-west-2.amazonaws.com" error="307 Temporary Redirect"

When using S3 storage, registry, by default, is responding with 307 redirect directly to S3 bucket, to avoid proxy image layers through the service.
https://docs.docker.com/registry/spec/api/#pulling-a-layer

It would be great if kube-image-keeper proxy could support 307 and forward it to the client, to directly fetch the image from the S3. It's not only faster, but also cheaper (S3->EC2 is free, while EC2->EC2 in different AZs is not).

As a workaround, for now it's possible to disable redirects by setting in helm values:

    registry:
      env:
      - name: REGISTRY_STORAGE_REDIRECT_DISABLE
        value: true

How to configure the proxy bind address ?

I am evaluating kuik v1.4.0 on different kubernetes cluster.

It works very well on minikube (minikube v1.31.1, k8s v1.26.6, containerd runtime)
but I am having hostPort/hostIP issues on a kubespray-deployed VM (kubespray v2.22.1, k8s v1.26.5, crio runtime) or on minikube with crio runtime.

In both cases, I deploy kuik on the cluster using the helm chart from this project.

Symptoms

Here are the symptoms on kubespray cluster with crio runtime:

all kuik pods are up and pass their readiness probe
pods which use cached images have status ErrImagePull or ImagePullBackOff
if I kubectl describe one of the pod, I can see the following message: pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host

I believe that the reason is an unfixed crio issue : cri-o/cri-o#1804 but this is too complex for me to fix the issue myself. To sum the problem, the proxy-daemonset is configured with hostPort: 7439 and hostIP: 127.0.0.1 but port-forwarding from a pod to the host is currently broken with crio.

As a workaround, I would like to be able to run the proxy daemonset to listen to 127.0.0.1:7439 using the hostNetwork.
Today, the proxy daemonset listens to port 8082 on all interfaces (and it is hardcoded : https://github.com/enix/kube-image-keeper/blob/v1.4.0/internal/proxy/server.go#L108-L110 ).

Would you accept a pull request to make the proxy bind address configurable (with defaults compatible with the existing behavior) ? That would happily workaround my issue and the helm chart could be updated to listen on the hostNetwork as an alternative to the current version that uses hostIP/hostPort.

Some troubleshooting

My minikube start command-line:

minikube start \
  --driver=virtualbox \
  --host-only-cidr=192.168.99.1/24 \
  --memory=10240 \
  --cpus=8 \
  --kubernetes-version=1.26.6 \
  --service-cluster-ip-range=10.96.0.0/12 \
  --docker-opt bip=172.17.0.1/20 \
  --extra-config=kubelet.authentication-token-webhook=true \
  --extra-config=kubelet.authorization-mode=Webhook \
  --extra-config=kubelet.max-pods=110 \
  --extra-config=apiserver.enable-admission-plugins=AlwaysPullImages,PodNodeSelector \
  --extra-config=scheduler.bind-address=0.0.0.0 \
  --extra-config=controller-manager.bind-address=0.0.0.0 \
  --addons ingress \
  --addons storage-provisioner \
  --container-runtime=cri-o

Note the --container-runtime=cri-o option (if not specified, the runtime will be containerd, which works).

Then I apply kuik on the cluster using helm as usual.

Pod status on the cluster

# kube-image-keeper pods are up
$ kubectl get pod -n kube-image-keeper
NAME                                             READY   STATUS    RESTARTS      AGE
kube-image-keeper-controllers-5f69d66fdc-tbgfg   1/1     Running   1 (14h ago)   14h
kube-image-keeper-controllers-5f69d66fdc-xx4fc   1/1     Running   0             14h
kube-image-keeper-proxy-rg876                    1/1     Running   0             14h
kube-image-keeper-registry-0                     1/1     Running   0             14h

# create a test deployment which use a cached docker image
$ kubectl create deployment mydeploy --image docker.io/busybox -- nc -lp 1337

# after waiting a bit, the mydeploy pod cannot pull its image
$ kubectl get pod -l app=mydeploy
NAME                       READY   STATUS             RESTARTS   AGE
mydeploy-8b8f68f58-q72pd   0/1     ImagePullBackOff   0          7m40s

$ kubectl describe pod -l app=mydeploy | tail -n3
  Warning  Failed     5m48s (x4 over 7m57s)  kubelet            Error: ErrImagePull
  Warning  Failed     5m37s (x6 over 7m57s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m2s (x16 over 7m57s)  kubelet            Back-off pulling image "localhost:7439/docker.io/busybox"

If I directly try to pull the image inside the VM:

# crictl pull localhost:7439/docker.io/busybox
E1213 11:22:38.682320   22294 remote_image.go:242] "PullImage from image service failed" err="rpc error: code = Unknown desc = pinging container registry localhost:7439: Get \"http://localhost:7439/v2/\": dial tcp 127.0.0.1:7439: connect: no route to host" image="localhost:8082/docker.io/busybox"
FATA[0012] pulling image: rpc error: code = Unknown desc = pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host 

# curl -sSL -x '' --fail localhost:7439
curl: (7) Failed connect to localhost:7439; No route to host

# using the proxy pod IP address works
# curl -sSL -x '' --fail 10.233.105.77:7439
curl: (22) The requested URL returned error: 404 Not Found

Caching the latest docker images for a deployment without expiry?

Hello enix,

We are considering using kuik to keep docker images used in a customer environment available even if the docker images are pruned from our own registry in the future. Kuik would allow the customer to keep running its old applications regardless of our image pruning policy.

Currently, kuik works very well when deployments have >0 replicas:

every docker image from our docker registry is cached by kuik as soon as the first pod uses it
the cached docker images are not expiring (according to kubectl get cachedimages) because at least 1 pod is using it

Now let's pretend that customer scales a deployment to 0 replicas, then goes to sleep for 6 months.
At some point, the docker image used by the customer is pruned from our registry, and is expired from kuik because:

no pod was using it, so the cachedimage has an expiration timestamp
the kuik garbage collector has ran and pruned the tag from the registry (because timestamp expired)

Now the customer wants to scale its deployment back to >0 replicas, but the image is gone both from our registry and from kuik. What can we do?

My naive idea was that maybe kuik could be configured to cache the docker images associated with the last N replicasets of the deployment without timestamp, even if the replicaset is scaled to 0 replicas.

What would be your thoughts about it?

Registry lock not released after a garbage collection job error

Hi,

I've faced an issue following a garbage collection job error.

Error
The images weren't cached anymore.
The error was: Failed to cache image {IMAGE_NAME} unexpected status code 405 Method Not Allowed: Method not allowed

Cause
The garbage collection job lock the registry (read only) at the end of its process (kubectl set env deploy kube-image-keeper-registry REGISTRY_STORAGE_MAINTENANCE_READONLY="{"enabled":true}")
The garbage collection job failed with following error: failed to garbage collect: failed to mark: s3aws: s3aws: failed to retrieve tags unknown repository name={IMAGE_NAME} terminate
The job stopped and never run the last command which release the repository lock (kubectl set env deploy kube-image-keeper-registry REGISTRY_STORAGE_MAINTENANCE_READONLY-)

Evolution
The job should release the lock before be killed.

Consider publishing images to Quay or GHCR

Hi 👋🏼

It would be great if the images were on a other container registry other than dockerhub.

Thanks!

Intermittent Failed to pull image

I'm trying to deploy kube image keeper in AWS EKS using a Standalone Minio installation.

I had no issue getting it configured and the registry is able to connect and store images in Minio. However some of the EKS node, at what seems to be random fail to be able to pull images from the registry with the following error.

Failed to pull image "localhost:7439/docker.io/grafana/promtail:2.7.4": failed to pull and unpack image "localhost:7439/docker.io/grafana/promtail:2.7.4": failed to copy: httpReadSeeker: failed open: failed to do request: Get "http://minio.minio.svc.cluster.local:9000/registry/docker/registry/v2/blobs/sha256/78/78e4198d60e924f58ae4e9dbcf647dfe0fc72eb594fba3a9f0f35b98eeb3a759/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxx%2F20240112%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240112T191123Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=341bf92fdbf7d9ccf87859afbb419c4f407527ea8222228fd2ea8f8ae0d4e64e": dial tcp: lookup minio.minio.svc.cluster.local: no such host

Some nodes are able to pull the same image without any issues and the networking setup of all the nodes are the same. I've verified that the node is able to access the service.

If I used the out-of-the-box Minio implementation everything seems to work fine and the only difference seems to be that Minio is in it's own namespace.

MutatingWebHook generates invalid annotations

With a container having a long name, the MutatingWebHook will generate an invalid Pod object.

Example Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
        image: nginx

The ReplicatSet is stuck and no workload can start:

Error creating: Pod "test-57ccc945b8-g94d9" is invalid: metadata.annotations: Invalid value: `"original-image-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa":` name part must be no more than 63 characters

kube-image-keeper-registry-0 goes into a crash back-off when using AWS EBS persistent storage.

Sometimes when the kube-image-keeper-registry-0 pod is restarted it goes into a crash back-off with something like this at the end of its logs

garbage-collector docker.elastic.co/beats/filebeat: marking blob sha256:89732bc7504122601f40269fc9ddfb70982e633ea9caf641ae45736f2846b004                                          │
garbage-collector docker.io/jgraph/drawio                                                                                                                                         │
garbage-collector manifest eligible for deletion: sha256:fb2a84c7a2e04d4ea2e5aa0c57385e0e61dd3c7c5ea559a09d5a3a2cca6de28f

I haven't found any errors in the logs, but it always ends with "garbage-collector manifest eligible for deletion"

My workaround is to delete the PVC and then restart the pod again. So there must be something on the volume that breaks it.

We are deploying it using the kube-image-keeper helm chart from https://charts.enix.io/ v1.4.0

It is hosted in EKS with k8s version 1.24. And uses the following stroage class:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  name: encrypted-ebs
parameters:
  csi.storage.k8s.io/fstype: ext4
  encrypted: "true"
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Any help would be greatly appreciated

how to specify the secret for pulling images

Im trying to setup kuik on one of my k8s clusters and I keep getting errors saying that authentication is required when the controller is trying to pull images from arifactory so I would like to know how to configure the imagepullsecret.

2023-05-25T19:49:30.958Z ERROR controller.cachedimage failed to cache image {"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "iafgprod-coeinfonuagique-stable-docker-virtual.jfrog.io-k8s-platform-velero-1.9.0", "namespace": "", "sourceImage": "iafgprod-coeinfonuagique-stable-docker-virtual.jfrog.io/k8s-platform/velero:1.9.0", "error": "GET https://iafgprod-coeinfonuagique-stable-docker-virtual.jfrog.io/artifactory/api/docker/coeinfonuagique-stable-docker-virtual/v2/token?scope=repository%3Ak8s-platform%2Fvelero%3Apull&service=iafgprod-coeinfonuagique-stable-docker-virtual.jfrog.io: : Authentication is required"}

The command below is what i tried but im still getting the same error message. I already verifed that the secret mentionned below exist and is working.

helm upgrade --install --create-namespace --namespace kuik-system kube-image-keeper kube-image-keeper --repo https://charts.enix.io/ --set registry.imagePullSecrets[0].name=iafgprod-coeinfonuagique-stable-docker-virtual

Arm64 part of multi-arch images is not served/cached

Currently the multi-arch images fail to start on arm64 nodes when Kuik is enabled. Deeper investigation revealed that local docker registry does not contain and/or serve the arm64 layers of multi-arch images like library/docker.

Cached images manifest (note that there's only amd64 architecture):

docker pull kube-image-keeper-registry.dev-kube-image-keeper.svc.cluster.local:5000/docker.io/library/docker:20-dind
docker manifest inspect --verbose kube-image-keeper-registry.dev-kube-image-keeper.svc.cluster.local:5000/docker.io/library/docker:20-dind --insecure

{
	"Ref": "kube-image-keeper-registry.dev-kube-image-keeper.svc.cluster.local:5000/docker.io/library/docker:20-dind",
	"Descriptor": {
		"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
		"digest": "sha256:11a8556b63283fb6edb98fc990166476bc2e14b33164039ab6132b27d84882d8",
		"size": 3251,
		"platform": {
			"architecture": "amd64",
			"os": "linux"
		}
	},
	"SchemaV2Manifest": {
		"schemaVersion": 2,
		"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
		"config": {
			"mediaType": "application/vnd.docker.container.image.v1+json",
			"size": 11319,
			"digest": "sha256:c0e053541b0eec2ee002f3862daedf169d40ff5e65d2a11206ebb1dd883acc13"
		},
		"layers": [
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 3397490,
				"digest": "sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 2014629,
				"digest": "sha256:db1d8fde5ab00eb91a63f1f2ceb21e27bf1b3e1bcb622bb9a3828478273df8cb"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 90,
				"digest": "sha256:f3759a44eb9f3e3577a539da15a245f36a7436c3bcc726a0a4c0e677c835a6ba"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 14117997,
				"digest": "sha256:1465fd8ca4ab3bbf2ff1fd0a79ecbb543f6c91c45c4e6643f12bf8f172af186b"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 16001759,
				"digest": "sha256:95c1bc752a8b6c0d7c1951d978f7a9d88f0473002f2b6fe9c05052dfa1d64144"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 16384833,
				"digest": "sha256:f5ce33fb9a21fd7c58a485b4048bb20b38c48518028ebbc87b46da23e62cd48e"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 548,
				"digest": "sha256:4d37eec86745434c6e5165a35a2012abbfd788ef455e7a6f3aeb282e2ef4839e"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 1020,
				"digest": "sha256:49d0e23604c0500f6a93a56a7e0951c8b8264e5822db51d8b4105d6c1a373374"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 167,
				"digest": "sha256:ac56a29f30eb929bd39086c9e11300ca62e2e446ae9d5d557e6723ded96d1ccc"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 7025508,
				"digest": "sha256:0184c8e99504ad3ef831f05b75e81580f06a9620c9b41627c5d92825baee5dcf"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 1319,
				"digest": "sha256:23430c064101ef52f3b552e39750e61826d539d4e98b43d88c805e504a591291"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 53901045,
				"digest": "sha256:e2a9b00eaf87e56919b17afcd93f6862875ce653ab4de6d7a3d5c916ef833944"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 1048,
				"digest": "sha256:004faac3bc8662333a4426e89bb9524953842efa0d3355697d3e5961fd2abeba"
			},
			{
				"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
				"size": 2814,
				"digest": "sha256:765a9f2e07176c06faf302ae4fdcbf43e75986f2fed0d7d0ca0973cc54276225"
			}
		]
	}
}

Original image manifest served by dockerhub (both amd64 and arm64 are present):

docker image pull library/docker:20-dind
docker manifest inspect --verbose library/docker:20-dind

[
	{
		"Ref": "docker.io/library/docker:20-dind@sha256:11a8556b63283fb6edb98fc990166476bc2e14b33164039ab6132b27d84882d8",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:11a8556b63283fb6edb98fc990166476bc2e14b33164039ab6132b27d84882d8",
			"size": 3251,
			"platform": {
				"architecture": "amd64",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 11319,
				"digest": "sha256:c0e053541b0eec2ee002f3862daedf169d40ff5e65d2a11206ebb1dd883acc13"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 3397490,
					"digest": "sha256:8a49fdb3b6a5ff2bd8ec6a86c05b2922a0f7454579ecc07637e94dfd1d0639b6"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 2014629,
					"digest": "sha256:db1d8fde5ab00eb91a63f1f2ceb21e27bf1b3e1bcb622bb9a3828478273df8cb"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 90,
					"digest": "sha256:f3759a44eb9f3e3577a539da15a245f36a7436c3bcc726a0a4c0e677c835a6ba"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 14117997,
					"digest": "sha256:1465fd8ca4ab3bbf2ff1fd0a79ecbb543f6c91c45c4e6643f12bf8f172af186b"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 16001759,
					"digest": "sha256:95c1bc752a8b6c0d7c1951d978f7a9d88f0473002f2b6fe9c05052dfa1d64144"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 16384833,
					"digest": "sha256:f5ce33fb9a21fd7c58a485b4048bb20b38c48518028ebbc87b46da23e62cd48e"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 548,
					"digest": "sha256:4d37eec86745434c6e5165a35a2012abbfd788ef455e7a6f3aeb282e2ef4839e"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1020,
					"digest": "sha256:49d0e23604c0500f6a93a56a7e0951c8b8264e5822db51d8b4105d6c1a373374"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 167,
					"digest": "sha256:ac56a29f30eb929bd39086c9e11300ca62e2e446ae9d5d557e6723ded96d1ccc"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 7025508,
					"digest": "sha256:0184c8e99504ad3ef831f05b75e81580f06a9620c9b41627c5d92825baee5dcf"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1319,
					"digest": "sha256:23430c064101ef52f3b552e39750e61826d539d4e98b43d88c805e504a591291"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 53901045,
					"digest": "sha256:e2a9b00eaf87e56919b17afcd93f6862875ce653ab4de6d7a3d5c916ef833944"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1048,
					"digest": "sha256:004faac3bc8662333a4426e89bb9524953842efa0d3355697d3e5961fd2abeba"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 2814,
					"digest": "sha256:765a9f2e07176c06faf302ae4fdcbf43e75986f2fed0d7d0ca0973cc54276225"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/docker:20-dind@sha256:f52db26a8460b6e4da8ff56ce235570fcb81fcef685bf1aebaa0290da6494599",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:f52db26a8460b6e4da8ff56ce235570fcb81fcef685bf1aebaa0290da6494599",
			"size": 3251,
			"platform": {
				"architecture": "arm64",
				"os": "linux",
				"variant": "v8"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 11334,
				"digest": "sha256:3abfbe5cf820b38df0a10d53261bc2b376ec5ff6680795a4b3eee1e70aed937d"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 3342848,
					"digest": "sha256:08409d4172603f40b56eb6b76240a1e6bd78baa0e96590dc7ff76c5f1a093af2"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 2025067,
					"digest": "sha256:00b32f8d4d5f789274c3df956d1a71dbda3c0d8eb267b33c53aa6de02b3e49fc"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 92,
					"digest": "sha256:249b647e77d07a61a6a56de29194440506db84b546bd7ad42c73b8f4388a30cd"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 12836224,
					"digest": "sha256:70115aff51274d0d1ff79c0b64a2364768dafee3c4f858d14c65e1653fea4259"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 14441515,
					"digest": "sha256:75265b819afd64316c90682b8f16ecb72b9a828eb3535892f735cfe778ad9ad1"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 14835063,
					"digest": "sha256:7e3009834ea7bfadd539c5f6f8679ff926b95c69041b9490d15d1f5584dccf3f"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 545,
					"digest": "sha256:ceae6ddcdf8c2486b41a2d8c467867d68a46d3dc4cca0b8ee3d1e81b68b87a6a"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1019,
					"digest": "sha256:bf932719bc1435f4e06003bae3823a2eebb71ad2c5e2e5e0327a1f2ce0ac2d29"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 166,
					"digest": "sha256:8b6a26a6efce5f5993747f0eaf938c5ba0ad4989ccdc527d783296447186b423"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 7245237,
					"digest": "sha256:40af53c00db16857c71785cabd9051fe6a84accfd8877f9ace9769152a95c0ec"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1322,
					"digest": "sha256:620f16fecdb17140029feeda0f58e4553c89f382e3f27731fb0ca33cd7d4ee51"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 49320282,
					"digest": "sha256:bddcb496053fdf47ac8750b107cc14ea095ae5957ba8eeec32ec3d5f4857be12"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 1050,
					"digest": "sha256:f14a1a3a1065ccb47908b268e7f5117a9250523b2c989888a893b21c94040496"
				},
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 2815,
					"digest": "sha256:fc5829b166b80d4ae63da17dd529566392c5237f1496f3f9998e844d08d02c69"
				}
			]
		}
	}
]

The pod's spec that was used to obtain the output above:

apiVersion: v1
kind: Pod
metadata:
  name: dind
  labels:
    app: dind
    kube-image-keeper.enix.io/image-caching-policy: "ignore"
spec:
  containers:
    - image: docker:20-dind
      command:
        - "sleep"
        - "604800"
      imagePullPolicy: IfNotPresent
      name: dind
      env:
        # - name: DOCKER_HOST
        #   value: tcp://localhost:2375
        - name: DOCKER_BUILDKIT
          value: "1"
        - name: DOCKER_DRIVER
          value: overlay2
        - name: DOCKER_TLS_CERTDIR
          value: ""
      volumeMounts:
        - name: dind-storage
          mountPath: /var/lib/docker
        - name: docker-socket
          mountPath: /var/run
    - name: docker
      image: docker:20-dind
      securityContext:
        privileged: true
      args:
        - "--tls=false"
        - "--insecure-registry=kube-image-keeper-registry.dev-kube-image-keeper.svc.cluster.local:5000"
      volumeMounts:
        - name: dind-storage
          mountPath: /var/lib/docker
        - name: docker-socket
          mountPath: /var/run
  nodeSelector:
    kubernetes.io/os: "linux"
    kubernetes.io/arch: "arm64"
  volumes:
    - name: dind-storage
      emptyDir: {}
    - name: docker-socket
      emptyDir: {}

Allow setting separate serviceAccount and annotations for registry deployment

Could you add to Helm Chart option to create serviceAccount (and annotations) for registry deployment?
We need it to be able to give access to S3 bucket for registry through IRSA.

Please modify controller deployment manifest to set proxy variables

I am currently installing and testing kube-image-keeper in my corporate environment, and my Kubernetes cluster VMs are hosted behind a proxy. I've attempted to add environment variables for http_proxy , https_proxy and no_proxy in the controller deployment, but it doesn't seem to be taking effect.

Could you please provide additional information? In cases where the image is not available in the local registry, which component is responsible for pulling the image from an upstream registry and pushing it to the local registry? If the controllers are responsible for this task, how can I incorporate the proxy variables into the respective controller pods?

I tried to edit controller manifest but it crashes with below error

2023/10/17 10:19:12 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota │
│ 2023-10-17T10:19:22.373Z ERROR Failed to get API Group-Resources {"error": "Get "https://10.43.0.1:443/api?timeout=32s\": net/http: TLS handsha │
│ 2023-10-17T10:19:22.373Z ERROR setup unable to start manager {"error": "Get "https://10.43.0.1:443/api?timeout=32s\": net/http: TLS handshake

sourceImage url with custom port issue

We are experiencing an issue with caching images from a private GitLab registry operating on a custom port (the registry is hosted at gitlab.example.com:5050). The error encountered is as follows:

Failed to cache image gitlab.example.com-5050/group/image/101-tests:87cb4002, due to the following reason: Get 'https://gitlab.example.com-5050/v2/': dial TCP: lookup gitlab.example.com-5050 on 10.3.0.10:53: no such host found; Get 'http://gitlab.example.com-5050/v2/': dial TCP: lookup gitlab.example.com-5050 on 10.3.0.10:53: no such host found.

It appears that the issue is related to the CRD where the 'sourceImage' field incorrectly replaces ':' with '-'. For reference, here is an example of the CRD (redacted):

apiVersion: kuik.enix.io/v1alpha1
kind: CachedImage
metadata:
  creationTimestamp: '2023-11-29T14:18:02Z'
  finalizers:
    - cachedimage.kuik.enix.io/finalizer
  generation: 2
  labels:
    kuik.enix.io/repository: 1dbfc7ee32cce8bf16cb315a49504c0ff229eabcf51bd9baf71c6f36
status:
  usedBy:
    count: 1
    pods:
      - namespacedName: test/preview-test-migration-x7bqg
spec:
  pullSecretNames:
    - preview-48131-test-registry
  pullSecretsNamespace: test
  sourceImage: >-
    gitlab.example.com-5050/group/image/101-tests:87cb4002

Feature Request: disable Image Cache based on regex

Hi, thank you very much for creating this tools. it is very useful,

Though i'm having a bit of a hiccup when using this tools with pod that is watched by argocd-image-updater. The pod is currently consists of multiple containers, with a mix of public image and internally-developed docker image. The argocd-image-updater is set to automatically update the pod when the internally-developed docker image is updated. But when kube-image-keeper is used, the internally-developed docker-image is no longer recognized by argocd-image-updater. I surely can ignore this pod, but this also means that the public image won't be cached (which is the one that usually takes longer to download).

So, it will be more awesome if there is some regex based mechanism to ignore images to be cached, maybe by defining the regex on annotations.

Metrics

Hi again,

It would be great if there was prometheus metrics that could be scraped and a service monitor that could be enabled in the Helm chart.

README.md points to the wrong helm chart location

Thanks for this project, it looks really promising. One thing I've noticed while evaluating it, is the wrong location of the Helm chart referenced in the README.md:

https://github.com/enix/kube-image-keeper/blob/54af3ff747a36795c248f7d9234a36bb72fca3fe/README.md?plain=1#L138C1-L138C106

Unable to use kube-image-keeper - Calico in eBPF doesn't support Host Ports

System

Ubuntu version: 22.04
k3s version: 1.26.1
Kube Image Keeper version: 1.0.1
Cert-manager version: 1.11.0

Helm values

controllers:
  image:
    repository: quay.io/enix/kube-image-keeper
  webhook:
    objectSelector:
      matchExpressions:
        - key: kube-image-keeper.enix.io/image-cache
          operator: In
          values: ["enabled"]
proxy:
  image:
    repository: quay.io/enix/kube-image-keeper
registry:
  image:
    repository: public.ecr.aws/docker/library/registry
  persistence:
    enabled: true
    storageClass: ceph-filesystem
    size: 20Gi

Test command

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot --labels "kube-image-keeper.enix.io/image-cache=enabled"

YAML of generated pod

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 1a6d1ef12006195eabcb97ea295b9f14ab9eccbd7b8788dc8b64e0dbb2398ee7
    cni.projectcalico.org/podIP: 10.42.152.222/32
    cni.projectcalico.org/podIPs: 10.42.152.222/32
    original-image-tmp-shell: nicolaka/netshoot
  creationTimestamp: "2023-02-06T20:00:03Z"
  finalizers:
  - pod.kuik.enix.io/finalizer
  labels:
    kube-image-keeper.enix.io/image-cache: enabled
    kuik.enix.io/images-rewritten: "true"
  name: tmp-shell
  namespace: default
  resourceVersion: "31901361"
  uid: f77f1bc3-9e3a-4913-8bac-6f6f69be605c
spec:
  containers:
  - image: localhost:7439/nicolaka/netshoot
    imagePullPolicy: Always
    name: tmp-shell
    resources: {}
    stdin: true
    stdinOnce: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    tty: true
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2gp2j
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-0
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 20
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 20
  volumes:
  - name: kube-api-access-2gp2j
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-02-06T20:00:03Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-02-06T20:00:03Z"
    message: 'containers with unready status: [tmp-shell]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-02-06T20:00:03Z"
    message: 'containers with unready status: [tmp-shell]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-02-06T20:00:03Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: localhost:7439/nicolaka/netshoot
    imageID: ""
    lastState: {}
    name: tmp-shell
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: 'rpc error: code = Unknown desc = failed to pull and unpack image
          localhost:7439/nicolaka/netshoot:latest": failed to resolve reference "localhost:7439/nicolaka/netshoot:latest":
          failed to do request: Head "http://localhost:7439/v2/nicolaka/netshoot/manifests/latest":
          dial tcp 127.0.0.1:7439: connect: connection refused'
        reason: ErrImagePull
  hostIP: 192.168.42.10
  phase: Pending
  podIP: 10.42.152.222
  podIPs:
  - ip: 10.42.152.222
  qosClass: BestEffort
  startTime: "2023-02-06T20:00:03Z"

Events

❯ k describe pod tmp-shell
Name:             tmp-shell
Namespace:        default
Priority:         0
Service Account:  default
Node:             k8s-0/192.168.42.10
Start Time:       Mon, 06 Feb 2023 15:00:03 -0500
Labels:           kube-image-keeper.enix.io/image-cache=enabled
                  kuik.enix.io/images-rewritten=true
Annotations:      cni.projectcalico.org/containerID: 1a6d1ef12006195eabcb97ea295b9f14ab9eccbd7b8788dc8b64e0dbb2398ee7
                  cni.projectcalico.org/podIP: 10.42.152.222/32
                  cni.projectcalico.org/podIPs: 10.42.152.222/32
                  original-image-tmp-shell: nicolaka/netshoot
Status:           Pending
IP:               10.42.152.222
IPs:
  IP:  10.42.152.222
Containers:
  tmp-shell:
    Container ID:
    Image:          localhost:7439/nicolaka/netshoot
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2gp2j (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-2gp2j:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  6s               default-scheduler  Successfully assigned default/tmp-shell to k8s-0
  Normal   Pulling    5s               kubelet            Pulling image "localhost:7439/nicolaka/netshoot"
  Warning  Failed     5s               kubelet            Failed to pull image "localhost:7439/nicolaka/netshoot": rpc error: code = Unknown desc = failed to pull and unpack image "localhost:7439/nicolaka/netshoot:latest": failed to resolve reference "localhost:7439/nicolaka/netshoot:latest": failed to do request: Head "http://localhost:7439/v2/nicolaka/netshoot/manifests/latest": dial tcp 127.0.0.1:7439: connect: connection refused
  Warning  Failed     5s               kubelet            Error: ErrImagePull
  Normal   BackOff    4s (x2 over 5s)  kubelet            Back-off pulling image "localhost:7439/nicolaka/netshoot"
  Warning  Failed     4s (x2 over 5s)  kubelet            Error: ImagePullBackOff

Cached Images

✖ k get cachedimages -A
NAME                                                                                                                CACHED   EXPIRES AT             PODS COUNT   AGE
docker.io-nicolaka-netshoot-latest                                                                                  true                            1            5m57s

Logs of kube-image-keeper

❯ stern -n kuik-system kube-image
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.295Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.296Z	INFO	controller-runtime.manager.controller.pod	adding finalizer	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.607Z	ERROR	controller-runtime.manager.controller.pod	Reconciler error	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "error": "Operation cannot be fulfilled on pods \"tmp-shell\": the object has been modified; please apply your changes to the latest version and try again"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.BackoffUntil
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.JitterUntil
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager k8s.io/apimachinery/pkg/util/wait.UntilWithContext
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.607Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.607Z	INFO	controller-runtime.manager.controller.pod	adding finalizer	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.733Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": ""}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.733Z	INFO	controller-runtime.manager.controller.cachedimage	caching image{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.746Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.748Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "cachedImage": "docker.io-nicolaka-netshoot-latest", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.748Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.748Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.751Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": ""}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.751Z	INFO	controller-runtime.manager.controller.cachedimage	caching image{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "GET /v2/ HTTP/1.1" 200 2 "" "Go-http-client/1.1"
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "HEAD /v2/docker.io/nicolaka/netshoot/manifests/latest HTTP/1.1" 200 3258 "" "go-containerregistry/v0.6.0"
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.741363587Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=b2526ee2-a358-4873-8959-3533d2c42b7b http.request.method=GET http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=2.899653ms http.response.status=200 http.response.written=2
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.745966131Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=a68b214a-54ea-4245-9e6c-de5d3b0df0ae http.request.method=HEAD http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/docker.io/nicolaka/netshoot/manifests/latest" http.request.useragent="go-containerregistry/v0.6.0" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=4.100674ms http.response.status=200 http.response.written=3258
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.758Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "cachedImage": "docker.io-nicolaka-netshoot-latest", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.758Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.763Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "GET /v2/ HTTP/1.1" 200 2 "" "Go-http-client/1.1"
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "HEAD /v2/docker.io/nicolaka/netshoot/manifests/latest HTTP/1.1" 200 3258 "" "go-containerregistry/v0.6.0"
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "GET /v2/ HTTP/1.1" 200 2 "" "Go-http-client/1.1"
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.767Z	INFO	controller-runtime.manager.controller.cachedimage	reconciled cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.768Z	INFO	controller-runtime.manager.controller.cachedimage	reconciling cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": ""}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.768Z	INFO	controller-runtime.manager.controller.cachedimage	caching image{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.772Z	INFO	controller-runtime.manager.controller.cachedimage	image already present in cache, ignoring	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:17.778Z	INFO	controller-runtime.manager.controller.cachedimage	reconciled cachedimage	{"reconciler group": "kuik.enix.io", "reconciler kind": "CachedImage", "name": "docker.io-nicolaka-netshoot-latest", "namespace": "", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-0 kube-image-keeper 10.42.152.247 - - [06/Feb/2023:20:03:17 +0000] "HEAD /v2/docker.io/nicolaka/netshoot/manifests/latest HTTP/1.1" 200 3258 "" "go-containerregistry/v0.6.0"
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.759935355Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=5b7191a5-634d-4038-9d8c-6e1856e280db http.request.method=GET http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=6.610741ms http.response.status=200 http.response.written=2
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.763121611Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=7c3b9363-6ea0-4bd2-8349-ce35d67d5a08 http.request.method=HEAD http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/docker.io/nicolaka/netshoot/manifests/latest" http.request.useragent="go-containerregistry/v0.6.0" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=2.824996ms http.response.status=200 http.response.written=3258
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.770377364Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=c4f8fed4-48d6-4874-b423-23e67f68c0c3 http.request.method=GET http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration="517.371µs" http.response.status=200 http.response.written=2
kube-image-keeper-0 kube-image-keeper time="2023-02-06T20:03:17.772022859Z" level=info msg="response completed" go.version=go1.16.15 http.request.host="kube-image-keeper-service:5000" http.request.id=09051b63-1f64-480b-8db1-f9f0ea281de2 http.request.method=HEAD http.request.remoteaddr="10.42.152.247:39176" http.request.uri="/v2/docker.io/nicolaka/netshoot/manifests/latest" http.request.useragent="go-containerregistry/v0.6.0" http.response.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.response.duration=1.416951ms http.response.status=200 http.response.written=3258
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:18.122Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:18.146Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "cachedImage": "docker.io-nicolaka-netshoot-latest", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:18.146Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:19.173Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:19.200Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "cachedImage": "docker.io-nicolaka-netshoot-latest", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:19.200Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:20.181Z	INFO	controller-runtime.manager.controller.pod	reconciling pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:20.204Z	INFO	controller-runtime.manager.controller.pod	cachedimage patched	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default", "cachedImage": "docker.io-nicolaka-netshoot-latest", "sourceImage": "nicolaka/netshoot"}
kube-image-keeper-controllers-6c5b6d4d47-66p4c cache-manager 2023-02-06T20:03:20.204Z	INFO	controller-runtime.manager.controller.pod	reconciled pod	{"reconciler group": "", "reconciler kind": "Pod", "name": "tmp-shell", "namespace": "default"}

How to figure out the number of calls controller is making to a registry?

We installed Quick on the aks cluster, and the dashboard also works. Can I use a metric to figure the number of calls to the registry?

Earlier, we used kube-fledged, which cost us money because it made too many calls to Jfrog. To avoid that with Kuik, I would like to see metrics on how many times it has called a registry and downloaded to cache

enix / kube-image-keeper Goto Github PK

kube-image-keeper's Introduction

kube-image-keeper (kuik)

Upgrading

From 1.6.0 o 1.7.0

Why and when is it useful?

Prerequisites

Supported Kubernetes versions

How it works

Architecture and components

Metrics

Installation

Installation with plain YAML files

Configuration and customization

Advanced usage

Pod filtering

Image pull policy

Cache persistence

Retain policy

Multi-arch cluster / Non-amd64 architectures

Corporate proxy

Insecure registries & self-signed certificates

Registry UI

Garbage collection and limitations

Known issues

Conflicts with other mutating webhooks

Private images are a bit less private

Cluster autoscaling delays

Garbage collection issue

Images with digest

kube-image-keeper's People

Contributors

Stargazers

Watchers

Forkers

kube-image-keeper's Issues

Helm values

Resources

Logs

Current situation

imagePullPolicy: Never

imagePullPolicy: IfNotPresent

imagePullPolicy: Always

Sometimes mutable tags are needed

Examples to illustrate the current situation

Avoiding single point of failures

An idea how to fix those issues

Similar issue

Symptoms

Some troubleshooting

System

Helm values

Test command

YAML of generated pod

Events

Cached Images

Logs of kube-image-keeper

Recommend Projects

Recommend Topics

Recommend Org

Jobs