canonical / cluster-api-bootstrap-provider-microk8s Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alexsjones/cluster-api-bootstrap-provider-microk8s

19.0 4.0 14.0 16.03 MB

This project offers a cluster API bootstrap provider controller that manages the node provision of a MicroK8s cluster.

Home Page: https://microk8s.io

Go 88.59% Makefile 4.30% Dockerfile 0.55% Shell 6.56%

cluster-api kubernetes microk8s

cluster-api-bootstrap-provider-microk8s's People

Contributors

Stargazers

Watchers

Forkers

balchua oscr spectrocloud sachinkumarsingh092 isabella232 ktsakalozos ksagle77 eupraxialabs skatsaounis pedrofragola motte-cloud vishvikkrishnan beliaev-maksim rgarcia

cluster-api-bootstrap-provider-microk8s's Issues

Add-ons Specified in Manifest Are Not Installing

For your manifest templates within kind: MicroK8sControlPlane -> spec.controlPlaneConfig.initConfiguration.addons, I believe I am specifying the values correctly for certain add-ons listed from this documentation page. However, a vast majority of the add-ons I seem to insert as values within the manifest do not seem to be installing correctly or at all (with the exception of the ones specified in the samples). Even when you run microk8s enable <addon-name> on the master node, there does seem to be the need of human interaction due to pre-requisite requirements. For example, Mayastor will need HugePages enabled for the installation.

Therefore, is there any way to add the capability of running a string array of custom commands inside of the CAPI manifest prior and after MicroK8s is installed in the initial control plane node? This will ensure certain add-ons can work and be properly installed directly from the CAPI manifest, preventing any other sort of human interaction generally done now in the initial master node.

Allow HTTPS for Snap Store Proxy Configuration in CAPI MicroK8s Bootstrap Script

Currently, in the preruncmd script for configuring the Snap Store proxy within the Cluster API MicroK8s Bootstrap Provider, the Snap Store proxy domain is hardcoded to use HTTP. I propose that this should be configurable to allow the use of HTTPS as well.

Steps to Reproduce

Deploy a Kubernetes cluster using Cluster API with MicroK8s.
Configure the Snap Store proxy using the snapstoreProxyDomain and snapstoreProxyId parameters.
Observe the preruncmd script attempting to use HTTP to configure the Snap Store proxy.

Expected Behavior

The preruncmd script should allow the Snap Store proxy to be configured using HTTPS. The URL should be fully configurable as part of the CRD.
Actual Behavior

The preruncmd script currently hardcodes the use of HTTP when configuring the Snap Store proxy:

while ! curl -sL http://"${1}"/v2/auth/store/assertions | snap ack /dev/stdin ; do
  echo "Failed to ACK store assertions, will retry"
  sleep 5
done

Proposed Change

Modify the preruncmd script to allow the full URL to be specified as part of the CRD, enabling the use of HTTPS if desired. For example, the script could be updated to:

while ! curl -sL "${1}"/v2/auth/store/assertions | snap ack /dev/stdin ; do
  echo "Failed to ACK store assertions, will retry"
  sleep 5
done

This change allows users to specify either HTTP or HTTPS in the snapstoreProxyDomain parameter.

Cluster Secrets <cluster-name>-ca and <cluster-name>-jointoken not generating

Using the Microk8s Bootstrap and Microk8s Control Plane with the vSphere Infrastructure Provider, the secrets for the ca and jointoken are not generating. Is this expected behavior? The only logs I see are that the <cluster-name>-ca secret is missing but nothing about why it didn't generate. I did not see this behavior when using AWS as the infrastructure provider.

When I do a kubectl describe cluster <cluster-name> , I see a status of Waiting for control plane provider to indicate the control plane has been initialized.

Using a Multipass vm on Ubuntu 22.04
kubectl client version: v1.28.3
kubectl server version: v1.27.6
clusterctl version: v1.5.3
microk8s version: v1.27.6 revision 6070

modules:final is never executed in Cloud-init of Multi-node Microk8s cluster

We disable the default calico and then use helm to install Cilium as CNi layer.
When deploying a 3-CP node Microk8s cluster, the first node is running correctly. cloud-init-output logs:

[2024-05-13 22:09:59] Cloud-init v. 22.4.2-0ubuntu0~22.04.1 running 'modules:config' at Mon, 13 May 2024 22:09:59 +0000. Up 10.82 seconds.
[2024-05-13 22:09:59] Begin run command: snap set system proxy.http="http://<PRIVATE_IP>:8000/" proxy.https="http://PRIVATE_IP:8000/"
[2024-05-13 22:10:00] End run command: exit(0)
[2024-05-13 22:10:01] Cloud-init v. 22.4.2-0ubuntu0~22.04.1 running 'modules:final' at Mon, 13 May 2024 22:10:01 +0000. Up 13.24 seconds.
[2024-05-13 22:10:01] + /capi-scripts/00-disable-host-services.sh

However, the second and the third node get stuck in cloud init. cloud-init-output logs:

[2024-05-13 20:24:31] Cloud-init v. 22.4.2-0ubuntu0~22.04.1 running 'modules:config' at Mon, 13 May 2024 20:24:31 +0000. Up 11.66 seconds.
[2024-05-13 20:24:31] Begin run command: snap set system proxy.http="http://PRIVATE_IP:8000/" proxy.https="http://PRIVATE_IP:8000/"
[2024-05-13 20:24:32] End run command: exit(0)

According to the cloud-init-log of the first node, it should run Cloud-init v. 22.4.2-0ubuntu0~22.04.1 running 'modules:final' after End run command: exit(0)
When Checking microk8s status in both 2nd and 3rd nodes:

microk8s is not running. Use microk8s inspect for a deeper inspection.

ProviderID not updated on worker nodes

It's the job of the microk8s control plane provider to provide the ProviderID to the nodes to get them from a provisioned to a running state. But sometimes this provisioning happens too slowly or not at all for the workers, for example in a single master single node cluster.
A good solution would be to move this task from the control plane to the bootstrap provider.

Race errors while deleting the cluster

Summary

Sometimes while deleting the machines in the cluster during the scaling down[1], the final machine (particularly control plane machine) gets stuck in Deleting phase. This can be traced back to deleting the node the machine is associated with, without passing on the nodeRef. This hinders with the deletion of machine as there is no reference to the node when it finally comes to its proper cleanup.

This doesn't happens often, so most probably a race condition. I tested and noted this on OpenStack so we need to dig in a bit more into this and test it out a bit more on different clouds and try and reproduce it.

[1] Scale down method

[RFE] Update the openstack template provide => cluster-template.yaml

The latest version of the cluster-template.yaml has the option to use Loadbalancer:

spec:
apiServerLoadBalancer:
enabled: true

it will be good to have this option using the microk8s and as per the upstream[0] have the option with LB and without LB:

The current version is a bit old

https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/templates/cluster-template.yaml

Running bash commands before/after the microk8s deployment with Openstack provider

How can we run bash commands before and/or after the microk8s cluster deployment with openstack provider? Preferably using the generated cluster yaml.

Unable to create CAPI clusters with microk8s 1.28 version

On creating a 3CP-1Worker node cluster, the first CP node comes fine and the other 2 CP machines stay in provisioning state. On AWS console, all the three CP machines are created.

Same for 1CP-1Worker node cluster. CP node comes fine and worker node never joins the cluster. On logging into worker node, microk8s status says that the node has joined master

Let me know if any logs are required

Migrate logger from `zap` to `klog`

Proposal: Migrate logger from zap to klog which is an industry used logger used in Cluster API(CAPI) and all other providers (CAPx) projects. Using klog will help in more granular control over logging as it supports --v verbosity flag in Kubernetes objects's arguments. klog will also help in bringing this provider more in-line with other CAPx projects.

Generating MicroK8s manifests with clusterctl

Hi everybody,
I tried to use the microk8s bootstrap provider together with the hetzner infrastructure provider. However when using clusterctl generate, the generated manifests used Kubeadm rather than MikroK8s. Do we need to enable feature flags to use this? Would be great if we could figure this out - I'd be happy to document this in a PR afterwards 🤭 the documentation only uses pregenerated manifests so it's not very descriptive there

Thanks in advance!

Use InitFlags() for command line flags

Proposal: In the current implementation, command line flags like MetricsBindAddress, LeaderElection etc. are initialized in the main() function only. We can use a separate InitFlags() function to initialize all the command line flags. This approach will ease the process of adding more command line flags in future, and will help in keeping the main() function clean. This approach will also bring this provider more in-line with other CAPx projects.

snap store proxy configuration not getting applied on CAPI worker nodes

The cluster spec is as follows:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: microk8s-maas
  namespace: default
spec:
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: MicroK8sControlPlane
    name: microk8s-maas-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: MaasCluster
    name: microk8s-maas
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: MaasCluster
metadata:
  name: microk8s-maas
  namespace: default
spec:
  dnsDomain: maas
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: MicroK8sControlPlane
metadata:
  name: microk8s-maas-control-plane
  namespace: default
spec:
  controlPlaneConfig:
    clusterConfiguration:
      portCompatibilityRemap: true
    initConfiguration:
      IPinIP: true
      snapstoreProxyDomain: "airgapped-snaps.capi.demo"
      snapstoreProxyId: "Rn0rGwTAE8uRE28ZOgSXiluKZXgMVKzi"
      addons:
      - dns
      joinTokenTTLInSecs: 9000
      preRunCommands:
      - |
        touch preRunCommands.done
      postRunCommands:
      - |
        touch postRunCommands.done
  machineTemplate:
    infrastructureTemplate:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: MaasMachineTemplate
      name: microk8s-maas-control-plane
  replicas: 3
  version: v1.27.11
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: MaasMachineTemplate
metadata:
  name: microk8s-maas-control-plane
  namespace: default
spec:
  template:
    spec:
      image: ubuntu/jammy
      minCPU: 1
      minMemory: 2048
      resourcePool: null
      tags:
      - controller
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: microk8s-maas-md-0
  namespace: default
spec:
  clusterName: microk8s-maas
  replicas: 3
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: MicroK8sConfigTemplate
          name: microk8s-maas-md-0
      clusterName: microk8s-maas
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: MaasMachineTemplate
        name: microk8s-maas-md-0
      version: 1.27.11
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: MaasMachineTemplate
metadata:
  name: microk8s-maas-md-0
  namespace: default
spec:
  template:
    spec:
      image: ubuntu/jammy
      minCPU: 1
      minMemory: 2048
      resourcePool: null
      tags:
      - worker
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: MicroK8sConfigTemplate
metadata:
  name: microk8s-maas-md-0
  namespace: default
spec:
  template:
    spec:
      initConfiguration:
        snapstoreProxyDomain: "airgapped-snaps.capi.demo"
        snapstoreProxyId: "Rn0rGwTAE8uRE28ZOgSXiluKZXgMVKzi"

However, in the worker nodes, the cloud-init-output.log indicates that the worker did not have the snap store proxy configured:

+ /capi-scripts/00-configure-snapstore-http-proxy.sh  
+ [[ '' != '' ]]
+ [[ '' != '' ]]
+ /capi-scripts/00-configure-snapstore-proxy.sh  
+ '[' 2 -ne 2 ']'
+ '[' -z '' ']'
+ echo 'Using the default snapstore'
Using the default snapstore
+ exit 0

This feature works as expected on the control plane nodes.

It should be possible to configure the snap store proxy on the worker nodes as well.

Support watchNamespace flag for controller

The current implementation of bootstrap-microk8s controller does not support watchNamespace flag. This won't have any affect on clusters deployed using upstream's deployment model as controllers and webhooks are a singleton instance.
But in many cases where controllers and webhooks are separate entities, this can create issues when having a separate controller per namespace as then there will be no restriction for the controller to watch objects in its namespace only. This can create an instance of multiple reconcilers from different namespaces creating resources in different namespaces.

Supporting watchNamespace will be helpful for downstream and will assist in bringing this provider, more in-line with other CAPx providers.

Fix the issue with the secrets using clusterctl 1.5

ISSUE DESCRIPTION

Using the version the versions below:

cluster-api capi-system CoreProvider v1.2.4 v1.5.0
infrastructure-openstack capo-system InfrastructureProvider v0.7.1 v0.7.3

The cluster will face the issue below due to the PR[0]

[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},} watch on cluster default/test-ci-cluster: failed to create cluster accessor: error fetching REST client config for remote cluster \"default/test-ci-cluster\": failed to retrieve kubeconfig secret for Cluster default/test-ci-cluster: Secret \"test-ci-cluster-kubeconfig\" not found" controller="machine" controllerGroup="cluster.x-k8s.io"

Workaround

Downgrade the clusterctl:

clusterctl upgrade apply --core cluster-api:v1.2.4
clusterctl upgrade apply --infrastructure openstack:v0.7.1

Fix

As per the doc[1] the microk8s bootstrap needs to be fixed to be able to use the latest version of clusterctl => "capiv1beta1.ClusterNameLabel label"

[0] kubernetes-sigs/cluster-api#8940
[1] kubernetes-sigs/cluster-api#9080

Enable specifying machine tags for CAPI-MAAS infrastructure cluster template

The cluster template for MAAS does not allow specifying machine tags to target a node for the cluster deployment; currently it is only possible to specify the resource pool. However, this feature is already available on the infra provider - ref: spectrocloud/cluster-api-provider-maas@bb8204b

The cluster template and .rc files for MAAS need to allow specifying machine tags for the control-plane and worker machines. Example for worker machine tag:

cluster-template-maas.rc:

export WORKER_MACHINE_TAGS="cloud,compute"

cluster-template-maas.yaml:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: MaasMachineTemplate
metadata:
  name: microk8s-maas-md-0
  namespace: default
spec:
  template:
    spec:
      image: ubuntu/jammy
      minCPU: 1
      minMemory: 2048
      resourcePool: bare-metal-pool
      tags:
      - cloud
      - compute