GithubHelp home page GithubHelp logo

launchboxio / cluster-api-provider-proxmox Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 330 KB

CAPI Provider for Proxmox

License: MIT License

Dockerfile 1.68% Makefile 17.65% Go 80.67%
cluster-api kubernetes operator-sdk proxmox

cluster-api-provider-proxmox's Introduction

Kubernetes Cluster API Provider Proxmox (CAPPVE)


What is Cluster API Provider Proxmox

Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management.

Proxmox is an open source hypervisor for launching QEMU VMs and LXC containers

The Cluster API Provider Proxmox allows a Proxmox host / cluster to respond to the infrastructure requests from Cluster API configurations.

Project Status

This project is currently under active development. Breaking changes may be occur, so please use at your own risk

Getting Started

Before installation, be sure to check the Requirements section first, to ensure your Proxmox environment is ready

Installation

Add the following provider configuration to ~/.cluster-api/clusterctl.yaml

providers:
  - name: "proxmox"
    url: "https://github.com/launchboxio/cluster-api-provider-proxmox/releases/latest/infrastructure-components.yaml"
    type: "InfrastructureProvider"

We can then install the provider alongside ClusterAPI

clusterctl init --infrastructure proxmox

Requirements

  • NFS storage volume for snippets (#8)
  • Proxmox deployed as a cluster (#9)

Credentials

Proxmox API

export PM_API_URL=https://proxmox:8006/api2/json
export PM_API_TOKEN_ID=user@pve!cluster-api
export PM_API_TOKEN_SECRET="xxxxxxxxxxx-xxxxx-xxxx-xxxx"
export NAMESPACE="my-cluster"

kubectl create secret generic proxmox \
  --from-literal=api_url="${PM_API_URL}" \
  --from-literal=token_id="${PM_API_TOKEN_ID}" \
  --from-literal=token_secret="${PM_API_TOKEN_SECRET}" \
  -n "${NAMESPACE}"

Storage

export HOST="0.0.0.0:22"
export USER="user_id"
export PASS="password"
export NAMESPACE="my-cluster"

kubectl create secret generic storage-access \
  --from-literal=host="${HOST}" \
  --from-literal=user="${USER}" \
  --from-literal=password="${PASSWORD}" \
  -n "${NAMESPACE}"

Template

This provider requires that a base cloud init template be created, which it can use to start and configure Kubernetes nodes. At the moment, only Ubuntu 22.04 has been tested, but other Ubuntu versions may work

SSH to one of the Proxmox nodes, and perform the following

export TEMPLATE_ID=XXXX  
export STORAGE="storage"
export DISK_SIZE="32G"

wget  https://cloud-images.ubuntu.com/releases/focal/release/ubuntu-20.04-server-cloudimg-amd64.img
qm create "${TEMPLATE_ID}" --memory 2048 --net0 virtio,bridge=vmbr0 # Change other configurations if needed
qm importdisk "${TEMPLATE_ID}" ubuntu-20.04-server-cloudimg-amd64.img "${STORAGE}"
qm set "${TEMPLATE_ID}" --scsihw virtio-scsi-pci --scsi0 storage:vm-9001-disk-0
qm set "${TEMPLATE_ID}" --serial0 socket --vga --serial0
qm set "${TEMPLATE_ID}" --ide2 storage:cloudinit
qm set "${TEMPLATE_ID}" --boot c --bootdisk scsi0
qm resize "${TEMPLATE_ID}" scsi0 "${DISK_SIZE}"

This template can then be used as a base for launching the Kubernetes nodes

cluster-api-provider-proxmox's People

Contributors

robwittman avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

quynhlab

cluster-api-provider-proxmox's Issues

Validate any major / minor increases also update metadata.yaml

Commits with feat() or BREAKING will increase the minor or major version, per semantic-release. This requires an additional update to metadata.yaml so that the new version is described as satisfying the required CAPI contract.

We should have a job that does something like:

  • semantic-release --dry-run (to parse commits for new expected version)
  • verify that metadata.yaml has a corresponding contract section

Double VM Creation

Once in a while, the operator creates a VM, fails to create the instance due to "changes were made to this object", and after reconciliation it creates yet another VM.

We need to better encapsulate VM creation and storage of the ID to prevent orphaned nodes

Error when unlinking drive

The following error occurs whenever a VM is deleted. The full deletion is able to reconcile without an issue, but this error shows up once in the logs. We should identify the cause, and resolve

"bad request: 400 Parameter verification failed. - {\"ide2\":\"unable to apply pending change ide2 : command '/usr/bin/ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/<redacted>_id_rsa root@<redacted> zfs destroy -r default/vm-102-cloudinit' failed: exit code 1\\n\"}"}

Pluggable snippet storage

Snippet storage is currently only setup for the ZFS over ISCSI used in the lab. The storage should be abstracted to support any storage engines supported by Proxmox

Question: Should we support any non-shared storage options?

Clone to target node in one step

When a VM is created, but not yet finished initializing, the operator may move the node if it re-reconciles the resource. Rather than create, migrate, then initialize, we should add the target node to to the clone() operation

task, err := template.Clone(&proxmox.VirtualMachineCloneOptions{
  ... 
  Target: targetNode
})

Creating VM: Operation timed out

The following error occurs sometimes after VM creation

2023-09-20T09:30:42-04:00       INFO    Creating VM     {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"testing-control-plane-s2p2b","namespace":"default"}, "namespace": "default", "name": "testing-control-plane-s2p2b", "reconcileID": "ef5c9622-6993-44d2-bd1e-2d63d8df2bdd"}
2023-09-20T09:30:53-04:00       ERROR   Task didn't complete in time    {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"testing-control-plane-s2p2b","namespace":"default"}, "namespace": "default", "name": "testing-control-plane-s2p2b", "reconcileID": "ef5c9622-6993-44d2-bd1e-2d63d8df2bdd", "error": "the operation has timed out"}

This should be handled, and either the created VM adopted appropriately, or destroying the VM to reattempt reconciliation. As-is, it causes orphaned VMs to remain running on the proxmox hosts

Worker nodes

Control plane nodes are currently being deployed (albeit, slowly). Worker node configurations need to be updated so that the worker come online and register

Install qemu-agent

The qemu-guest-agent should be installed on instances, so that IP address and other information can be reported to Proxmox

The automated release is failing 🚨

🚨 The automated release from the main branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the main branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Missing package.json file.

A package.json file at the root of your project is required to release on npm.

Please follow the npm guideline to create a valid package.json file.


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

Non-clustered Proxmox support

By default the operator uses /cluster/ endpoints for getting information about the environment. If a user wants to use a non-clustered deployment, we should support all operations against a single node.

LXC Investigation

For the sake of simplicity / experience, the operator currently only supports launching instances with using QEMU. Proxmox supports running LXC containers, which may be a better or alternative option for running Kubernetes instances.

We should investigate the requirements and configuration to support launching Kubernetes images through LXC

Store infrastructure metadata in /var/lib/cloud

Most userdata scripts use variables like {{ ds.meta_data["instance-id"] }} to reference infrastructure settings. We should add the following parameters to the discoverable userdata

  • Instance ID: vm.ID or proxmox://{{ vm.ID }}
  • Datacenter: cluster.Name
  • Machine ID: machine.ID

Cluster Templates

As part of the Cluster API contract, we should provider several ProxmoxCluster templates for users to generate with clusterctl generate cluster

  • Simple base cluster
  • Deploying with SSH keys
  • Deploying with multiple networks

Validate PRs have updated manifests when required

If types are updated, but the operator itself not ran locally, make manifests isn't ran, resulting in the types and generated CRDs being out of sync.

There should be a CI step that executes make manifests, and verifies there isn't any diff

Volume cleanup

When VMs are deleted, TrueNAS (used for ZFS over ISCSI) fails to remove the created vm-XXX-cloudinit storage volume, due to dataset is busy. We had added a time.Sleep(time.Second *10) to try and wait it out, but they are still being orphaned. This then causes issues when launching new VMs that re-use an ID.

We should ensure that when ProxmoxMachines are cleaned up, the cloud-init volumes are also removed from the given storage

Cluster Deletion Protection

To prevent accidental deletion of a cluster, users should be able to add protection to the cluster.

When a cluster is created, if spec.deletionProtection: true, then we attach a no-op finalizer to the cluster. When a delete request occurs on the cluster, we simply return from reconcile.

To delete the cluster, a user must first update the cluster with spec.deletionProtection: false, which will then allow reconciliation to complete.

We also need to document that spec.deletionTimestamp needs to be removed to reverse the resource deletion

Handle VM creation failure

Occasionally when VMs are cloned in Proxmox, they can fail due to invalid configurations on the VM.

  • These issues should be captured, and handled for fixing reconciliation
  • Errors should be reported to the ProxmoxMachine CRD

Event logging

Actions performed by the operator should be stored as events on the ProxmoxMachine resource

Node selection mechanism

Currently, all VMs in a given deployment use the same machine template, which has a single spec.targetNode. If this field is left blank, the controller should have a configurable mechanism for selecting the appropriate node to target

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.