GithubHelp home page GithubHelp logo

slauger / hcloud-okd4 Goto Github PK

View Code? Open in Web Editor NEW
70.0 6.0 37.0 192 KB

Deploy OKD4 (OpenShift) on Hetzner Cloud

License: MIT License

Dockerfile 5.19% Makefile 14.06% HCL 75.07% Smarty 5.67%
okd openshift terraform packer hashicorp kubernetes

hcloud-okd4's Introduction

Docker Build

hcloud-okd4

Deploy OKD4 (OpenShift) on Hetzner Cloud using Hashicorp Packer, Terraform and Ansible.

Current status

The Hetzner Cloud does not fulfill the I/O performance/latency requirements for etcd - even when using local SSDs (instead of ceph storage). This could result in different problems during the cluster bootstrap. You could check the I/O performance via etcdctl check perf.

Because of that OpenShift on hcloud is only suitable for small test environments. Please do not use it for production clusters.

Architecture

The deployment defaults to a single node cluster.

  • 1x Master Node (CX41)
  • 1x Loadbalancer (LB11)
  • 1x Bootstrap Node (CX41) - deleted after cluster bootstrap
  • 1x Ignition Node (CX11) - deleted after cluster bootstrap

Usage

Build toolbox

To ensure that the we have a proper build environment, we create a toolbox container first.

make fetch
make build

If you do not want to build the container by your own, it is also available on quay.io.

Run toolbox

Use the following command to start the container.

make run

All the following commands will be executed inside the container.

Set Version

Set a target version of use the targets latest_version to fetch the latest available version.

export OPENSHIFT_RELEASE=$(make latest_version)

Create your install-config.yaml

---
apiVersion: v1
baseDomain: 'example.com'
metadata:
  name: 'okd4'
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 1
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR:
platform:
  none: {}
pullSecret: '{"auths":{"none":{"auth": "none"}}}'
sshKey: ssh-rsa AABBCC... Some_Service_User

Create cluster manifests

make generate_manifests

Create ignition config

make generate_ignition

Set required environment variables

# terraform variables
export TF_VAR_dns_domain=okd4.example.com
export TF_VAR_dns_zone_id=14758f1afd44c09b7992073ccf00b43d

# credentials for hcloud
export HCLOUD_TOKEN=14758f1afd44c09b7992073ccf00b43d14758f1afd44c09b7992073ccf00b43d

# credentials for cloudflare
export [email protected]
export CLOUDFLARE_API_KEY=14758f1afd44c09b7992073ccf00b43d

Create Fedora CoreOS image

Build a Fedora CoreOS hcloud image with Packer and embed the hcloud user data source (http://169.254.169.254/hetzner/v1/userdata).

make hcloud_image

Build infrastructure with Terraform

make infrastructure BOOTSTRAP=true

Wait for the bootstrap to complete

make wait_bootstrap

Cleanup bootstrap and ignition node

make infrastructure

Finish the installation process

make wait_completion

Sign Worker CSRs

CSRs of the master nodes get signed by the bootstrap node automaticaly during the cluster bootstrap. CSRs from worker nodes must be signed manually.

make sign_csr
sleep 60
make sign_csr

This step is not necessary if you set replicas_worker to zero.

Deployment of OCP

It's also possible OCP (with RedHat CoreOS) instead of OKD. Just export DEPLOYMENT_TYPE=ocp. For example:

export DEPLOYMENT_TYPE=ocp
export OPENSHIFT_RELEASE=4.6.35
make fetch build run

You can also select the latest version from a specific channel via:

export OCP_RELEASE_CHANNEL=stable-4.11
export OPENSHIFT_RELEASE=$(make latest_version)
make fetch build run

To setup OCP a pull secret in your install-config.yaml is necessary, which could be obtained from cloud.redhat.com.

Enforce Firewall rules

As the Terraform module from Hetzer is currently unable to produce applied rules that contain hosts you deploy at the same time, you have to deploy them afterwards.

In order to do that, you should visit your Hetzner Web Console and apply the okd-master firewall rule to all hosts with the label okd.io/master: true, the okd-base to the label okd.io/node: true and okd-ingress to all nodes with the okd.io/ingress: true label. Since terraform will ignore firewall changes, this should not interfere with your existing state.

Note: This will keep hosts pingable, but isolate them complete from the internet, making the cluster only reachable through the load balancer. If you require direct SSH access, you can add another rule, that you apply nodes that allows access to port 22.

Cloudflare API Token

Checkout this issue to get details about how to obtain an API token for the Cloudflare API.

Author

hcloud-okd4's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar ebartz avatar nohal avatar renovate-bot avatar schuemann avatar sisheogorath avatar slauger avatar vvro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hcloud-okd4's Issues

Stuck on TASK [wait until ssh is available]

Hi there,
So I'm trying to use this repo to setup OKD4 on HCLOUD.
CloudFlare and Hetzner Token is correct and ressources get created.

But it's stuck on waiting for ssh being available and when I go into the server console on hcloud the requests to api-int. and ignition. dont work.

What version of okd shall I use? Were there any breaking changes?

Which platform to select when running make generate_manifests

Hi,

I try to create a okd4 cluster on hcloud but I don't know what I have to select when I run 'make generate_manifests'

bash-5.1# make generate_manifests
mkdir config
cp install-config.yml config/install-config.yml
openshift-install create manifests --dir=config
? SSH Public Key <none>
? Platform  [Use arrows to move, enter to select, type to filter, ? for more help]
> aws
  azure
  gcp
  libvirt
  openstack
  ovirt
  vsphere

Can you help me?

missing variable CLOUDFLARE_API_TOKEN

Hello there,

Great guide only missing important info about the API token creation, just spent a good 24 hours trying to get this to work - hope this helps others avoid the same pain!

  1. create API token following the instructions from here:
  1. From the Hashicorp guide:
  • Create an environment variable named CLOUDFLARE_API_TOKEN and set it to your Cloudflare API token.
    $ export CLOUDFLARE_API_TOKEN=Oo-bF...
  1. On the container the following variables are required:
export TF_VAR_dns_domain=example.com
export TF_VAR_dns_zone_id=myzoneid
export HCLOUD_TOKEN=my hetzner cloud_token
export [email protected]
export CLOUDFLARE_API_KEY=cloudflare GLOBAL api key
  1. This addittional variable is required:
export CLOUDFLARE_API_TOKEN=Oo-bF...
  1. If you don't export the CLOUDFLARE_API_TOKEN variable, the Cloudflare API does not create DNS records and you get the following when running make infrastructure BOOTSTRAP=true
Error: failed to create DNS record: error from makeRequest: HTTP status 403: Authentication error
 
   with cloudflare_record.dns_a_apps_wc,
   on dns.tf line 34, in resource "cloudflare_record" "dns_a_apps_wc":
   34: resource "cloudflare_record" "dns_a_apps_wc" {

Thank you

TTL issues with default hcloud resolvers

The default resolvers from hcloud have a very strange caching mechanism and ignore the TTL from Cloudflare. A newly created record (e. g. api-int.cluster-id.basedomain.tld) is resolvable after about 60 minutes. This record is required for the inital boot of all master/worker nodes -> we need somehow override the dns servers provided by dhcp and switch to custom dns servers / cloudflare (during the first boot!)

Not able to create Image

Hello,

any idea how to fix this issue during the image creation?

==> hcloud: Provisioning with shell script: /tmp/packer-shell2847920557
==> hcloud: + mkdir /source
==> hcloud: + mount -t tmpfs -o size=2G none /source
==> hcloud: + cd /source
==> hcloud: + curl -sfL
==> hcloud: + unxz
==> hcloud: curl: no URL specified!
==> hcloud: curl: try 'curl --help' or 'curl --manual' for more information
==> hcloud: unxz: (stdin): File format not recognized
==> hcloud: Provisioning step had errors: Running the cleanup provisioner, if present...
==> hcloud: Destroying server...
==> hcloud: Deleting temporary SSH key...
Build 'hcloud' errored after 1 minute 12 seconds: Script exited with non-zero exit status: 1. Allowed exit codes are: [0]

make toolbox not working (anymore?)

Hello,

first: thank you for opening up your archivements to the world! πŸ‘

I just cloned this repo and started from scratch (as written in readme), and on the second command "make toolbox" I got following error:

[root@void hcloud-okd4]# make toolbox
make: *** Keine Regel, um β€žtoolboxβ€œ zu erstellen.  Schluss.

I'd guess it is just not there anymore.

Kind Regards,
Nico

Stuck at DNS entry for etcd

Fantastic project. Thanks for the great effort.

Today I faced a problem with creating DNS entries at cloudflare.

Error: Failed to create record: error from makeRequest: HTTP status 400: content "{\"result\":null,\"success\":false,\"errors\":[{\"code\":1004,\"message\":\"DNS Validation Error\",\"error_chain\":[{\"code\":9101,\"message\":\"service is a required data field.\"}]}],\"messages\":[]}"

on dns.tf line 52, in resource "cloudflare_record" "dns_srv_etcd":
52: resource "cloudflare_record" "dns_srv_etcd" {
make: *** [Makefile:86: infrastructure] Error 1
bash-5.0#```

The previous commit worked like a charm.

Any hint?

Resolve bootstrap issue

API on the bootstrap node is available but bootstrap get stuck. Needs further debugging.

Output of openshift-install

openshift-install --dir=config/ wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.4.0-0.okd-2020-03-21-112849
DEBUG Built from commit fc790034704d5e279eabacd833d3e90c76815978
INFO Waiting up to 20m0s for the Kubernetes API at https://api.ocp4.example.com:6443...
INFO API v1.17.1 up
INFO Waiting up to 40m0s for bootstrapping to complete...

Cloudflare Loadbalancer

2020-03-22 22 46 23

Log of bootstrap node

...
MΓ€r 22 22:21:37 static.73.XXX.YYY.ZZ.clients.your-server.de hyperkube[1139]: E0322 22:21:37.918962
1139 pod_workers.go:191] Error syncing pod caffd628b85147b99baa533a31992b96
("bootstrap-kube-controller-manager-static.73.XXX.YYY.ZZ.clients.your-server.de_kube-system(caffd628b85147b99baa533a31992b96)"),
skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "back-off 5m0s restarting failed
container=kube-controller-manager 
pod=bootstrap-kube-controller-manager-static.73.XXX.YYY.ZZ.clients.your-server.de_kube-system(caffd628b85147b99baa533a31992b96)"⁣
...

Terraform Error: Failed to query available provider packages

First thanks for your work on this project ! :)

When running make infrastructure BOOTSTRAP=true I get the following error okd.log
:

...
Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/hcloud: provider registry registry.terraform.io does not have a
provider named registry.terraform.io/hashicorp/hcloud


Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/cloudflare: provider registry registry.terraform.io does not have a
provider named registry.terraform.io/hashicorp/cloudflare

make: *** [Makefile:86: infrastructure] Error 1

I already performed some actions on the terraform versions.tf but without success..

Provide a license file

It seems that part of the idea of this repository is to be able to learn and extend from it. Therefore it would be nice, if it would provide a license file, so it's clear what to expect from the repository.

For example, I for myself love to mirror useful repository and currently look into this repository to make extend it with hcloud firewall rules and alike, which I would love to contribute back. But for now, I can't do anything with it, without a license. :/

adjusting replicas does not seem to work

Hi Simon,

first of all, this project is really awesome! Thanks for putting all this together.

One thing that does not seem to work for me is having more nodes. I thought that it would be done by specifying a number of workers/masters like this:

---
apiVersion: v1
baseDomain: 'example.com'
metadata:
  name: 'okd4'
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 4
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
  machineCIDR:
platform:
  none: {}
pullSecret: '{"auths":{"none":{"auth": "none"}}}'
sshKey: ssh-rsa AABBCC... Some_Service_User

But that did not do the trick for me. Am I missing something?

make fetch doesnt work

root@dev-server:~/hcloud-okd4# make fetch
wget -O openshift-install-linux-none.tar.gz https://github.com/openshift/okd/releases/download/none/openshift-install-linux-none.tar.gz
--2022-08-31 10:02:15-- https://github.com/openshift/okd/releases/download/none/openshift-install-linux-none.tar.gz
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/okd-project/okd/releases/download/none/openshift-install-linux-none.tar.gz [following]
--2022-08-31 10:02:15-- https://github.com/okd-project/okd/releases/download/none/openshift-install-linux-none.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 404 Not Found
2022-08-31 10:02:16 ERROR 404: Not Found.

make wait_bootstrap fails: FATAL loading kubeconfig

Hello @slauger,

thanks again for creating this! (And fixing the docs, great work!)

I found another issue.

On current master, there the Makefile instructs "wait_bootstrap" to:

wait_bootstrap:
        openshift-install --dir=config/ wait-for bootstrap-complete --log-level=debug

kubeconf is saved in ignition.

I'd probably open a PR later this day, but for now, locally fixed "--dir=config/" to "--dir=ignition/", which works totally fine.

Thanks again for your great work!

Kind Regards,
Nico

Dependency updates for OKD/OCP releases

When a new OKD/OCP version is released, the defined version in the Makefile should be updated automatically.

OKD_RELEASE=4.5.0-0.okd-2020-07-29-070316
FCOS_RELEASE=32.20200715.3.0
OCP_RELEASE=4.5.4
RHCOS_RELEASE=4.5.2

Example code for obtain the latest release for OKD:

-bash$ curl -sL --header 'Accept:application/json' https://origin-release.svc.ci.openshift.org/graph\?channel\=stable | jq '.nodes[0]'
{
  "version": "4.4.0-0.okd-2020-08-20-195325",
  "payload": "registry.svc.ci.openshift.org/origin/release@sha256:4ac5ae870017264ac3842b654a9dc4617e11efb2c5b6d0c12d4c52e85830835f"
}

Ignition fails with error creating file resolv.conf

After starting make wait_bootstrap with COREOS_RELEASE=36.20220820.3.0 the initial boot fails and stops in emergency mode only visible on hetzner console:

CRITICAL: Ignition failed: failed to create files: failed to create files: error creating /sysroot/etc/resolv.conf: error creating file "/sysroot/etc/resolv.conf": A non regular file exists there already an overwrite is false

Generating "/run/initramfs/rdosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdosreport.txt" to a USB stick or /boot after mounting them and attach it to a bug report.

Press Enter for maintenance
(or press Control-D to continue)

There exists a link in /sysroot/etc/resolv.conf -> ../run/systemd/resolve/stub-resolve.conf.

Workaround: Press Enter, delete this link and reboot:

rm /sysroot/etc/resolv.conf
reboot
exit

packer does not deploy authorized_keys to image

When calling make infrastructure BOOTSTRAP=true, the machines bootstrap01, ignition01 and master01 are being created.

After waiting for the SSH connection to be available, the SSH connection fails with Permission denied (publickey).

After investigating, I couldn't find any place where the SSH key would be set in the base image by packer during make hcloud_image.

Can anybody tell me what I am missing?

master bootstrapping failing - no admitted ingress for route

Hello,

it seems like the installation process is failing at the master bootstrapping step.
Any idea what could cause this issue of a master node not willing to complete bootstrapping? (Left it running for > 1day and it still wasn't "finished")

I do appreciate any help!

Container logs of the master node are attached
master-container-logs.zip

bash-5.1# cat install-config.yaml
apiVersion: v1
baseDomain: k8s.hnbg.elsysweyr.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}

  replicas: 3
metadata:
  creationTimestamp: null
  name: prod-hnbg-public-services
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.200.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
publish: External
pullSecret: '*pullsecret removed*'
sshKey: |
 '*sshkeys removed*' 

This command below seems to be hanging without returning.

[core@master01 ~]$ journalctl -b -f -u bootkube.service

No result for this command:

[core@master01 ~]$ for pod in $(sudo podman ps -a -q); do sudo podman logs $pod; done
[core@master01 ~]$ 

Installation CLI log:

openshift-install --dir=ignition/ wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.12.0-0.okd-2023-03-05-022504 
DEBUG Built from commit 7c2530226516a12c37f10bc14e070f66c0f27930 
INFO Waiting up to 20m0s (until 9:28PM) for the Kubernetes API at https://api.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com:6443... 
DEBUG Loading Agent Config...                      
DEBUG Still waiting for the Kubernetes API: Get "https://api.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com:6443/version": EOF 
INFO API v1.25.0-2655+18eadcaadf0be7-dirty up     
DEBUG Loading Install Config...                    
DEBUG   Loading SSH Key...                         
DEBUG   Loading Base Domain...                     
DEBUG     Loading Platform...                      
DEBUG   Loading Cluster Name...                    
DEBUG     Loading Base Domain...                   
DEBUG     Loading Platform...                      
DEBUG   Loading Networking...                      
DEBUG     Loading Platform...                      
DEBUG   Loading Pull Secret...                     
DEBUG   Loading Platform...                        
DEBUG Using Install Config loaded from state file  
INFO Waiting up to 30m0s (until 9:41PM) for bootstrapping to complete... 

E0319 21:22:25.249802    6491 reflector.go:140] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: Get "https://api.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=11315&timeoutSeconds=529&watch=true": http2: client connection lost - error from a previous attempt: unexpected EOF
W0319 21:23:41.366132    6491 reflector.go:347] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

W0319 21:34:24.516370    6491 reflector.go:347] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0319 21:36:11.202971    6491 reflector.go:347] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server 
ERROR OAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com in route oauth-openshift in namespace openshift-authentication 
ERROR OAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication 
ERROR OAuthServerDeploymentDegraded:               
ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: route "openshift-authentication/oauth-openshift": status does not have a valid host address 
ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.103.225:443/healthz": dial tcp 172.30.103.225:443: connect: connection refused 
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready 
ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) 
ERROR Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.103.225:443/healthz": dial tcp 172.30.103.225:443: connect: connection refused 
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found 
ERROR ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 1 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). 
ERROR WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) 
INFO Cluster operator baremetal Disabled is False with :  
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected 
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected 
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected 
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected 
ERROR Cluster operator console Degraded is True with DefaultRouteSync_FailedAdmitDefaultRoute::RouteHealth_RouteNotAdmitted::SyncLoopRefresh_FailedIngress: DefaultRouteSyncDegraded: no ingress for host console-openshift-console.apps.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com in route console in namespace openshift-console 
ERROR RouteHealthDegraded: console route is not admitted 
ERROR SyncLoopRefreshDegraded: no ingress for host console-openshift-console.apps.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com in route console in namespace openshift-console 
ERROR Cluster operator console Available is False with RouteHealth_RouteNotAdmitted: RouteHealthAvailable: console route is not admitted 
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required 
ERROR Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) 
INFO Cluster operator ingress Progressing is True with Reconciling: ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 2 updated replica(s) are available... 
INFO ).                                           
INFO Not all ingress controllers are available.   
ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-756d8b77f9-qxqhm" cannot be scheduled: 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. Pod "router-default-756d8b77f9-g777c" cannot be scheduled: 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) 
INFO Cluster operator ingress EvaluationConditionsDetected is False with AsExpected:  
INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer 
INFO Cluster operator insights Disabled is False with AsExpected:  
INFO Cluster operator insights SCAAvailable is Unknown with :  
ERROR Cluster operator kube-controller-manager Degraded is True with GarbageCollector_Error: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host 
ERROR Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas 
ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas 
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
INFO Cluster operator network ManagementStateDegraded is False with :  
INFO Cluster operator network Progressing is True with Deploying: Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready 
ERROR Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline 
INFO Use the following commands to gather logs from the cluster 
INFO openshift-install gather bootstrap --help    
ERROR Bootstrap failed to complete: timed out waiting for the condition 
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane. 
make: *** [Makefile:99: wait_bootstrap] Error 5
bash-5.1# 
bash-5.1# make wait_bootstrap
openshift-install --dir=ignition/ wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.12.0-0.okd-2023-03-05-022504 
DEBUG Built from commit 7c2530226516a12c37f10bc14e070f66c0f27930 
INFO Waiting up to 20m0s (until 10:02PM) for the Kubernetes API at https://api.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com:6443... 
DEBUG Loading Agent Config...                      
INFO API v1.25.0-2655+18eadcaadf0be7-dirty up     
DEBUG Loading Install Config...                    
DEBUG   Loading SSH Key...                         
DEBUG   Loading Base Domain...                     
DEBUG     Loading Platform...                      
DEBUG   Loading Cluster Name...                    
DEBUG     Loading Base Domain...                   
DEBUG     Loading Platform...                      
DEBUG   Loading Networking...                      
DEBUG     Loading Platform...                      
DEBUG   Loading Pull Secret...                     
DEBUG   Loading Platform...                        
DEBUG Using Install Config loaded from state file  
INFO Waiting up to 30m0s (until 10:12PM) for bootstrapping to complete... 
^Cmake: *** [Makefile:99: wait_bootstrap] Interrupt

bash-5.1# make wait_bootstrap
openshift-install --dir=ignition/ wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer 4.12.0-0.okd-2023-03-05-022504 
DEBUG Built from commit 7c2530226516a12c37f10bc14e070f66c0f27930 
INFO Waiting up to 20m0s (until 10:03PM) for the Kubernetes API at https://api.prod-hnbg-public-services.k8s.hnbg.elsysweyr.com:6443... 
DEBUG Loading Agent Config...                      
INFO API v1.25.0-2655+18eadcaadf0be7-dirty up     
DEBUG Loading Install Config...                    
DEBUG   Loading SSH Key...                         
DEBUG   Loading Base Domain...                     
DEBUG     Loading Platform...                      
DEBUG   Loading Cluster Name...                    
DEBUG     Loading Base Domain...                   
DEBUG     Loading Platform...                      
DEBUG   Loading Networking...                      
DEBUG     Loading Platform...                      
DEBUG   Loading Pull Secret...                     
DEBUG   Loading Platform...                        
DEBUG Using Install Config loaded from state file  
INFO Waiting up to 30m0s (until 10:13PM) for bootstrapping to complete... 

Performing this command on the extracted logs should give you a good overview:

[core@master01 ~]$ sudo tail -f /var/log/containers/* | grep -e "\(error\|fail\)"

packer builder missing

When building the hcloud image, the following issue comes up:

make hcloud_image
if [ "okd" == "okd" ]; then (cd packer && packer build -var fcos_url=https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/37.20221127.3.0/x86_64/fedora-coreos-37.20221127.3.0-qemu.x86_64.qcow2.xz hcloud-fcos.json); fi
Error: Failed to initialize build "hcloud"

error initializing builder 'hcloud': Unknown builder hcloud


make: *** [Makefile:89: hcloud_image] Error 1

To fix this, you need to install the plugin first:

packer plugins install github.com/hashicorp/hcloud

Question: @slauger should this be part of the dockerfile, or part of the makefile?

make infrastructure fails

The make infrastructure BOOTSTRAP=true command fails saying:

β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_ignition[0],
β”‚   on dns.tf line 1, in resource "cloudflare_record" "dns_a_ignition":
β”‚    1: resource "cloudflare_record" "dns_a_ignition" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_api,
β”‚   on dns.tf line 10, in resource "cloudflare_record" "dns_a_api":
β”‚   10: resource "cloudflare_record" "dns_a_api" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_api_int,
β”‚   on dns.tf line 18, in resource "cloudflare_record" "dns_a_api_int":
β”‚   18: resource "cloudflare_record" "dns_a_api_int" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_apps,
β”‚   on dns.tf line 26, in resource "cloudflare_record" "dns_a_apps":
β”‚   26: resource "cloudflare_record" "dns_a_apps" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_apps_wc,
β”‚   on dns.tf line 34, in resource "cloudflare_record" "dns_a_apps_wc":
β”‚   34: resource "cloudflare_record" "dns_a_apps_wc" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_a_etcd[0],
β”‚   on dns.tf line 42, in resource "cloudflare_record" "dns_a_etcd":
β”‚   42: resource "cloudflare_record" "dns_a_etcd" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with cloudflare_record.dns_srv_etcd[0],
β”‚   on dns.tf line 52, in resource "cloudflare_record" "dns_srv_etcd":
β”‚   52: resource "cloudflare_record" "dns_srv_etcd" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with module.bootstrap.cloudflare_record.dns-a[0],
β”‚   on modules/hcloud_coreos/dns.tf line 1, in resource "cloudflare_record" "dns-a":
β”‚    1: resource "cloudflare_record" "dns-a" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with module.master.cloudflare_record.dns-a[0],
β”‚   on modules/hcloud_coreos/dns.tf line 1, in resource "cloudflare_record" "dns-a":
β”‚    1: resource "cloudflare_record" "dns-a" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with module.ignition.cloudflare_record.dns-a[0],
β”‚   on modules/hcloud_instance/main.tf line 16, in resource "cloudflare_record" "dns-a":
β”‚   16: resource "cloudflare_record" "dns-a" {
β”‚ 
β•΅
β•·
β”‚ Error: failed to create DNS record: error from makeRequest: HTTP status 400: content "{\"success\":false,\"errors\":[{\"code\":7003,\"message\":\"Could not route to \\/zones\\/CLOUDFLARE_API_TOKEN\\/dns_records, perhaps your object identifier is invalid?\"},{\"code\":7000,\"message\":\"No route for that URI\"}],\"messages\":[],\"result\":null}"
β”‚ 
β”‚   with module.ignition.cloudflare_record.dns-aaaa[0],
β”‚   on modules/hcloud_instance/main.tf line 25, in resource "cloudflare_record" "dns-aaaa":
β”‚   25: resource "cloudflare_record" "dns-aaaa" {
β”‚ 
β•΅

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.