redhat-cop / ocp4-helpernode Goto Github PK

This playbook helps set up an "all-in-one" node, that has all the infrastructure/services in order to install OpenShift 4.

Shell 27.05% Jinja 72.95%

container-cop

ocp4-helpernode's Introduction

OCP4 Helper Node

❗ Red Hat support cannot assist with problems with this Repo. For issues please open a GitHub issue

This playbook helps set up an "all-in-one" node, that has all the infrastructure/services in order to install OpenShift 4. After you run the playbook, you'll be ready to begin the installation process.

A lot of OpenShift 4 specific jargon is used throughout this doc, so please visit the official documentation page to get familiar with OpenShift 4.

⚠️ This playbook originally was written with BareMetal UPI install in mind

This playbook assumes the following:

You're on a Network that has access to the internet.
The network you're on does NOT have DHCP (you can disable installing DHCP on the helper).
The ocp4-helpernode will be your LB/DHCP/PXE/DNS and HTTP server.
You still have to do the OpenShift Install steps by hand.
I used CentOS 7/8, but RHEL 7/8 will work as well.
You will be running the openshift-install command from the ocp4-helpernode.

Below is a highlevel diagram how the ocp4-helpernode fits into your network.

It's important to note that you can delegate DNS to this ocp4-helpernode if you don't want to use it as your main DNS server. You will have to delegate $CLUSTERID.$DOMAIN to this helper node.

For example; if you want a $CLUSTERID of ocp4, and you have a $DOMAIN of example.com. Then you will delegate ocp4.example.com to this ocp4-helpernode.

Using this playbook

The following are highlevel steps on how to use this playbook. There are more detailed instructions in the "quickstarts" section.

Prereqs

⚠️ NOTE If using RHEL 7, you will need to enable the rhel-7-server-rpms and the rhel-7-server-extras-rpms repos. If you're using RHEL 8, you will need to enable rhel-8-for-x86_64-baseos-rpms, rhel-8-for-x86_64-appstream-rpms, and ansible-2.9-for-rhel-8-x86_64-rpms.

Install a CentOS 7 or CentOS 8 server with this recommended setup:

4 vCPUs
4 GB of RAM
30GB HD
Static IP

There is a sample kickstart file for EL 7 and EL 8 that is used during testing, if you'd like to automate the initial install of the OS.

Once the base OS is installed, install EPEL

yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(rpm -E %rhel).noarch.rpm

Next install ansible and git, then clone this repo.

yum -y install ansible git
git clone https://github.com/redhat-cop/ocp4-helpernode
cd ocp4-helpernode

Setup your Environment Vars

Inside that dir there is a vars.yaml file under docs/examples/var.yaml ... modify it to match your network and the environment. (the example one assumes a /24)

cp docs/examples/vars.yaml .

⚠️ NOTE, currently this playbook assumes/is limited to a /24 network

See the vars.yaml documentation page for more info about what you can define. There are different options, depending on what you're doing. For example, if you're doing a static ip install vs a dhcp install.

Run the playbook

Once you edited your vars.yaml file; run the playbook

ansible-playbook -e @vars.yaml tasks/main.yml

Helper Script

You can run this script and it's options to display helpful information about the install and other post-install goodies.

/usr/local/bin/helpernodecheck

Install OpenShift 4 UPI

Now you're ready to follow the OCP4 UPI install doc

Quickstarts

The following are quickstarts. These are written using libvirt, but are generic enough to be used in BareMetal or other Virtualized Environments.

Bare Metal DHCP install quickstart
Bare Metal Static IPs install quickstart
Libvirt DHCP install quickstart
Libvirt Static IPs install quickstart
DHCP install on KVM/Power quickstart
DHCP install on PowerVM quickstart
OCP4 on VMware vSphere UPI Automation quickstart
A Video "how-to" done on a Twitch Stream

Contributing

Please see the contributing doc for more details.

ocp4-helpernode's People

Contributors

Stargazers

Watchers

Forkers

therevoman ragpo benswinney jamessundeep bit4man vinaybhalerao tonykhbo sips4711 samueltauil rcarrata thomasodus whowutwut jaysonzhao gurianos kevinlowrie sumanyam ruconse levysantanna etoews operst vsrn09 vncrdo realjkg sgwd nickjordan dustintrap yussufsh ikumareson essensa schabrolles bpradipt alpapad alakae boonchu skymysky preinking mjaykumar ctatineni ajaydevop bjneuhaus kennymaccormik kristvanbesien sharadc2001 xphyr bilal-io blackcbi4 mauroseb pcberni96 nicholas-chia cs-zhang sivanamurugesan ecchong redhat-snelson ibm-security-experts-lab buzzdao deepzip pravin-dsilva sushilsuresh gmarcy cdjg35 realmjian henkwaanders earcher-sdg deathemperor arcprabh prajyot-parab ssgkim nillad0515 mgiessing christianh814 mitheesh tgarcia-rh dclee27 tribalnightowl zad2712 novonilc maurorodrigues jenbam sudeeshjohn tosmi miiraheart rht-ccsd joecharles33 rogeriobulhoes hornjason imicer jimmyibm rshlin eosplane lapd-devops dlbewley styx13 fdescollonges snowind timroster thomas-m-young marciopaiva redhatworkshops djeremiah shankumar12

ocp4-helpernode's Issues

RFE: Support for other network spaces

Currently, this helpernode assumes/supports only a /24. It would be good to be able to specify different network ranges and configure accordingly.

RHCOS installation for OCP 4.3 fails at first boot

Hi Team,

While booting RHCOS for OCP 4.3 machine, the system shows messages as follows and boot is getting failed with following errors:

can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
iscsid[606]: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName.
iscsid[606]: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi

dracut-initqueue[271]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[271]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[271]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[271]: Warning: dracut-initqueue timeout - starting timeout scripts

Worker node gave error message "Internal Server Error" during OCP 4.4.5 installation

ignition[889] GET https://api-int.ocp4.example.com:22623/config/worker attempt #26
ignition[889] GET result: Internal Server Error

There's no such error message on OCP 4.3 version.

Master node fails to come up due to SSL_ERROR_SYSCALL

Hi,
I am trying a UPI deployment with a mix of VMs (bootstrap and master) and bare metals (workers).
All of the VMs and the bare metal are on the same network, however, only the helper node and worker nodes have an additional interface with direct access to the external network.

So far, my bootstrap node is coming out just fine, however, when I try to bring up the master node after installation it hangs with the following console logs:

[ 35.681457] systemd[1]: Starting dracut pre-mount hook...
Starting dracut pre-mount hook...
[ 35.698322] systemd[1]: Started dracut pre-mount hook.
[ OK ] Started dracut pre-mount hook.
[ 36.785229] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #11
[ 36.789469] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF
[ *** ] A start job is running for Ignition (fetch) (38s / no limit)[ 41.790268] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #12
[ 41.794543] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF
[** ] A start job is running for Ignition (fetch) (43s / no limit)[ 46.795065] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #13
[ 46.799328] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF
[ ***] A start job is running for Ignition (fetch) (48s / no limit)[ 51.800174] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #14
[ 51.804033] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF
[ *] A start job is running for Ignition (fetch) (53s / no limit)[ 56.804746] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #15
[ 56.810427] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF
[ ] A start job is running for Ignition (fetch) (58s / no limit)[ 61.806363] ignition[706]: GET https://api-int.ocp4.example.com:22623/config/master: attempt #16
[ 61.810235] ignition[706]: GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF

On the helper node, I tried making the same request as that of the master node:
[root@helper ocp4]# curl -vk https://api-int.ocp4.example.com:22623/

Trying 192.168.67.130...
TCP_NODELAY set
Connected to api-int.ocp4.example.com (192.168.67.130) port 22623 (#0)
ALPN, offering h2
ALPN, offering http/1.1
successfully set certificate verify locations:
CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
TLSv1.3 (OUT), TLS handshake, Client hello (1):
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api-int.ocp4.example.com:22623
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api-int.ocp4.example.com:22623

The bootstrap didn't capture anything in its log:
[core@bootstrap ~]$ journalctl -b -f -u bootkube.service
-- Logs begin at Wed 2020-05-06 17:33:42 UTC. --

Any suggestions on what is causing this issue.

OCP4 failed to pull image from private registry x509: certificate signed by unknown authority

I just created a private registry, and want to pull image from it. could anyone help please? thx a lot
Steps:
on MISC machine
1.openssl req -newkey rsa:4096 -nodes -sha256 -keyout domain.key -x509 -days 3650 -out domain.crt -subj "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.xxxxx.com"
2.cp /etc/crts/domain.crt /etc/pki/ca-trust/source/anchors/
3.update-ca-trust extract
4.mkdir -p /data/registry
5.cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
fields:
service: registry
storage:
cache:
layerinfo: inmemory
filesystem:
rootdirectory: /data/registry
delete:
enabled: true
http:
addr: :8443
tls:
certificate: /etc/crts/domain.crt
key: /etc/crts/domain.key
EOF
6.systemctl restart docker
7.systemctl enable docker-distribution
8.systemctl restart docker-distribution
9.add 127.0.0.1 registry.xxxxx.com:8443 to /etc/hosts
10.docker login registry.xxxxxx.com:8443 -u root -p xxxxxx
11.podman login registry.xxxxxxx.com:8443 -u root -p xxxxx
12.docker pull nginx:latest
13.docker tag xxxx(imageID) registry.xxxxxx.com:8443/nginx:latest
14.docker push registry.xxxxxx.com:8443/nginx:latest
15.make nginx.yaml as
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-deploymentoc secrets link default mindsphere-secret --for=pull
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: registry.xxxxxx.com:8443/nginx:latest
#image: nginx:latest
ports:
- containerPort: 80
serviceAccount: default
serviceAccountName: default
16.oc create secret generic reg-secret --from-file=.dockerconfigjson=/root/.docker/config.json --type=kubernetes.io/dockerconfigjson
17.oc secrets link default reg-secret --for=pull
18.ssh core@worker0 and add MISC IP with registry.xxxxxx.com //all nodes
19.oc apply -f nginx.yaml

Result:
Failed to pull image "registry.xxxxxx.com:8443/nginx:latest": rpc error: code = Unknown desc = error pinging docker registry registry.xxxxxx.com:8443: Get https://registry.xxxxxx.com:8443/v2/: x509: certificate signed by unknown authority

RFE: Playbook assumes root

You should be able to run the playbook as a non root user by simply adding become: yes and make -K a requirement.

Autoboot for coreos nodes

I could not find your devel repo here is a minor suggestion to your quickstart. I don't have to manually choose pxe.

Author: David Bellizzi [email protected]
Date: Tue Apr 14 10:39:45 2020 -0400

Remove manaual boot step from quickstart

diff --git a/docs/quickstart.md b/docs/quickstart.md
index 5f4c6e0..4fc2171 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -61,7 +61,7 @@ EL 7
virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096
--disk path=/var/lib/libvirt/images/ocp4-aHelper.qcow2,bus=virtio,size=30
--os-variant centos7.0 --network network=openshift4,model=virtio
---boot menu=on --location /var/lib/libvirt/ISO/CentOS-7-x86_64-Minimal-1810.iso
+--boot hd,network,menu=on --location /var/lib/libvirt/ISO/CentOS-7-x86_64-Minimal-1810.iso
--initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" --noautoconsole


@@ -70,7 +70,7 @@ __EL 8__
virt-install --name="ocp4-aHelper" --vcpus=2 --ram=4096 \
--disk path=/var/lib/libvirt/images/ocp4-aHelper.qcow2,bus=virtio,size=50 \
--os-variant centos8 --network network=openshift4,model=virtio \
---boot menu=on --location /var/lib/libvirt/ISO/CentOS-8-x86_64-1905-dvd1.iso \
+--boot hd,network,menu=on --location /var/lib/libvirt/ISO/CentOS-8-x86_64-1905-dvd1.iso \
--initrd-inject helper-ks.cfg --extra-args "inst.ks=file:/helper-ks.cfg" --noautoconsole

@@ -194,7 +194,7 @@ mkdir ~/ocp4
cd ~/ocp4


-Create a place to store your pull-secret
+Create a place to store your pull-secret

mkdir -p ~/.openshift
@@ -290,7 +290,7 @@ chmod o+r /var/www/html/ignition/*.ign

Install VMs

-Launch virt-manager, and boot the VMs into the boot menu; and select PXE. The vms should boot into the proper PXE profile, based on their IP address.
+Launch virt-manager, and boot the VMs. The vms should boot into the proper PXE profile, based on their IP address.

Boot/install the VMs in the following order

add more parameters to the documentation about the vars.yaml

maybe we need to document more options the vars.yaml already can handle
such as chrony and opc_bios that can handle both urls and local files

named refuses to start

Hrmm.... How do I go about troubleshooting this?

failed: [localhost] (item=named) => {"ansible_loop_var": "item", "changed": false, "item": "named", "msg": "Unable to start service named: Job for named.service failed because the control process exited with error code. See \"systemctl status named.service\" and \"journalctl -xe\" for details.\n"}
ok: [localhost] => (item=haproxy)
ok: [localhost] => (item=httpd)
ok: [localhost] => (item=rpcbind)
ok: [localhost] => (item=nfs-server)
ok: [localhost] => (item=nfs-lock)
ok: [localhost] => (item=nfs-idmap)

RUNNING HANDLER [restart bind] ****************************************************************************************************************************************

RUNNING HANDLER [restart dhcpd] ***************************************************************************************************************************************

PLAY RECAP ************************************************************************************************************************************************************
localhost                  : ok=40   changed=9    unreachable=0    failed=1    skipped=5    rescued=0    ignored=0

[root@rhel7-ocph ocp4-helpernode]# systemctl status named.service
● named.service - Berkeley Internet Name Domain (DNS)
   Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2020-04-02 18:00:01 EDT; 22s ago
  Process: 20775 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else echo "Checking of zone files is disabled"; fi (code=exited, status=1/FAILURE)

Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: _default/0.104.172.in-addr.arpa/IN: bad name (check-names)
Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: zone localhost.localdomain/IN: loaded serial 0
Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: zone localhost/IN: loaded serial 0
Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: zone 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN: loaded serial 0
Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0
Apr 02 18:00:01 rhel7-ocph.lab bash[20775]: zone 0.in-addr.arpa/IN: loaded serial 0
Apr 02 18:00:01 rhel7-ocph.lab systemd[1]: named.service: control process exited, code=exited status=1
Apr 02 18:00:01 rhel7-ocph.lab systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).
Apr 02 18:00:01 rhel7-ocph.lab systemd[1]: Unit named.service entered failed state.
Apr 02 18:00:01 rhel7-ocph.lab systemd[1]: named.service failed.```

Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated"

I got 4.3.8 installed and everything is working fine.

However, when I go to upgrade, I get the following error.

$ oc adm upgrade --to-latest=true
Updating to latest version 4.3.13

$ oc describe clusterversion
Name:         version
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterVersion
Metadata:
  Creation Timestamp:  2020-04-28T00:47:17Z
  Generation:          2
  Resource Version:    953549
  Self Link:           /apis/config.openshift.io/v1/clusterversions/version
  UID:                 f67845db-4ecb-4c74-a8c1-73bda2820f2f
Spec:
  Channel:     stable-4.3
  Cluster ID:  af5828d9-24ea-4234-805e-b1d595c528e9
  Desired Update:
    Force:    false
    Image:    quay.io/openshift-release-dev/ocp-release@sha256:e1ebc7295248a8394afb8d8d918060a7cc3de12c491283b317b80b26deedfe61
    Version:  4.3.13
  Upstream:   https://api.openshift.com/api/upgrades_info/v1/graph
Status:
  Available Updates:
    Force:    false
    Image:    quay.io/openshift-release-dev/ocp-release@sha256:e1ebc7295248a8394afb8d8d918060a7cc3de12c491283b317b80b26deedfe61
    Version:  4.3.13
  Conditions:
    Last Transition Time:  2020-04-28T01:36:39Z
    Message:               Done applying 4.3.8
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-04-29T23:18:39Z
    Message:               Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated": Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [hostmount-anyuid]
    Reason:                UpgradePreconditionCheckFailed
    Status:                True
    Type:                  Failing
    Last Transition Time:  2020-04-29T23:18:24Z
    Message:               Unable to apply 4.3.13: it may not be safe to apply this update
    Reason:                UpgradePreconditionCheckFailed
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-04-28T00:50:29Z
    Status:                True
    Type:                  RetrievedUpdates
    Last Transition Time:  2020-04-28T02:27:51Z
    Message:               Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [hostmount-anyuid]
    Reason:                DefaultSecurityContextConstraints_Mutated
    Status:                False
    Type:                  Upgradeable
  Desired:
    Force:    false
    Image:    quay.io/openshift-release-dev/ocp-release@sha256:e1ebc7295248a8394afb8d8d918060a7cc3de12c491283b317b80b26deedfe61
    Version:  4.3.13
  History:
    Completion Time:    <nil>
    Image:              quay.io/openshift-release-dev/ocp-release@sha256:e1ebc7295248a8394afb8d8d918060a7cc3de12c491283b317b80b26deedfe61
    Started Time:       2020-04-29T23:18:24Z
    State:              Partial
    Verified:           true
    Version:            4.3.13
    Completion Time:    2020-04-28T01:36:39Z
    Image:              quay.io/openshift-release-dev/ocp-release@sha256:a414f6308db72f88e9d2e95018f0cc4db71c6b12b2ec0f44587488f0a16efc42
    Started Time:       2020-04-28T00:47:25Z
    State:              Completed
    Verified:           false
    Version:            4.3.8
  Observed Generation:  2
  Version Hash:         lnZzahlL8hk=
Events:                 <none>

The SCC was altered by this line in nfs-provisioner-setup.sh.

Is that line effectively preventing the upgrade?

Ansible error in generate_ssh_keys.yaml

I added the ocp4_helpernode to my CI and it was working great but yesterday I started to see this error

Installed:
ansible.noarch 0:2.4.2.0-2.el7 git.x86_64 0:1.8.3.1-21.el7_7

Dependency Installed:
PyYAML.x86_64 0:3.10-11.el7
libyaml.x86_64 0:0.1.4-11.el7_0
perl.x86_64 4:5.16.3-294.el7_6
perl-Carp.noarch 0:1.26-244.el7
perl-Encode.x86_64 0:2.51-7.el7
perl-Error.noarch 1:0.17020-2.el7
perl-Exporter.noarch 0:5.68-3.el7
perl-File-Path.noarch 0:2.09-2.el7
perl-File-Temp.noarch 0:0.23.01-3.el7
perl-Filter.x86_64 0:1.49-3.el7
perl-Getopt-Long.noarch 0:2.40-3.el7
perl-Git.noarch 0:1.8.3.1-21.el7_7
perl-HTTP-Tiny.noarch 0:0.033-3.el7
perl-PathTools.x86_64 0:3.40-5.el7
perl-Pod-Escapes.noarch 1:1.04-294.el7_6
perl-Pod-Perldoc.noarch 0:3.20-4.el7
perl-Pod-Simple.noarch 1:3.28-4.el7
perl-Pod-Usage.noarch 0:1.63-3.el7
perl-Scalar-List-Utils.x86_64 0:1.27-248.el7
perl-Socket.x86_64 0:2.010-4.el7
perl-Storable.x86_64 0:2.45-3.el7
perl-TermReadKey.x86_64 0:2.30-20.el7
perl-Text-ParseWords.noarch 0:3.29-4.el7
perl-Time-HiRes.x86_64 4:1.9725-3.el7
perl-Time-Local.noarch 0:1.2300-2.el7
perl-constant.noarch 0:1.27-2.el7
perl-libs.x86_64 4:5.16.3-294.el7_6
perl-macros.x86_64 4:5.16.3-294.el7_6
perl-parent.noarch 1:0.225-244.el7
perl-podlators.noarch 0:2.5.1-3.el7
perl-threads.x86_64 0:1.87-4.el7
perl-threads-shared.x86_64 0:1.43-6.el7
python-babel.noarch 0:0.9.6-8.el7
python-backports.x86_64 0:1.0-8.el7
python-backports-ssl_match_hostname.noarch 0:3.5.0.1-1.el7
python-cffi.x86_64 0:1.6.0-5.el7
python-enum34.noarch 0:1.0.4-1.el7
python-httplib2.noarch 0:0.9.2-1.el7
python-idna.noarch 0:2.4-1.el7
python-ipaddress.noarch 0:1.0.16-2.el7
python-jinja2.noarch 0:2.7.2-4.el7
python-markupsafe.x86_64 0:0.11-10.el7
python-paramiko.noarch 0:2.1.1-9.el7
python-passlib.noarch 0:1.6.5-2.el7
python-ply.noarch 0:3.4-11.el7
python-pycparser.noarch 0:2.14-1.el7
python-setuptools.noarch 0:0.9.8-7.el7
python-six.noarch 0:1.9.0-2.el7
python2-cryptography.x86_64 0:1.7.2-2.el7
python2-jmespath.noarch 0:0.9.0-3.el7
python2-pyasn1.noarch 0:0.1.9-7.el7
rsync.x86_64 0:3.1.2-6.el7_6.1
sshpass.x86_64 0:1.06-2.el7

Complete!
Cloning into 'ocp4-helpernode'...
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/root/ocp4-helpernode/tasks/generate_ssh_keys.yaml': line 7, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

openssh_keypair:
^ here

The error appears to have been in '/root/ocp4-helpernode/tasks/generate_ssh_keys.yaml': line 7, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

openssh_keypair:
^ here

exception type: <class 'ansible.errors.AnsibleParserError'>
exception: no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/root/ocp4-helpernode/tasks/generate_ssh_keys.yaml': line 7, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

openssh_keypair:
^ here

I'm invoking it as

!/bin/sh
yum -y install ansible git
git clone https://github.com/RedHatOfficial/ocp4-helpernode
cp vars.yaml ocp4-helpernode/
cd ocp4-helpernode
ansible-playbook -e @vars.yaml tasks/main.yml

My vars.yaml

---
disk: vda
helper:
  name: "helper"
  ipaddr: "192.168.7.77"
  networkifacename: "eth1"
dns:
  domain: "foo.bar.com"
  clusterid: "cim6xufuaruaidi"
  forwarder1: "10.193.159.254"
  forwarder2: "10.102.76.214"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.10"
  poolend: "192.168.7.30"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.20"
  macaddr: "52:54:00:a5:50:2c"
masters:
  - name: "master0"
    ipaddr: "192.168.7.21"
    macaddr: "52:54:00:2a:44:b3"
  - name: "master1"
    ipaddr: "192.168.7.22"
    macaddr: "52:54:00:80:13:25"
  - name: "master2"
    ipaddr: "192.168.7.23"
    macaddr: "52:54:00:98:92:db"
workers:
  - name: "worker0"
    ipaddr: "192.168.7.11"
    macaddr: "52:54:00:5d:50:ed"
  - name: "worker1"
    ipaddr: "192.168.7.12"
    macaddr: "52:54:00:31:5f:f7"

Error while running ansible-playbook command.

Hi Team , We are getting below error while running below command. we have edited main.yml file for dhcp configuration
ansible-playbook -e @vars.yaml /usr/ocp4-helpernode/tasks/main.yml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not
match 'all'

PLAY [all] *****************************************************************************************************
skipping: no hosts matched

PLAY RECAP *****************************************************************************************************

imageregistry operator is not activated by default (at least on ppc64le)

Here is the output of the oc get configs.imageregistry.operator.openshift.io -o yaml we got on Power (ppc64le) after having install OCP 4.3.18

apiVersion: v1
items:
- apiVersion: imageregistry.operator.openshift.io/v1
  kind: Config
  metadata:
    creationTimestamp: « 2020-05-25T14:23:48Z »
    finalizers:
    - imageregistry.operator.openshift.io/finalizer
    generation: 1
    name: cluster
    resourceVersion: « 20362 »
    selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster
    uid: 1fc4933c-d0d7-41cd-b404-eb592fa443f4
  spec:
    defaultRoute: false
    disableRedirect: false
    httpSecret: a0f3f4a62c08efb15777319adc01524ea6a51437c2e48a3e076b7d81815a04fa37469116b2d8dcdaaad42e35829de30da76344b257b48a9e03fc2daeee67c878
    logging: 2
    managementState: Removed
    proxy:
      http: «  »
      https: «  »
      noProxy: «  »
    readOnly: false
    replicas: 1
    requests:
      read:
        maxInQueue: 0
        maxRunning: 0
        maxWaitInQueue: 0s
      write:
        maxInQueue: 0
        maxRunning: 0
        maxWaitInQueue: 0s
    storage: {}
  status:
    conditions:
    - lastTransitionTime: « 2020-05-25T14:47:48Z »
      message: All registry resources are removed
      reason: Removed
      status: « False »
      type: Progressing
    - lastTransitionTime: « 2020-05-25T14:23:49Z »
      message: The registry is removed
      reason: Removed
      status: « True »
      type: Available
    - lastTransitionTime: « 2020-05-25T14:23:49Z »
      status: « False »
      type: Degraded
    - lastTransitionTime: « 2020-05-25T14:23:49Z »
      message: The registry is removed
      reason: Removed
      status: « True »
      type: Removed
    observedGeneration: 1
    readyReplicas: 0
    storage: {}
    storageManaged: false
kind: List
metadata:
  resourceVersion: «  »
  selfLink: «  »

It seems that managementState: Removed (don’t know if it is the same for x86 on ocp4.3...).
But this mean we should first patch this is order to activate the operator to have the registry being deployed.

oc patch configs.imageregistry.operator.openshift.io cluster —type merge —patch ‘{« spec »:{« managementState »: « Managed »}}’

This patch could be added to the documentation provided by the ocp4helperchek nfs-info command, as it should be always like this... (should be managed)

Then we can patch the storage to use a real PV ...

oc patch configs.imageregistry.operator.openshift.io cluster --type=json -p '[{"op": "remove", "path": "/spec/storage/emptyDir" }]'
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"pvc":{ "claim": "registry-pvc"}}}}'

Reverse zonefile and other issues

Found some issues and made some improvements

reverse zone file corrected the helper-ip
forward zone file corrected
configured rsyslog for Haproxy logging

ppc64le conditionals cause problems on ARM

I'm using the helpernode for services on a mini k8s ARM cluster. Because its k8s and not OpenShift some of the other issues don't apply here, but the ppc64le conditionals cause problems when trying to use this on the arm architecture. Not sure if this will affect any other architectures, but I'll add more detail next time I'm on my ARM cluster.

ssh config file from generate_ssh_keys.yaml broken

While testing this on a new cluster, I had some systems I wanted to access via IP.. The new ~/.ssh/config file has the line
Host *
which means anything will get appended with
HostName %h.{{ dns.clusterid }}.{{ dns.domain }}

I propose this be changed to a Jinja template... and instead of *, just list the first part of the hostnames from the vars file.

oc login fails with error: x509: certificate signed by unknown authority

Everything installs correctly but I just now starting getting this error in my wrapper around oc login
I'm using CentOS 7. Am I missing a step?

[root@helper ~]# cat ocp_login.sh
#!/bin/sh
export KUBECONFIG=/root/ocp4/auth/kubeconfig
token=cat /root/ocp4/auth/kubeadmin-password
oc login -u kubeadmin -p $token --loglevel 10
[root@helper ~]#

[root@helper ~]# sh ocp_login.sh
I0714 09:36:23.216976 30088 loader.go:375] Config loaded from file: /root/ocp4/auth/kubeconfig
I0714 09:36:23.217413 30088 round_trippers.go:423] curl -k -v -XHEAD 'https://api.cinofnb5arcsvlx.foo.bar.com:6443/' I0714 09:36:23.231324 30088 round_trippers.go:443] HEAD https://api.cinofnb5arcsvlx.foo.bar.com:6443/ 403 Forbidden in 13 milliseconds I0714 09:36:23.231368 30088 round_trippers.go:449] Response Headers:
I0714 09:36:23.231387 30088 round_trippers.go:452] Content-Type: application/json
I0714 09:36:23.231405 30088 round_trippers.go:452] X-Content-Type-Options: nosniff
I0714 09:36:23.231420 30088 round_trippers.go:452] Content-Length: 186
I0714 09:36:23.231437 30088 round_trippers.go:452] Date: Tue, 14 Jul 2020 16:36:23 GMT
I0714 09:36:23.231454 30088 round_trippers.go:452] Audit-Id: ed375ebc-74b7-4011-906b-c187a9520bd2
I0714 09:36:23.231494 30088 request_token.go:86] GSSAPI Enabled
I0714 09:36:23.231570 30088 round_trippers.go:423] curl -k -v -XGET -H "X-Csrf-Token: 1" 'https://api.cinofnb5arcsvlx.foo.bar.com:6443/.well-known/oauth-authorization-server' I0714 09:36:23.232984 30088 round_trippers.go:443] GET https://api.cinofnb5arcsvlx.foo.bar.com:6443/.well-known/oauth-authorization-server 200 OK in 1 milliseconds I0714 09:36:23.233011 30088 round_trippers.go:449] Response Headers:
I0714 09:36:23.233024 30088 round_trippers.go:452] Content-Type: application/json
I0714 09:36:23.233036 30088 round_trippers.go:452] Content-Length: 645
I0714 09:36:23.233047 30088 round_trippers.go:452] Date: Tue, 14 Jul 2020 16:36:23 GMT
I0714 09:36:23.233058 30088 round_trippers.go:452] Audit-Id: 7d232745-341b-4287-8b83-8fcf9673e8dd
I0714 09:36:23.233313 30088 round_trippers.go:423] curl -k -v -XHEAD 'https://oauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com' I0714 09:36:23.297012 30088 round_trippers.go:443] HEAD https://oauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com in 63 milliseconds I0714 09:36:23.297038 30088 round_trippers.go:449] Response Headers:
I0714 09:36:23.297048 30088 request_token.go:438] falling back to kubeconfig CA due to possible x509 error: x509: certificate signed by unknown authority
I0714 09:36:23.297147 30088 round_trippers.go:423] curl -k -v -XGET -H "X-Csrf-Token: 1" 'https://oauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com/oauth/authorize?client_id=openshift-challenging-client&code_challenge=4WW6yutcHmWMLurbzHrHmToOsmGyaPu2kT4-6jyEMtY&code_challenge_method=S256&redirect_uri=https%3A%2F%2Foauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com%2Foauth%2Ftoken%2Fimplicit&response_type=code' I0714 09:36:23.307680 30088 round_trippers.go:443] GET https://oauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com/oauth/authorize?client_id=openshift-challenging-client&code_challenge=4WW6yutcHmWMLurbzHrHmToOsmGyaPu2kT4-6jyEMtY&code_challenge_method=S256&redirect_uri=https%3A%2F%2Foauth-openshift.apps.cinofnb5arcsvlx.foo.bar.com%2Foauth%2Ftoken%2Fimplicit&response_type=code in 10 milliseconds I0714 09:36:23.307709 30088 round_trippers.go:449] Response Headers:
I0714 09:36:23.308572 30088 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: oc/openshift (linux/amd64) kubernetes/2576e48" 'https://api.cinofnb5arcsvlx.foo.bar.com:6443/api/v1/namespaces/openshift/configmaps/motd' I0714 09:36:23.310040 30088 round_trippers.go:443] GET https://api.cinofnb5arcsvlx.foo.bar.com:6443/api/v1/namespaces/openshift/configmaps/motd 403 Forbidden in 1 milliseconds I0714 09:36:23.310060 30088 round_trippers.go:449] Response Headers:
I0714 09:36:23.310157 30088 round_trippers.go:452] X-Content-Type-Options: nosniff
I0714 09:36:23.310182 30088 round_trippers.go:452] Content-Length: 303
I0714 09:36:23.310190 30088 round_trippers.go:452] Date: Tue, 14 Jul 2020 16:36:23 GMT
I0714 09:36:23.310198 30088 round_trippers.go:452] Audit-Id: 5563e4f4-7430-41fd-b45d-ed1fe2a9e802
I0714 09:36:23.310205 30088 round_trippers.go:452] Content-Type: application/json
I0714 09:36:23.310242 30088 request.go:1017] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps "motd" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "openshift"","reason":"Forbidden","details":{"name":"motd","kind":"configmaps"},"code":403}
F0714 09:36:23.310871 30088 helpers.go:114] error: x509: certificate signed by unknown authority

[root@helper ~]# cat vars.yaml

disk: vda
ssh_gen_key: false
helper:
name: "helper"
ipaddr: "192.168.1.77"
networkifacename: "eth1"
dns:
domain: "foo.bar.com"
clusterid: "cinofnb5arcsvlx"
forwarder1: "10.193.159.254"
forwarder2: "10.102.76.214"
dhcp:
router: "192.168.1.1"
bcast: "192.168.1.255"
netmask: "255.255.255.0"
poolstart: "192.168.1.10"
poolend: "192.168.1.30"
ipid: "192.168.1.0"
netmaskid: "255.255.255.0"
bootstrap:
name: "bootstrap"
ipaddr: "192.168.1.20"
macaddr: "52:54:00:82:43:eb"
masters:

name: "master0"
ipaddr: "192.168.1.21"
macaddr: "52:54:00:63:e6:c3"
name: "master1"
ipaddr: "192.168.1.22"
macaddr: "52:54:00:2a:c3:42"
name: "master2"
ipaddr: "192.168.1.23"
macaddr: "52:54:00:47:21:02"
workers:
name: "worker0"
ipaddr: "192.168.1.11"
macaddr: "52:54:00:8c:ae:92"
name: "worker1"
ipaddr: "192.168.1.12"
macaddr: "52:54:00:92:3e:4e"

[root@helper ~]# cat install-config.yaml
apiVersion: v1
baseDomain:foo.bar.com
compute:

hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: cinofnb5arcsvlx
networking:
clusterNetworks:
- cidr: 10.254.0.0/16
  hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
- 172.30.0.0/16
  platform:
  none: {}
  pullSecret: '{"auths":{"cloud.openshift.com": ...
  sshKey: 'ssh-rsa ...

Unable to start service haproxy

I was following the following doc to prepare bastion node. But getting the following error:

failed: [localhost] (item=haproxy) => {"ansible_loop_var": "item", "changed": false, "item": "haproxy", "msg": "Unable to start service haproxy: Job for haproxy.service failed because the control process exited with error code.\nSee \"systemctl status haproxy.service\" and \"journalctl -xe\" for details.\n"}
.
.
.
localhost                  : ok=29   changed=20   unreachable=0    failed=1    skipped=29   rescued=1    ignored=0

Doc: https://github.com/RedHatOfficial/ocp4-helpernode
Here is the full log:

[root@ocpbastion ocp4-helpernode]# ansible-playbook -e @vars-ppc64le.yaml tasks/main.yml

PLAY [all] *****************************************************************************

TASK [Gathering Facts] *****************************************************************
ok: [localhost]

TASK [fail] ****************************************************************************
[WARNING]: conditional statements should not include jinja2 templating delimiters such
as {{ }} or {% %}. Found: item is search('{{ chars }}')
skipping: [localhost] => (item=aus.stglabs.ibm.com)
skipping: [localhost] => (item=ocpbastion)
skipping: [localhost] => (item=ocpbootstrap)

TASK [fail] ****************************************************************************
[WARNING]: conditional statements should not include jinja2 templating delimiters such
as {{ }} or {% %}. Found: item.name is search('{{ chars }}')
skipping: [localhost] => (item={'name': 'ocpmaster1', 'ipaddr': 'XX.XX.XX.221'})
skipping: [localhost] => (item={'name': 'ocpmaster2', 'ipaddr': 'XX.XX.XX.223'})
skipping: [localhost] => (item={'name': 'ocpmaster3', 'ipaddr': 'XX.XX.XX.224'})
skipping: [localhost] => (item={'name': 'ocpworker1', 'ipaddr': 'XX.XX.XX.225'})
skipping: [localhost] => (item={'name': 'ocpworker2', 'ipaddr': 'XX.XX.XX.226'})
skipping: [localhost] => (item={'name': 'ocpworker3', 'ipaddr': 'XX.XX.XX.227'})

TASK [fail] ****************************************************************************
skipping: [localhost]

TASK [file] ****************************************************************************
changed: [localhost]

TASK [openssh_keypair] *****************************************************************
changed: [localhost]

TASK [blockinfile] *********************************************************************
changed: [localhost] => (item={'name': 'ocpmaster1', 'ipaddr': 'XX.XX.XX.221'})
changed: [localhost] => (item={'name': 'ocpmaster2', 'ipaddr': 'XX.XX.XX.223'})
changed: [localhost] => (item={'name': 'ocpmaster3', 'ipaddr': 'XX.XX.XX.224'})
changed: [localhost] => (item={'name': 'ocpworker1', 'ipaddr': 'XX.XX.XX.225'})
changed: [localhost] => (item={'name': 'ocpworker2', 'ipaddr': 'XX.XX.XX.226'})
changed: [localhost] => (item={'name': 'ocpworker3', 'ipaddr': 'XX.XX.XX.227'})

TASK [blockinfile] *********************************************************************
changed: [localhost] => (item={'name': 'ocpbootstrap'})

TASK [assert] **************************************************************************
fatal: [localhost]: FAILED! => {
    "assertion": false,
    "changed": false,
    "evaluated_to": false,
    "msg": "Assertion failed"
}

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [set_fact] ************************************************************************
skipping: [localhost]

TASK [set_fact] ************************************************************************
skipping: [localhost]

TASK [set_fact] ************************************************************************
skipping: [localhost]

TASK [set_fact] ************************************************************************
skipping: [localhost]

TASK [set_fact] ************************************************************************
skipping: [localhost]

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [set_fact] ************************************************************************
ok: [localhost]

TASK [Install needed packages] *********************************************************
changed: [localhost]

TASK [Install packages for DHCP/PXE install] *******************************************
skipping: [localhost]

TASK [Install additional package for Intel platforms] **********************************
skipping: [localhost]

TASK [Write out dhcp file] *************************************************************
skipping: [localhost]

TASK [Write out named file] ************************************************************
changed: [localhost]

TASK [Installing DNS Serialnumber generator] *******************************************
changed: [localhost]

TASK [Set zone serial number] **********************************************************
changed: [localhost]

TASK [Setting serial number as a fact] *************************************************
ok: [localhost]

TASK [Write out "aus.stglabs.ibm.com" zone file] ***************************************
changed: [localhost]

TASK [Write out reverse zone file] *****************************************************
changed: [localhost]

TASK [Write out haproxy config file] ***************************************************
changed: [localhost]

TASK [Copy httpd conf file] ************************************************************
changed: [localhost]

TASK [Create apache directories for installing] ****************************************
changed: [localhost] => (item=/var/www/html/install)
changed: [localhost] => (item=/var/www/html/ignition)

TASK [Delete OCP4 files, if requested, to download again] ******************************
skipping: [localhost] => (item=/usr/local/src/openshift-client-linux.tar.gz)
skipping: [localhost] => (item=/usr/local/src/openshift-install-linux.tar.gz)
skipping: [localhost] => (item=/var/www/html/install/bios.raw.gz)
skipping: [localhost] => (item=/var/lib/tftpboot/rhcos/initramfs.img)
skipping: [localhost] => (item=/var/lib/tftpboot/rhcos/kernel)

TASK [Downloading OCP4 installer Bios] *************************************************
changed: [localhost]

TASK [Start firewalld service] *********************************************************
ok: [localhost]

TASK [Open up firewall ports] **********************************************************
changed: [localhost] => (item=67/udp)
changed: [localhost] => (item=53/tcp)
changed: [localhost] => (item=53/udp)
changed: [localhost] => (item=443/tcp)
changed: [localhost] => (item=80/tcp)
changed: [localhost] => (item=8080/tcp)
changed: [localhost] => (item=6443/tcp)
changed: [localhost] => (item=6443/udp)
changed: [localhost] => (item=22623/tcp)
changed: [localhost] => (item=22623/udp)
changed: [localhost] => (item=9000/tcp)
changed: [localhost] => (item=69/udp)
changed: [localhost] => (item=111/tcp)
changed: [localhost] => (item=2049/tcp)
changed: [localhost] => (item=20048/tcp)
changed: [localhost] => (item=50825/tcp)
changed: [localhost] => (item=53248/tcp)

TASK [Best effort SELinux repair - DNS] ************************************************
changed: [localhost]

TASK [Best effort SELinux repair - Apache] *********************************************
changed: [localhost]

TASK [Create NFS export directory] *****************************************************
changed: [localhost]

TASK [Copy NFS export conf file] *******************************************************
changed: [localhost]

TASK [Create TFTP config] **************************************************************
skipping: [localhost]

TASK [generate netboot entry for grub2] ************************************************
skipping: [localhost]

TASK [Create TFTP RHCOS dir] ***********************************************************
skipping: [localhost]

TASK [SEBool allow haproxy connect any port] *******************************************
changed: [localhost]

TASK [Copy over files needed for TFTP] *************************************************
skipping: [localhost]

TASK [Downloading OCP4 installer initramfs] ********************************************
skipping: [localhost]

TASK [Downloading OCP4 installer kernel] ***********************************************
skipping: [localhost]

TASK [Set the default tftp file] *******************************************************
skipping: [localhost]

TASK [Set the bootstrap specific tftp file] ********************************************
skipping: [localhost]

TASK [Set the master specific tftp files] **********************************************
skipping: [localhost] => (item={'name': 'ocpmaster1', 'ipaddr': 'XX.XX.XX.221'})
skipping: [localhost] => (item={'name': 'ocpmaster2', 'ipaddr': 'XX.XX.XX.223'})
skipping: [localhost] => (item={'name': 'ocpmaster3', 'ipaddr': 'XX.XX.XX.224'})

TASK [Set the worker specific tftp files] **********************************************
skipping: [localhost] => (item={'name': 'ocpworker1', 'ipaddr': 'XX.XX.XX.225'})
skipping: [localhost] => (item={'name': 'ocpworker2', 'ipaddr': 'XX.XX.XX.226'})
skipping: [localhost] => (item={'name': 'ocpworker3', 'ipaddr': XX.XX.XX.227'})

TASK [create grub.cfg] *****************************************************************
skipping: [localhost]

TASK [generate grub entry (bootstrap)] *************************************************
skipping: [localhost]

TASK [generate grub entry (masters)] ***************************************************
skipping: [localhost] => (item={'name': 'ocpmaster1', 'ipaddr': 'XX.XX.XX.221'})
skipping: [localhost] => (item={'name': 'ocpmaster2', 'ipaddr': 'XX.XX.XX.223'})
skipping: [localhost] => (item={'name': 'ocpmaster3', 'ipaddr': 'XX.XX.XX.224'})

TASK [generate grub entry (workers)] ***************************************************
skipping: [localhost] => (item={'name': 'ocpworker1', 'ipaddr': 'XX.XX.XX.225'})
skipping: [localhost] => (item={'name': 'ocpworker2', 'ipaddr': 'XX.XX.XX.226'})
skipping: [localhost] => (item={'name': 'ocpworker3', 'ipaddr': 'XX.XX.XX.227'})

TASK [Installing TFTP Systemd helper] **************************************************
skipping: [localhost]

TASK [Installing TFTP Systemd unit file] ***********************************************
skipping: [localhost]

TASK [Systemd daemon reload] ***********************************************************
skipping: [localhost]

TASK [Starting services] ***************************************************************
changed: [localhost] => (item=named)
failed: [localhost] (item=haproxy) => {"ansible_loop_var": "item", "changed": false, "item": "haproxy", "msg": "Unable to start service haproxy: Job for haproxy.service failed because the control process exited with error code.\nSee \"systemctl status haproxy.service\" and \"journalctl -xe\" for details.\n"}
ok: [localhost] => (item=httpd)
changed: [localhost] => (item=rpcbind)
changed: [localhost] => (item=nfs-server)

RUNNING HANDLER [restart bind] *********************************************************

RUNNING HANDLER [restart haproxy] ******************************************************

RUNNING HANDLER [restart httpd] ********************************************************

RUNNING HANDLER [restart nfs] **********************************************************

PLAY RECAP *****************************************************************************
localhost                  : ok=29   changed=20   unreachable=0    failed=1    skipped=29   rescued=1    ignored=0

vars-ppc64le.yaml contents:

[root@ocpbastion ocp4-helpernode]# cat vars-ppc64le.yaml
---
disk: vda
staticips: true
helper:
  name: "ocpbastion"
  ipaddr: "XX.XX.XX.236"
dns:
  domain: "XXXXXexampleXXXXX.com"
  clusterid: "ocp4"
  forwarder1: "XX.XX.XX.200"
  forwarder2: "XX.XX.XX.50"
bootstrap:
  name: "ocpbootstrap"
  ipaddr: "XX.XX.XX.233"
masters:
  - name: "ocpmaster1"
    ipaddr: "XX.XX.XX.221"
  - name: "ocpmaster2"
    ipaddr: "XX.XX.XX.223"
  - name: "ocpmaster3"
    ipaddr: "XX.XX.XX.224"
workers:
  - name: "ocpworker1"
    ipaddr: "XX.XX.XX.225"
  - name: "ocpworker2"
    ipaddr: "XX.XX.XX.226"
  - name: "ocpworker3"
    ipaddr: "XX.XX.XX.227"

ppc64le: true
ocp_bios: "https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/4.3/4.3.18/rhcos-4.3.18-ppc64le-metal.ppc64le.raw.gz"
ocp_initramfs: "https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/4.3/4.3.18/rhcos-4.3.18-ppc64le-installer-initramfs.ppc64le.img"
ocp_install_kernel: "https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/4.3/4.3.18/rhcos-4.3.18-ppc64le-installer-kernel-ppc64le"
ocp_client: "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.3.18/openshift-client-linux-4.3.18.tar.gz"
ocp_installer: "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.3.18/openshift-install-linux-4.3.18.tar.gz"

Any idea what could be the issue?

adding others in vars.yaml leads to creating pxeconfigs for them also if using DHCP

when adding any host to others in vars.yaml, it creates a pxeconfig for it

quickstart snipped for modifying registry with storageclass

Add a comment to the quickstart indicating that on platforms with an existing StorageClass the following patch command will modify the registry config to request a claim.

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"pvc":{"claim":""}}}}'

OC login failed in master node

I have installed RHCOS . i have login to master node where oc login failing with below error.
[core@master0 openshift]$ oc login -u system:admin --insecure-skip-tls-verify
Server [https://localhost:8443]: https://api.ocp4.bacadgsl.com:6443
error: net/http: TLS handshake timeout

Also please find below output of oc command.
[core@master0 openshift]$ oc version
Client Version: 4.3.7-202003130552-62c8cca

**[core@master0 openshift]$ oc status
error: Missing or incomplete configuration info. Please login or point to an existing, complete config file:

Via the command-line flag --config
Via the KUBECONFIG environment variable
In your home directory as ~/.kube/config
To view or setup config directly use the 'config' command.**

Please help to resolve the same.

Why require Internet?

I'd be interested in using this with a offline install. I see the vars.yaml with URLs in it. Could I simply put those in my own location and remove that dependency or are there other reasons I need this?

If so, I'm wondering if a script to download those, drop them in files/ and change the variable to point to that would be enough to drop the dependency. I'd just copy the result into my Ansible code.

Not enough memory to load specified image

Have been using ocp4-helpernode for quite some time, thank you! It has been a great.

Today, I needed to redeploy to an existing environment fresh since I lost access to the previous deployment artifacts.

The issue I've been running into regardless of the OS on the helpernode, I am unable to get the bootstrap node bootstrapped. The error received on the console Not enough memory to load specified image

Took a pcap file and there seems to be a negotiation failure with TFTP service as can be seen in the screenshot. I will rerun a fresh test and confirm.

helpernode OS CentOS Linux release 7.7.1908 (Core)

Please add ip=eno3:dhcp option to the dhcp server config files

Hi, please allow the addition of the ip=eno3:dhcp option to the dhcp server config files to allow for more flexibility in the bare metal upi deployment.

I envision that it could be an option of the vars.yaml. That would be very helpfull.

e.g.

default menu.c32
prompt 1
timeout 9
ONTIMEOUT 1
menu title ######## PXE Boot Menu ########
label 1
menu label ^1) Install Bootstrap Node
menu default
kernel rhcos/kernel
append initrd=rhcos/initramfs.img nomodeset ip=eno3:dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=http://10.0.0.250:8080/install/bios.raw.gz coreos.inst.ignition_url=http://10.0.0.250:8080/ignition/bootstrap.ign

Greetings and thanks.

Error AnsibleUndefinedVariable: 'registry_host' is undefined even though setup_registry is not configured

My vars.yml:

staticips: true
disk: vda
helper:
  name: "helper"
  ipaddr: "192.168.8.217"
  networkifacename: "ens192"
dns:
  domain: "domain.com"
  clusterid: "ocp4"
  forwarder1: "8.8.8.8"
  forwarder2: "8.8.4.4"
dhcp:
  router: "192.168.8.1"
  bcast: "192.168.8.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.8.10"
  poolend: "192.168.7.30"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.8.219"
  macaddr: "52:54:00:60:72:67"
masters:
  - name: "master0"
    ipaddr: "192.168.8.210"
    macaddr: "52:54:00:e7:9d:67"
  - name: "master1"
    ipaddr: "192.168.8.211"
    macaddr: "52:54:00:80:16:23"
  - name: "master2"
    ipaddr: "192.168.8.212"
    macaddr: "52:54:00:d5:1c:39"
workers:
  - name: "worker0"
    ipaddr: "192.168.8.200"
    macaddr: "52:54:00:f4:26:a1"
  - name: "worker1"
    ipaddr: "192.168.8.224"
    macaddr: "52:54:00:82:90:00"
other:
  - name: "non-cluster-vm"
    ipaddr: "192.168.8.217"
#setup_registry:
#  deploy: false
#  autosync_registry: false
#  registry_image: docker.io/library/registry:2
#  local_repo: "ocp4/openshift4"
#  product_repo: "openshift-release-dev"
#  release_name: "ocp-release"
#  release_tag: "4.4.9-x86_64"

When running ansible-playbook -e @vars.yaml -e staticips=true tasks/main.yml I got the error below.

TASK [Write out "domain.com" zone file] ******************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'registry_host' is undefined"}

For a quick resolution I removed these line in template/zonefile.j2 and it run successfully.

{%if setup_registry %}
; Create entry for the local registry
{{ registry_host }}	IN	A	{{ helper.ipaddr }}
;
{% endif %}

unable to create directory to provision new pv: mkdir /persistentvolumes/openshift-image-registry-registry-pvc-pvc-8617bc9a-0321-44d7-9192-5823df3e277f: permission denied

Hello!

I added extra disk to helpernode and mount it at /exports, because I was under impression that all the nfs storage would be there.
Later I used "helpernodecheck nfs-setup" to setup nfs provisioner which did not work due to this error:

I0413 12:00:09.119057 1 controller.go:987] provision "openshift-image-registry/registry-pvc" class "nfs-storage-provisioner": started I0413 12:00:09.123036 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-image-registry", Name:"registry-pvc", UID:"8617bc9a-0321-44d7-9192-5823df3e277f", APIVersion:"v1", ResourceVersion:"325736", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "openshift-image-registry/registry-pvc" W0413 12:00:09.124249 1 controller.go:746] Retrying syncing claim "openshift-image-registry/registry-pvc" because failures 4 < threshold 15 E0413 12:00:09.124289 1 controller.go:761] error syncing claim "openshift-image-registry/registry-pvc": failed to provision volume with StorageClass "nfs-storage-provisioner": unable to create directory to provision new pv: mkdir /persistentvolumes/openshift-image-registry-registry-pvc-pvc-8617bc9a-0321-44d7-9192-5823df3e277f: permission denied I0413 12:00:09.124336 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-image-registry", Name:"registry-pvc", UID:"8617bc9a-0321-44d7-9192-5823df3e277f", APIVersion:"v1", ResourceVersion:"325736", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "nfs-storage-provisioner": unable to create directory to provision new pv: mkdir /persistentvolumes/openshift-image-registry-registry-pvc-pvc-8617bc9a-0321-44d7-9192-5823df3e277f: permission denied

Where is this location defined? Should this be under /exports?

Document helpernodecheck

There is a lot of information/configuration that can be done using /usr/local/bin/helpernodecheck ...this should probably be documented.

Help to generate some machineconfig file (like chrony configuration)

We are currently evaluating the possibility to use this playbook to also generate some useful machineconfig file like ntp (chrony).

User can specify the chrony.conf need via a variable chrony_conf in vars.yaml

chrony_conf: |
  server pool.ntp.org iburst
  driftfile /var/lib/chrony/drift
  makestep 1.0 3
  rtcsync
  logdir /var/log/chrony

a template can be used in order to generate the machineconfig file for master and worker with chrony_conf encoded with base64
the results yaml files will be available in a machineconfig folder.
helpernodecheck install-info could be updated with the following

Quickstart Notes:
        mkdir ~/install
        cd ~/install
        vi install-config.yaml
->      openshift-install create manifests
->      cp ../machineconfig/* ./manifests
        openshift-install create ignition-configs
        cp *.ign /var/www/html/ignition/
        restorecon -vR /var/www/html/

Modification seems to work on our side (tested with OCP4.3 on ppc64le PowerVM) and we are ready to propose a PR if you are interested.

AnsibleUndefinedVariable

I am getting the below error while running the playbook with these commands
ansible-playbook -e vars.yaml tasks/main.yml
my vars.yaml file is
disk: vda
helper:
name: "helper"
ipaddr: "192.168.2.109"
networkifacename: "enp0s3"
dns:
domain: "openshift.example.com"
clusterid: "ocp4"
forwarder1: "8.8.8.8"
forwarder2: "8.8.4.4"
dhcp:
router: "192.168.2.1"
bcast: "192.168.2.255"
netmask: "255.255.255.0"
poolstart: "192.168.2.10"
poolend: "192.168.2.30"
ipid: "192.168.2.0"
netmaskid: "255.255.255.0"
bootstrap:
name: "bootstrap"
ipaddr: "192.168.2.108"
macaddr: "08:00:27:07:0b:3f"
masters:

name: "master"
ipaddr: "192.168.2.115"
macaddr: "08:00:27:00:be:7d"
name: "master2"
ipaddr: "192.168.2.121"
macaddr: " 08:00:27:2d:b6:26"
name: "master3"
ipaddr: "192.168.2.122"
macaddr: "08:00:27:df:7b:d2"
workers:
name: "worker0"
ipaddr: "192.168.7.11"
macaddr: "52:54:00:f4:26:a1"

and error is

[root@helper ocp4-helpernode]# ansible-playbook -e vars.yaml tasks/main.yml

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Install needed packages] *************************************************
ok: [localhost]

TASK [Install packages for DHCP/PXE install] ***********************************
ok: [localhost]

TASK [Write out dhcp file] *****************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dhcp' is undefined"}
to retry, use: --limit @/root/ocp4-helpernode/tasks/main.retry

PLAY RECAP *********************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=1

and I configured static IP to all my nodes worker, master, and bootstrap.
Please suggest if i am doing any wrong steps. if not, please help in this issue.

Help is appreciated
Thanks

Bare metal install: PXE Boot not working on Intel NUCs

I get all the way through the install instances part. https://github.com/RedHatOfficial/ocp4-helpernode/blob/master/docs/bmquickstart.md#install-instances

I'm setting the Intel NUCs BIOs in the following way:

Disable secure boot
Enable UEFI boot
Boot Option #1: UEFI: PXE IPv4 Intel(R) Ethernet, disable the others
Uncheck all of the Fast Boot, Boot Network Devices Last, etc.

The NUC will get to the point of:
"Checking media presence"
"Media Present"
"Start PXE over IPv4 on MAC: xxxxxxx"

and after ~1 min or so, it will error out with "A bootable device has not been detected".

I've double checked the install and all of the files are in place, the webserver is running and serving the necessary files, etc.

I have noticed other PXE boot tutorials that the filename mentioned in the DHCP config is always something like 'BOOTX64.efi'.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/preparing-for-a-network-install_installing-rhel-as-an-experienced-user#configuring-a-tftp-server-for-uefi-based-clients_preparing-for-a-network-install

Does the bare metal not support UEFI PXE boot?

Having problems getting master nodes up

Hello. Thanks for the great helpernode. I'm setting up using bare metal in my local network. helper, boostrap, masters are all on VMs (using VMWare ESXi 6), worker on physical servers. I manage to follow your guides using the DHCP method. Bootstrap is up but not a single master can be up.

this is what journalctl -f on bootstrap shows

Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:57:58.143974    2106 kubelet.go:1958] SyncLoop (PLEG): "bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system(1f97af20f1d578c1694b03302bc0d81e)", event: &pleg.PodLifecycleEvent{ID:"1f97af20f1d578c1694b03302bc0d81e", Type:"ContainerDied", Data:"2162eb651e8dd9fcd583f6086b71f92b6e31461f8c8f7a6ff5f0befab16fea5d"}
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:57:58.144610    2106 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:57:58.146953    2106 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node bootstrap.ocp4.beu.local
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:57:58.146997    2106 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node bootstrap.ocp4.beu.local
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:57:58.147017    2106 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node bootstrap.ocp4.beu.local
Jul 06 16:57:58 bootstrap.ocp4.beu.local crio[2045]: time="2020-07-06 16:57:58.154221463Z" level=info msg="Attempting to create container: kube-system/bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local/kube-controller-manager" id=72cf8507-25bb-4182-801a-a2c7a9666e4b
Jul 06 16:57:58 bootstrap.ocp4.beu.local crio[2045]: time="2020-07-06 16:57:58.154303689Z" level=warning msg="error reserving ctr name k8s_kube-controller-manager_bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system_1f97af20f1d578c1694b03302bc0d81e_1 for id 16e64f3194c5151b36abf3fd92b76440abfcfdeb314b641cb39ca0bd8165dd41: name is reserved"
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:57:58.154503    2106 remote_runtime.go:200] CreateContainer in sandbox "e7cd80c1d291e9b3dfadb8004fb65554630e832420672361d1068f272b166f3e" from runtime service failed: rpc error: code = Unknown desc = error reserving ctr name k8s_kube-controller-manager_bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system_1f97af20f1d578c1694b03302bc0d81e_1 for id 16e64f3194c5151b36abf3fd92b76440abfcfdeb314b641cb39ca0bd8165dd41: name is reserved
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:57:58.154573    2106 kuberuntime_manager.go:803] container start failed: CreateContainerError: error reserving ctr name k8s_kube-controller-manager_bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system_1f97af20f1d578c1694b03302bc0d81e_1 for id 16e64f3194c5151b36abf3fd92b76440abfcfdeb314b641cb39ca0bd8165dd41: name is reserved
Jul 06 16:57:58 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:57:58.154605    2106 pod_workers.go:191] Error syncing pod 1f97af20f1d578c1694b03302bc0d81e ("bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system(1f97af20f1d578c1694b03302bc0d81e)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CreateContainerError: "error reserving ctr name k8s_kube-controller-manager_bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local_kube-system_1f97af20f1d578c1694b03302bc0d81e_1 for id 16e64f3194c5151b36abf3fd92b76440abfcfdeb314b641cb39ca0bd8165dd41: name is reserved"
Jul 06 16:57:58 bootstrap.ocp4.beu.local crio[2045]: time="2020-07-06 16:57:58.181005800Z" level=info msg="Removed container 2162eb651e8dd9fcd583f6086b71f92b6e31461f8c8f7a6ff5f0befab16fea5d: kube-system/bootstrap-kube-controller-manager-bootstrap.ocp4.beu.local/kube-controller-manager" id=5c72b969-3cb4-4dc3-aaa2-6710a664252b
Jul 06 16:57:58 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2520] failed to fetch discovery: Unauthorized
Jul 06 16:57:58 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2521] failed to fetch discovery: Unauthorized
Jul 06 16:57:58 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2522] failed to fetch discovery: Unauthorized
Jul 06 16:57:58 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2523] failed to fetch discovery: Unauthorized
Jul 06 16:57:59 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2524] failed to fetch discovery: Unauthorized
Jul 06 16:57:59 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2525] failed to fetch discovery: Unauthorized
Jul 06 16:57:59 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2526] failed to fetch discovery: Unauthorized
Jul 06 16:57:59 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2527] failed to fetch discovery: Unauthorized
Jul 06 16:57:59 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2528] failed to fetch discovery: Unauthorized
Jul 06 16:58:00 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2529] failed to fetch discovery: Unauthorized
Jul 06 16:58:00 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2530] failed to fetch discovery: Unauthorized
Jul 06 16:58:00 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2531] failed to fetch discovery: Unauthorized
Jul 06 16:58:00 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2532] failed to fetch discovery: Unauthorized
Jul 06 16:58:00 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2533] failed to fetch discovery: Unauthorized
Jul 06 16:58:01 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2534] failed to fetch discovery: Unauthorized
Jul 06 16:58:01 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2535] failed to fetch discovery: Unauthorized
Jul 06 16:58:01 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2536] failed to fetch discovery: Unauthorized
Jul 06 16:58:01 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2537] failed to fetch discovery: Unauthorized
Jul 06 16:58:01 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2538] failed to fetch discovery: Unauthorized
Jul 06 16:58:02 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2539] failed to fetch discovery: Unauthorized
Jul 06 16:58:02 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2540] failed to fetch discovery: Unauthorized
Jul 06 16:58:02 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:02.469085    2106 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 06 16:58:02 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:02.471765    2106 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:02 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:02.471823    2106 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:02 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:02.471842    2106 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:02 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2541] failed to fetch discovery: Unauthorized
Jul 06 16:58:02 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2542] failed to fetch discovery: Unauthorized
Jul 06 16:58:02 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2543] failed to fetch discovery: Unauthorized
Jul 06 16:58:03 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2544] failed to fetch discovery: Unauthorized
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.189483    2106 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.190033    2106 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.191881    2106 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.191919    2106 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.191934    2106 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.191884    2106 kubelet_node_status.go:486] Recording NodeHasSufficientMemory event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.192024    2106 kubelet_node_status.go:486] Recording NodeHasNoDiskPressure event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: I0706 16:58:03.192035    2106 kubelet_node_status.go:486] Recording NodeHasSufficientPID event message for node bootstrap.ocp4.beu.local
Jul 06 16:58:03 bootstrap.ocp4.beu.local crio[2045]: time="2020-07-06 16:58:03.197565000Z" level=info msg="Attempting to create container: kube-system/bootstrap-kube-scheduler-bootstrap.ocp4.beu.local/kube-scheduler" id=0489d019-fc17-459c-9a8c-d108db357a04
Jul 06 16:58:03 bootstrap.ocp4.beu.local crio[2045]: time="2020-07-06 16:58:03.197629567Z" level=warning msg="error reserving ctr name k8s_kube-scheduler_bootstrap-kube-scheduler-bootstrap.ocp4.beu.local_kube-system_b667a15b69c990109211e0bde7f14f40_1 for id c5c9f665d07a0315eff68252905ed4db58837c31bffa67383be7b22836de07d0: name is reserved"
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:58:03.198619    2106 remote_runtime.go:200] CreateContainer in sandbox "b1899005b7bafbce48aa9f48a28a3df597be389214a75e2ac87994ad4b7b2ee4" from runtime service failed: rpc error: code = Unknown desc = error reserving ctr name k8s_kube-scheduler_bootstrap-kube-scheduler-bootstrap.ocp4.beu.local_kube-system_b667a15b69c990109211e0bde7f14f40_1 for id c5c9f665d07a0315eff68252905ed4db58837c31bffa67383be7b22836de07d0: name is reserved
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:58:03.198688    2106 kuberuntime_manager.go:803] container start failed: CreateContainerError: error reserving ctr name k8s_kube-scheduler_bootstrap-kube-scheduler-bootstrap.ocp4.beu.local_kube-system_b667a15b69c990109211e0bde7f14f40_1 for id c5c9f665d07a0315eff68252905ed4db58837c31bffa67383be7b22836de07d0: name is reserved
Jul 06 16:58:03 bootstrap.ocp4.beu.local hyperkube[2106]: E0706 16:58:03.198727    2106 pod_workers.go:191] Error syncing pod b667a15b69c990109211e0bde7f14f40 ("bootstrap-kube-scheduler-bootstrap.ocp4.beu.local_kube-system(b667a15b69c990109211e0bde7f14f40)"), skipping: failed to "StartContainer" for "kube-scheduler" with CreateContainerError: "error reserving ctr name k8s_kube-scheduler_bootstrap-kube-scheduler-bootstrap.ocp4.beu.local_kube-system_b667a15b69c990109211e0bde7f14f40_1 for id c5c9f665d07a0315eff68252905ed4db58837c31bffa67383be7b22836de07d0: name is reserved"
Jul 06 16:58:03 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2545] failed to fetch discovery: Unauthorized
Jul 06 16:58:03 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2546] failed to fetch discovery: Unauthorized
Jul 06 16:58:03 bootstrap.ocp4.beu.local bootkube.sh[2125]: [#2547] failed to fetch discovery: Unauthorized

BTW, I don't understand how the certificates of this https://api-int.ocp4.beu.local:22623/config/master works? Masters keep saying invalid certs. Even curl to that URL isn't valid.

ansible-playbook command fails to fetch the required binaries

With networkifacename assigned to an interface that is connected to the external network ansible-playbook command fails to fetch required binaries:

#ansible-playbook -e @vars.yaml tasks/main.yml' fails while downloading the packages:
TASK [Downloading OCP4 installer Bios] **************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "dest": "/var/www/html/install/bios.raw.gz", "elapsed": 10, "gid": 0, "group": "root", "mode": "0555", "msg": "Request failed: <urlopen error [Errno -2] Name or service not known>", "owner": "root", "secontext": "system_u:object_r:httpd_sys_content_t:s0", "size": 175505408, "state": "file", "uid": 0, "url": "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/4.3.0/rhcos-4.3.0-x86_64-metal.raw.gz"}

How to reproduce?

Reboot a physical machine and once the console is up, try wget and it works out fine.
Provide an interface that is connected to the external network in vars.yaml and run the ansible-
playbook command.
ansible-playbook fails with the error mentioned above.
After the execution of the above command doing wget independently also fails.

Most probably it is modifying the network interface in some manner that is causing this behavior.

Also, for the PXE boot to work, I am assuming all the machines need to be connected to the public network via one interface and a private network with another interface. These interfaces that are connected to the private network are going to configured with the DHCP server. Is that correct?

RFE: Add prometheus exporters on Helpernode to monitor on OCP

I think would be interesting to add prometheus exporters to httpd, bind and such so we could monitor metrics from helpernode in OCP (maybe a dedicated namespace with prometheus/grafana instances)

RFE: Install helm binary

Since Helm 3 is coming to OCP v4, go ahead and install the binary so its available.

aka (splunk install uses Helm for now)

Compact Cluster Support

Support for "Compact" clusters (3 nodes that are both master and worker) is coming in 4.5

Currently, this playbook doesn't support such design. the HAproxy config must be edited to account for the router running on the masters.

Other things to consider is DNS and DHCP configurations.

failed to provision volume with StorageClass "nfs-storage-provisioner": unable to create directory to provision new pv: permission denied

Thanks a lot for providing this helpernode code! It's given us a huge head start on deploying OpenShift 4.

I'm having a problem that's very similar to #35. I also added a separate disk to my VM and mounted it at /export. The exception being that fix for that issue didn't work for me.

$ cat /etc/exports
/export	*(rw,sync,root_squash)

$ showmount -e localhost
Export list for localhost:
/export *

I deployed the nfs-provisioner

$ helpernodecheck nfs-setup
Now using project "nfs-provisioner" on server "https://api.sandbox.edpay.nz:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app django-psql-example

to build a new example application in Python. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node

Already on project "nfs-provisioner" on server "https://api.sandbox.edpay.nz:6443".
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
securitycontextconstraints.security.openshift.io/hostmount-anyuid added to: ["system:serviceaccount:nfs-provisioner:nfs-client-provisioner"]
deployment.extensions/nfs-client-provisioner created
storageclass.storage.k8s.io/nfs-storage-provisioner created
storageclass.storage.k8s.io/nfs-storage-provisioner annotated
Now using project "default" on server "https://api.sandbox.edpay.nz:6443".

Deployment started; you should monitor it with "oc get pods -n nfs-provisioner"

Everything is fine with that.

$ oc get pods -n nfs-provisioner
NAME                                      READY   STATUS    RESTARTS   AGE
nfs-client-provisioner-7c84684645-grwj7   1/1     Running   0          76s

I create the registry PVC but get an error in the events.

$ oc create -f /usr/local/src/registry-pvc.yaml -n openshift-image-registry
persistentvolumeclaim/registry-pvc created

$ oc describe pvc registry-pvc -n openshift-image-registry
Name:          registry-pvc
Namespace:     openshift-image-registry
StorageClass:  nfs-storage-provisioner
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: nfs-storage
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason                Age                   From                                                                                      Message
  ----     ------                ----                  ----                                                                                      -------
  Normal   Provisioning          39s (x4 over 2m24s)   nfs-storage_nfs-client-provisioner-7c84684645-grwj7_bdf62238-8371-11ea-b836-0a580a800207  External provisioner is provisioning volume for claim "openshift-image-registry/registry-pvc"
  Warning  ProvisioningFailed    39s (x4 over 2m24s)   nfs-storage_nfs-client-provisioner-7c84684645-grwj7_bdf62238-8371-11ea-b836-0a580a800207  failed to provision volume with StorageClass "nfs-storage-provisioner": unable to create directory to provision new pv: mkdir /persistentvolumes/openshift-image-registry-registry-pvc-pvc-52fb821f-61b9-48b1-b23b-8099223c5840: permission denied
  Normal   ExternalProvisioning  13s (x11 over 2m25s)  persistentvolume-controller                                                               waiting for a volume to be created, either by external provisioner "nfs-storage" or manually created by system administrator

I'm deploying OpenShift 4.3.8 and this VM happens to be Oracle Linux 7 so the user is nfsnobody.

$ oc version
Client Version: 4.3.8
Server Version: 4.3.8
Kubernetes Version: v1.16.2

$ cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="7.6"
ID="ol"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Oracle Linux Server 7.6"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:6:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.6
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.6

$ ll -d /export
drwxrwxrwx. 3 nfsnobody nfsnobody 4096 Mar  2 17:10 /export

$ sudo restorecon -Rv /export
restorecon reset /export context system_u:object_r:unlabeled_t:s0->system_u:object_r:usr_t:s0
restorecon reset /export/lost+found context system_u:object_r:unlabeled_t:s0->system_u:object_r:usr_t:s0

$ ls -ldZ /export/
drwxrwxrwx. nfsnobody nfsnobody system_u:object_r:usr_t:s0       /export/

I can hop into the nfs-client-provisioner pod and poke around a bit.

$ oc project nfs-provisioner
Already on project "nfs-provisioner" on server "https://api.sandbox.edpay.nz:6443".

$ oc rsh pod/nfs-client-provisioner-7c84684645-grwj7

/ # whoami
root

/ # ls -ld /persistentvolumes/
drwxrwxrwx    3 nobody   nobody        4096 Mar  2 04:10 /persistentvolumes/

/ # ls /persistentvolumes/
lost+found

/ # touch /persistentvolumes/hello.txt
touch: /persistentvolumes/hello.txt: Permission denied

/ # mkdir /persistentvolumes/test
mkdir: can't create directory '/persistentvolumes/test': Permission denied

So /export is owned by nfsnobody as it should be in OL 7 but /persistentvolumes is owned by nobody in the nfs-client-provisioner pod.

Could that be the source of the "permission denied" issue?
If so, any suggestions on how to troubleshoot or fix it?
If not, any ideas on what else it could be?

I tried changing the owner of /export to nobody and poked around in the pod again but /persistentvolumes is owned by 99 in the nfs-client-provisioner pod.

$ sudo chown nfsnobody:nfsnobody -R /export

$ oc rsh pod/nfs-client-provisioner-7c84684645-grwj7

/ # ls -al
...
drwxrwxrwx    3 99       99            4096 Mar  2 04:10 persistentvolumes

/ # ls /persistentvolumes/
ls: can't open '/persistentvolumes/': Permission denied

I reverted back to /export being owned by nfsnobody.

Can OCP4.3 support ppc64le baremetal server?

We have several IBM power8/power9 OpenPOWER servers(which bootloader is petitboot)

When deploy CoreOS on it, install process completed successfully. But when server reboot, it can not boot from harddisk as petitboot can not parse out configuration.

BTW, if deploy CoreOS on KVM based VMs on one of these servers, it can be loaded successfully as VM use grub2 as bootloader.

So, the question is, whether OpenPOWER baremetal server is supported by OCP4.3 and higher? Thx!

RFE: Network Interface should be dynamic, unless set

Right now, a user must set the interface name under helper.networkifacename when setting up the vars.yaml file.

I propose we set this to the default interface provided by ansible using {{ ansible_default_ipv4.interface }}

Although I think we should have a mechanism to let the user override this. It should go...

The playbook will use {{ ansible_default_ipv4.interface }} if helper.networkifacename is not provided
The playbook will use helper.networkifacename if provided, overriding {{ ansible_default_ipv4.interface }}

RFE: Update for EL8

This playbook should be updated for EL (RHEL/CentOS) 8.

The side effect would be using a newer version of ansible.

RFE: Create ssh-keys

Have the playbook generate sshkeys. This will give the user the options to use this one (or they can use another one).

Use openssh_keypair (ONLY available in ansible 2.8)
You can also use ssh-keygen -t rsa -b 4096 -C "root@helper" -N "" -f ~/.ssh/id_rsa as well

NFS Fails to work on OCP 4.4.3 - The Deployment "nfs-client-provisioner" is invalid

When running helpernodecheck nfs-setup, I didn't necessarily remember seeing errors, but I couldn't get PVCs to create a new PV.

I did some investigating, and ran these commands:

[root@helper src]# oc project nfs-provisioner
Already on project "nfs-provisioner" on server "https://api.ocp4.melrose.wawak.org:6443".
[root@helper src]# oc version
Client Version: openshift-clients-4.3.0-201910250623-88-g6a937dfe
Server Version: 4.4.3
Kubernetes Version: v1.17.1
[root@helper src]# oc create -f nfs-provisioner-deployment.yaml
The Deployment "nfs-client-provisioner" is invalid:
* spec.selector: Required value
* spec.template.metadata.labels: Invalid value: map[string]string{"app":"nfs-client-provisioner"}: `selector` does not match template `labels`
[root@helper src]# oc project
Using project "nfs-provisioner" on server "https://api.ocp4.melrose.wawak.org:6443".
[root@helper src]# oc get all
No resources found in nfs-provisioner namespace.

I'm not sure exactly what's going on, but there seems to be some sort of incompatibility between 4.4.3 and the nfs-provisoner stuff.

Do you need a must gather? This environment worked perfectly with the nfs-client and 4.3.

Helper name resolution config always adds FQDN

To use ocp4_helpernode in my CI I had to multihome the helper by adding another interface in the kickstart file and then specify eth1 in the vars.

network  --bootproto=static --device=eth0 --gateway= 1.2.3.1 --ip=1.2.3.4 --nameserver=5.6.7.8 --netmask=255.255.252.0 --ipv6=auto --activate
network  --bootproto=static --device=eth1 --gateway=NETWORK.1 --ip=NETWORK.77 --nameserver=5.6.7.8 --netmask=255.255.255.0 --ipv6=auto --activate

Note: NETWORK is replaced by the CI

All that works great but I found I had to go in after I ran your ansible and set the resolver config to

[root@helper ~]# cat /etc/resolv.conf
search ci3dmymdepagxtz.foo.bar.com
nameserver 127.0.0.1
nameserver 5.6.7.8
[root@helper ~]#

When on the helper and I try to ssh -i ~/.ssh/key it says

ssh: Could not resolve hostname .ci3dmymdepagxtz.foo.bar.com: Name or service not known

But if I just use worker0 it works,

PXE boot: "unable to locate configuration files"

Running on ESX, DHCP works, IP addresses (own, router, dhcp) are assigned.
Was working with a pull of June.

RFE: Add a container registry

Add the ability to host a container registry. This can be helpful for disconnected installs or any ancillary container images one may want to host.

Here's some steps for a registry setup

After those steps you can create a systemd service unit file to start it on boot

#
# Copy (chmod 664) to /etc/systemd/system/poc-registry.service
#
# systemctl daemon-reload
# systemctl start poc-registry
# systemctl status poc-registry
# systemctl enable poc-registry
#

[Unit]
Description=OpenShift POC HTTP for PXE Config
After=network.target syslog.target

[Service]
Type=simple
TimeoutStartSec=5m
ExecStartPre=-/usr/bin/podman rm "poc-registry"

ExecStart=/usr/bin/podman run   --name poc-registry -p 5000:5000 \
                                -v /opt/registry/data:/var/lib/registry:z \
                                -v /opt/registry/auth:/auth:z \
                                -e "REGISTRY_AUTH=htpasswd" \
                                -e "REGISTRY_AUTH_HTPASSWD_REALM=Realm" \
                                -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
                                -v /opt/registry/certs:/certs:z \
                                -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
                                -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
                                docker.io/library/registry:2

ExecReload=-/usr/bin/podman stop "poc-registry"
ExecReload=-/usr/bin/podman rm "poc-registry"
ExecStop=-/usr/bin/podman stop "poc-registry"
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target

PROPOSAL: Make the helpernode HA

Proposal

Use Red Hat Cluster Suite (pacemaker/corosync) to make the helpernode truly HA.

Background

The helpernode was originally intended as a "teaching tool" to learn how to install OpenShift 4. I believe it's time to grow it beyond that and make it a viable tool to run in production.

DHCP filename not being set properly for Power arch

Power architecture uses grub netboot and the filename should be boot/grub2/powerpc-ieee1275/core.elf in dhcpd.conf instead of the default pexelinux.0

failed to initialize the cluster: Cluster operator monitoring is still updating

Seeing install errors between CI runs of yesterday and today. Nothing has changed in my scripts but I see that OCP 4.3.8 is now being used. Moved from Centos7 to CentOS8 but no joy.

ssh [email protected]
helper.ciirzlitbtzrx8b.barnacle.foo.com
[root@helper ~]# tail ocp4/.openshift_install.log
time="2020-04-08T09:35:34-07:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.3.0: 100% complete, waiting on authentication, monitoring"
time="2020-04-08T09:35:35-07:00" level=debug msg="Cluster is initialized"
time="2020-04-08T09:35:35-07:00" level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
time="2020-04-08T09:35:35-07:00" level=debug msg="Route found in openshift-console namespace: console"
time="2020-04-08T09:35:35-07:00" level=debug msg="Route found in openshift-console namespace: downloads"
time="2020-04-08T09:35:35-07:00" level=debug msg="OpenShift console route is created"
time="2020-04-08T09:35:35-07:00" level=info msg="Install complete!"
time="2020-04-08T09:35:35-07:00" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/ocp4/auth/kubeconfig'"
time="2020-04-08T09:35:35-07:00" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.ciirzlitbtzrx8b.barnacle.foo.com"
time="2020-04-08T09:35:35-07:00" level=info msg="Login to the console with user: kubeadmin, password: jua6j-ZggE9-nzyCA-2CYIa"
[root@helper ~]# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0     True        False         False      43h
cloud-credential                           4.3.0     True        False         False      44h
cluster-autoscaler                         4.3.0     True        False         False      43h
console                                    4.3.0     True        False         False      43h
dns                                        4.3.0     True        False         False      44h
image-registry                             4.3.0     True        False         False      43h
ingress                                    4.3.0     True        False         False      43h
insights                                   4.3.0     True        False         False      44h
kube-apiserver                             4.3.0     True        False         False      44h
kube-controller-manager                    4.3.0     True        False         False      44h
kube-scheduler                             4.3.0     True        False         False      44h
machine-api                                4.3.0     True        False         False      44h
machine-config                             4.3.0     True        False         False      44h
marketplace                                4.3.0     True        False         False      43h
monitoring                                 4.3.0     True        False         False      24h
network                                    4.3.0     True        False         False      44h
node-tuning                                4.3.0     True        False         False      43h
openshift-apiserver                        4.3.0     True        False         False      43h
openshift-controller-manager               4.3.0     True        False         False      44h
openshift-samples                          4.3.0     True        False         False      43h
operator-lifecycle-manager                 4.3.0     True        False         False      44h
operator-lifecycle-manager-catalog         4.3.0     True        False         False      44h
operator-lifecycle-manager-packageserver   4.3.0     True        False         False      24h
service-ca                                 4.3.0     True        False         False      44h
service-catalog-apiserver                  4.3.0     True        False         False      43h
service-catalog-controller-manager         4.3.0     True        False         False      43h
storage                                    4.3.0     True        False         False      43h

ssh [email protected]
[root@helper ~]# hostname -f

helper.cias5f8wlj0kea8.barnacle.foo.com
[root@helper ~]# tail ocp4/.openshift_install.log
time="2020-04-09T23:52:31-07:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.3.8: 100% complete, waiting on monitoring"
time="2020-04-09T23:55:34-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2020-04-09T23:58:35-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2020-04-10T00:04:30-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2020-04-10T00:10:36-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2020-04-10T00:10:48-07:00" level=info msg="Cluster operator insights Disabled is False with : "
time="2020-04-10T00:10:48-07:00" level=info msg="Cluster operator monitoring Available is False with : "
time="2020-04-10T00:10:48-07:00" level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack."
time="2020-04-10T00:10:48-07:00" level=error msg="Cluster operator monitoring Degraded is True with UpdatingPrometheusK8SFailed: Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus: expected 2 replicas, updated 1 and available 1"
time="2020-04-10T00:10:48-07:00" level=fatal msg="failed to initialize the cluster: Cluster operator monitoring is still updating"
[root@helper ~]#  oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.8     True        False         False      5h39m
cloud-credential                           4.3.8     True        False         False      5h55m
cluster-autoscaler                         4.3.8     True        False         False      5h46m
console                                    4.3.8     True        False         False      5h41m
dns                                        4.3.8     True        False         False      5h50m
image-registry                             4.3.8     True        False         False      5h47m
ingress                                    4.3.8     True        False         False      5h46m
insights                                   4.3.8     True        False         False      5h51m
kube-apiserver                             4.3.8     True        False         False      5h50m
kube-controller-manager                    4.3.8     True        False         False      5h48m
kube-scheduler                             4.3.8     True        False         False      5h48m
machine-api                                4.3.8     True        False         False      5h51m
machine-config                             4.3.8     True        False         False      5h50m
marketplace                                4.3.8     True        False         False      5h46m
monitoring                                           False       True          True       5h42m
network                                    4.3.8     True        False         False      5h51m
node-tuning                                4.3.8     True        False         False      5h48m
openshift-apiserver                        4.3.8     True        False         False      5h47m
openshift-controller-manager               4.3.8     True        False         False      5h49m
openshift-samples                          4.3.8     True        False         False      5h46m
operator-lifecycle-manager                 4.3.8     True        False         False      5h48m
operator-lifecycle-manager-catalog         4.3.8     True        False         False      5h48m
operator-lifecycle-manager-packageserver   4.3.8     True        False         False      5h47m
service-ca                                 4.3.8     True        False         False      5h51m
service-catalog-apiserver                  4.3.8     True        False         False      5h48m
service-catalog-controller-manager         4.3.8     True        False         False      5h48m
storage                                    4.3.8     True        False         False      5h47m

how to set ocp node hostname?

how to set ocp node hostname? by dhcp-server? I add a new node, and add new one dhcp line, but the hostname is default localhost.

`/etc/dhcp/dhcpd.conf

host worker11 { hardware ethernet 00:50:56:b7:60:88; fixed-address 172.31.20.88; }
`

Approach to create rhcos master & worker machine

Hi,

I have created rhcos bootstrap machine using "bootstrap.ign" file. Now I want to create a master and a worker machine but the steps for this is not clearly mentioned anywhere. Can you please confirm if we can follow the below approach:
"Create master and worker machine in the same way bootstrap node was created,instead will provide "master.ign" for master machine and "worker.ign" file for worker machine during first boot."

Or pls suggest if need to follow any other method.

RFE: Move RHCOS installation target disk to the masters/workers/bootstrap section

Currently there's a global setting for which disk RHCOS is installed to. This setting eventually makes it's way to the PXE config as kernel parameters.

We should have this as a specific setting to each node section. Example:

bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.20"
  macaddr: "52:54:00:60:72:67"
  disk: vda
masters:
  - name: "master0"
    ipaddr: "192.168.7.21"
    macaddr: "52:54:00:e7:9d:67"
    disk: vda
  - name: "master1"
    ipaddr: "192.168.7.22"
    macaddr: "52:54:00:80:16:23"
    disk: vda
  - name: "master2"
    ipaddr: "192.168.7.23"
    macaddr: "52:54:00:d5:1c:39"
    disk: vda
workers:
  - name: "worker0"
    ipaddr: "192.168.7.11"
    macaddr: "52:54:00:f4:26:a1"
    disk: vda
  - name: "worker1"
    ipaddr: "192.168.7.12"
    macaddr: "52:54:00:82:90:00"
    disk: vda
  - name: "worker2"
    ipaddr: "192.168.7.13"
    macaddr: "52:54:00:8e:10:34"
    disk: vda

This way you can have "mixed disk" environment (say, for example, one of your workers has NVMe disks or if you're using FC and each node has a UUID as the disk name)

redhat-cop / ocp4-helpernode Goto Github PK

ocp4-helpernode's Introduction

OCP4 Helper Node

Using this playbook

Prereqs

Setup your Environment Vars

Run the playbook

Helper Script

Install OpenShift 4 UPI

Quickstarts

Contributing

ocp4-helpernode's People

Contributors

Stargazers

Watchers

Forkers

ocp4-helpernode's Issues

Install VMs

[root@helper ~]# cat vars.yaml

vars-ppc64le.yaml contents:

Proposal

Background

Recommend Projects

Recommend Topics

Recommend Org

Jobs