GithubHelp home page GithubHelp logo

Comments (17)

canit00 avatar canit00 commented on June 9, 2024

@niteshnarayanlal TY! The system you're running this playbook on is dual homed?

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

I think we can say that. My setup has 2 NICs at the moment both of them are connected to the external network via the same router.

from ocp4-helpernode.

canit00 avatar canit00 commented on June 9, 2024

Can you please provide the output of netstat -rn & ip route I can test on my end.

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

[root@lab filetranspiler]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.x.xx.xxx 0.0.0.0 UG 0 0 0 eno1
0.0.0.0 10.x.xx.xxx 0.0.0.0 UG 0 0 0 eno2
10.x.0.0 0.0.0.0 255.255.224.0 U 0 0 0 eno1
10.x.0.0 0.0.0.0 255.255.224.0 U 0 0 0 eno2
10.xx.0.0 0.0.0.0 255.255.0.0 U 0 0 0 cni-podman0

[root@lab filetranspiler]# ip route
default via 10.x.xx.xxx dev eno1 proto dhcp metric 100
default via 10.x.xx.xxx dev eno2 proto dhcp metric 101
10.x.0.0/19 dev eno1 proto kernel scope link src 10.x.x.xxx metric 100
10.x.0.0/19 dev eno2 proto kernel scope link src 10.x.xx.xxx metric 101
10.xx.0.0/16 dev cni-podman0 proto kernel scope link src 10.xx.0.1 linkdown

Please note: For now in my setup, I have manually downloaded all the packages and removed the downloading sections from main.yaml to proceed.

Also, can you please confirm:
For the PXE boot to work, I am assuming all the machines need to be connected to the public network via one interface and a private network with another interface. These interfaces that are connected to the private network are going to configured with the DHCP server. Is that correct?

from ocp4-helpernode.

canit00 avatar canit00 commented on June 9, 2024

You can configure DHCP to listen on whatever interface on the helper node, as long as the control plane & compute nodes can connect. And your assumption is correct, when RHCOS is bootstrapped they will need Internet access to download the container images to deploy/run OpenShift. If they do not have Internet access you will need to download the offline install.

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

Thanks for the response.
As per the vars.yaml documentation:

"dns.forwarder1 - Tis will be set up as the DNS forwarder. This is usually one of the corprate (or "upstream") DNS servers.
dns.forwarder2 - Tis will be set up as the second DNS forwarder. This is usually one of the corprate (or "upstream") DNS servers."

In my case, I have a couple of DNS servers that I picked from resolv.conf (these are on the same network that is providing internet connectivity to the helper).
However, with the above configuration openshift bootstrap installation fails to fetch the version.

[root@lab ocp4]# openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer 4.3.8
DEBUG Built from commit f7a2f7cf9ec3201bb8c9ebb677c05d21c72e3cc5
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: EOF
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: EOF

Following is the state of haproxy:

[root@lab ~]# netstat -nltupe | grep ':6443|:22623|:80|:443'
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 114077 44421/haproxy
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 0 114078 44421/haproxy
tcp 0 0 0.0.0.0:22623 0.0.0.0:* LISTEN 0 114076 44421/haproxy
tcp 0 0 0.0.0.0:6443 0.0.0.0:* LISTEN 0 114075 44421/haproxy
tcp6 0 0 :::8080 :::* LISTEN 0 136814 49309/httpd
[root@lab ~]# systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2020-04-13 16:25:38 EDT; 1h 13min ago
Main PID: 44420 (haproxy)
Tasks: 2 (limit: 409922)

Any suggestions what might be missing in the haproxy configuration that is not done by the helper-node?

from ocp4-helpernode.

christianh814 avatar christianh814 commented on June 9, 2024

By default the haproxy is configured to listen to ALL interfaces on the helpernode https://github.com/RedHatOfficial/ocp4-helpernode/blob/master/templates/haproxy.cfg.j2#L69

You can check the backends by visiting http://${HELPER_IP}:9000 and see the stats page. At the VERY least you should see the bootstrap turn green.

If not, you may have to debug by ssh core@bootstrap and running the journalctl command it tells you on the motd.

from ocp4-helpernode.

canit00 avatar canit00 commented on June 9, 2024

@niteshnarayanlal sorry Nitesh been out sick.. I'll find time to review the info and get back to you asap.

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

@canit00 sure no problem. Please take care.
@christianh814 I don't think bootstrap node is even installed. I did check HTTP://${HELPER_IP}:9000. It shows a couple of things:

  • In bootstrap, Layer4 connection problem: Connection refused
  • The backend status is DOWN.

I felt that selinux could be a problem so I explicitly allowed the two ports that are used.
semanage port -a -t http_port_t -p tcp 6443
semanage port -a -t http_port_t -p tcp 22623
But nothing changed.

Backend 0 0 0 1 0 1 300 213
Cum. sessions: 213
- Queue time: 0 ms

  • Connect time: 0 ms
    - Total time: 0 ms
    0 ? 4998 0
    Response bytes in: 0
    Compression in: 0
    Compression out: 0 (0%)
    Compression bypass: 0
    Total bytes saved: 0 (0%)
    0 0 213 0
    Connection resets during transfers: 0 client, 0 server
    0 0 41m17s DOWN 0 0 0 1 41m17s

from ocp4-helpernode.

canit00 avatar canit00 commented on June 9, 2024

@niteshnarayanlal setting up an offline environment to help you troubleshoot. However, if you don't mind providing the installation logs I will be more than happy to review them for you. The only catch is that you'd need to anonymize IPs/Hostnames. If you follow these instructions gives you the details of the below command. Basically point it to the installation directory where it can obtain the kubeconfig files the install generated/required files.

As said, in the meantime, I will set up a disconnected environment. Cheers!

openshift-install gather bootstrap --dir=/ocp4/

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

Removing my comment as it is irrelevant. I was not using the right ssh command.
I have access to the bootstrap node.
One quick question I checked the output of http://${HELPER_IP}:9000 and it shows:

bootstrap 0 0 - 0 0 0 0 - 0
Cum. sessions: 0
Avg over last 1024 success. conn.
- Queue time: 0 ms

  • Connect time: 0 ms
    - Total time: 0 ms
    0 ? 0 0 0 0 0
    Connection resets during transfers: 0 client, 0 server
    0 0 4h10m DOWN L4CON in 0ms
    Layer4 connection problem: Connection refused
    1 Y - 1
    Failed Health Checks
    1 4h10m -

I am wondering why it is reporting "connection refused". Is that expected or something is wrong?

from ocp4-helpernode.

christianh814 avatar christianh814 commented on June 9, 2024

@canit00 I was able to move forward with the installation, the error that was preventing the bootstrap installation was "device is busy".

However, when I am unable to access the bootstrap node from my helper:
[root@helper ocp4]# ssh [email protected]
ssh: Could not resolve hostname 10.19.159.142.ocp4.example.com: Name or service not known

This is because you need to do ssh bootstrap. There is a ~/.ssh/config that "automagically" connects you using the ssh key generated in the playbook

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

@christianh814 Yeah just removed that comment. Sorry about the noise.

from ocp4-helpernode.

canit00 avatar canit00 commented on June 9, 2024

The reason you can't resolve hosts within ocp4.example.com is due to your /etc/resolv.conf

Typically when the helper node is deployed successfully your resolv.conf would look similar to this pointing to itself.

cat /etc/resolv.conf 
# Generated by NetworkManager
search ocp4.example.com virt.lab.eng.bos.redhat.com
nameserver 127.0.0.1

Due to your forwarders listed in the vars.yaml

---
disk: sdb
helper:
  name: "dhcp"
  ipaddr: "192.168.1.13"
dns:
  domain: "example.com"
  clusterid: "ocp4"
  forwarder1: "10.19.152.212"
  forwarder2: "10.19.42.41"

It should in turn configure the lab forwarders as follows:

grep -A4 Foward /etc/named.conf 
	/* Fowarders */
	forward only;
	forwarders { 10.19.152.212; 10.19.42.41; };

Please note that there is a typo Fowarders when it should be Forwarders in /etc/named.conf but is a non issue and will get it corrected.

from ocp4-helpernode.

christianh814 avatar christianh814 commented on June 9, 2024

Removing my comment as it is irrelevant. I was not using the right ssh command.
I have access to the bootstrap node.
One quick question I checked the output of http://${HELPER_IP}:9000 and it shows:

bootstrap 0 0 - 0 0 0 0 - 0
Cum. sessions: 0
Avg over last 1024 success. conn.

  • Queue time: 0 ms

    • Connect time: 0 ms
      • Total time: 0 ms
        0 ? 0 0 0 0 0
        Connection resets during transfers: 0 client, 0 server
        0 0 4h10m DOWN L4CON in 0ms
        Layer4 connection problem: Connection refused
        1 Y - 1
        Failed Health Checks
        1 4h10m -

I am wondering why it is reporting "connection refused". Is that expected or something is wrong?

It depends where you are in the bootstrap sequence. In the beginning (while it's setting itself up) it should be "red" (in the web UI). Then it'll turn "green" once it's up and bootstrapping the masters. It'll turn "red" again when the masters are finished bootstrapping because the bootstrap node hands over the rest of the install responsibility to the masters.

As @canit00 said, you can send over the logs openshift-install gather bootstrap --dir=./ocp4/. Or log into the bootstrap and take a look at the journalctl command it tells you to run.

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

The reason you can't resolve hosts within ocp4.example.com is due to your /etc/resolv.conf

Typically when the helper node is deployed successfully your resolv.conf would look similar to this pointing to itself.

cat /etc/resolv.conf 
# Generated by NetworkManager
search ocp4.example.com virt.lab.eng.bos.redhat.com
nameserver 127.0.0.1

Due to your forwarders listed in the vars.yaml

---
disk: sdb
helper:
  name: "dhcp"
  ipaddr: "192.168.1.13"

That's the thing, I don't have a private network.
I have one single network that also provides internet connectivity.
I hope it should be possible to deploy a cluster without an explicit physical network.

dns:
domain: "example.com"
clusterid: "ocp4"
forwarder1: "10.19.152.212"

This is the IP address of the helper node, not specifying this leads to
[root@helper~]# ssh core@bootstrap
ssh: Could not resolve hostname bootstrap.ocp4.example.com: Name or service not known
Hence, along with the network DNS IP, I have also specified this.

forwarder2: "10.19.42.41"


It should in turn configure the lab forwarders as follows:

grep -A4 Foward /etc/named.conf
/* Fowarders */
forward only;
forwarders { 10.19.152.212; 10.19.42.41; };


Please note that there is a **typo** `Fowarders` when it should be `Forwarders` in /etc/named.conf but is a non issue and will get it corrected.

I was able to get the bootstrap, master and worker node installed with ssh access from the helper. However, the bootstrap node for some reason started throwing the following error:

Starting etcd certificate signer...
Error: error creating container storage: the container name "etcd-signer" is already in use by "961d53487462d2750395b0d9fb678864cf58908baa5361b38fc346a7ce644970". You have to remove that container to be able to reuse that name.: that name is already in use
Apr 15 22:47:39 dhcp159-142.lab systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
Apr 15 22:47:39 dhcp159-142.lab.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Apr 15 22:47:44 dhcp159-142.lab.com systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Apr 15 22:47:44 dhcp159-142.lab.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 20.

from ocp4-helpernode.

niteshnarayanlal avatar niteshnarayanlal commented on June 9, 2024

The initial issue that I reported here was about the inability of the helper node to fetch files after running the ansible-playbook command specifically when the helper node has multiple networks. This was because the resolv.conf was getting overwritten with only 'nameserver 127.0.0.1' entry.
However, I am not able to reproduce this issue consistently, hence I am closing this issue. Thank you for your support.
Recently, I am failing to bring up the master node as it gets stuck with a 'GET error: Get https://api-int.ocp4.example.com:22623/config/master: EOF'. I will probably open a different issue for that.

from ocp4-helpernode.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.