Comments (14)
/cc @mcornea @achuzhoy @celebdor
from dev-scripts.
Thanks for the report, can you provide the keepalived logs from all masters please?
from dev-scripts.
@hardys logs from keepalieved containers
from dev-scripts.
Worth mentioning that both masters and workers nodes end up in NotReady status:
oc get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 3d2h v1.13.4+1ad602308
master-1 NotReady master 3d2h v1.13.4+1ad602308
master-2 NotReady master 3d2h v1.13.4+1ad602308
worker-0 NotReady worker 2d22h v1.13.4+1ad602308
from dev-scripts.
Hit this issue in my deployment, only master-2 went to NotReady. Exact same issue described in this issue.
from dev-scripts.
@yprokule , seems like there's L2 connectivity issue for 192.168.123.5 (as ping doesn't work) ,
- Could you please run the same test for 192.168.123.6 (DNS) and 192.168.123.10 (INGRESS)?
- Could you please attach the output of arp table (arp -a) from all nodes?
from dev-scripts.
master-0
[root@master-0 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.029 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms
[root@master-0 ~]# ping -c1 192.168.123.5
connect: Invalid argument
master-1
[root@master-1 ~]# ping -c1 192.168.123.5
PING 192.168.123.5 (192.168.123.5) 56(84) bytes of data.
64 bytes from 192.168.123.5: icmp_seq=1 ttl=64 time=0.174 ms
--- 192.168.123.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.174/0.174/0.174/0.000 ms
[root@master-1 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.213 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.213/0.213/0.213/0.000 ms
master-2
root@master-2 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.163 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.163/0.163/0.163/0.000 ms
[root@master-2 ~]# ping -c1 192.168.123.5
PING 192.168.123.5 (192.168.123.5) 56(84) bytes of data.
64 bytes from 192.168.123.5: icmp_seq=1 ttl=64 time=0.030 ms
--- 192.168.123.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.030/0.030/0.030/0.000 ms
from dev-scripts.
on other nodes
Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.413449 121307 kubelet.go:2273] node "master-2" not found
Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.497345 121307 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.498449 121307 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope
from dev-scripts.
@yprokule , I think that I found something.
Master-0 doesn't hold the API VIP (192.168.123.5) but I can still see the following HOST entry in the routing table:
192.168.123.5 dev ens4 proto kernel scope link src 192.168.123.5 metric 101
So, when Master-0 try to send any packet to 192.168.123.5, network stack fail with 'connect: Invalid argument' .
I deleted the 192.168.123.5 route from Master-0 , and now I'm able to ping 192.168.123.5.
[core@master-0 ~]$ sudo ip route del 192.168.123.5/32
[core@master-0 ~]$ ping 192.168.123.5
PING 192.168.123.5 (192.168.123.5) 56(84) bytes of data.
64 bytes from 192.168.123.5: icmp_seq=1 ttl=64 time=0.170 ms
64 bytes from 192.168.123.5: icmp_seq=2 ttl=64 time=0.098 ms
64 bytes from 192.168.123.5: icmp_seq=3 ttl=64 time=0.252 ms
64 bytes from 192.168.123.5: icmp_seq=4 ttl=64 time=0.210 ms
^C
--- 192.168.123.5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 92ms
rtt min/avg/max/mdev = 0.098/0.182/0.252/0.058 ms
[core@master-0 ~]$
from dev-scripts.
yep deleting the incorrect route fixed the notready state of the node, which was simply not able to reach the api to report status
from dev-scripts.
I See @karmab has a WIP patch for this here: #369
from dev-scripts.
seems like a RHCOS/RHEL bug, I filed bz for that https://bugzilla.redhat.com/show_bug.cgi?id=1700415
from dev-scripts.
a different workaround here: #377
from dev-scripts.
We've got an open bug tracking the kernel issue. In the meantime, we've updated our config such that the undeleted route won't cause a problem anymore. See #377
from dev-scripts.
Related Issues (20)
- RFE -- Simplify the deployment of specific nightly, CI, and GA builds HOT 4
- [Doc] CI builds does not work with pull secret from https://cloud.redhat.com/openshift/install/pull-secret HOT 1
- Setting UPSTREAM_IRONIC=true results in rpc auth failures HOT 2
- vBMC container missing mount after initial cluster deployment HOT 4
- Pull request #1071 added some strange buffering HOT 1
- A cluster name that exceeds 14 characters causes an error during bridge creation HOT 5
- error TASK [libvirt : Create libvirt networks] HOT 5
- Deployment throws error with config OPENSHIFT_VERSION=4.5.0
- FATAL failed to fetch Master Machines: failed to load asset "Install Config": invalid "install-config.yaml" HOT 3
- Error when chown runs and user name differs from group name
- Use openshift-install coreos print-stream-json HOT 3
- Allow creating a custom number of extra disks HOT 2
- Align Ansible version to metal3-dev-env one HOT 2
- libvirt net ostestbm won't start after reboot HOT 2
- make agent fails with error AGENT_E2E_TEST_SCENARIO is missing
- lxml install needs devel packages for arm64
- The Cluster creation fails with Error: could not contact Ironic API: context deadline exceeded HOT 6
- openshift-sdn support is about to be removed from the installer
- Wrong command in ./01_install_requirements.sh script HOT 6
- Make sure NetworkManager-initscripts-updown is installed on RHEL9 host.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dev-scripts.