Comments (5)
It seems in baremetal deployments it is required to run:
Just to clarify, the workaround is not needed on ALL baremetal deployments, its needed on deployments where the hostnames of the masters don't conform "master-", in most cases this is happening when environments don't use the dev-scripts DNS (MANAGE_BR_BRIDGE=n), we have baremetal deployments out there that are using the libvirt DNS and these should be fine.
from dev-scripts.
Current deployment logs:
level=info msg="API v1.13.4+af45cda up"
level=info msg="Waiting up to 1h0m0s for the bootstrap-complete event..."
level=debug msg="added kube-controller-manager.15945f50e4fbad81: comlog1.cloud.lab.eng.bos.redhat.com_b9976643-5c36-11e9-9d47-7efaccbf92ca became leader"
level=debug msg="added kube-scheduler.15945f511cc63729: comlog1.cloud.lab.eng.bos.redhat.com_b9960ca4-5c36-11e9-af59-7efaccbf92ca became leader"
It seems the kube-apiserver-* and kube-controller-manager-* pods fail:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
kni1-c13u23.cloud.lab.eng.bos.redhat.com Ready master 2m25s v1.13.4+1ad602308
kni1-c13u25.cloud.lab.eng.bos.redhat.com Ready master 2m32s v1.13.4+1ad602308
kni1-c13u27.cloud.lab.eng.bos.redhat.com Ready master 2m18s v1.13.4+1ad602308
$ oc get pods --all-namespaces | grep -v -E 'Running|Complete'
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-kube-apiserver kube-apiserver-kni1-c13u25.cloud.lab.eng.bos.redhat.com 1/2 CrashLoopBackOff 2 112s
openshift-kube-controller-manager kube-controller-manager-kni1-c13u25.cloud.lab.eng.bos.redhat.com 0/1 CrashLoopBackOff 3 95s
The controller fails because cannot access the API:
$ oc logs kube-controller-manager-kni1-c13u25.cloud.lab.eng.bos.redhat.com -n openshift-kube-controller-manager
...
I0411 09:18:02.320212 1 glog.go:58] FLAGSET: csrsigning controller
I0411 09:18:02.320216 1 flags.go:33] FLAG: --cluster-signing-cert-file="/etc/kubernetes/static-pod-resources/secrets/csr-signer/tls.crt"
I0411 09:18:02.320220 1 flags.go:33] FLAG: --cluster-signing-key-file="/etc/kubernetes/static-pod-resources/secrets/csr-signer/tls.key"
I0411 09:18:02.320224 1 flags.go:33] FLAG: --experimental-cluster-signing-duration="720h0m0s"
I0411 09:18:02.780578 1 serving.go:312] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp [::1]:6443: connect: connection refused
The API fails because cannot access etcd
$ oc logs kube-apiserver-kni1-c13u25.cloud.lab.eng.bos.redhat.com -n openshift-kube-apiserver -c kube-apiserver-1
...
I0411 08:56:09.357815 1 server.go:568] external host was not specified, using 10.19.138.12
I0411 08:56:09.358129 1 server.go:611] Initializing cache sizes based on 0MB limit
I0411 08:56:09.358226 1 server.go:146] Version: v1.13.4+af45cda
I0411 08:56:09.716079 1 clientca.go:92] [0] "/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt" client-ca certificate: "aggregator-signer" [] issuer="<self>" (2019-04-11 08:25:47 +0000 UTC to 2019-04-12 08:25:47 +0000 UTC (now=2019-04-11 08:56:09.716056853 +0000 UTC))
I0411 08:56:09.716568 1 clientca.go:92] [0] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "kube-ca" [] issuer="<self>" (2019-04-11 08:25:44 +0000 UTC to 2029-04-08 08:25:44 +0000 UTC (now=2019-04-11 08:56:09.716557964 +0000 UTC))
I0411 08:56:09.716596 1 clientca.go:92] [1] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "admin-kubeconfig-signer" [] issuer="<self>" (2019-04-11 08:25:44 +0000 UTC to 2029-04-08 08:25:44 +0000 UTC (now=2019-04-11 08:56:09.716578289 +0000 UTC))
I0411 08:56:09.716607 1 clientca.go:92] [2] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "kubelet-signer" [] issuer="<self>" (2019-04-11 08:25:49 +0000 UTC to 2019-04-12 08:25:49 +0000 UTC (now=2019-04-11 08:56:09.716601658 +0000 UTC))
...
I0411 08:56:09.728516 1 resolver_conn_wrapper.go:116] ccResolverWrapper: sending new addresses to cc: [{etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 0 <nil>}]
I0411 08:56:09.728672 1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>} {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>} {etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>}]
W0411 08:56:09.730413 1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-1.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
...
W0411 08:56:13.480785 1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-1.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
W0411 08:56:13.509333 1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-0.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
W0411 08:56:13.621924 1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-2.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
...
W0411 08:58:49.954815 1 clientconn.go:953] Failed to dial etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.
F0411 08:58:49.954775 1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 openshift.io [https://etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 https://etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 https://etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379] /etc/kubernetes/static-pod-resources/secrets/etcd-client/tls.key /etc/kubernetes/static-pod-resources/secrets/etcd-client/tls.crt /etc/kubernetes/static-pod-resources/configmaps/etcd-serving-ca/ca-bundle.crt true 0xc420a0fdd0 <nil> 5m0s 1m0s}), err (dial tcp: lookup etcd-2.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host)
W0411 08:58:49.954833 1 clientconn.go:953] Failed to dial etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.
W0411 08:58:49.954859 1 clientconn.go:953] Failed to dial etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.
from dev-scripts.
This is still required, right?
from dev-scripts.
@celebdor Could you clarify for us (1) what's needed to fix this permanently, and (2) if we could remove this from the create cluster script now and require environments that need it to just run the fix_ep.sh script at the right time? Or is it needed on all deployments?
We'd like to remove the early exit from the installer, but this patch_ep_host_etcd code is one of the things blocking it (see openshift-metal3/kni-installer#60)
from dev-scripts.
Removed in #620.
from dev-scripts.
Related Issues (20)
- A cluster name that exceeds 14 characters causes an error during bridge creation HOT 5
- error TASK [libvirt : Create libvirt networks] HOT 5
- Deployment throws error with config OPENSHIFT_VERSION=4.5.0
- FATAL failed to fetch Master Machines: failed to load asset "Install Config": invalid "install-config.yaml" HOT 3
- Error when chown runs and user name differs from group name
- Use openshift-install coreos print-stream-json HOT 3
- Allow creating a custom number of extra disks HOT 2
- Align Ansible version to metal3-dev-env one HOT 2
- libvirt net ostestbm won't start after reboot HOT 2
- make agent fails with error AGENT_E2E_TEST_SCENARIO is missing
- lxml install needs devel packages for arm64
- The Cluster creation fails with Error: could not contact Ironic API: context deadline exceeded HOT 6
- openshift-sdn support is about to be removed from the installer
- Wrong command in ./01_install_requirements.sh script HOT 6
- Make sure NetworkManager-initscripts-updown is installed on RHEL9 host.
- Failure to create dualstack v6 primary cluster
- Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory HOT 3
- 01_requirements.sh errors when `go` is pre-installed.
- Suspect yq doesn't always successfully install on 01_install_requirements.sh L100 HOT 3
- RHEL9: libvirt-sock file not found error during cluster bringup
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dev-scripts.