GithubHelp home page GithubHelp logo

Comments (5)

derekhiggins avatar derekhiggins commented on July 30, 2024 1

It seems in baremetal deployments it is required to run:

Just to clarify, the workaround is not needed on ALL baremetal deployments, its needed on deployments where the hostnames of the masters don't conform "master-", in most cases this is happening when environments don't use the dev-scripts DNS (MANAGE_BR_BRIDGE=n), we have baremetal deployments out there that are using the libvirt DNS and these should be fine.

from dev-scripts.

e-minguez avatar e-minguez commented on July 30, 2024

Current deployment logs:

level=info msg="API v1.13.4+af45cda up"
level=info msg="Waiting up to 1h0m0s for the bootstrap-complete event..."
level=debug msg="added kube-controller-manager.15945f50e4fbad81: comlog1.cloud.lab.eng.bos.redhat.com_b9976643-5c36-11e9-9d47-7efaccbf92ca became leader"
level=debug msg="added kube-scheduler.15945f511cc63729: comlog1.cloud.lab.eng.bos.redhat.com_b9960ca4-5c36-11e9-af59-7efaccbf92ca became leader"

It seems the kube-apiserver-* and kube-controller-manager-* pods fail:

$ oc get nodes
NAME                                       STATUS   ROLES    AGE     VERSION
kni1-c13u23.cloud.lab.eng.bos.redhat.com   Ready    master   2m25s   v1.13.4+1ad602308
kni1-c13u25.cloud.lab.eng.bos.redhat.com   Ready    master   2m32s   v1.13.4+1ad602308
kni1-c13u27.cloud.lab.eng.bos.redhat.com   Ready    master   2m18s   v1.13.4+1ad602308
$ oc get pods --all-namespaces | grep -v -E 'Running|Complete'
NAMESPACE                                    NAME                                                                READY   STATUS             RESTARTS   AGE
openshift-kube-apiserver                     kube-apiserver-kni1-c13u25.cloud.lab.eng.bos.redhat.com             1/2     CrashLoopBackOff   2          112s
openshift-kube-controller-manager            kube-controller-manager-kni1-c13u25.cloud.lab.eng.bos.redhat.com    0/1     CrashLoopBackOff   3          95s

The controller fails because cannot access the API:

$ oc logs kube-controller-manager-kni1-c13u25.cloud.lab.eng.bos.redhat.com -n openshift-kube-controller-manager
...
I0411 09:18:02.320212       1 glog.go:58] FLAGSET: csrsigning controller
I0411 09:18:02.320216       1 flags.go:33] FLAG: --cluster-signing-cert-file="/etc/kubernetes/static-pod-resources/secrets/csr-signer/tls.crt"
I0411 09:18:02.320220       1 flags.go:33] FLAG: --cluster-signing-key-file="/etc/kubernetes/static-pod-resources/secrets/csr-signer/tls.key"
I0411 09:18:02.320224       1 flags.go:33] FLAG: --experimental-cluster-signing-duration="720h0m0s"
I0411 09:18:02.780578       1 serving.go:312] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp [::1]:6443: connect: connection refused

The API fails because cannot access etcd

$ oc logs kube-apiserver-kni1-c13u25.cloud.lab.eng.bos.redhat.com -n openshift-kube-apiserver -c kube-apiserver-1
...
I0411 08:56:09.357815       1 server.go:568] external host was not specified, using 10.19.138.12
I0411 08:56:09.358129       1 server.go:611] Initializing cache sizes based on 0MB limit
I0411 08:56:09.358226       1 server.go:146] Version: v1.13.4+af45cda
I0411 08:56:09.716079       1 clientca.go:92] [0] "/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt" client-ca certificate: "aggregator-signer" [] issuer="<self>" (2019-04-11 08:25:47 +0000 UTC to 2019-04-12 08:25:47 +0000 UTC (now=2019-04-11 08:56:09.716056853 +0000 UTC))
I0411 08:56:09.716568       1 clientca.go:92] [0] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "kube-ca" [] issuer="<self>" (2019-04-11 08:25:44 +0000 UTC to 2029-04-08 08:25:44 +0000 UTC (now=2019-04-11 08:56:09.716557964 +0000 UTC))
I0411 08:56:09.716596       1 clientca.go:92] [1] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "admin-kubeconfig-signer" [] issuer="<self>" (2019-04-11 08:25:44 +0000 UTC to 2029-04-08 08:25:44 +0000 UTC (now=2019-04-11 08:56:09.716578289 +0000 UTC))
I0411 08:56:09.716607       1 clientca.go:92] [2] "/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt" client-ca certificate: "kubelet-signer" [] issuer="<self>" (2019-04-11 08:25:49 +0000 UTC to 2019-04-12 08:25:49 +0000 UTC (now=2019-04-11 08:56:09.716601658 +0000 UTC))
...
I0411 08:56:09.728516       1 resolver_conn_wrapper.go:116] ccResolverWrapper: sending new addresses to cc: [{etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 0  <nil>}]
I0411 08:56:09.728672       1 balancer_v1_wrapper.go:125] balancerWrapper: got update addr from Notify: [{etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>} {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>} {etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379 <nil>}]
W0411 08:56:09.730413       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-1.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
...
W0411 08:56:13.480785       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-1.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
W0411 08:56:13.509333       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-0.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
W0411 08:56:13.621924       1 clientconn.go:1304] grpc: addrConn.createTransport failed to connect to {etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-2.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host". Reconnecting...
...
W0411 08:58:49.954815       1 clientconn.go:953] Failed to dial etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.
F0411 08:58:49.954775       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 openshift.io [https://etcd-0.kni1.cloud.lab.eng.bos.redhat.com:2379 https://etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379 https://etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379] /etc/kubernetes/static-pod-resources/secrets/etcd-client/tls.key /etc/kubernetes/static-pod-resources/secrets/etcd-client/tls.crt /etc/kubernetes/static-pod-resources/configmaps/etcd-serving-ca/ca-bundle.crt true 0xc420a0fdd0 <nil> 5m0s 1m0s}), err (dial tcp: lookup etcd-2.kni1.cloud.lab.eng.bos.redhat.com on 10.19.143.247:53: no such host)
W0411 08:58:49.954833       1 clientconn.go:953] Failed to dial etcd-1.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.
W0411 08:58:49.954859       1 clientconn.go:953] Failed to dial etcd-2.kni1.cloud.lab.eng.bos.redhat.com:2379: context canceled; please retry.

from dev-scripts.

e-minguez avatar e-minguez commented on July 30, 2024

This is still required, right?

from dev-scripts.

stbenjam avatar stbenjam commented on July 30, 2024

@celebdor Could you clarify for us (1) what's needed to fix this permanently, and (2) if we could remove this from the create cluster script now and require environments that need it to just run the fix_ep.sh script at the right time? Or is it needed on all deployments?

We'd like to remove the early exit from the installer, but this patch_ep_host_etcd code is one of the things blocking it (see openshift-metal3/kni-installer#60)

from dev-scripts.

stbenjam avatar stbenjam commented on July 30, 2024

Removed in #620.

from dev-scripts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.