GithubHelp home page GithubHelp logo

CI test for local bare metal K8s cluster with Docker (in Docker) failing because of Travis CI AUFS problem about ansible-for-kubernetes HOT 8 CLOSED

geerlingguy avatar geerlingguy commented on May 21, 2024
CI test for local bare metal K8s cluster with Docker (in Docker) failing because of Travis CI AUFS problem

from ansible-for-kubernetes.

Comments (8)

geerlingguy avatar geerlingguy commented on May 21, 2024

Fix for the Pi Dramble issue is in this comment: geerlingguy/raspberry-pi-dramble#166 (comment)

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

That fix doesn't seem to have resolved the entire issue.

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

The issue seems to be that when the Docker daemon is started/restarted (after installed inside the container), DNS resolution goes away inside the container.

I checked the /etc/resolv.conf file and in both cases it's the same:

search c.travis-ci-prod-2.internal google.internal
nameserver 127.0.0.11
options attempts:3 ndots:0

But before running the docker role, I get a successful ping:

root@e749afc94541:/# ping www.google.com
PING www.google.com (172.217.0.4) 56(84) bytes of data.
64 bytes from ord38s04-in-f4.1e100.net (172.217.0.4): icmp_seq=1 ttl=54 time=11.8 ms

After running the role, I get:

root@e749afc94541:/# ping www.google.com
ping: www.google.com: Temporary failure in name resolution

I'm going to try to force-mount a read-only resolv.conf in the container with Google DNS to see if that'll fix things.

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

Well... that worked. We'll see if this test run succeeds!

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

Still not completely resolved. Testing inside the Travis CI environment now.

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

Can't find much there either. Just kubelet failing for unspecified reasons :/

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

After burning a couple more hours on this, and spending a little time digging through kubeadm, kubelet, and docker inside a running Travis CI environment. I couldn't find anything pointing to the actual problem.

It seems like kubelet would start, it would connect to the Docker daemon... and then... nothing!

Very puzzling. I've seen kubelet fail in 500 different ways, but there was always some error message that would help. In this case, kubeadm says "control plane never started" and kubelet says "connected to docker" then... nothing. Just starts spewing out the endless loops of 'can't get v1.Nodes, v1.Pods, v1.Services, forever and ever.

Docker was running fine on all the nodes, and I compared this setup to the very similar one running for raspberry-pi-dramble, and couldn't find any other issue.

The only difference, really, is that this setup was using a three-container setup managed by docker-compose, whereas the Drupal VM one used a single container and ran the playbook inside the container. But even so, the master never initialized all the way, and I could never figure out why.

I solved the AUFS problem. I set the docker daemon to use systemd instead of cgroups. I forced DNS to use 8.8.8.8 by mounting a resolv.conf file in the containers, and that fixed the internal docker restart DNS issues... but I couldn't figure out where to go next.

Therefore I give up, and I'll just lint this playbook for now. Testing will go against VirtualBox/Vagrant locally.

from ansible-for-kubernetes.

geerlingguy avatar geerlingguy commented on May 21, 2024

As a final note, I was also running into an issue with conntrack problems (docker not allowing file writes to update the resource size) when I was trying the DinD K8s cluster approach with multiple containers locally: kubernetes-retired/kubeadm-dind-cluster#50

So it wasn't fun getting docker-in-docker with kubernetes-in-docker on multi-container cluster setup working in Travis CI. And though it would get up and running locally, kube-proxy would never start up because it couldn't write to that file.

from ansible-for-kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.