GithubHelp home page GithubHelp logo

lyft / cni-ipvlan-vpc-k8s Goto Github PK

View Code? Open in Web Editor NEW
361.0 522.0 57.0 197 KB

AWS VPC Kubernetes CNI driver using IPvlan

License: Apache License 2.0

Makefile 1.30% Go 98.37% Dockerfile 0.33%
lyft

cni-ipvlan-vpc-k8s's People

Contributors

bpownow avatar chris-h-phillips avatar dbyron0 avatar jonathanburns avatar keith avatar lbernail avatar mikecutalo avatar mjchoi avatar paulnivin avatar serialvelocity avatar theatrus avatar tomwans avatar ungureanuvladvictor avatar xjerod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cni-ipvlan-vpc-k8s's Issues

Incorrect handling of errors in IPAM

The error handling in the IPAM add command is such that it is very hard to understand why something is failing. I've run into two problems that are due to the same general issue:

  1. The allocateClient doesn't differentiate between an interface already being maxed out on allowed IPv4 or IPv6 IP addresses and all IP addresses in a subnet being taken. Only the later is represented in the error here
  2. The error message from AllocateClient.AllocateIPFirstAvailableAtIndex(...) is clobbered unless there is more than one subnet that is tagged. This happens here.

For [2], if you have a single subnet for allocating Pod ENIs into, and the ENI is already attached to the host but has reached its maximum number of IPs, the error that will get returned is unable to create a new elastic network interface due to No subnets are available which haven't already been used but this is incorrect.

L2 mode versus L3 mode, masquerading and security groups

Hi,

I've been looking in details at your plugins and it looks really great.

I have a few questions on the design:

  • the plugin is relying on L2 mode (efficient and simple) but would probably work great with L3 mode (even if it will require some ip rules to force traffic through the appropriate ENI). Is there a particular reason for this choice?
  • traffic not going to the VPC CIDR block is masqueraded on the main interface. Is that necessary?Having the pod IP address in VPC flowlogs could be interesting and it could also be interesting over VPC peerings (of course that would involve NAT to access the internet)

In addition, it would be great if different ENIs could have different security groups and pods could be assigned to ENIs based on security requirements (using annotations for instance)

Thanks again for this plugin it looks really promising.

Laurent

Build binaries for 0.5.0 release

Hi all,
I see that we don't have the binaries for the 0.5.0 release, but just the source code. Do you think you would be able to build and publish them as part of the release process, as you did for the 0.4.2 release?

There are some tools like kops that are expecting to download the binaries from Github.

Network rules with this approach?

Thanks for sharing the great work. I wonder how network rules are applied in this mode. Are you still using security group or a separate mechanism has been developed?

Cache Describe* calls

Currently, the IPAM and address allocation logic will make a large number of requests to the ec2:DescribeSubnets AWS API. When running in a large account, its likely to hit throttles from AWS when spawning a lot of new pods.

We should cache the subnets locally across IPAM invocations for at least a few minutes.

Clarify configuration and setup instructions

Some gaps are:

  • Where does the configuration file go? We don't currently mention its a .conflist file for example.
  • Limitations and assumptions (if you're not using an AWS centric image and are using NetworkManager, for example)
  • Lack of a step by step

New release

Hi,

We're experimenting with this at the moment, and noticed that the latest released version is someway behind HEAD. any chance you could create a new release?

Thanks.

Plugin does not work with containerd runtime

I have tried to use the CNI plugin with containerd 1.1 (which ships with built-in CRI support) and it does not work.

After digging into the issue, it seems that containerd cannot retrieve the IP address of eth0 from the plugin output and fails to create the sandbox.

Looking at the code for unnumbered_ptp plugin, it seems that it is overriding the interface created by the ipvlan plugin: https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/unnumbered-ptp/unnumbered-ptp.go#L244 so the final output does not contain information on eth0.

In addition, the plugin seems to "move" the IP address created for interface eth0 to interface veth0 (https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/unnumbered-ptp/unnumbered-ptp.go#L241) which does not work with containerd because it is explicitly looking for the IP address associated to eth0.

I tried to modify this behavior to append the veth interfaces to the interface list generated by the ipvlan plugin and disable moving the IP address to veth0. After this change containerd successfully created network sandboxes. I may have missed something in the logic of the unnumbered_ptp plugin. I will submit a tentative PR fixing this. Happy to discuss this further.

Additional interfaces could be in the same subnet

The IPAM plugin currently requires that all ENIs attached to the instance are in different subnets, but the plugin will work if interfaces are in the same subnet (in IPVLAN L2 mode there is no route lookup on the host, packets just exit from the mater interface)

We currently run a slightly modified version of the plugin that removes this requirement and it works fine.

I'm happy to propose a PR to allow this behavior. Here is a first idea:

  • add a new multipleENIPerSubnet configuration parameter to the IPAM plugin that defaults to false
  • when set to true disable unicity checks in NewInterface (aws/interface.go)

To make it work in L3 or L3S mode we will need to create additional ip route tables (one per interface) forcing traffic through the ipvlan master interface and create IP rules for each pod to use the appropriate route table (I haven't tested L3 mode with the plugin yet but I'm not sure it can work without IP rules even with different subnets).

Creating / Deleting route tables for L3/L3S modes should probably be done in configureInterface and a new unConfigureInterface function called in RemoveInterface.

Creating host IP rules could be done in new functions addIPRule and delIPRule called by cmdAdd and cmdDel

Also happy to integrate this in a PR

Primary ENI IP

It looks like the Primary ENI IP is allocated, but not used as a podIP.

This would not be an issue on bigger servers with more interfaces and IPs per interface, or even more servers, but starting out small this removes precious resources.

Is there a reason for this or would it be possible to change this behavior?

Would be willing to do a PR if this can be done and is a good idea.

CNI plugin: `portmap` when chained with cni-ipvlan-vpc-k8s does not work

This is my generated conflist that gets picked up by Kubernetes:

{
  "cniVersion": "0.3.1",
  "name": "cni-ipvlan-vpc-k8s",
  "plugins": [
  {
      "cniVersion": "0.3.1",
      "type": "cni-ipvlan-vpc-k8s-ipvlan",
      "mode": "l2",
      "master": "ipam",
      "ipam": {
        "type": "cni-ipvlan-vpc-k8s-ipam",
        "interfaceIndex": 1,
        "subnetTags": {
            "cni-ipvlan-cluster-subnet": "$CLUSTER_NAME"
        },
        "secGroupIds": [
            "$SECURITY_GROUP_ID"
        ],
        "routeToVpcPeers": true,
        "skipDeallocation": $SKIP_DEALLOCATION
      }
    },
    {
        "cniVersion": "0.3.1",
        "type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
        "hostInterface": "$DEFAULT_NIC",
        "containerInterface": "veth0",
        "ipMasq": true
    },
    {
        "cniVersion": "0.3.1",
        "type": "loopback"
    },
    {
        "type": "portmap",
        "capabilities": {"portMappings": true}
    }
  ]
}

Deploying an nginx ingress + using hostPort does not seem to work. Doing a quick sudo iptables -L -n -t nat in a cluster using a different CNI plugin (e.g. flannel) gives me the expected iptables rules but my test cluster using the cni-ipvlan-vpc-k8s CNI plugin does not.

Pods loosing connection to each other on particular node from time to time

Hello,

we are seeing quite strange and (at least for me) hard to debug issue when a Kuberetes node gets into a state where any pod running on it cannot reach any other pods withing the cluster including pods running on the very same node.

We have a node being in this situation right now, so I can provide any debug output if needed (the node is cordened from Kubernetes and all production pods are drained from it).

Let me describe our setup first:

root@ip-10-110-174-111:~# /opt/cni/bin/cni-ipvlan-vpc-k8s-tool eniif 
iface   mac                 id                      subnet                     subnet_cidr       secgrps                  vpc                     ips                                                                                                                                                                                                                         
eth0    0a:2a:d3:92:7e:8c   eni-018ae0468fdf10781   subnet-0e238589d27216701   10.110.174.0/25   [sg-0ccda037d02911c59]   vpc-0f1dab3aaf561ddba   [10.110.174.111]                                                                                                                                                                                                            
eth1    0a:b5:1f:03:d1:d6   eni-0f5769f94c79afb21   subnet-0004ceb2afd7e3ba7   100.96.224.0/19   [sg-0ccda037d02911c59]   vpc-0f1dab3aaf561ddba   [100.96.236.178 100.96.235.253 100.96.255.154 100.96.248.69 100.96.254.43 100.96.253.184 100.96.234.151 100.96.254.206 100.96.244.19 100.96.230.47 100.96.251.97 100.96.239.70 100.96.228.204 100.96.236.91 100.96.225.8]   
eth2    0a:e2:0f:b4:49:4c   eni-0aba7b11eb2077b50   subnet-0004ceb2afd7e3ba7   100.96.224.0/19   [sg-0ccda037d02911c59]   vpc-0f1dab3aaf561ddba   [100.96.254.118 100.96.247.76 100.96.233.85 100.96.228.225 100.96.255.230 100.96.227.61 100.96.230.198 100.96.247.0 100.96.236.129 100.96.255.57 100.96.236.8]

We only have two pods running on the node now:

fluentd-loggly-hb9lf                                             1/1       Running     0          228d      100.96.236.178   ip-10-110-174-111.eu-central-1.compute.internal
jhorky-shell                                                     1/1       Running     0          21m       100.96.254.43    ip-10-110-174-111.eu-central-1.compute.internal

The pods can't see each other even though they are in the same (/19) subnet and running on the same node:

bash-5.0# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UNKNOWN group default 
    link/ether 0a:b5:1f:03:d1:d6 brd ff:ff:ff:ff:ff:ff
    inet 100.96.254.43/19 brd 100.96.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8b5:1fff:fe03:d1d6/64 scope link dadfailed tentative 
       valid_lft forever preferred_lft forever
4: veth0@if114: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 0a:70:58:d2:9d:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::870:58ff:fed2:9dd6/64 scope link 
       valid_lft forever preferred_lft forever
bash-5.0# ping 100.96.236.178
PING 100.96.236.178 (100.96.236.178) 56(84) bytes of data.
From 100.96.254.43 icmp_seq=1 Destination Host Unreachable
From 100.96.254.43 icmp_seq=2 Destination Host Unreachable
From 100.96.254.43 icmp_seq=3 Destination Host Unreachable

In a tcpdump (running with -i any) on the compute node, I see these ARP requests but no replies:

15:36:09.107512 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:10.129829 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:11.153879 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:12.177949 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:13.201804 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28

When trying to ping the other way around, the situation is very different:
The ping works:

PING 100.96.254.43 (100.96.254.43) 56(84) bytes of data.
64 bytes from 100.96.254.43: icmp_seq=1 ttl=64 time=0.168 ms
64 bytes from 100.96.254.43: icmp_seq=2 ttl=64 time=0.051 ms

The tcpdump running on the "debug jhorky" docker shows: just this (no ICMP messages?!?):

15:57:22.065934 ARP, Request who-has 100.96.254.43 tell 100.96.236.178, length 28
15:57:22.066041 ARP, Reply 100.96.254.43 is-at 0a:b5:1f:03:d1:d6, length 28

The tcpdump running on the compute node doesn't show any icmp as well:

15:57:16.241711 ARP, Request who-has 100.96.236.178 tell 10.110.174.111, length 28
15:57:16.241732 ARP, Reply 100.96.236.178 is-at d2:5d:eb:14:bf:f1, length 28

Anyway, right now, I have no idea what more to look at.

Once again, the node is in this state, so I can provide any output needed.

Any help much appreciated.

aws-vpc-cni-k8s vs lyft cni

We should add a comparison between aws-vpc-cni-k8s vs lyft cni, so users get familiar with lyft's advantages over AWS CNI

Implementing deallocation garbage collection in the tool

The IPAM plugin has a configuration option to never deallocate addresses, however there is no way to eventually clean up these addresses outside of terminating a node.

There should be a mark and sweep system to locate persistently free IPs and mark them for removal at a later pass. In the spirit of running daemon-less, we should record marks in a state file.

Build issues with golang:1.9.2-alpine3.6 docker image

Trying to build/install this onto a coreos box for my kubernets cluster. I am pretty sure its something I am doing, but if you have any insight that would help, it would be appreciated.

Repo: https://github.com/C45tr0/install-cni-ipvlan-vpc-k8s
Log:

docker run --net=host --rm -v /Path/install-cni-ipvlan-vpc-k8s/tmp:/shared install-cni-ip-vlan-vpc-k8s:latest
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.2-223-gf6e5807065 [http://dl-cdn.alpinelinux.org/alpine/v3.6/main]
v3.6.2-216-g48901173c2 [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
OK: 8437 distinct packages available
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
(1/18) Installing binutils-libs (2.28-r3)
(2/18) Installing binutils (2.28-r3)
(3/18) Installing gmp (6.1.2-r0)
(4/18) Installing isl (0.17.1-r0)
(5/18) Installing libgomp (6.3.0-r4)
(6/18) Installing libatomic (6.3.0-r4)
(7/18) Installing pkgconf (1.3.7-r0)
(8/18) Installing libgcc (6.3.0-r4)
(9/18) Installing mpfr3 (3.1.5-r0)
(10/18) Installing mpc1 (1.0.3-r0)
(11/18) Installing libstdc++ (6.3.0-r4)
(12/18) Installing gcc (6.3.0-r4)
(13/18) Installing libssh2 (1.8.0-r1)
(14/18) Installing libcurl (7.57.0-r0)
(15/18) Installing expat (2.2.0-r1)
(16/18) Installing pcre (8.41-r0)
(17/18) Installing git (2.13.5-r0)
(18/18) Installing make (4.2.1-r0)
Executing busybox-1.26.2-r9.trigger
OK: 108 MiB in 30 packages
Grabbing golang/dep
Grabbing lyft/cni-ipvlan-vpc-k8s
# github.com/vishvananda/netlink
src/github.com/vishvananda/netlink/bpf_linux.go:4:23: fatal error: asm/types.h: No such file or directory
 #include <asm/types.h>
                       ^
compilation terminated.
Making cni-ipvlan-vpc-k8s
fatal: No names found, cannot describe anything.
/go/bin/dep ensure -v
Gopkg.lock was already in sync with imports and Gopkg.toml
(1/19) Wrote github.com/j-keck/arping@master
(2/19) Wrote github.com/coreos/[email protected]
(3/19) Wrote github.com/docker/[email protected]
(4/19) Wrote github.com/vishvananda/netns@master
(5/19) Wrote github.com/pkg/[email protected]
(6/19) Wrote github.com/Microsoft/[email protected]
(7/19) Wrote github.com/jmespath/go-jmespath@0b12d6b5
(8/19) Wrote github.com/go-ini/[email protected]
(9/19) Wrote github.com/vishvananda/netlink@master
(10/19) Wrote github.com/nightlyone/lockfile@master
(11/19) Wrote github.com/containernetworking/[email protected]
(12/19) Wrote github.com/urfave/[email protected]
(13/19) Wrote golang.org/x/sys@master
(14/19) Wrote github.com/containernetworking/[email protected]
(15/19) Wrote golang.org/x/net@master
(16/19) Wrote github.com/docker/[email protected]
(17/19) Wrote github.com/docker/[email protected]
(18/19) Wrote github.com/aws/[email protected]
(19/19) Wrote github.com/docker/[email protected]
go install ./
# github.com/lyft/cni-ipvlan-vpc-k8s/vendor/github.com/vishvananda/netlink
vendor/github.com/vishvananda/netlink/bpf_linux.go:4:23: fatal error: asm/types.h: No such file or directory
 #include <asm/types.h>
                       ^
compilation terminated.
make: *** [Makefile:19: cache] Error 2
cp: can't stat 'cni-ipvlan-vpc-k8s-*.tar.gz': No such file or directory

missing routes on first pod for an interface

I installed the plugin in our dev cluster and after much whack-a-mole rescheduling of pods that couldn't talk to other pods things settled down. The problem pods kept coming back when new nodes would be added to the cluster. We use kops and the rolling update causes a lot of pod rescheduling. Eventually I started to look closer and found the trouble pods didn't have any routes for the vpc ranges at all. I also noticed that those pods were all the first pod that got assigned to a new eni whether it was a new node or just the next needed eni on a busy node.

Looking at the code it seems that this lag between new eni and ec2 metadata service results being fully populated is a known thing. The ipam plugin depends on the ranges from the vpc-ipv4-cidr-blocks section of the metadata to set up the correct routes for the veth interfaces. However, the parsing of the vpc-ipv4-cidr-blocks call doesn't return an error if no cidrs were found. This results in an interface with no routes to the vpc.

So one approach to solving that could be simply checking the length of the vpc-ipv4-cidr-blocks slice and returning an error if it's 0. That would cause a retry on the metadata service until it gets a result. I'm somewhat concerned about that solution because I wonder if there is an intermediate state where some but not all of the VPC ranges get returned and so we're back where we started with required routes missing.

Another way of solving it would be to just use the describe vpcs api to get the vpc ranges all the time instead of relying on the metadata service. It appears that there is some desire to not use the describe VPC api calls to get the vpc cidrs. I'd like to get more context from those who know about that preference. We already have to have quite a few IAM permissions available to make the plugin work it doesn't seem like describe vpcs is that onerous as a part of that list. Maybe it's just to have one less place where config data is extracted from?

Error adding network: no plugin name provided

Thanks for sharing the great work. I am trying to test it out but was not able to bring up following pods

kubectl create -f https://k8s.io/docs/tasks/access-application-cluster/hello.yaml

I am using version 1.7.10

kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

I am seeing following error in the log file:

kubelet[3752]: E1212 21:12:29.218496    3752 cni.go:294] Error adding network: no plugin name provided

I am using the configuration file provided in README and have updated it with my secGroupIds

{
  "cniVersion": "0.3.1",
  "name": "cni-ipvlan-vpc-k8s",
  "plugins": [
  {
      "cniVersion": "0.3.1",
      "type": "cni-ipvlan-vpc-k8s-ipvlan",
      "mode": "l2",
      "master": "ipam",
      "ipam": {
          "type": "cni-ipvlan-vpc-k8s-ipam",
          "interfaceIndex": 1,
              "subnetTags": {
              "kubernetes_kubelet": "true"
          },
          "secGroupIds": [
              "sg-34b79141"
              ]
          }
    },
    {
        "cniVersion": "0.3.1",
        "type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
        "hostInterface": "eth0",
        "containerInterface": "veth0",
        "ipMasq": true
    }
    ]
}

Here are cni binaries in /opt/cni/bin:

root@ip-10-0-55-131:/opt/cni/bin# ls
bridge			 cni-ipvlan-vpc-k8s-ipvlan   cni-ipvlan-vpc-k8s-tool		flannel     loopback
cni-ipvlan-vpc-k8s-ipam  cni-ipvlan-vpc-k8s-.tar.gz  cni-ipvlan-vpc-k8s-unnumbered-ptp	host-local  ptp

Here is the code snippet

~/workspace/src/k8s-v1.7.10/kubernetes-1.7.10/vendor/github.com/containernetworking/cni/pkg/invoke/find.go
// FindInPath returns the full path of the plugin by searching in the provided path
func FindInPath(plugin string, paths []string) (string, error) {
        if plugin == "" {
                return "", fmt.Errorf("no plugin name provided")
        }
                
        if len(paths) == 0 {
                return "", fmt.Errorf("no paths provided")
        }       

        for _, path := range paths {
                for _, fe := range ExecutableFileExtensions {
                        fullpath := filepath.Join(path, plugin) + fe
                        if fi, err := os.Stat(fullpath); err == nil && fi.Mode().IsRegular() {
                                return fullpath, nil
                        }
                }
        }               
                
        return "", fmt.Errorf("failed to find plugin %q in path %s", plugin, paths)
}

Kubelet is running with following options:

root      3752     1  1 21:07 ?        00:00:30 /usr/local/bin/kubelet --node-ip --allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --hostname-override=ip-10-0-55-131.ec2.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=100.64.0.0/10 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --require-kubeconfig=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/

Can you tell me if I missed anything?

thank you very much

Pods stucked in ContainerCreating (possible duplicate IP assignment from IPAM)

I am seeing some pods stuck in ContainerCreating state when I run

kubectl create -f https://k8s.io/docs/tasks/access-application-cluster/hello.yaml

Here is the output

kubectl get pod
NAME                     READY     STATUS              RESTARTS   AGE
hello-1243552595-1bm63   1/1       Running             0          3h
hello-1243552595-bvc3r   1/1       Running             0          3h
hello-1243552595-hrj3s   0/1       ContainerCreating   0          3h
hello-1243552595-mv49s   0/1       ContainerCreating   0          3h

I am using version 1.7.10

kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Here is the error msg I am seeing in the log

Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380612    3752 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380659    3752 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380678    3752 kuberuntime_manager.go:624] createPodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380711    3752 pod_workers.go:182] Error syncing pod 28da1900-df81-11e7-a409-0e64e3f014fa ("hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)"), skipping: failed to "CreatePodSandbox" for "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"hello-1243552595-mv49s_default\" network: failed to add host route dst 10.0.5.23: file exists"

Looks like cni-plugin tries to use 10.0.5.23 for Pod hello-1243552595-mv49s, whereas 10.0.5.23 is already assigned to another Pod running on the same host.

kubectl describe pod hello-1243552595-bvc3r
Name:           hello-1243552595-bvc3r
Namespace:      default
Node:           ip-10-0-55-131.ec2.internal/10.0.55.131
Start Time:     Tue, 12 Dec 2017 21:12:28 +0000
Labels:         app=hello
                pod-template-hash=1243552595
                tier=backend
                track=stable
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"hello-1243552595","uid":"28d91673-df81-11e7-a409-0e64e3f014fa","...
                kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container hello
Status:         Running
IP:             10.0.5.23
Controllers:    ReplicaSet/hello-1243552595
Containers:
  hello:
    Container ID:       docker://34c740bf48525b72a7fedceb1dc64951f4ca14bd9fad6f3b2b8e485f09f8c152
    Image:              gcr.io/google-samples/hello-go-gke:1.0
    Image ID:           docker-pullable://gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e
    Port:               80/TCP
    State:              Running
      Started:          Tue, 12 Dec 2017 23:37:23 +0000
    Ready:              True
    Restart Count:      0
    Requests:
      cpu:              100m
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j8llx (ro)
Conditions:
  Type          Status
  Initialized   True
  Ready         True
  PodScheduled  True
Volumes:
  default-token-j8llx:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-j8llx
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
                node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                    -------------           --------        ------          -------
  2h            18m             657     kubelet, ip-10-0-55-131.ec2.internal                            Warning         FailedSync      Error syncing pod
  2h            18m             657     kubelet, ip-10-0-55-131.ec2.internal                            Normal          SandboxChanged  Pod sandbox changed, it will be killed and re-created.
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Pulling         pulling image "gcr.io/google-samples/hello-go-gke:1.0"
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Pulled          Successfully pulled image "gcr.io/google-samples/hello-go-gke:1.0"
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Created         Created container
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Started         Started container

Here is the ip route output

ip route
default via 10.0.32.1 dev eth0
10.0.5.0/24 dev eth1  proto kernel  scope link  src 10.0.5.154
10.0.5.23 dev veth350a598b  scope link
10.0.32.0/19 dev eth0  proto kernel  scope link  src 10.0.55.131
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1

Here is the docker ps output

docker ps
CONTAINER ID        IMAGE                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
efd4c399f948        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 6 seconds ago       Up 5 seconds                            k8s_POD_kube-dns-2712020956-90fd9_kube-system_ebba507c-df81-11e7-a409-0e64e3f014fa_111
23cac500ae36        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 6 seconds ago       Up 5 seconds                            k8s_POD_hello-1243552595-mv49s_default_28da1900-df81-11e7-a409-0e64e3f014fa_111
52532863c1d4        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 7 seconds ago       Up 6 seconds                            k8s_POD_hello-1243552595-hrj3s_default_28da0c5e-df81-11e7-a409-0e64e3f014fa_111
34c740bf4852        gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e   "/usr/bin/hello"         20 minutes ago      Up 20 minutes                           k8s_hello_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_0
1c17414c1e46        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 20 minutes ago      Up 20 minutes                           k8s_POD_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_1
3ad87d21eea0        protokube:1.6.0                                                                                              "/usr/bin/protokube -"   3 hours ago         Up 3 hours                              distracted_aryabhata

Cannot connect to pod outside VPC

The setup is as follows:

  • node -- 10.102.12.217.
  • vpn node -- 10.102.49.255
  • outside client -- 10.103.0.2
  • pod (( coredns/busybox )) -- 10.102.11.165.

client & node aren't directly connected, there's vpn node in between them.

I can do the following pings/netcats (( i.e. testing connectivity )):

client <--> node (( and vice versa ))
vpn node <--> pod (( and vice versa ))
pod --> client (( but not vice versa ))

Upon closer inspection (( that is running tcpdump on the node I see the following )):

ubuntu@ip-10-102-12-217:~$ sudo tcpdump -i any icmp -nnn
13:26:13.820856 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 1, length 64
13:26:14.831852 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 2, length 64
13:26:15.856034 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 3, length 64

The AWS properly routes the package to the node, yet there's no reply if the source address is outside VPC stack.

Version: v0.5.0

Configuration:

{
    "cniVersion": "0.3.1",
    "name": "cni-ipvlan-vpc-k8s",
    "plugins": [
	{
	    "cniVersion": "0.3.1",
	    "type": "cni-ipvlan-vpc-k8s-ipam",
	    "interfaceIndex": 1,
	    "subnetTags": {
		"kubernetes_kubelet": "{{ kubernetes_kubelet }}"
	    },
	    "secGroupIds":  {{ secGroupIds | to_nice_json(indent=12) }}
	},
	{
	    "cniVersion": "0.3.1",
	    "type": "cni-ipvlan-vpc-k8s-ipvlan",
	    "mode": "l2"
	},
	{
	    "cniVersion": "0.3.1",
	    "type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
	    "hostInterface": "{{ ansible_default_ipv4.interface }}",
	    "containerInterface": "veth0",
	    "ipMasq": true
	}
    ]
}

(( the {{ }} placeholders are filler with proper values )).

I've marked single private subnet with the needed tag, and it's 10.102.0.0/20 subnet. VPN node is in a different subnet (since it has public IP).

Listing VPC CIDRs to create routes in pods does not filter on association state

DescribeVPCCIDRs adds all CIDR associated with the VPC regardless of association state.
If this state is different from associated we should not add this CIDR range to the list

When removing a CIDR range from a VPC, the range remains disassociated for a long time (1+ hour) before being removed. We should only add ranges with status associated.

Possible association states:

  • associating
  • associated
  • disassociating
  • disassociated
  • failing
  • failed

I'm not sure what we should do with the associating state because it may fail

Disabling ipMasq

I was looking into using the VPC CNI plugin with ipMasq disabled so our traffic goes out via each ethX rather than host eth0 (VPC traffic + 0.0.0.0/0).

From the looks of it when ipMasq gets disabled the pods lose egress to non-VPC CIDRs. I assume that is expected. From the readme i found that this config flag aws added to hadnle kube2iam case? In our org we do not run kube2iam and we are restricting access to the metadata endpoint by other means.

Any future plans to support this mode of operating?

IP Batching changes the default behavior

Hello,

I've been working on a PR (coming very soon) and I was surprised at the amount of addresses allocated to the interfaces on my test instance. I discovered it was related to #65 which by will now allocate all possible IPs when an interface is attached by default.

In our setup we have low pod density on nodes and on large instance this will allocate far too many IPs. We can of course configure ipBatchSize, but I think changing the default batch size to 1 would avoid impacting existing setups. More than happy to do this very simple PR if you think it makes sense.

cc @theatrus @jonathanburns

Dependency on some private lyft repos

Hi,

I am getting the following errors when trying to retrieve dependencies. Looks like it has some dependencies to private repo:

โžœ cni-ipvlan-vpc-k8s git:(master) dep ensure
ensure Solve(): no valid source could be created:
failed to set up sources from the following URLs:
https://github.com/lyft/cni-eni
: remote repository at https://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
ssh://[email protected]/lyft/cni-eni
: remote repository at ssh://[email protected]/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
git://github.com/lyft/cni-eni
: remote repository at git://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
http://github.com/lyft/cni-eni
: remote repository at http://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128

Inconsistency between ENI allocated IPs and OS configuration

We are seeing an issue that seems to happens regularly: some pods have no network connectivity

After looking into the configuration it turns out that when this happens we are in the following situation:

  • pod sandbox configured properly (veth and ipvlan interfaces, as well as proper routing configurations)
  • IP of the pod not associated with the ENI so traffic is dropped by the VPC

After looking into logs we found the following:

  • cloudtrail shows a call to unassociate the IP address from the ENI (which seems to indicate that the CNI plugin was called with DELETE, but the routes and iptables rules are still there
  • the sandbox itself is not deleted. We found some errors in the kubelet logs, not sure it this is related:
failed to remove pod init container "consul-template": failed to get container status "371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70": rpc error: code = Unknown desc = Error: No such container: 371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70; Skipping pod "alerting-metric-evaluator-anomaly-0_datadog(4c15f7d2-5783-11e8-903a-02fc6d7aa9b8)"
  • kubelet tries to restart containers in the same sandbox (which fails because the pods have no network connectivity, which is required by the init container)

Any idea what could trigger this situation? Our current setup uses docker, kubelet 1.10 and the latest version of the CNI plugin.

I think SkipDeallocation could probably help but I'd like to understand exactly what is happening.

I wonder if allowing for more verbose logs could help in this kind of situation (for instance log ADD/DELETE calls with parameters)

Nodeports not working properly

When setting up a Nodeport if we access a host where the target service is running and load-balancing chooses the local pod, the traffic is dropped.

Everything seems to work OK because if the first SYN is dropped the client will retry (however queries load-balanced to the local pod take much longer) and will (probably) be sent to another host.

This can be seen by logging martian packets. When traffic is sent to a local pod it will be dropped with the following log:

[912228.409488] IPv4: martian source 172.30.182.212 from 172.21.51.75, on dev ens3
[912228.409534] ll header: 00000000: 0e d8 07 a0 c0 0c 0e f0 4b 50 fd 5c 08 00        ........KP.\..

To trigger the issue I simply did this until the answer took more than 1s:

$ curl http://172.30.183.34:30054

where 172.30.183.34 is the host IP and 30054 the nodeport. The Kube-proxy Nodeport iptables prerouting rules redirected traffic to 172.30.182.212 (local pod for the service) which triggered the martian log.

Looking at routing explains the issue:

$ ip route get 172.30.182.212 from 172.21.51.75 iif ens3
RTNETLINK answers: Invalid cross-device link

$ ip route get 172.30.182.212
172.30.182.212 dev veth3b59a300  src 172.30.183.34

$ ip route get 172.21.51.75 from 172.30.182.212 iif veth3b59a300
172.21.51.75 from 172.30.182.212 via 172.30.182.212 dev veth3b59a300

This means that traffic arrives on ens3 but the reverse route is through the pod (the route getting back to the pod is necessary to access services).

To trigger the issue consistently (100% of the time) we just need to add externalTrafficPolicy: Local to the service definition (or scale the service down to 1 pod)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.