lyft / cni-ipvlan-vpc-k8s Goto Github PK
View Code? Open in Web Editor NEWAWS VPC Kubernetes CNI driver using IPvlan
License: Apache License 2.0
AWS VPC Kubernetes CNI driver using IPvlan
License: Apache License 2.0
The error handling in the IPAM add command is such that it is very hard to understand why something is failing. I've run into two problems that are due to the same general issue:
allocateClient
doesn't differentiate between an interface already being maxed out on allowed IPv4 or IPv6 IP addresses and all IP addresses in a subnet being taken. Only the later is represented in the error hereAllocateClient.AllocateIPFirstAvailableAtIndex(...)
is clobbered unless there is more than one subnet that is tagged. This happens here.For [2], if you have a single subnet for allocating Pod ENIs into, and the ENI is already attached to the host but has reached its maximum number of IPs, the error that will get returned is unable to create a new elastic network interface due to No subnets are available which haven't already been used
but this is incorrect.
Hi,
I've been looking in details at your plugins and it looks really great.
I have a few questions on the design:
In addition, it would be great if different ENIs could have different security groups and pods could be assigned to ENIs based on security requirements (using annotations for instance)
Thanks again for this plugin it looks really promising.
Laurent
Hi all,
I see that we don't have the binaries for the 0.5.0
release, but just the source code. Do you think you would be able to build and publish them as part of the release process, as you did for the 0.4.2
release?
There are some tools like kops that are expecting to download the binaries from Github.
Thanks for sharing the great work. I wonder how network rules are applied in this mode. Are you still using security group or a separate mechanism has been developed?
The plugin expects all interfaces to be named eth which is not the case on ubuntu for instance
Currently, the IPAM and address allocation logic will make a large number of requests to the ec2:DescribeSubnets AWS API. When running in a large account, its likely to hit throttles from AWS when spawning a lot of new pods.
We should cache the subnets locally across IPAM invocations for at least a few minutes.
Some gaps are:
.conflist
file for example.Hi,
We're experimenting with this at the moment, and noticed that the latest released version is someway behind HEAD. any chance you could create a new release?
Thanks.
I have tried to use the CNI plugin with containerd 1.1 (which ships with built-in CRI support) and it does not work.
After digging into the issue, it seems that containerd cannot retrieve the IP address of eth0 from the plugin output and fails to create the sandbox.
Looking at the code for unnumbered_ptp plugin, it seems that it is overriding the interface created by the ipvlan plugin: https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/unnumbered-ptp/unnumbered-ptp.go#L244 so the final output does not contain information on eth0.
In addition, the plugin seems to "move" the IP address created for interface eth0 to interface veth0 (https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/plugin/unnumbered-ptp/unnumbered-ptp.go#L241) which does not work with containerd because it is explicitly looking for the IP address associated to eth0.
I tried to modify this behavior to append the veth interfaces to the interface list generated by the ipvlan plugin and disable moving the IP address to veth0. After this change containerd successfully created network sandboxes. I may have missed something in the logic of the unnumbered_ptp plugin. I will submit a tentative PR fixing this. Happy to discuss this further.
The IPAM plugin currently requires that all ENIs attached to the instance are in different subnets, but the plugin will work if interfaces are in the same subnet (in IPVLAN L2 mode there is no route lookup on the host, packets just exit from the mater interface)
We currently run a slightly modified version of the plugin that removes this requirement and it works fine.
I'm happy to propose a PR to allow this behavior. Here is a first idea:
NewInterface
(aws/interface.go)To make it work in L3 or L3S mode we will need to create additional ip route tables (one per interface) forcing traffic through the ipvlan master interface and create IP rules for each pod to use the appropriate route table (I haven't tested L3 mode with the plugin yet but I'm not sure it can work without IP rules even with different subnets).
Creating / Deleting route tables for L3/L3S modes should probably be done in configureInterface
and a new unConfigureInterface
function called in RemoveInterface
.
Creating host IP rules could be done in new functions addIPRule
and delIPRule
called by cmdAdd
and cmdDel
Also happy to integrate this in a PR
Due to kube-proxy running in the default namespace, outbound connections to service IPs end up using the node IP, similar to the original userspace kube-proxy behavior. Support retaining the source IP of the connecting Pod.
It looks like the Primary ENI IP is allocated, but not used as a podIP.
This would not be an issue on bigger servers with more interfaces and IPs per interface, or even more servers, but starting out small this removes precious resources.
Is there a reason for this or would it be possible to change this behavior?
Would be willing to do a PR if this can be done and is a good idea.
This is my generated conflist that gets picked up by Kubernetes:
{
"cniVersion": "0.3.1",
"name": "cni-ipvlan-vpc-k8s",
"plugins": [
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-ipvlan",
"mode": "l2",
"master": "ipam",
"ipam": {
"type": "cni-ipvlan-vpc-k8s-ipam",
"interfaceIndex": 1,
"subnetTags": {
"cni-ipvlan-cluster-subnet": "$CLUSTER_NAME"
},
"secGroupIds": [
"$SECURITY_GROUP_ID"
],
"routeToVpcPeers": true,
"skipDeallocation": $SKIP_DEALLOCATION
}
},
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
"hostInterface": "$DEFAULT_NIC",
"containerInterface": "veth0",
"ipMasq": true
},
{
"cniVersion": "0.3.1",
"type": "loopback"
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
Deploying an nginx ingress + using hostPort
does not seem to work. Doing a quick sudo iptables -L -n -t nat
in a cluster using a different CNI plugin (e.g. flannel) gives me the expected iptables rules but my test cluster using the cni-ipvlan-vpc-k8s
CNI plugin does not.
Hello,
we are seeing quite strange and (at least for me) hard to debug issue when a Kuberetes node gets into a state where any pod running on it cannot reach any other pods withing the cluster including pods running on the very same node.
We have a node being in this situation right now, so I can provide any debug output if needed (the node is cordened from Kubernetes and all production pods are drained from it).
Let me describe our setup first:
root@ip-10-110-174-111:~# /opt/cni/bin/cni-ipvlan-vpc-k8s-tool eniif
iface mac id subnet subnet_cidr secgrps vpc ips
eth0 0a:2a:d3:92:7e:8c eni-018ae0468fdf10781 subnet-0e238589d27216701 10.110.174.0/25 [sg-0ccda037d02911c59] vpc-0f1dab3aaf561ddba [10.110.174.111]
eth1 0a:b5:1f:03:d1:d6 eni-0f5769f94c79afb21 subnet-0004ceb2afd7e3ba7 100.96.224.0/19 [sg-0ccda037d02911c59] vpc-0f1dab3aaf561ddba [100.96.236.178 100.96.235.253 100.96.255.154 100.96.248.69 100.96.254.43 100.96.253.184 100.96.234.151 100.96.254.206 100.96.244.19 100.96.230.47 100.96.251.97 100.96.239.70 100.96.228.204 100.96.236.91 100.96.225.8]
eth2 0a:e2:0f:b4:49:4c eni-0aba7b11eb2077b50 subnet-0004ceb2afd7e3ba7 100.96.224.0/19 [sg-0ccda037d02911c59] vpc-0f1dab3aaf561ddba [100.96.254.118 100.96.247.76 100.96.233.85 100.96.228.225 100.96.255.230 100.96.227.61 100.96.230.198 100.96.247.0 100.96.236.129 100.96.255.57 100.96.236.8]
We only have two pods running on the node now:
fluentd-loggly-hb9lf 1/1 Running 0 228d 100.96.236.178 ip-10-110-174-111.eu-central-1.compute.internal
jhorky-shell 1/1 Running 0 21m 100.96.254.43 ip-10-110-174-111.eu-central-1.compute.internal
The pods can't see each other even though they are in the same (/19) subnet and running on the same node:
bash-5.0# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UNKNOWN group default
link/ether 0a:b5:1f:03:d1:d6 brd ff:ff:ff:ff:ff:ff
inet 100.96.254.43/19 brd 100.96.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::8b5:1fff:fe03:d1d6/64 scope link dadfailed tentative
valid_lft forever preferred_lft forever
4: veth0@if114: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 0a:70:58:d2:9d:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::870:58ff:fed2:9dd6/64 scope link
valid_lft forever preferred_lft forever
bash-5.0# ping 100.96.236.178
PING 100.96.236.178 (100.96.236.178) 56(84) bytes of data.
From 100.96.254.43 icmp_seq=1 Destination Host Unreachable
From 100.96.254.43 icmp_seq=2 Destination Host Unreachable
From 100.96.254.43 icmp_seq=3 Destination Host Unreachable
In a tcpdump (running with -i any) on the compute node, I see these ARP requests but no replies:
15:36:09.107512 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:10.129829 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:11.153879 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:12.177949 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
15:36:13.201804 ARP, Request who-has 100.96.236.178 tell 100.96.254.43, length 28
When trying to ping the other way around, the situation is very different:
The ping works:
PING 100.96.254.43 (100.96.254.43) 56(84) bytes of data.
64 bytes from 100.96.254.43: icmp_seq=1 ttl=64 time=0.168 ms
64 bytes from 100.96.254.43: icmp_seq=2 ttl=64 time=0.051 ms
The tcpdump running on the "debug jhorky" docker shows: just this (no ICMP messages?!?):
15:57:22.065934 ARP, Request who-has 100.96.254.43 tell 100.96.236.178, length 28
15:57:22.066041 ARP, Reply 100.96.254.43 is-at 0a:b5:1f:03:d1:d6, length 28
The tcpdump running on the compute node doesn't show any icmp as well:
15:57:16.241711 ARP, Request who-has 100.96.236.178 tell 10.110.174.111, length 28
15:57:16.241732 ARP, Reply 100.96.236.178 is-at d2:5d:eb:14:bf:f1, length 28
Anyway, right now, I have no idea what more to look at.
Once again, the node is in this state, so I can provide any output needed.
Any help much appreciated.
We should add a comparison between aws-vpc-cni-k8s vs lyft cni, so users get familiar with lyft's advantages over AWS CNI
Per discussion in #21, comms with VPC peers should transit the IPvlan route, instead of going back out over the default namespace.
The IPAM plugin has a configuration option to never deallocate addresses, however there is no way to eventually clean up these addresses outside of terminating a node.
There should be a mark and sweep system to locate persistently free IPs and mark them for removal at a later pass. In the spirit of running daemon-less, we should record marks in a state file.
Trying to build/install this onto a coreos box for my kubernets cluster. I am pretty sure its something I am doing, but if you have any insight that would help, it would be appreciated.
Repo: https://github.com/C45tr0/install-cni-ipvlan-vpc-k8s
Log:
docker run --net=host --rm -v /Path/install-cni-ipvlan-vpc-k8s/tmp:/shared install-cni-ip-vlan-vpc-k8s:latest
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
v3.6.2-223-gf6e5807065 [http://dl-cdn.alpinelinux.org/alpine/v3.6/main]
v3.6.2-216-g48901173c2 [http://dl-cdn.alpinelinux.org/alpine/v3.6/community]
OK: 8437 distinct packages available
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
(1/18) Installing binutils-libs (2.28-r3)
(2/18) Installing binutils (2.28-r3)
(3/18) Installing gmp (6.1.2-r0)
(4/18) Installing isl (0.17.1-r0)
(5/18) Installing libgomp (6.3.0-r4)
(6/18) Installing libatomic (6.3.0-r4)
(7/18) Installing pkgconf (1.3.7-r0)
(8/18) Installing libgcc (6.3.0-r4)
(9/18) Installing mpfr3 (3.1.5-r0)
(10/18) Installing mpc1 (1.0.3-r0)
(11/18) Installing libstdc++ (6.3.0-r4)
(12/18) Installing gcc (6.3.0-r4)
(13/18) Installing libssh2 (1.8.0-r1)
(14/18) Installing libcurl (7.57.0-r0)
(15/18) Installing expat (2.2.0-r1)
(16/18) Installing pcre (8.41-r0)
(17/18) Installing git (2.13.5-r0)
(18/18) Installing make (4.2.1-r0)
Executing busybox-1.26.2-r9.trigger
OK: 108 MiB in 30 packages
Grabbing golang/dep
Grabbing lyft/cni-ipvlan-vpc-k8s
# github.com/vishvananda/netlink
src/github.com/vishvananda/netlink/bpf_linux.go:4:23: fatal error: asm/types.h: No such file or directory
#include <asm/types.h>
^
compilation terminated.
Making cni-ipvlan-vpc-k8s
fatal: No names found, cannot describe anything.
/go/bin/dep ensure -v
Gopkg.lock was already in sync with imports and Gopkg.toml
(1/19) Wrote github.com/j-keck/arping@master
(2/19) Wrote github.com/coreos/[email protected]
(3/19) Wrote github.com/docker/[email protected]
(4/19) Wrote github.com/vishvananda/netns@master
(5/19) Wrote github.com/pkg/[email protected]
(6/19) Wrote github.com/Microsoft/[email protected]
(7/19) Wrote github.com/jmespath/go-jmespath@0b12d6b5
(8/19) Wrote github.com/go-ini/[email protected]
(9/19) Wrote github.com/vishvananda/netlink@master
(10/19) Wrote github.com/nightlyone/lockfile@master
(11/19) Wrote github.com/containernetworking/[email protected]
(12/19) Wrote github.com/urfave/[email protected]
(13/19) Wrote golang.org/x/sys@master
(14/19) Wrote github.com/containernetworking/[email protected]
(15/19) Wrote golang.org/x/net@master
(16/19) Wrote github.com/docker/[email protected]
(17/19) Wrote github.com/docker/[email protected]
(18/19) Wrote github.com/aws/[email protected]
(19/19) Wrote github.com/docker/[email protected]
go install ./
# github.com/lyft/cni-ipvlan-vpc-k8s/vendor/github.com/vishvananda/netlink
vendor/github.com/vishvananda/netlink/bpf_linux.go:4:23: fatal error: asm/types.h: No such file or directory
#include <asm/types.h>
^
compilation terminated.
make: *** [Makefile:19: cache] Error 2
cp: can't stat 'cni-ipvlan-vpc-k8s-*.tar.gz': No such file or directory
I installed the plugin in our dev cluster and after much whack-a-mole rescheduling of pods that couldn't talk to other pods things settled down. The problem pods kept coming back when new nodes would be added to the cluster. We use kops and the rolling update causes a lot of pod rescheduling. Eventually I started to look closer and found the trouble pods didn't have any routes for the vpc ranges at all. I also noticed that those pods were all the first pod that got assigned to a new eni whether it was a new node or just the next needed eni on a busy node.
Looking at the code it seems that this lag between new eni and ec2 metadata service results being fully populated is a known thing. The ipam plugin depends on the ranges from the vpc-ipv4-cidr-blocks section of the metadata to set up the correct routes for the veth interfaces. However, the parsing of the vpc-ipv4-cidr-blocks call doesn't return an error if no cidrs were found. This results in an interface with no routes to the vpc.
So one approach to solving that could be simply checking the length of the vpc-ipv4-cidr-blocks slice and returning an error if it's 0. That would cause a retry on the metadata service until it gets a result. I'm somewhat concerned about that solution because I wonder if there is an intermediate state where some but not all of the VPC ranges get returned and so we're back where we started with required routes missing.
Another way of solving it would be to just use the describe vpcs api to get the vpc ranges all the time instead of relying on the metadata service. It appears that there is some desire to not use the describe VPC api calls to get the vpc cidrs. I'd like to get more context from those who know about that preference. We already have to have quite a few IAM permissions available to make the plugin work it doesn't seem like describe vpcs is that onerous as a part of that list. Maybe it's just to have one less place where config data is extracted from?
Thanks for sharing the great work. I am trying to test it out but was not able to bring up following pods
kubectl create -f https://k8s.io/docs/tasks/access-application-cluster/hello.yaml
I am using version 1.7.10
kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
I am seeing following error in the log file:
kubelet[3752]: E1212 21:12:29.218496 3752 cni.go:294] Error adding network: no plugin name provided
I am using the configuration file provided in README and have updated it with my secGroupIds
{
"cniVersion": "0.3.1",
"name": "cni-ipvlan-vpc-k8s",
"plugins": [
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-ipvlan",
"mode": "l2",
"master": "ipam",
"ipam": {
"type": "cni-ipvlan-vpc-k8s-ipam",
"interfaceIndex": 1,
"subnetTags": {
"kubernetes_kubelet": "true"
},
"secGroupIds": [
"sg-34b79141"
]
}
},
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
"hostInterface": "eth0",
"containerInterface": "veth0",
"ipMasq": true
}
]
}
Here are cni binaries in /opt/cni/bin:
root@ip-10-0-55-131:/opt/cni/bin# ls
bridge cni-ipvlan-vpc-k8s-ipvlan cni-ipvlan-vpc-k8s-tool flannel loopback
cni-ipvlan-vpc-k8s-ipam cni-ipvlan-vpc-k8s-.tar.gz cni-ipvlan-vpc-k8s-unnumbered-ptp host-local ptp
Here is the code snippet
~/workspace/src/k8s-v1.7.10/kubernetes-1.7.10/vendor/github.com/containernetworking/cni/pkg/invoke/find.go
// FindInPath returns the full path of the plugin by searching in the provided path
func FindInPath(plugin string, paths []string) (string, error) {
if plugin == "" {
return "", fmt.Errorf("no plugin name provided")
}
if len(paths) == 0 {
return "", fmt.Errorf("no paths provided")
}
for _, path := range paths {
for _, fe := range ExecutableFileExtensions {
fullpath := filepath.Join(path, plugin) + fe
if fi, err := os.Stat(fullpath); err == nil && fi.Mode().IsRegular() {
return fullpath, nil
}
}
}
return "", fmt.Errorf("failed to find plugin %q in path %s", plugin, paths)
}
Kubelet is running with following options:
root 3752 1 1 21:07 ? 00:00:30 /usr/local/bin/kubelet --node-ip --allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --hostname-override=ip-10-0-55-131.ec2.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=100.64.0.0/10 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --require-kubeconfig=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/
Can you tell me if I missed anything?
thank you very much
I am seeing some pods stuck in ContainerCreating state when I run
kubectl create -f https://k8s.io/docs/tasks/access-application-cluster/hello.yaml
Here is the output
kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-1243552595-1bm63 1/1 Running 0 3h
hello-1243552595-bvc3r 1/1 Running 0 3h
hello-1243552595-hrj3s 0/1 ContainerCreating 0 3h
hello-1243552595-mv49s 0/1 ContainerCreating 0 3h
I am using version 1.7.10
kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Here is the error msg I am seeing in the log
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380612 3752 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380659 3752 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380678 3752 kuberuntime_manager.go:624] createPodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380711 3752 pod_workers.go:182] Error syncing pod 28da1900-df81-11e7-a409-0e64e3f014fa ("hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)"), skipping: failed to "CreatePodSandbox" for "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"hello-1243552595-mv49s_default\" network: failed to add host route dst 10.0.5.23: file exists"
Looks like cni-plugin tries to use 10.0.5.23 for Pod hello-1243552595-mv49s, whereas 10.0.5.23 is already assigned to another Pod running on the same host.
kubectl describe pod hello-1243552595-bvc3r
Name: hello-1243552595-bvc3r
Namespace: default
Node: ip-10-0-55-131.ec2.internal/10.0.55.131
Start Time: Tue, 12 Dec 2017 21:12:28 +0000
Labels: app=hello
pod-template-hash=1243552595
tier=backend
track=stable
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"hello-1243552595","uid":"28d91673-df81-11e7-a409-0e64e3f014fa","...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container hello
Status: Running
IP: 10.0.5.23
Controllers: ReplicaSet/hello-1243552595
Containers:
hello:
Container ID: docker://34c740bf48525b72a7fedceb1dc64951f4ca14bd9fad6f3b2b8e485f09f8c152
Image: gcr.io/google-samples/hello-go-gke:1.0
Image ID: docker-pullable://gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e
Port: 80/TCP
State: Running
Started: Tue, 12 Dec 2017 23:37:23 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j8llx (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-j8llx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j8llx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2h 18m 657 kubelet, ip-10-0-55-131.ec2.internal Warning FailedSync Error syncing pod
2h 18m 657 kubelet, ip-10-0-55-131.ec2.internal Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
18m 18m 1 kubelet, ip-10-0-55-131.ec2.internal spec.containers{hello} Normal Pulling pulling image "gcr.io/google-samples/hello-go-gke:1.0"
18m 18m 1 kubelet, ip-10-0-55-131.ec2.internal spec.containers{hello} Normal Pulled Successfully pulled image "gcr.io/google-samples/hello-go-gke:1.0"
18m 18m 1 kubelet, ip-10-0-55-131.ec2.internal spec.containers{hello} Normal Created Created container
18m 18m 1 kubelet, ip-10-0-55-131.ec2.internal spec.containers{hello} Normal Started Started container
Here is the ip route output
ip route
default via 10.0.32.1 dev eth0
10.0.5.0/24 dev eth1 proto kernel scope link src 10.0.5.154
10.0.5.23 dev veth350a598b scope link
10.0.32.0/19 dev eth0 proto kernel scope link src 10.0.55.131
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
Here is the docker ps output
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
efd4c399f948 gcr.io/google_containers/pause-amd64:3.0 "/pause" 6 seconds ago Up 5 seconds k8s_POD_kube-dns-2712020956-90fd9_kube-system_ebba507c-df81-11e7-a409-0e64e3f014fa_111
23cac500ae36 gcr.io/google_containers/pause-amd64:3.0 "/pause" 6 seconds ago Up 5 seconds k8s_POD_hello-1243552595-mv49s_default_28da1900-df81-11e7-a409-0e64e3f014fa_111
52532863c1d4 gcr.io/google_containers/pause-amd64:3.0 "/pause" 7 seconds ago Up 6 seconds k8s_POD_hello-1243552595-hrj3s_default_28da0c5e-df81-11e7-a409-0e64e3f014fa_111
34c740bf4852 gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e "/usr/bin/hello" 20 minutes ago Up 20 minutes k8s_hello_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_0
1c17414c1e46 gcr.io/google_containers/pause-amd64:3.0 "/pause" 20 minutes ago Up 20 minutes k8s_POD_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_1
3ad87d21eea0 protokube:1.6.0 "/usr/bin/protokube -" 3 hours ago Up 3 hours distracted_aryabhata
Brother, I compile for a day. Compilation is unsuccessful. Please find a compiled package.
Can Google Drive send me
My steps
go get -u github.com/golang/dep/cmd/dep.
go get github.com/lyft/cni-ipvlan-vpc-k8s
cd $GOPATH/src/github.com/lyft/cni-ipvlan-vpc-k8s
make build
thanks
The setup is as follows:
client & node aren't directly connected, there's vpn node in between them.
I can do the following pings/netcats (( i.e. testing connectivity )):
client <--> node (( and vice versa ))
vpn node <--> pod (( and vice versa ))
pod --> client (( but not vice versa ))
Upon closer inspection (( that is running tcpdump on the node I see the following )):
ubuntu@ip-10-102-12-217:~$ sudo tcpdump -i any icmp -nnn
13:26:13.820856 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 1, length 64
13:26:14.831852 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 2, length 64
13:26:15.856034 IP 10.103.0.2 > 10.102.11.165: ICMP echo request, id 16455, seq 3, length 64
The AWS properly routes the package to the node, yet there's no reply if the source address is outside VPC stack.
Version: v0.5.0
Configuration:
{
"cniVersion": "0.3.1",
"name": "cni-ipvlan-vpc-k8s",
"plugins": [
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-ipam",
"interfaceIndex": 1,
"subnetTags": {
"kubernetes_kubelet": "{{ kubernetes_kubelet }}"
},
"secGroupIds": {{ secGroupIds | to_nice_json(indent=12) }}
},
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-ipvlan",
"mode": "l2"
},
{
"cniVersion": "0.3.1",
"type": "cni-ipvlan-vpc-k8s-unnumbered-ptp",
"hostInterface": "{{ ansible_default_ipv4.interface }}",
"containerInterface": "veth0",
"ipMasq": true
}
]
}
(( the {{ }} placeholders are filler with proper values )).
I've marked single private subnet with the needed tag, and it's 10.102.0.0/20
subnet. VPN node is in a different subnet (since it has public IP).
DescribeVPCCIDRs
adds all CIDR associated with the VPC regardless of association state.
If this state is different from associated
we should not add this CIDR range to the list
When removing a CIDR range from a VPC, the range remains disassociated
for a long time (1+ hour) before being removed. We should only add ranges with status associated
.
Possible association states:
I'm not sure what we should do with the associating
state because it may fail
I was looking into using the VPC CNI plugin with ipMasq disabled so our traffic goes out via each ethX rather than host eth0 (VPC traffic + 0.0.0.0/0).
From the looks of it when ipMasq gets disabled the pods lose egress to non-VPC CIDRs. I assume that is expected. From the readme i found that this config flag aws added to hadnle kube2iam case? In our org we do not run kube2iam and we are restricting access to the metadata endpoint by other means.
Any future plans to support this mode of operating?
Hello,
I've been working on a PR (coming very soon) and I was surprised at the amount of addresses allocated to the interfaces on my test instance. I discovered it was related to #65 which by will now allocate all possible IPs when an interface is attached by default.
In our setup we have low pod density on nodes and on large instance this will allocate far too many IPs. We can of course configure ipBatchSize, but I think changing the default batch size to 1 would avoid impacting existing setups. More than happy to do this very simple PR if you think it makes sense.
Hi,
I am getting the following errors when trying to retrieve dependencies. Looks like it has some dependencies to private repo:
โ cni-ipvlan-vpc-k8s git:(master) dep ensure
ensure Solve(): no valid source could be created:
failed to set up sources from the following URLs:
https://github.com/lyft/cni-eni
: remote repository at https://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
ssh://[email protected]/lyft/cni-eni
: remote repository at ssh://[email protected]/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
git://github.com/lyft/cni-eni
: remote repository at git://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
failed to set up sources from the following URLs:
http://github.com/lyft/cni-eni
: remote repository at http://github.com/lyft/cni-eni does not exist, or is inaccessible: : exit status 128
README says it's compatible with containerd, but there's a little bit of a mystery.
to communicate with container runtime, there's just only docker implementation exist, not containerd.
https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/master/nl/docker.go
I'm curious about the mechanism how it is possible to have compatibility with containerd.
We are seeing an issue that seems to happens regularly: some pods have no network connectivity
After looking into the configuration it turns out that when this happens we are in the following situation:
After looking into logs we found the following:
failed to remove pod init container "consul-template": failed to get container status "371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70": rpc error: code = Unknown desc = Error: No such container: 371295090acf33795fe5badb07063021cace4fcff719cd13effc6ff2b5136f70; Skipping pod "alerting-metric-evaluator-anomaly-0_datadog(4c15f7d2-5783-11e8-903a-02fc6d7aa9b8)"
Any idea what could trigger this situation? Our current setup uses docker, kubelet 1.10 and the latest version of the CNI plugin.
I think SkipDeallocation could probably help but I'd like to understand exactly what is happening.
I wonder if allowing for more verbose logs could help in this kind of situation (for instance log ADD/DELETE calls with parameters)
When setting up a Nodeport if we access a host where the target service is running and load-balancing chooses the local pod, the traffic is dropped.
Everything seems to work OK because if the first SYN is dropped the client will retry (however queries load-balanced to the local pod take much longer) and will (probably) be sent to another host.
This can be seen by logging martian packets. When traffic is sent to a local pod it will be dropped with the following log:
[912228.409488] IPv4: martian source 172.30.182.212 from 172.21.51.75, on dev ens3
[912228.409534] ll header: 00000000: 0e d8 07 a0 c0 0c 0e f0 4b 50 fd 5c 08 00 ........KP.\..
To trigger the issue I simply did this until the answer took more than 1s:
$ curl http://172.30.183.34:30054
where 172.30.183.34
is the host IP and 30054
the nodeport. The Kube-proxy Nodeport iptables prerouting rules redirected traffic to 172.30.182.212
(local pod for the service) which triggered the martian log.
Looking at routing explains the issue:
$ ip route get 172.30.182.212 from 172.21.51.75 iif ens3
RTNETLINK answers: Invalid cross-device link
$ ip route get 172.30.182.212
172.30.182.212 dev veth3b59a300 src 172.30.183.34
$ ip route get 172.21.51.75 from 172.30.182.212 iif veth3b59a300
172.21.51.75 from 172.30.182.212 via 172.30.182.212 dev veth3b59a300
This means that traffic arrives on ens3 but the reverse route is through the pod (the route getting back to the pod is necessary to access services).
To trigger the issue consistently (100% of the time) we just need to add externalTrafficPolicy: Local
to the service definition (or scale the service down to 1 pod)
Hey there,
great to see the nice fix in #80 making it into 0.6.1
however I was hoping to update kubernetes/kops
to use the new version, however that relies on downloading the release binary rather than source code ( see here: https://github.com/kubernetes/kops/blob/8df55b8571c1cd6dc5e6f7fba56175bf0945fb74/upup/pkg/fi/cloudup/apply_cluster.go#L1183 )
As such it would be great if you could publish the binary of the 0.6.1
release.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.