GithubHelp home page GithubHelp logo

cilium / cilium-cli Goto Github PK

View Code? Open in Web Editor NEW
395.0 395.0 196.0 75.1 MB

CLI to install, manage & troubleshoot Kubernetes clusters running Cilium

Home Page: https://cilium.io

License: Apache License 2.0

Go 99.21% Makefile 0.30% Dockerfile 0.16% HTML 0.32% Shell 0.01%
cilium ebpf kubernetes networking observability security

cilium-cli's Introduction

Cilium Logo

CII Best Practices Go Report Card CLOMonitor Artifact Hub Join the Cilium slack channel GoDoc Read the Docs Apache licensed BSD licensed GPL licensed FOSSA Status Gateway API Status Github Codespaces

Cilium is a networking, observability, and security solution with an eBPF-based dataplane. It provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay mode. It is L7-protocol aware and can enforce network policies on L3-L7 using an identity based security model that is decoupled from network addressing.

Cilium implements distributed load balancing for traffic between pods and to external services, and is able to fully replace kube-proxy, using efficient hash tables in eBPF allowing for almost unlimited scale. It also supports advanced functionality like integrated ingress and egress gateway, bandwidth management and service mesh, and provides deep network and security visibility and monitoring.

A new Linux kernel technology called eBPF is at the foundation of Cilium. It supports dynamic insertion of eBPF bytecode into the Linux kernel at various integration points such as: network IO, application sockets, and tracepoints to implement security, networking and visibility logic. eBPF is highly efficient and flexible. To learn more about eBPF, visit eBPF.io.

Overview of Cilium features for networking, observability, service mesh, and runtime security

Stable Releases

The Cilium community maintains minor stable releases for the last three minor Cilium versions. Older Cilium stable versions from minor releases prior to that are considered EOL.

For upgrades to new minor releases please consult the Cilium Upgrade Guide.

Listed below are the actively maintained release branches along with their latest patch release, corresponding image pull tags and their release notes:

v1.15 2024-07-11 quay.io/cilium/cilium:v1.15.7 Release Notes
v1.14 2024-07-11 quay.io/cilium/cilium:v1.14.13 Release Notes
v1.13 2024-07-11 quay.io/cilium/cilium:v1.13.18 Release Notes

Architectures

Cilium images are distributed for AMD64 and AArch64 architectures.

Software Bill of Materials

Starting with Cilium version 1.13.0, all images include a Software Bill of Materials (SBOM). The SBOM is generated in SPDX format. More information on this is available on Cilium SBOM.

Development

For development and testing purpose, the Cilium community publishes snapshots, early release candidates (RC) and CI container images build from the main branch. These images are not for use in production.

For testing upgrades to new development releases please consult the latest development build of the Cilium Upgrade Guide.

Listed below are branches for testing along with their snapshots or RC releases, corresponding image pull tags and their release notes where applicable:

main daily quay.io/cilium/cilium-ci:latest N/A
v1.16.0-rc.2 2024-07-15 quay.io/cilium/cilium:v1.16.0-rc.2 Release Candidate Notes

Functionality Overview

Protect and secure APIs transparently

Ability to secure modern application protocols such as REST/HTTP, gRPC and Kafka. Traditional firewalls operate at Layer 3 and 4. A protocol running on a particular port is either completely trusted or blocked entirely. Cilium provides the ability to filter on individual application protocol requests such as:

  • Allow all HTTP requests with method GET and path /public/.*. Deny all other requests.
  • Allow service1 to produce on Kafka topic topic1 and service2 to consume on topic1. Reject all other Kafka messages.
  • Require the HTTP header X-Token: [0-9]+ to be present in all REST calls.

See the section Layer 7 Policy in our documentation for the latest list of supported protocols and examples on how to use it.

Secure service to service communication based on identities

Modern distributed applications rely on technologies such as application containers to facilitate agility in deployment and scale out on demand. This results in a large number of application containers being started in a short period of time. Typical container firewalls secure workloads by filtering on source IP addresses and destination ports. This concept requires the firewalls on all servers to be manipulated whenever a container is started anywhere in the cluster.

In order to avoid this situation which limits scale, Cilium assigns a security identity to groups of application containers which share identical security policies. The identity is then associated with all network packets emitted by the application containers, allowing to validate the identity at the receiving node. Security identity management is performed using a key-value store.

Secure access to and from external services

Label based security is the tool of choice for cluster internal access control. In order to secure access to and from external services, traditional CIDR based security policies for both ingress and egress are supported. This allows to limit access to and from application containers to particular IP ranges.

Simple Networking

A simple flat Layer 3 network with the ability to span multiple clusters connects all application containers. IP allocation is kept simple by using host scope allocators. This means that each host can allocate IPs without any coordination between hosts.

The following multi node networking models are supported:

  • Overlay: Encapsulation-based virtual network spanning all hosts. Currently, VXLAN and Geneve are baked in but all encapsulation formats supported by Linux can be enabled.

    When to use this mode: This mode has minimal infrastructure and integration requirements. It works on almost any network infrastructure as the only requirement is IP connectivity between hosts which is typically already given.

  • Native Routing: Use of the regular routing table of the Linux host. The network is required to be capable to route the IP addresses of the application containers.

    When to use this mode: This mode is for advanced users and requires some awareness of the underlying networking infrastructure. This mode works well with:

    • Native IPv6 networks
    • In conjunction with cloud network routers
    • If you are already running routing daemons

Load Balancing

Cilium implements distributed load balancing for traffic between application containers and to external services and is able to fully replace components such as kube-proxy. The load balancing is implemented in eBPF using efficient hashtables allowing for almost unlimited scale.

For north-south type load balancing, Cilium's eBPF implementation is optimized for maximum performance, can be attached to XDP (eXpress Data Path), and supports direct server return (DSR) as well as Maglev consistent hashing if the load balancing operation is not performed on the source host.

For east-west type load balancing, Cilium performs efficient service-to-backend translation right in the Linux kernel's socket layer (e.g. at TCP connect time) such that per-packet NAT operations overhead can be avoided in lower layers.

Bandwidth Management

Cilium implements bandwidth management through efficient EDT-based (Earliest Departure Time) rate-limiting with eBPF for container traffic that is egressing a node. This allows to significantly reduce transmission tail latencies for applications and to avoid locking under multi-queue NICs compared to traditional approaches such as HTB (Hierarchy Token Bucket) or TBF (Token Bucket Filter) as used in the bandwidth CNI plugin, for example.

Monitoring and Troubleshooting

The ability to gain visibility and troubleshoot issues is fundamental to the operation of any distributed system. While we learned to love tools like tcpdump and ping and while they will always find a special place in our hearts, we strive to provide better tooling for troubleshooting. This includes tooling to provide:

  • Event monitoring with metadata: When a packet is dropped, the tool doesn't just report the source and destination IP of the packet, the tool provides the full label information of both the sender and receiver among a lot of other information.
  • Metrics export via Prometheus: Key metrics are exported via Prometheus for integration with your existing dashboards.
  • Hubble: An observability platform specifically written for Cilium. It provides service dependency maps, operational monitoring and alerting, and application and security visibility based on flow logs.

Getting Started

What is eBPF and XDP?

Berkeley Packet Filter (BPF) is a Linux kernel bytecode interpreter originally introduced to filter network packets, e.g. for tcpdump and socket filters. The BPF instruction set and surrounding architecture have recently been significantly reworked with additional data structures such as hash tables and arrays for keeping state as well as additional actions to support packet mangling, forwarding, encapsulation, etc. Furthermore, a compiler back end for LLVM allows for programs to be written in C and compiled into BPF instructions. An in-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the BPF bytecode to CPU architecture-specific instructions for native execution efficiency. BPF programs can be run at various hooking points in the kernel such as for incoming packets, outgoing packets, system calls, kprobes, uprobes, tracepoints, etc.

BPF continues to evolve and gain additional capabilities with each new Linux release. Cilium leverages BPF to perform core data path filtering, mangling, monitoring and redirection, and requires BPF capabilities that are in any Linux kernel version 4.8.0 or newer (the latest current stable Linux kernel is 4.14.x).

Many Linux distributions including CoreOS, Debian, Docker's LinuxKit, Fedora, openSUSE and Ubuntu already ship kernel versions >= 4.8.x. You can check your Linux kernel version by running uname -a. If you are not yet running a recent enough kernel, check the Documentation of your Linux distribution on how to run Linux kernel 4.9.x or later.

To read up on the necessary kernel versions to run the BPF runtime, see the section Prerequisites.

https://cdn.jsdelivr.net/gh/cilium/cilium@main/Documentation/images/bpf-overview.png

XDP is a further step in evolution and enables running a specific flavor of BPF programs from the network driver with direct access to the packet's DMA buffer. This is, by definition, the earliest possible point in the software stack, where programs can be attached to in order to allow for a programmable, high performance packet processor in the Linux kernel networking data path.

Further information about BPF and XDP targeted for developers can be found in the BPF and XDP Reference Guide.

To know more about Cilium, its extensions and use cases around Cilium and BPF take a look at Further Readings section.

Community

Slack

Join the Cilium Slack channel to chat with Cilium developers and other Cilium users. This is a good place to learn about Cilium, ask questions, and share your experiences.

Special Interest Groups (SIG)

See Special Interest groups for a list of all SIGs and their meeting times.

Developer meetings

The Cilium developer community hangs out on Zoom to chat. Everybody is welcome.

eBPF & Cilium Office Hours livestream

We host a weekly community YouTube livestream called eCHO which (very loosely!) stands for eBPF & Cilium Office Hours. Join us live, catch up with past episodes, or head over to the eCHO repo and let us know your ideas for topics we should cover.

Governance

The Cilium project is governed by a group of Maintainers and Committers. How they are selected and govern is outlined in our governance document.

Adopters

A list of adopters of the Cilium project who are deploying it in production, and of their use cases, can be found in file USERS.md.

Roadmap

Cilium maintains a public roadmap. It gives a high-level view of the main priorities for the project, the maturity of different features and projects, and how to influence the project direction.

License

The Cilium user space components are licensed under the Apache License, Version 2.0. The BPF code templates are dual-licensed under the General Public License, Version 2.0 (only) and the 2-Clause BSD License (you can use the terms of either license, at your option).

cilium-cli's People

Contributors

aanm avatar aditighag avatar asauber avatar bmcustodio avatar brb avatar christarazi avatar dependabot[bot] avatar doniacld avatar gandro avatar giorio94 avatar jibi avatar joestringer avatar jrajahalme avatar kaworu avatar learnitall avatar meyskens avatar mhofstetter avatar michi-covalent avatar nbusseneau avatar nebril avatar pchaigno avatar renovate[bot] avatar rolinh avatar sayboras avatar squeed avatar tgraf avatar ti-mo avatar tklauser avatar tommyp1ckles avatar viktor-kurchenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cilium-cli's Issues

Re-enable AKS

AKS is failing:

Run cilium install --cluster-name cilium-cli-ci-51 --azure-resource-group cilium-ci --azure-tenant-id *** --azure-client-id *** --azure-client-secret *** --config monitor-aggregation=none
🔮 Auto-detected Kubernetes kind: AKS
✨ Running "AKS" validation checks
✅ Detected az binary
ℹ️  Cilium version not set, using default version "v1.9.4"
🔮 Auto-detected IPAM mode: azure
🔮 Auto-detected datapath mode: azure
✅ Using manually configured principal for cilium operator with App ID *** and tenant ID ***
✅ Derived Azure subscription id fd182b79-36cb-4a4e-b567-e568d63f9f62
✅ Derived Azure node resource group MC_cilium-ci_cilium-cli-ci-51_westeurope
🔑 Generating CA...
2021/03/02 16:51:59 [INFO] generate received request
2021/03/02 16:51:59 [INFO] received CSR
2021/03/02 16:51:59 [INFO] generating key: ecdsa-256
2021/03/02 16:51:59 [INFO] encoded CSR
2021/03/02 16:51:59 [INFO] signed certificate with serial number 426037716530457065549867383132446480950320672028
2021/03/02 16:51:59 [INFO] generate received request
🔑 Generating certificates for Hubble...
2021/03/02 16:51:59 [INFO] received CSR
2021/03/02 16:51:59 [INFO] generating key: ecdsa-256
2021/03/02 16:51:59 [INFO] encoded CSR
2021/03/02 16:51:59 [INFO] signed certificate with serial number 315538192034672657700974214522730735367004936319
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
ℹ️  Manual overwrite in ConfigMap: monitor-aggregation=none
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed...
Error: Unable to install Cilium:  timeout while waiting for status to become successful: context deadline exceeded
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         2 errors
 \__/¯¯\__/    Operator:       1 errors
 /¯¯\__/¯¯\    Hubble:         1 warnings
 \__/¯¯\__/    ClusterMesh:    1 warnings
    \__/

DaemonSet         cilium                   Desired: 2, Ready: 2/2, Available: 2/2
Containers:       cilium                   Running: 2
Image versions    cilium                   quay.io/cilium/cilium:v1.9.4: 2
Errors:           cilium                   cilium-gsb9x             unable to retrieve cilium status: error in stream: error dialing backend: dial tcp 10.240.0.35:10250: i/o timeout
                  cilium                   cilium-tj4pl             unable to retrieve cilium status: error in stream: error dialing backend: dial tcp 10.240.0.4:10250: i/o timeout
                  cilium-operator          cilium-operator          context deadline exceeded
Warnings:         clustermesh-apiserver    clustermesh-apiserver    clustermesh is not deployed
                  hubble-relay             hubble-relay             hubble relay is not deployed

Error: Process completed with exit code 1.

Cilium CLI should validate the --version argument

If you pass --version 1.9.6 into the Cilium CLI, it will pass this directly into the image tag and fail to deploy Cilium (ImagePullBackoff) because the real version number for the image is v1.9.6. The CLI should validate the arg and either transparently add the v if not specified, or reject the user argument if it is specified in the wrong format.

Cilium CLI should respect KUBECONFIG environment variable

Context, my usual kubeconfig is configured to use microk8s. When I deploy an EKS cluster using EKSCTL, it overrides $KUBECONFIG to point to the EKS cluster. However, the CLI doesn't pick up on this so I get the following error:

$ echo $KUBECONFIG
/home/joe/.kube/config-eks
$ cilium connectivity check
microk8s is not running, try microk8s start

EDIT: Actually, cilium install used the correct kubeconfig, so maybe this problem is specific to cilium connectivity check?

Deployment with Cilium v1.8.6 fails on GKE

I’m unable to successfully deploy Cilium on a GKE cluster at the moment. The cilium pods come up fine, but none of the managed pods do.

Warning FailedCreatePodSandBox 22s (x4 over 26s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container “f6ca5819eee5af8a48fd5f505182dd31a90e20737a60c16ca139a9ae4ce7e1b2” network for pod “kube-dns-6465f78586-llt4t”: networkPlugin cni failed to set up pod “kube-dns-6465f78586-llt4t_kube-system” network: unable to allocate IP via local cilium agent: [POST /ipam][502] postIpamFailure range is full

Steps to reproduce the issue -

$ gcloud container clusters create <name> --image-type COS --num-nodes 2 --machine-type n1-standard-4 --zone <zone>
$ ./cilium install --version v1.8.6

@cmluciano reported similar problem when he tried running connectivity tests.

cilium-cli: sporadic hangs on GKE rapid channel with k8s1.20, reg-channel is stable

Running cilium-cli install, connectivity test on GKE rapid channel with k8s1.20 appears to hang sometimes either in install step or at start of connectivity test.

$ ./cilium install
🔮 Auto-detected Kubernetes kind: GKE
ℹ️  Cilium version not set, using default version "v1.9.5"
🔮 Auto-detected cluster name: gke-ezpz-scale-test-us-west1-a-cluster-2
🔮 Auto-detected IPAM mode: kubernetes
🔮 Auto-detected datapath mode: gke
✅ Detected GKE native routing CIDR: 10.60.0.0/14
🚀 Creating Resource quotas...
🔑 Found existing CA in secret cilium-ca
🔑 Generating certificates for Hubble...
2021/03/18 12:44:20 [INFO] generate received request
2021/03/18 12:44:20 [INFO] received CSR
2021/03/18 12:44:20 [INFO] generating key: ecdsa-256
2021/03/18 12:44:20 [INFO] encoded CSR
2021/03/18 12:44:20 [INFO] signed certificate with serial number 147171107335332848158446142786587809339761735476
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
🚀 Creating GKE Node Init DaemonSet...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed...
$ ./cilium connectivity test
✨ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Creating namespace for connectivity check...
✨ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Deploying echo-same-node service...
✨ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Deploying client service...
✨ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Deploying echo-other-node service...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for deployments [client echo-same-node] to become ready...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for deployments [echo-other-node] to become ready...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for CiliumEndpoint for pod cilium-test/client-58dfdc5f6-mz77k to appear...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-588bf78fbb-mqqvm to appear...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-779c8c89d6-ml7h8 to appear...
⌛ [gke_ezpz-scale-test_us-west1-a_jf-encryption-rapid-20] Waiting for service echo-other-node to become ready...

Above it got stuck waiting for echo-other-node pod. I've not seen this using regular channel and k8s1.19.

cilium connectivity test may take a long time to fail out if it can't deploy pods

If you have a situation where pods cannot be deployed, cilium connectivity test appears to hang:

$ ./cilium connectivity test
ℹ️  Single node environment detected, enabling single node connectivity test
✨ [microk8s-cluster] Creating namespace for connectivity check...
✨ [microk8s-cluster] Deploying echo-same-node service...
✨ [microk8s-cluster] Deploying client service...
⌛ [microk8s-cluster] Waiting for deployments [client echo-same-node] to become ready...

Eventually it does time out:

Error: Connectivity test failed:  waiting for deployment client to become ready has been interrupted: context deadline exceeded

The above would be due to the long timeout here:

return 5 * time.Minute

In the mean time, I can see that there are reasons why the pods are not being deployed:

$ k -n  cilium-test describe pod echo-same-node-97cd54966-bqz2v | tail -n 4
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  13s (x7 over 4m51s)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
  Warning  FailedScheduling  3s                   default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity.

It'd be convenient if the CLI was able to figure this kind of case out a bit more quickly and give you a hint why nothing's moving, particularly in cases where deployment is making no progress as could be otherwise observed through the above. This could be something as simple as setting a timer for 30s and printing a message "Try running kubectl -n cilium-test get pods to see whether the test is making progress" (and cancelling that timer when the deployment wait is successful).

Idempotent Cilium Install ?

Just noticed that when issuing cilium install 2x in a row, the second execution fails because resources (service accounts, config maps, daemonsets, etc) have been created already. Is this intended ?

How to reproduce:

Release v0.5

$> cilium install
...
...
✅ 
$> cilium install 
...
..
Error: Unable to install Cilium:  unable to create secret kube-system/hubble-server-certs: secrets "hubble-server-certs" already exists

From a security perspective, it could make sense to abort installation, i.e.: one could argue that the cluster has been compromised with a certificate that an attacker has access to and all cilium-related bytes can be read, so in this case it's good to abort the installation. If we want to abort installation, should we at least say why we are aborting ? From a users perspective, reading this message doesn't tell me much on what next steps I should take to fix this.

P.S.: I opened a PR that goes for idempotent. No tests yet, since I'm not sure if idempotency is not intended here. Alternatively, we could add a friendly message. Let me know what you think.

Cheers !

Uninstall command fails when there's no Hubble Relay installed

When I install without Hubble Relay enabled, then I later uninstall, I get the
following error:

$ cilium uninstall
🔥 Deleting Relay...
🔥 Deleting Relay certificates...
✨ Patching ConfigMap cilium-config to disable Hubble...
Error: Unable to disable Hubble:  unable to patch ConfigMap cilium-config with patch "[{\"op\": \"remove\", \"path\": \"/data/hubble-disable-tls\"},{\"op\": \"remove\", \"path\": \"/data/hubble-tls-cert-file\"},{\"op\": \"remove\", \"path\": \"/data/hubble-tls-key-file\"},{\"op\": \"remove\", \"path\": \"/data/hubble-tls-client-ca-files\"},{\"op\": \"remove\", \"path\": \"/data/enable-hubble\"},{\"op\": \"remove\", \"path\": \"/data/hubble-socket-path\"},{\"op\": \"remove\", \"path\": \"/data/hubble-listen-address\"}]": the server rejected our request due to an error in our request

cilium-cli: the following sequences hangs, install, test, uninstall, install, test

Testing install/uninstall/install patterns I ran into a hang when doing the following,

$ cilium install
$ cilium connectivity test
$ cilium uninstall
$ kubectl delete -n cilium-test deployments client echo-other-node echo-same-node
$ cilium install
$ cilium connectivity test
⌛ [gke_ezpz-scale-test_us-west2-a_cilium-cli-jf] Waiting for deployments [client echo-same-node] to become ready...

omitted output from commands that worked only the last connectivity test hangs on waiting to become ready. Looking at pods,

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                                       READY   STATUS    RESTARTS   AGE
kube-system   cilium-gke-node-init-cjr9v                                 1/1     Running   0          6m58s
kube-system   cilium-gke-node-init-rwtbt                                 1/1     Running   0          6m58s
kube-system   cilium-k7xsf                                               1/1     Running   0          6m56s
kube-system   cilium-mq2qs                                               1/1     Running   0          6m56s
kube-system   cilium-operator-74c66fdc5-qp2cv                            1/1     Running   0          6m56s
kube-system   event-exporter-gke-564fb97f9-lnzl6                         2/2     Running   0          11m
kube-system   fluentbit-gke-hwvrc                                        2/2     Running   0          93m
kube-system   fluentbit-gke-z2vqm                                        2/2     Running   0          93m
kube-system   gke-metrics-agent-fxnqq                                    1/1     Running   0          93m
kube-system   gke-metrics-agent-fz5tg                                    1/1     Running   0          93m
kube-system   kube-dns-6465f78586-j4m48                                  4/4     Running   0          12m
kube-system   kube-dns-6465f78586-mq748                                  4/4     Running   0          12m
kube-system   kube-dns-autoscaler-7f89fb6b79-5k46v                       1/1     Running   0          11m
kube-system   kube-proxy-gke-cilium-cli-jf-default-pool-792dc158-6nsz    1/1     Running   0          28m
kube-system   kube-proxy-gke-cilium-cli-jf-default-pool-792dc158-w2ls    1/1     Running   0          92m
kube-system   l7-default-backend-7fd66b8b88-mj9hg                        1/1     Running   0          11m
kube-system   metrics-server-v0.3.6-7b5cdbcbb8-dzbs4                     2/2     Running   0          11m
kube-system   pdcsi-node-4fkd4                                           2/2     Running   0          93m
kube-system   pdcsi-node-5jkgp                                           2/2     Running   0          93m
kube-system   stackdriver-metadata-agent-cluster-level-d5c84778d-wx8zj   2/2     Running   0          11m

This is GKE environment created with,

gcloud container clusters create cilium-cli-jf  --preemptible --image-type COS --num-nodes 2 --machine-type n1-standard-4 --zone us-west2-a

The hang is repeatable and happens every time for me. The rationale to delete the pods in the middle there was to ensure new pods were created on the next cilium connectivity test otherwise I'm not convinced they will be managed pods. The uninstall step does not remove the pods. And also in GKE at least the restart-pod option doesn't not restart cilium-test namespaced pods, only kube-system pods. [Not clear to me if this is intended or a bug on its own.]

This appears to be incorrect behavior, but let me know if its a user error or unsupported pattern. Thanks!

Expect specific command exit codes in connectivity tests

Some connectivity tests (e.g. #158) make curl fail with (expected) DNS resolution errors.

---------------------------------------------------------------------------------------------------------------------
🔌 [pod-to-world-toFQDNs] Testing cilium-test/client-68c6675687-xkv85 -> google.com:443...
---------------------------------------------------------------------------------------------------------------------
⌛ The following command is expected to fail...
✅ curl command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n --show-error --silent --fail --show-error --connect-timeout 5 --output /dev/null https://google.com" failed as expected: command terminated with exit code 28
✅ [pod-to-world-toFQDNs] cilium-test/client-68c6675687-xkv85 (10.0.0.22) -> google.com (google.com)

We should be able to narrow down the exit codes of the underlying test programs to ensure we don't get any false negatives with regards to test outcomes. For example, it's not acceptable for a test to fail on DNS resolution when a TCP connection failure was expected.

connectivity test times out in kind cluster with 1 worker and 1 control-plane node

Running cilium connectivity test on a kind cluster with one control-plane and one worker node leads to a time out:

$ cat kind-config.yaml 
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
networking:
  disableDefaultCNI: true
$ kind create cluster --config kind-config.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼
 ✓ Preparing nodes 📦 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing StorageClass 💾 
 ✓ Joining worker nodes 🚜 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋
$ cilium install                               
🔮 Auto-detected Kubernetes kind: kind
✨ Running "kind" validation checks
✅ Detected kind version "0.10.0"
ℹ️  Cilium version not set, using default version "v1.9.5"
🔮 Auto-detected cluster name: kind-kind
🔮 Auto-detected IPAM mode: kubernetes
ℹ️  kube-proxy-replacement disabled
🔮 Auto-detected datapath mode: tunnel
🔑 Generating CA...
2021/04/16 15:08:20 [INFO] generate received request
2021/04/16 15:08:20 [INFO] received CSR
2021/04/16 15:08:20 [INFO] generating key: ecdsa-256
2021/04/16 15:08:20 [INFO] encoded CSR
2021/04/16 15:08:20 [INFO] signed certificate with serial number 632200254072856107178309406146856789099952594575
🔑 Generating certificates for Hubble...
2021/04/16 15:08:20 [INFO] generate received request
2021/04/16 15:08:20 [INFO] received CSR
2021/04/16 15:08:20 [INFO] generating key: ecdsa-256
2021/04/16 15:08:20 [INFO] encoded CSR
2021/04/16 15:08:20 [INFO] signed certificate with serial number 547407880615380917102801676482800853796701366900
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed...
$ cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet         cilium             Desired: 2, Ready: 2/2, Available: 2/2
Containers:       cilium-operator    Running: 1
                  cilium             Running: 2
Image versions    cilium             quay.io/cilium/cilium:v1.9.5: 2
                  cilium-operator    quay.io/cilium/operator-generic:v1.9.5: 1

$ cilium connectivity test 
✨ [kind-kind] Creating namespace for connectivity check...
✨ [kind-kind] Deploying echo-same-node service...
✨ [kind-kind] Deploying same-node deployment...
✨ [kind-kind] Deploying client deployment...
✨ [kind-kind] Deploying echo-other-node service...
✨ [kind-kind] Deploying other-node deployment...
⌛ [kind-kind] Waiting for deployments [client echo-same-node] to become ready...
⌛ [kind-kind] Waiting for deployments [echo-other-node] to become ready...
Error: Connectivity test failed:  waiting for deployment echo-other-node to become ready has been interrupted: context deadline exceeded

There is a check for the single node case here:

if k.params.MultiCluster == "" {
daemonSet, err := k.client.GetDaemonSet(ctx, k.params.CiliumNamespace, defaults.AgentDaemonSetName, metav1.GetOptions{})
if err != nil {
k.Log("❌ Unable to determine status of Cilium DaemonSet. Run \"cilium status\" for more details")
return nil, fmt.Errorf("unable to determine status of Cilium DaemonSet: %w", err)
}
if daemonSet.Status.DesiredNumberScheduled == 1 && !k.params.SingleNode {
k.Log("ℹ️ Single node environment detected, enabling single node connectivity test")
k.params.SingleNode = true
}

But it seems in the above kind cluster that check wasn't hit.

Connectivity test to world can end in RST/RST

---------------------------------------------------------------------------------------------------------------------
🔌 [pod-to-world] Testing cilium-test/client-98ff44fc7-h5cvn -> https://google.com...
---------------------------------------------------------------------------------------------------------------------
✅ Drop not found for pod cilium-test/client-98ff44fc7-h5cvn
✅ DNS request found for pod cilium-test/client-98ff44fc7-h5cvn
✅ DNS response found for pod cilium-test/client-98ff44fc7-h5cvn
✅ SYN found for pod cilium-test/client-98ff44fc7-h5cvn
✅ SYN-ACK found for pod cilium-test/client-98ff44fc7-h5cvn
✅ FIN or RST found for pod cilium-test/client-98ff44fc7-h5cvn
❌ FIN-ACK not found for pod cilium-test/client-98ff44fc7-h5cvn
📄 Flow logs of pod cilium-test/client-98ff44fc7-h5cvn:
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.382: 10.0.1.46:45297 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 from-network FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 from-stack FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 from-network FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 from-stack FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.0.224:53 -> 10.0.1.46:45297 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.1.46:36165 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.1.46:36165 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.1.46:36165 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.384: 10.0.1.46:36165 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:36165 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:36165 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:36165 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:36165 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 to-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 from-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 to-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 from-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.130:53 -> 10.0.1.46:36165 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.385: 10.0.1.46:51351 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:45297 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 from-network FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 from-network FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:51351 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.1.46:40952 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:45297 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:45297 to-stack FORWARDED (UDP)
Feb 19 09:24:01.386: 10.0.0.224:53 -> 10.0.1.46:45297 to-network FORWARDED (UDP)
Feb 19 09:24:01.387: 10.0.0.224:53 -> 10.0.1.46:45297 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.387: 10.0.0.224:53 -> 10.0.1.46:45297 to-stack FORWARDED (UDP)
Feb 19 09:24:01.387: 10.0.0.224:53 -> 10.0.1.46:45297 to-network FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.388: 10.0.1.46:51351 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 to-stack FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 to-network FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 to-stack FORWARDED (UDP)
Feb 19 09:24:01.389: 10.0.0.224:53 -> 10.0.1.46:51351 to-network FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 to-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 from-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 to-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 from-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.130:53 -> 10.0.1.46:40952 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.390: 10.0.1.46:43677 -> 10.0.0.224:53 to-network FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 from-network FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.393: 10.0.1.46:43677 -> 10.0.0.224:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 from-network FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 from-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 from-network FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 from-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.0.224:53 -> 10.0.1.46:43677 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 to-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 from-stack FORWARDED (UDP)
Feb 19 09:24:01.394: 10.0.1.46:46918 -> 10.0.1.130:53 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.396: 10.0.1.130:53 -> 10.0.1.46:46918 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.396: 10.0.1.130:53 -> 10.0.1.46:46918 to-stack FORWARDED (UDP)
Feb 19 09:24:01.396: 10.0.1.130:53 -> 10.0.1.46:46918 from-stack FORWARDED (UDP)
Feb 19 09:24:01.396: 10.0.1.130:53 -> 10.0.1.46:46918 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 to-stack FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 to-network FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 to-stack FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.0.224:53 -> 10.0.1.46:43677 to-network FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.1.130:53 -> 10.0.1.46:46918 from-endpoint FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.1.130:53 -> 10.0.1.46:46918 to-stack FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.1.130:53 -> 10.0.1.46:46918 from-stack FORWARDED (UDP)
Feb 19 09:24:01.397: 10.0.1.130:53 -> 10.0.1.46:46918 to-endpoint FORWARDED (UDP)
Feb 19 09:24:01.398: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: SYN)
Feb 19 09:24:01.398: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: SYN)
Feb 19 09:24:01.399: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: SYN, ACK)
Feb 19 09:24:01.399: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Feb 19 09:24:01.399: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.399: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.412: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.412: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.412: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.412: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.420: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.420: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.420: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.420: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.421: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.421: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.421: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.421: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.422: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.422: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.423: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.423: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.455: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.455: 172.217.4.142:443 -> 10.0.1.46:41144 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.459: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.459: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 09:24:01.459: 10.0.1.46:41144 -> 172.217.4.142:443 from-endpoint FORWARDED (TCP Flags: ACK, RST)
Feb 19 09:24:01.459: 10.0.1.46:41144 -> 172.217.4.142:443 to-stack FORWARDED (TCP Flags: ACK, RST)
❌ [pod-to-world] cilium-test/client-98ff44fc7-h5cvn (10.0.1.46) -> https://google.com (https://google.com)

CLI not compatible with cilium status 1.10

$ cilium status

 
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         2 errors
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         OK
 \__/¯¯\__/    ClusterMesh:    OK
    \__/
Deployment        clustermesh-apiserver    Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet         cilium                   Desired: 2, Ready: 2/2, Available: 2/2
Deployment        cilium-operator          Desired: 1, Ready: 1/1, Available: 1/1
Deployment        hubble-relay             Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium                   Running: 2
                  cilium-operator          Running: 1
                  hubble-relay             Running: 1
                  clustermesh-apiserver    Running: 1
Image versions    cilium                   quay.io/cilium/cilium-ci:8a2a4e32b24859dbf3c4d00809c2c35aa85d75cb: 2
                  cilium-operator          quay.io/cilium/operator-aws:v1.9.3: 1
                  hubble-relay             quay.io/cilium/hubble-relay:v1.9.3: 1
                  clustermesh-apiserver    quay.io/coreos/etcd:v3.4.13: 1
                  clustermesh-apiserver    quay.io/cilium/clustermesh-apiserver:v1.9.2: 1
Errors:           cilium                   cilium-tbtzm    unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field KubeProxyReplacement.kube-proxy-replacement.devices of type string
                  cilium                   cilium-kzgks    unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field KubeProxyReplacement.kube-proxy-replacement.devices of type string

ServiceAccount hubble-relay left in-place after `cilium uninstall`

Steps to reproduce:

$ cilium install --cluster-name $CLUSTER_NAME
$ cilium hubble enable
$ cilium uninstall

After not using the cluster for a while, I attempted to re-install cilium using helm:

$ helm repo add cilium https://helm.cilium.io
$ helm repo update
$ helm install -n kube-system --set hubble.relay.enabled=true cilium cilium/cilium

This lead to the following error:

Error: rendered manifests contain a resource that already exists. Unable to continue with install: ServiceAccount "hubble-relay" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cilium-enterprise"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"

The ServiceAccount is indeed still there:

$ kubectl -n kube-system get serviceaccount hubble-relay
NAME                 SECRETS   AGE
hubble-relay         1         23m

Naturally, I would have expected that ServiceAccount to be uninstalled as well by cilium uninstall. Note that I can still get rid of it using cilium hubble disable before or after cilium uninstall. It might be more user-friendly though to remove as part of cilium uninstall as well, given all other hubble-relay objects are uninstalled.

uninstall: Remove hubble ui without removing hubble

Problem
Running cilium hubble disable --ui ignores the --ui flag and jumps straight to disabling all of hubble

Expected outcome
cilium hubble disable --ui would only disable the UI. On quick look in the hubble section we may only be looking disable and then reverting everything at once.

Versions
hubble v0.8.0-dev@master-d419ce1 compiled with go1.16.3 on darwin/amd64

connectivity: improve handling existing cilium-test namespace

The original reason for opening #115 was that I noticed an issue with cilium connectivity check.

When cilium-test namespace exists, srcDeploymentNeeded remains false and the deploy function returns without doing anything else.

func (k *K8sConnectivityCheck) deploy(ctx context.Context) error {
var srcDeploymentNeeded, dstDeploymentNeeded bool
if k.params.ForceDeploy {
if err := k.deleteDeployments(ctx, k.clients.src); err != nil {
return err
}
}
_, err := k.clients.src.GetNamespace(ctx, k.params.TestNamespace, metav1.GetOptions{})
if err != nil {
srcDeploymentNeeded = true
// In a single cluster environment, the source client is also
// responsibel for destination deployments
if k.params.MultiCluster == "" {
dstDeploymentNeeded = true
}
k.Log("✨ [%s] Creating namespace for connectivity check...", k.clients.src.ClusterName())
_, err = k.clients.src.CreateNamespace(ctx, k.params.TestNamespace, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create namespace %s: %s", k.params.TestNamespace, err)
}
}
if k.params.MultiCluster != "" {
if k.params.ForceDeploy {
if err := k.deleteDeployments(ctx, k.clients.dst); err != nil {
return err
}
}
_, err = k.clients.dst.GetNamespace(ctx, k.params.TestNamespace, metav1.GetOptions{})
if err != nil {
dstDeploymentNeeded = true
k.Log("✨ [%s] Creating namespace for connectivity check...", k.clients.dst.ClusterName())
_, err = k.clients.dst.CreateNamespace(ctx, k.params.TestNamespace, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create namespace %s: %s", k.params.TestNamespace, err)
}
}
}
if srcDeploymentNeeded {
k.Log("✨ [%s] Deploying echo-same-node service...", k.clients.src.ClusterName())
svc := newService(echoSameNodeDeploymentName, map[string]string{"name": echoSameNodeDeploymentName}, serviceLabels, "http", 8080)
_, err = k.clients.src.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
if k.params.MultiCluster != "" {
k.Log("✨ [%s] Deploying echo-other-node service...", k.clients.src.ClusterName())
svc := newService(echoOtherNodeDeploymentName, map[string]string{"name": echoOtherNodeDeploymentName}, serviceLabels, "http", 8080)
svc.ObjectMeta.Annotations = map[string]string{}
svc.ObjectMeta.Annotations["io.cilium/global-service"] = "true"
_, err = k.clients.src.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
}
echoDeployment := newDeployment(deploymentParameters{
Name: echoSameNodeDeploymentName,
Kind: kindEchoName,
Port: 8080,
Image: "quay.io/cilium/json-mock:1.2",
Affinity: &corev1.Affinity{
PodAffinity: &corev1.PodAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
{
LabelSelector: &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "name", Operator: metav1.LabelSelectorOpIn, Values: []string{clientDeploymentName}},
},
},
TopologyKey: "kubernetes.io/hostname",
},
},
},
},
ReadinessProbe: newLocalReadinessProbe(8080, "/"),
})
_, err = k.clients.src.CreateDeployment(ctx, k.params.TestNamespace, echoDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", echoSameNodeDeploymentName, err)
}
k.Log("✨ [%s] Deploying client service...", k.clients.src.ClusterName())
clientDeployment := newDeployment(deploymentParameters{Name: clientDeploymentName, Kind: kindClientName, Port: 8080, Image: "quay.io/cilium/alpine-curl:1.0", Command: []string{"/bin/ash", "-c", "sleep 10000000"}})
_, err = k.clients.src.CreateDeployment(ctx, k.params.TestNamespace, clientDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", clientDeploymentName, err)
}
}
if dstDeploymentNeeded {
if !k.params.SingleNode || k.params.MultiCluster != "" {
k.Log("✨ [%s] Deploying echo-other-node service...", k.clients.dst.ClusterName())
svc := newService(echoOtherNodeDeploymentName, map[string]string{"name": echoOtherNodeDeploymentName}, serviceLabels, "http", 8080)
if k.params.MultiCluster != "" {
svc.ObjectMeta.Annotations = map[string]string{}
svc.ObjectMeta.Annotations["io.cilium/global-service"] = "true"
}
_, err = k.clients.dst.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
echoOtherNodeDeployment := newDeployment(deploymentParameters{
Name: echoOtherNodeDeploymentName,
Kind: kindEchoName,
Port: 8080,
Image: "quay.io/cilium/json-mock:1.2",
Affinity: &corev1.Affinity{
PodAntiAffinity: &corev1.PodAntiAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
{
LabelSelector: &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "name", Operator: metav1.LabelSelectorOpIn, Values: []string{clientDeploymentName}},
},
},
TopologyKey: "kubernetes.io/hostname",
},
},
},
},
ReadinessProbe: newLocalReadinessProbe(8080, "/"),
})
_, err = k.clients.dst.CreateDeployment(ctx, k.params.TestNamespace, echoOtherNodeDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", echoOtherNodeDeploymentName, err)
}
}
}
return nil
}

Getting #115 done maybe be a little more involved than I hoped, so a simpler fixes will be needed.

How should we update the cilium-agent version used by the CLI?

Users are likely to pull the cilium-cli binary once and not update it particularly frequently.

Currently, we are embedding the "latest" version of the main Cilium components into the binary directly:

Version = "v1.9.4"

This means that the version will get out-of-date very easily, and even if we create a new cilium/cilium-cli release every time we do a stable branch release on cilium/cilium, if users do not update their CLI then by default they will get an older release. This is likely to include more bugs and potentially unresolved security issues.

How should we best handle pulling the latest stable release as part of cilium install?

'cilium clustermesh enable' doesn't take Cilium version into account

$ cilium install --cluster-name cilium-1 --cluster-id 1 --version v1.10.0-rc0
$ cilium clustermesh enable --service-type NodePort

currently results in quay.io/cilium/clustermesh-apiserver:v1.9.x being used. I guess this is a result of

return defaults.ClusterMeshApiserverImage

ClusterMeshApiserverImage = "quay.io/cilium/clustermesh-apiserver:" + Version

Version = "v1.9.5"

Maybe we could somehow look at the version of Cilium currently deployed and use the corresponding tag before defaulting?

clustermesh default cilium version is 1.9.2

We had cilium image related regression in Cilium v1.9.2 and due to which v1.9.3 was created, but cilium-cli is stuck with v1.9.2

 cilium install --cluster-id 1 --cluster-name cilium-1
🔮 Auto-detected Kubernetes kind: EKS
ℹ️  Cilium version not set, using default version "v1.9.2"
🔮 Auto-detected IPAM mode: eni
🔮 Auto-detected datapath mode: aws-eni
🔑 Generating CA...
2021/01/29 10:38:52 [INFO] generate received request
2021/01/29 10:38:52 [INFO] received CSR
2021/01/29 10:38:52 [INFO] generating key: ecdsa-256
2021/01/29 10:38:52 [INFO] encoded CSR
2021/01/29 10:38:52 [INFO] signed certificate with serial number 558327038851774533359718045903189424259695907456
🔑 Generating certificates for Hubble...
2021/01/29 10:38:53 [INFO] generate received request
2021/01/29 10:38:53 [INFO] received CSR
2021/01/29 10:38:53 [INFO] generating key: ecdsa-256
2021/01/29 10:38:53 [INFO] encoded CSR
2021/01/29 10:38:53 [INFO] signed certificate with serial number 551347253042918017024257987186811330133611658170
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed..

Re-enable AKS CI

  • The service principal created and shared is not working properly. After a couple of initial successful CI runs, the service principal seems to lose access to the required resources

Collect Kubernetes Endpoints output in sysdump

these are the backends for services, which we gather. it is useful to know what the state is in K8s of backing Endpoints for a Service to make sure that the Cilium datapath is correctly realizing the state of Endpoints in Kubernetes.

Feature request: add hubble-ui support?

cilium-cli current supports enabling Hubble with cilium hubble enable. This deploys hubble-relay but not hubble-ui. What do you think about adding cilium hubble-ui enable (or cilium hubble enable --ui)?

enabling Hubble in GKE

GKE runs Cilium when dataplane V2 is enabled. The installation is full managed, and as of today based on 1.9. The agent pod is named anet, but it still uses cilium-config configmap. I have tried updating the configmap, and found that any changes to flags that are already defined by GKE do get overridden by the add-on manager, but new keys stay untouched. I think it should be possible to enable Hubble TCP server and relay.

Flow logs are sometimes now showing up

Example:
The connection was successful and the logs are present for one fo the pods but not the other even though this is the same-node test so all flow logs are coming from the same agent

🔌 [pod-to-pod] Testing cilium-test/client-98ff44fc7-lrn92 -> cilium-test/echo-same-node-645db85ff-vtl7n...
---------------------------------------------------------------------------------------------------------------------
✅ Drop not found for pod cilium-test/echo-same-node-645db85ff-vtl7n
✅ RST not found for pod cilium-test/echo-same-node-645db85ff-vtl7n
❌ SYN not found for pod cilium-test/echo-same-node-645db85ff-vtl7n
❌ FIN not found for pod cilium-test/echo-same-node-645db85ff-vtl7n
📄 Flow logs of pod cilium-test/echo-same-node-645db85ff-vtl7n:
📄 Flow logs of pod cilium-test/client-98ff44fc7-lrn92:
Feb 19 16:21:28.257: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: SYN)
Feb 19 16:21:28.257: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: SYN)
Feb 19 16:21:28.257: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: SYN)
Feb 19 16:21:28.257: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: SYN)
Feb 19 16:21:28.257: 10.124.0.154:8080 -> 10.124.0.137:55948 from-endpoint FORWARDED (TCP Flags: SYN, ACK)
Feb 19 16:21:28.257: 10.124.0.154:8080 -> 10.124.0.137:55948 to-stack FORWARDED (TCP Flags: SYN, ACK)
Feb 19 16:21:28.257: 10.124.0.154:8080 -> 10.124.0.137:55948 from-stack FORWARDED (TCP Flags: SYN, ACK)
Feb 19 16:21:28.257: 10.124.0.154:8080 -> 10.124.0.137:55948 to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.258: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.258: 10.124.0.154:8080 -> 10.124.0.137:55948 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.154:8080 -> 10.124.0.137:55948 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.154:8080 -> 10.124.0.137:55948 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.258: 10.124.0.154:8080 -> 10.124.0.137:55948 to-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.269: 10.124.0.154:8080 -> 10.124.0.137:55948 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.269: 10.124.0.154:8080 -> 10.124.0.137:55948 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.269: 10.124.0.154:8080 -> 10.124.0.137:55948 from-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.269: 10.124.0.154:8080 -> 10.124.0.137:55948 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 19 16:21:28.269: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.269: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.269: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.269: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.270: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.270: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.270: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.270: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.272: 10.124.0.154:8080 -> 10.124.0.137:55948 from-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.272: 10.124.0.154:8080 -> 10.124.0.137:55948 to-stack FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.272: 10.124.0.154:8080 -> 10.124.0.137:55948 from-stack FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.272: 10.124.0.154:8080 -> 10.124.0.137:55948 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb 19 16:21:28.272: 10.124.0.137:55948 -> 10.124.0.154:8080 from-endpoint FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.272: 10.124.0.137:55948 -> 10.124.0.154:8080 to-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.272: 10.124.0.137:55948 -> 10.124.0.154:8080 from-stack FORWARDED (TCP Flags: ACK)
Feb 19 16:21:28.272: 10.124.0.137:55948 -> 10.124.0.154:8080 to-endpoint FORWARDED (TCP Flags: ACK)
❌ [pod-to-pod] cilium-test/client-98ff44fc7-lrn92 (10.124.0.137) -> cilium-test/echo-same-node-645db85ff-vtl7n (10.124.0.154)

[RFC] declarative Kubernetes API usage pattern

The nature of some of the functionality that cilium program implements, is often very procedural, however that shouldn't imply that all of the logic around API interactions is also procedural.

Presently, cilium install and cilium connectivity check follow a procedural pattern where a set of objects is generated and written to the API directly.

cilium-cli/install/install.go

Lines 1386 to 1512 in dff492a

func (k *K8sInstaller) Install(ctx context.Context) error {
if err := k.autodetectAndValidate(ctx); err != nil {
return err
}
switch k.flavor.Kind {
case k8s.KindEKS:
if _, err := k.client.GetDaemonSet(ctx, "kube-system", "aws-node", metav1.GetOptions{}); err == nil {
k.Log("🔥 Deleting aws-node DaemonSet...")
if err := k.client.DeleteDaemonSet(ctx, "kube-system", "aws-node", metav1.DeleteOptions{}); err != nil {
return err
}
}
case k8s.KindGKE:
if k.params.NativeRoutingCIDR == "" {
cidr, err := k.gkeNativeRoutingCIDR(ctx, k.client.ContextName())
if err != nil {
k.Log("❌ Unable to auto-detect GKE native routing CIDR. Is \"gcloud\" installed?")
k.Log("ℹ️ You can set the native routing CIDR manually with --native-routing-cidr")
return err
}
k.params.NativeRoutingCIDR = cidr
}
if err := k.deployResourceQuotas(ctx); err != nil {
return err
}
case k8s.KindAKS:
if k.params.Azure.ResourceGroupName == "" {
k.Log("❌ Azure resoure group is required, please specify --azure-resource-group")
return fmt.Errorf("missing Azure resource group name")
}
if err := k.createAzureServicePrincipal(ctx); err != nil {
return err
}
}
if err := k.installCerts(ctx); err != nil {
return err
}
k.Log("🚀 Creating Service accounts...")
if _, err := k.client.CreateServiceAccount(ctx, k.params.Namespace, k8s.NewServiceAccount(defaults.AgentServiceAccountName), metav1.CreateOptions{}); err != nil {
return err
}
if _, err := k.client.CreateServiceAccount(ctx, k.params.Namespace, k8s.NewServiceAccount(defaults.OperatorServiceAccountName), metav1.CreateOptions{}); err != nil {
return err
}
k.Log("🚀 Creating Cluster roles...")
if _, err := k.client.CreateClusterRole(ctx, ciliumClusterRole, metav1.CreateOptions{}); err != nil {
return err
}
if _, err := k.client.CreateClusterRoleBinding(ctx, k8s.NewClusterRoleBinding(defaults.AgentClusterRoleName, k.params.Namespace, defaults.AgentServiceAccountName), metav1.CreateOptions{}); err != nil {
return err
}
if _, err := k.client.CreateClusterRole(ctx, operatorClusterRole, metav1.CreateOptions{}); err != nil {
return err
}
if _, err := k.client.CreateClusterRoleBinding(ctx, k8s.NewClusterRoleBinding(defaults.OperatorClusterRoleName, k.params.Namespace, defaults.OperatorServiceAccountName), metav1.CreateOptions{}); err != nil {
return err
}
if k.params.Encryption {
if err := k.createEncryptionSecret(ctx); err != nil {
return err
}
}
k.Log("🚀 Creating ConfigMap...")
if _, err := k.client.CreateConfigMap(ctx, k.params.Namespace, k.generateConfigMap(), metav1.CreateOptions{}); err != nil {
return err
}
switch k.flavor.Kind {
case k8s.KindGKE:
k.Log("🚀 Creating GKE Node Init DaemonSet...")
if _, err := k.client.CreateDaemonSet(ctx, k.params.Namespace, k.generateGKEInitDaemonSet(), metav1.CreateOptions{}); err != nil {
return err
}
}
k.Log("🚀 Creating Agent DaemonSet...")
if _, err := k.client.CreateDaemonSet(ctx, k.params.Namespace, k.generateAgentDaemonSet(), metav1.CreateOptions{}); err != nil {
return err
}
k.Log("🚀 Creating Operator Deployment...")
if _, err := k.client.CreateDeployment(ctx, k.params.Namespace, k.generateOperatorDeployment(), metav1.CreateOptions{}); err != nil {
return err
}
if k.params.Wait {
k.Log("⌛ Waiting for Cilium to be installed...")
collector, err := status.NewK8sStatusCollector(ctx, k.client, status.K8sStatusParameters{
Namespace: k.params.Namespace,
Wait: true,
WaitDuration: k.params.WaitDuration,
WarningFreePods: []string{defaults.AgentDaemonSetName, defaults.OperatorDeploymentName},
})
if err != nil {
return err
}
s, err := collector.Status(ctx)
if err != nil {
if s != nil {
fmt.Println(s.Format())
}
return err
}
}
if k.params.RestartUnmanagedPods {
if err := k.restartUnmanagedPods(ctx); err != nil {
return err
}
}
return nil
}

func (k *K8sConnectivityCheck) deploy(ctx context.Context) error {
var srcDeploymentNeeded, dstDeploymentNeeded bool
if k.params.ForceDeploy {
if err := k.deleteDeployments(ctx, k.clients.src); err != nil {
return err
}
}
_, err := k.clients.src.GetNamespace(ctx, k.params.TestNamespace, metav1.GetOptions{})
if err != nil {
srcDeploymentNeeded = true
// In a single cluster environment, the source client is also
// responsibel for destination deployments
if k.params.MultiCluster == "" {
dstDeploymentNeeded = true
}
k.Log("✨ [%s] Creating namespace for connectivity check...", k.clients.src.ClusterName())
_, err = k.clients.src.CreateNamespace(ctx, k.params.TestNamespace, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create namespace %s: %s", k.params.TestNamespace, err)
}
}
if k.params.MultiCluster != "" {
if k.params.ForceDeploy {
if err := k.deleteDeployments(ctx, k.clients.dst); err != nil {
return err
}
}
_, err = k.clients.dst.GetNamespace(ctx, k.params.TestNamespace, metav1.GetOptions{})
if err != nil {
dstDeploymentNeeded = true
k.Log("✨ [%s] Creating namespace for connectivity check...", k.clients.dst.ClusterName())
_, err = k.clients.dst.CreateNamespace(ctx, k.params.TestNamespace, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create namespace %s: %s", k.params.TestNamespace, err)
}
}
}
if srcDeploymentNeeded {
k.Log("✨ [%s] Deploying echo-same-node service...", k.clients.src.ClusterName())
svc := newService(echoSameNodeDeploymentName, map[string]string{"name": echoSameNodeDeploymentName}, serviceLabels, "http", 8080)
_, err = k.clients.src.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
if k.params.MultiCluster != "" {
k.Log("✨ [%s] Deploying echo-other-node service...", k.clients.src.ClusterName())
svc := newService(echoOtherNodeDeploymentName, map[string]string{"name": echoOtherNodeDeploymentName}, serviceLabels, "http", 8080)
svc.ObjectMeta.Annotations = map[string]string{}
svc.ObjectMeta.Annotations["io.cilium/global-service"] = "true"
_, err = k.clients.src.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
}
echoDeployment := newDeployment(deploymentParameters{
Name: echoSameNodeDeploymentName,
Kind: kindEchoName,
Port: 8080,
Image: "quay.io/cilium/json-mock:1.2",
Affinity: &corev1.Affinity{
PodAffinity: &corev1.PodAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
{
LabelSelector: &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "name", Operator: metav1.LabelSelectorOpIn, Values: []string{clientDeploymentName}},
},
},
TopologyKey: "kubernetes.io/hostname",
},
},
},
},
ReadinessProbe: newLocalReadinessProbe(8080, "/"),
})
_, err = k.clients.src.CreateDeployment(ctx, k.params.TestNamespace, echoDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", echoSameNodeDeploymentName, err)
}
k.Log("✨ [%s] Deploying client service...", k.clients.src.ClusterName())
clientDeployment := newDeployment(deploymentParameters{Name: clientDeploymentName, Kind: kindClientName, Port: 8080, Image: "quay.io/cilium/alpine-curl:1.0", Command: []string{"/bin/ash", "-c", "sleep 10000000"}})
_, err = k.clients.src.CreateDeployment(ctx, k.params.TestNamespace, clientDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", clientDeploymentName, err)
}
}
if dstDeploymentNeeded {
if !k.params.SingleNode || k.params.MultiCluster != "" {
k.Log("✨ [%s] Deploying echo-other-node service...", k.clients.dst.ClusterName())
svc := newService(echoOtherNodeDeploymentName, map[string]string{"name": echoOtherNodeDeploymentName}, serviceLabels, "http", 8080)
if k.params.MultiCluster != "" {
svc.ObjectMeta.Annotations = map[string]string{}
svc.ObjectMeta.Annotations["io.cilium/global-service"] = "true"
}
_, err = k.clients.dst.CreateService(ctx, k.params.TestNamespace, svc, metav1.CreateOptions{})
if err != nil {
return err
}
echoOtherNodeDeployment := newDeployment(deploymentParameters{
Name: echoOtherNodeDeploymentName,
Kind: kindEchoName,
Port: 8080,
Image: "quay.io/cilium/json-mock:1.2",
Affinity: &corev1.Affinity{
PodAntiAffinity: &corev1.PodAntiAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
{
LabelSelector: &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "name", Operator: metav1.LabelSelectorOpIn, Values: []string{clientDeploymentName}},
},
},
TopologyKey: "kubernetes.io/hostname",
},
},
},
},
ReadinessProbe: newLocalReadinessProbe(8080, "/"),
})
_, err = k.clients.dst.CreateDeployment(ctx, k.params.TestNamespace, echoOtherNodeDeployment, metav1.CreateOptions{})
if err != nil {
return fmt.Errorf("unable to create deployment %s: %s", echoOtherNodeDeploymentName, err)
}
}
}
return nil
}

This is largely dictated by the Go client which requires calling a very specific function for each object type. However, it is not necessary to use this client for all API interactions.

  • it would desirable to be able to serialise all objects to YAML or JSON for user to apply separately, e.g. using GitOps, or just so they can preview what would happen (--dry-run)
  • it would be quite convenient if the implementation operated explicitly on a set of objects, rather then it being function with many side-effects, which also make the code more easily unit-testable
  • it would be desirable to have a single function that handles create-or-update logic that can leverage server-side-apply in clusters where it is available, as opposed to having a function for each object type
  • it should be possible to interact with some APIs dynamically, i.e. without having to include a fully-typed client that has extra dependencies and being bound to a particular version of the give API, e.g. this could be useful to support multiple version of Cilium APIs, or enable integrations with e.g. OpenShift APIs without needing a client with full set of types

This can implemented fairly easily using http://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/client, which can handle typed and unstructured objects.

Here is one example (albeit it m only does create-or-skip, it can be easily extend to support create-or-update/apply):

https://github.com/isovalent/gke-test-cluster-operator/blob/a6824ed4a47b5c0f78b1aefccfda39c7ef363b7e/controllers/common/common.go#L73-L114

status: Cilium status is green when Cilium is not able to manage pods

I am not sure if this is in scope of cilium status, but it is possible for Cilium to not be able to manage pods at all and report cilium status as green.

This issue came up when testing latest images on AKS. Details are in this issue: cilium/cilium#15496

Maybe cilium status should look at some k8s metrics that may indicate that cluster is not healthy WRT to managing pods?

Review/extend NodePort connectivity checks

This is a follow-up issue to cilium/cilium#14728. I wasn't able to reproduce the original issue based on the specified steps or enabling kube-proxy replacement. I'm creating this issue to review the current connectivity checks, and potentially see if it can be extended to catch the nodeport connectivity issue reported by a community user, where requests to node's external IP (public IP) from within cluster failed.

install: CLI hangs although the cluster appears to be installed fine

I created a 4 nodes kind cluster then used the following command;

$ cilium install --context kind-cluster1 --cluster-name cluster1 --cluster-id 1 --kube-proxy-replacement disabled 
🔮 Auto-detected Kubernetes kind: kind
✨ Running "kind" validation checks
✅ Detected kind version "0.9.0"
ℹ️  Cilium version not set, using default version "v1.9.2"
🔮 Auto-detected IPAM mode: kubernetes
🔑 Generating CA...
2021/01/22 14:39:56 [INFO] generate received request
2021/01/22 14:39:56 [INFO] received CSR
2021/01/22 14:39:56 [INFO] generating key: ecdsa-256
2021/01/22 14:39:56 [INFO] encoded CSR
2021/01/22 14:39:56 [INFO] signed certificate with serial number 694213983562267657784946858107907613901506252597
🔑 Generating certificates for Hubble...
2021/01/22 14:39:56 [INFO] generate received request
2021/01/22 14:39:56 [INFO] received CSR
2021/01/22 14:39:56 [INFO] generating key: ecdsa-256
2021/01/22 14:39:56 [INFO] encoded CSR
2021/01/22 14:39:56 [INFO] signed certificate with serial number 680976501129005789786372125785426297889267556290
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed...


    /¯¯\
 /¯¯\__/¯¯\    Cilium:         4 errors
 \__/¯¯\__/    Operator:       1 errors
 /¯¯\__/¯¯\    Hubble:         1 warnings
 \__/¯¯\__/    ClusterMesh:    1 warnings
    \__/

DaemonSet         cilium                   Desired: 4, Ready: 4/4, Available: 4/4
Containers:       cilium                   Running: 4
Image versions    cilium                   quay.io/cilium/cilium:v1.9.2: 4
Errors:           cilium                   cilium-wx6f6             unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field Masquerading.masquerading.enabled of type bool
                  cilium                   cilium-zkbhb             unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field Masquerading.masquerading.enabled of type bool
                  cilium                   cilium-67rp2             unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field Masquerading.masquerading.enabled of type bool
                  cilium                   cilium-ftvzf             unable to retrieve cilium status: unable to unmarshal response of cilium status: json: cannot unmarshal object into Go struct field Masquerading.masquerading.enabled of type bool
                  cilium-operator          cilium-operator          context deadline exceeded
Warnings:         hubble-relay             hubble-relay             Relay is not deployed
                  clustermesh-apiserver    clustermesh-apiserver    ClusterMesh is not deployed

Error: timeout while waiting for status to become successful: context deadline exceeded

It looks like the cilium installation went fine:

$ kubectl --context kind-cluster1 get pods -A
NAMESPACE            NAME                                             READY   STATUS    RESTARTS   AGE
kube-system          cilium-67rp2                                     1/1     Running   0          13m
kube-system          cilium-ftvzf                                     1/1     Running   0          13m
kube-system          cilium-operator-798674d575-hqcr2                 1/1     Running   0          13m
kube-system          cilium-wx6f6                                     1/1     Running   0          13m
kube-system          cilium-zkbhb                                     1/1     Running   0          13m
kube-system          coredns-f9fd979d6-4mkrv                          1/1     Running   0          15m
kube-system          coredns-f9fd979d6-5ttrn                          1/1     Running   0          15m
kube-system          etcd-cluster1-control-plane                      1/1     Running   0          15m
kube-system          kube-apiserver-cluster1-control-plane            1/1     Running   0          15m
kube-system          kube-controller-manager-cluster1-control-plane   1/1     Running   0          15m
kube-system          kube-proxy-46qkc                                 1/1     Running   0          15m
kube-system          kube-proxy-9vsnp                                 1/1     Running   0          15m
kube-system          kube-proxy-gcgx9                                 1/1     Running   0          15m
kube-system          kube-proxy-rj4kc                                 1/1     Running   0          15m
kube-system          kube-scheduler-cluster1-control-plane            1/1     Running   0          15m
local-path-storage   local-path-provisioner-78776bfc44-4p2sh          1/1     Running   0          15m

Cilium CLI should always print "datapath mode" when installing, not only when autodetecting

Example output from a user:

# ./cilium-cli/cilium install
:information_source:  Cilium version not set, using default version "v1.9.4"
� Auto-detected cluster name: kubernetes
� Auto-detected IPAM mode: cluster-pool
� Found existing CA in secret cilium-ca
� Generating certificates for Hubble...
2021/04/20 11:22:16 [INFO] generate received request
2021/04/20 11:22:16 [INFO] received CSR
2021/04/20 11:22:16 [INFO] generating key: ecdsa-256
2021/04/20 11:22:16 [INFO] encoded CSR
2021/04/20 11:22:16 [INFO] signed certificate with serial number 314242258959939237570091445604397824011757516364
� Creating Service accounts...
� Creating Cluster roles...
� Creating ConfigMap...
� Creating Agent DaemonSet...
� Creating Operator Deployment...
:hourglass: Waiting for Cilium to be installed...
:recycle:  Restarting unmanaged pods...
:recycle:  Restarted unmanaged pod

It's not clear what datapath mode was used in this case (closer inspection showed direct routing). It would be good to always print this information when installing so that if there is a problem to debug, we can have an understanding of how the datapath is operating based on the install.

Pod-to-host ICMP tests fail if NET_RAW capability is not provided by runtime

On a 1.20.4 cluster deployed with CRI-O 1.20.0 and "v6-first" dual-stack, I observe the following failures in the pod-to-host tests (IPs have been altered):

---------------------------------------------------------------------------------------------------------------------
🔌 [pod-to-host] Testing cilium-test/client-58dfdc5f6-fkgxg -> 2001:db8:0:1::21...
---------------------------------------------------------------------------------------------------------------------
❌ ping command failed: error in stream: command terminated with exit code 1
❌ [pod-to-host] cilium-test/client-58dfdc5f6-fkgxg (2001:db8:0:0:3::1234) -> 2001:db8:0:1::21 (2001:db8:0:1::21)

Exec'ing into the pod shows the failure is caused by insufficient permissions:

/ # ping 2001:db8:0:1::21
PING 2001:db8:0:1::21 (2001:db8:0:1::21): 56 data bytes
ping: permission denied (are you root?)
/ # echo $?
1

This appears to stem from the pod lacking the NET_RAW capability. In my specific case, this happens because CRI-O above 1.18.0 no longer gives NET_RAW to containers by default, as implemented in cri-o/cri-o#3119.

However, I assume this would also bite folks who are deploying their clusters with non-CRI-O runtimes but with restrictive pod security contexts lacking NET_RAW.

CRI-O offers a few ways to allow ping (enabling NET_RAW, enabling the ip_forward sysctl, etc) but I'm unsure of the best way to fix this in the more generic context of the Cilium CLI in "all possible container runtimes" (or if it's something that even should be considered here).

Figured it was probably worth opening an issue to discuss before flinging code at the problem.

Local NodePort connectivity test has faulty selection dst pod logic

I am observing a scenario where the local nodeport test selects the wrong dst pod to test connectivity against.

Src pod: client2
Src node: 10.0.194.76

❯ k -n cilium-test get pods -o custom-columns=NAME:.metadata.name,IP:.status.podIP,NODE:.status.hostIP
NAME                               IP              NODE
client-68c6675687-whsgk            10.128.9.1      10.0.179.183
client2-6c6cdcd976-t48ln           10.128.11.1     10.0.194.76
echo-other-node-588bf78fbb-4q8lz   10.128.11.203   10.0.194.76
echo-same-node-66589c9569-q4vcb    10.128.9.145    10.0.179.183

The CLI should select the dst pod as echo-other-node as it's on the same node as the src pod.

However, the CLI selects 31646 as the dst port (because it's a nodeport test) which means the dst pod is incorrectly selected as echo-same-node.

❯ k -n cilium-test get svc -o wide
NAME              TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)       
echo-other-node   NodePort   172.30.167.139   <none>        8080:32564/TCP
echo-same-node    NodePort   172.30.35.187    <none>        8080:31646/TCP

In summary, the traffic looks like this:

client2 -> echo-same-node
10.128.11.1:src-port -> 10.0.194.76:31646

where it should actually look like this:

client2 -> echo-other-node
10.128.11.1:src-port -> 10.0.194.76:32564

This causes the connectivity to fail as a false positive.

ClusterMesh status does not print good errors when it times out

Example:

  CONTEXT1=$(kubectl config view | grep cilium-cli-ci-multicluster-1-139 | head -1 | awk '***print $2***')
  cilium --context $CONTEXT1 clustermesh status --wait --wait-duration 5m
  shell: /bin/bash -e ***0***
  env:
    clusterName1: cilium-cli-ci-multicluster-1-139
    clusterName2: cilium-cli-ci-multicluster-2-139
    zone: us-west2-a
    CLOUDSDK_METRICS_ENVIRONMENT: github-actions-setup-gcloud
    GCLOUD_PROJECT: ***
    GOOGLE_APPLICATION_CREDENTIALS: /home/runner/work/cilium-cli/cilium-cli/615244fc-eaaa-466e-8d27-775d74cd8ecd
✅ Cluster access information is available:
  - 10.168.15.253:2379
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
Error: Unable to determine status:  context deadline exceeded

install: invalid cluster-id should be blocked at creation time

I was following the instructions for setting up clustermesh on GKE and didn't notice that line that mentioned cluster-id should be between 0..255

Command I ran cilium install --cluster-name cml-cluster-a --cluster-id 14480 . This caused my cilium install to hang until it timeout out. It was then quite clear from the Cilium daemonset's pod logs that this cluster-id number was outside the expected bounds.

I realize that this is my fault for just skipping over that message in the docs, but we might consider some form of linting to short-circuit this eventual feedback loop for others that may have missed the parameter limits.

Environment info:
GKE - v1.18.15-gke.1501
Cilium - 1.9.5
Cilium-CLI - v0.5-dev@master-15910bd

install: revert changes if install fails

Problem
While testing the clustermesh setup I had some misconfigured flags that caused the install to exit. When I corrected the flags, the install immediately failed because several resource quotas already exist

cilium-resource-quota
cilium-operator-resource-quota

Expected outcome
If cilium install fails early on for an initial installation, resources should be cleaned up.

Versions
cilium-cli: v0.5-dev@master-15910bd compiled with go1.16.2 on darwin/amd64
k8s: gke-1.18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.