GithubHelp home page GithubHelp logo

Comments (7)

prateeksahu avatar prateeksahu commented on May 28, 2024 1

Hey @pchaigno, thanks for all the tips and directions.
There was a rogue process on the master which was hogging up the port on which vxlan was supposed to run. Upon reboot and cleaning up the firewall rules, the experiments ran smoothly.

from cilium-perf-networking.

prateeksahu avatar prateeksahu commented on May 28, 2024

I tried out the direct-routing mode and that successfully ran the kubenetbench benchmark. It seems tunnelling/vxlan setting is causing some issue with pod2pod communication.
Any insights in this regard will be helpful.

from cilium-perf-networking.

pchaigno avatar pchaigno commented on May 28, 2024

Could you share a Cilium sysdump? Did you check for pakcet drops reported by Cilium? Did you try to trace the failing connection with Cilium and/or tcpdump?

from cilium-perf-networking.

prateeksahu avatar prateeksahu commented on May 28, 2024

Hi @pchaigno, I have attached the sysdump here. I did find some failed tasks while generating the sysdump mostly pertaining to hubble, but that was expected since I am not running hubble while doing the perf testing.
I did find one failed task that I am not completely certain: [10] Collecting Cilium egress NAT policies: failed to collect Cilium egress NAT policies: the server could not find the requested resource (get ciliumegressnatpolicies.cilium.io)

I will check tcpdump by running the server and
client independently and report back. Unfortunately, I raised the issue at a bad time since I am travelling for about a week, but will update asap.
cilium-sysdump-20220215-171559.zip

from cilium-perf-networking.

pchaigno avatar pchaigno commented on May 28, 2024

It looks like both nodes can't reach pods on the other node. That traffic should go through the tunnel, so most likely explanation is that the encapsulated traffic is dropped. Maybe you need to open a hole in some firewall to allow the VXLAN traffic through?

from cilium-perf-networking.

prateeksahu avatar prateeksahu commented on May 28, 2024

@pchaigno. Apologies for the delay.
I see that VXLAN port 8472 is open on both the nodes

master
sudo netstat -a -n | grep 8472                                                                                                                                                                        12:35:38
udp        0      0 0.0.0.0:8472            0.0.0.0:*

node
sudo netstat -a -n | grep 8472                                                                                                                                     
udp        0      0 0.0.0.0:8472            0.0.0.0:*                          
udp6       0      0 :::8472                 :::*

While trying to do a tcpdump, I did notice the 'cilium_vxlan' interface was not up on the master.
master:

sudo tcpdump -D                                                                                                                                                                                       12:34:07
1.cilium_net [Up, Running]
2.cilium_host [Up, Running]
3.enp2s0 [Up, Running]
4.flannel.1 [Up, Running]
5.lo [Up, Running, Loopback]
6.any (Pseudo-device that captures on all interfaces) [Up, Running]
7.virbr0 [Up]
8.docker0 [Up]
9.virbr1 [Up]
10.virbr2 [Up]
11.bluetooth-monitor (Bluetooth Linux Monitor) [none]
12.nflog (Linux netfilter log (NFLOG) interface) [none]
13.nfqueue (Linux netfilter queue (NFQUEUE) interface) [none]
14.virbr1-nic [none]
15.virbr2-nic [none]
16.virbr0-nic [none]
17.cilium_vxlan [none]

the same is up and running on the worker node

sudo tcpdump -D                                                                                                                                                    Thu 02/24 12:35 PM CST
[sudo] password for prateek: 
1.cilium_net [Up, Running]
2.cilium_host [Up, Running]
3.cilium_vxlan [Up, Running]
4.enp5s0 [Up, Running]
5.lo [Up, Running, Loopback]
6.any (Pseudo-device that captures on all interfaces) [Up, Running]
7.enp3s0 [Up]
8.virbr0 [Up]
9.docker0 [Up]
10.bluetooth-monitor (Bluetooth Linux Monitor) [none]
11.nflog (Linux netfilter log (NFLOG) interface) [none]
12.nfqueue (Linux netfilter queue (NFQUEUE) interface) [none]
13.virbr0-nic [none]

Could you provide instructions on debugging why the interface is not showing up?

from cilium-perf-networking.

pchaigno avatar pchaigno commented on May 28, 2024

Did you check the Cilium agents and dmesg on the node for errors?

from cilium-perf-networking.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.