Comments (12)
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from ingress-nginx.
/remove-kind bug
/kind support
/triage needs-information
1- dns should hit the service and not the pod ips so when they're replaced dns doesn't just timeout. (I haven't figured this one out yet.)
- What was the ipaddress of the coredns pods before and after they/it crashed/replaced
- What do you mean dns here ? Do you mean dns resolution queries for resolving the backend-app pod ipaddress ?
- What do yo mean by service ? Is it the backend-app service configured in the ingress resource ?
- What do you mean pod ips here ? Do you mean the pod ips of the backend app ?
from ingress-nginx.
The pod doesn't have to crash, any dropped dns queries to resolve the memcached host will cause the configured timeout to not be followed. (Assuming the memcached timeout is < dns retry interval.)
This is an example:
coredns pod ip
~ % kubectl get pod -n kube-system -o wide | grep coredns
rke2-coredns-rke2-coredns-76d7556c69-lhr4h 1/1 Running 0 10m 10.64.2.210
Delete the pod, or rollout restart the deployment, and the new pod will get a new IP.
~ % kubectl delete pod -n kube-system rke2-coredns-rke2-coredns-76d7556c69-lhr4h
pod "rke2-coredns-rke2-coredns-76d7556c69-lhr4h" deleted
~ % kubectl get pod -n kube-system -o wide | grep coredns
rke2-coredns-rke2-coredns-76d7556c69-q7prt 1/1 Running 0 53s 10.64.2.51
By DNS - I mean the DNS query that ingress-nginx uses to resolve the global-rate-limit-memcached-host to an IP address.
By service I mean the coredns service, in this case 10.65.0.10 (10.65/16 is this example clusters service range. 10.64/16 is the pod cidr range, with /24's being given to each node to be assigned to pods.)
~ % kubectl get svc -n kube-system rke2-coredns-rke2-coredns
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rke2-coredns-rke2-coredns ClusterIP 10.65.0.10 <none> 53/UDP,53/TCP 61d
Now connections will be kept alive to memcached, but if the memcached deployment also gets an update or pods get rescheduled, ingress-nginx will try to reconnect and query coredns for the IP.
When this happens (only with queries to ingress objects using global-rate-limit) this shows up in the hubble/cilum logs.
ingress-nginx/ingress-nginx-controller-74rb6:53239 (ID:28807) <> 10.64.2.210:53 (world) Stale or unroutable IP DROPPED (UDP)
(Notice how this is to a coredns pod IP and not to the coredns service address.)
Then after the dns query to resolve the global-rate-limit-memcached-host times out, the request to the backend pod happens as normal - so every request is delayed by an extra 10s.
I haven't figured out why this happens, and querying for the memcached hostname from within the nginx pod resolves correctly.
~ % kubectl exec --stdin --tty -n ingress-nginx ingress-nginx-controller-74rb6 -- bash
ingress-nginx-controller-74rb6:/etc/nginx$ nslookup memcached.ingress-nginx.svc.cluster.local.
Server: 10.65.0.10
Address: 10.65.0.10:53
Name: memcached.ingress-nginx.svc.cluster.local
Address: 10.65.7.46
This suggests to me that it's something within ingress-nginx/openresty/lua that is doing something weird with dns, but I haven't figured that part out yet. (It could also be some weird issue with cilium/ebpf.)
None of these specifics really matter, because regardless of the underlying issue, it shouldn't cause a 10s delay to requests when the timeout is configured to be 50ms.
from ingress-nginx.
Thanks for explaining.
But none of your data is actual live status from output of commands so I personally don't see much relevance pointing to a problem in the controller like the suggestion that you configured a 50ms timeout for something, somewhere in the controller and the controller did not honor that.
- There is dns cache involved
- There is arp involved
- I dont know where you configured a 50ms timeout for ingress-nginx-controller pod to resolve a memcached pod's hostname to a ipaddress
- And I suspect the controller you have running is not even a image built/released by this project
But these are my opinions. Wait for comments from experts who agree that a user breaking coredns showcases a bug in the ingress-controller image you are running, which may not be a image released by this project.
from ingress-nginx.
Also, if the coredns pods crashed and recreated, is there a hint here that the ipaddress of the memcached pod changed ?
I wish you get to the bottom of this issue and solve future occurrence of this break in rate-limiting
from ingress-nginx.
This is using the ingress-nginx helm chart. You can see the output of nginx-ingress-controller --version
in the description.
NGINX Ingress controller
Release: v1.10.1
Build: https://github.com/kubernetes/ingress-nginx/commit/4fb5aac1dd3669daa3a14d9de3e3cdb371b4c518
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.3
You can see the helm chart values I used to reproduce this issue in the description. The only thing configured is the memcached host.
controller:
global-rate-limit-memcached-host: memcached.ingress-nginx.svc.cluster.local.
The 50ms timeout is the default, as shown here:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#global-rate-limit
global-rate-limit-memcached-connect-timeout: configure timeout for connect, send and receive operations. Unit is millisecond. Defaults to 50ms.
DNS cache is implementation specific, If you have more information on the lua resolver dns cache implementation, that would be helpful. As indicated by my nslookup command from within the nginx controller pod, the pod can resolve it correctly, and uses the coredns service address to do so, but based on the traffic captured the lua code for the global-rate-limit does not.
This does not require blocking traffic to coredns, or coredns crashing. Coredns was updated, and how updates work is by creating a new pod, waiting for it to become ready, the service/endpoint slices get updated, and then the old pods are removed.
from ingress-nginx.
Thanks for your comments.
If I was to summarize the problem, it would like below.
You are reporting that after a crash/recreate of the coredns pods in a cluster, it takes 10s for ingress-nginx-controller to resolve the memcached workload service to a memcached pod ipaddress, and this 10s is a blind-spot for the rate-limit annotation because it can NOT get the counters in those 10seconds.
You are expecting that it should take only 50ms, after the crash/recreate of coredns pods in a cluster, for ingress-nginx to resolve & query the memcached workload and get the rate-limit-counters, for the rate-limit-annotation.
Is this correct ?
from ingress-nginx.
I deleted coredns pod in one terminal and I did a nslookup for a service resource in a bash while loop with sleep for 1 second. I don't see even 1 nslookup failure
from ingress-nginx.
2 issues - issue 1 causes issue 2.
DNS resolution from within the global-rate-limit code-path does not use the service ip, and does not reach coredns if the pod IPs change. This does not apply to nslookup from within the ingress-nginx pod, or to resolving the backend pods for an ingress. (I believe this uses the k8s api instead of dns, but regardless.)
Any dns lookup failure (i.e. timeout, not an immediate NXDOMAIN) will cause traffic through ingress-nginx to be delayed by 10s, even when the timeout configured for the global-rate-limit is set to 50ms.
Here's an easier way you can reproduce issue 2. This is showing both the input (commands I run) and the output. I'll put just the commands down below.
user@computer ~ % kind create cluster
Creating cluster "kind" ...
â Ensuring node image (kindest/node:v1.29.2) đŧ
â Preparing nodes đĻ
â Writing configuration đ
â Starting control-plane đšī¸
â Installing CNI đ
â Installing StorageClass đž
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Not sure what to do next? đ
Check out https://kind.sigs.k8s.io/docs/user/quick-start/
user@computer ~ % helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.config.global-rate-limit-memcached-host=asdf.blackhole.alphabet5.dev
Release "ingress-nginx" does not exist. Installing it now.
NAME: ingress-nginx
LAST DEPLOYED: Fri May 3 12:21:37 2024
NAMESPACE: ingress-nginx
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the load balancer IP to be available.
You can watch the status by running 'kubectl get service --namespace ingress-nginx ingress-nginx-controller --output wide --watch'
An example Ingress that makes use of the controller:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example
namespace: foo
spec:
ingressClassName: nginx
rules:
- host: www.example.com
http:
paths:
- pathType: Prefix
backend:
service:
name: exampleService
port:
number: 80
path: /
# This section is only required if TLS is to be enabled for the Ingress
tls:
- hosts:
- www.example.com
secretName: example-tls
If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided:
apiVersion: v1
kind: Secret
metadata:
name: example-tls
namespace: foo
data:
tls.crt: <base64 encoded cert>
tls.key: <base64 encoded key>
type: kubernetes.io/tls
user@computer ~ % cat <<EOF > manifests.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy
spec:
selector:
matchLabels:
app: dummy
replicas: 1
template:
metadata:
labels:
app: dummy
spec:
containers:
- name: podinfo
image: stefanprodan/podinfo
ports:
- containerPort: 9898
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dummy
annotations:
nginx.ingress.kubernetes.io/global-rate-limit: "1000000"
nginx.ingress.kubernetes.io/global-rate-limit-window: "1m"
spec:
ingressClassName: nginx
rules:
- host: "localhost"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: dummy
port:
number: 9898
---
apiVersion: v1
kind: Service
metadata:
name: dummy
spec:
selector:
app: dummy
ports:
- protocol: TCP
port: 9898
targetPort: 9898
EOF
user@computer ~ % kubectl apply -f manifests.yaml
deployment.apps/dummy created
ingress.networking.k8s.io/dummy created
service/dummy created
user@computer ~ % sudo kubectl port-forward -n ingress-nginx svc/ingress-nginx-controller 80:80
user@computer ~ % Forwarding from 127.0.0.1:80 -> 80
Forwarding from [::1]:80 -> 80
user@computer ~ % time curl http://localhost
Handling connection for 80
user@computer ~ % time curl http://localhost
Handling connection for 80
{
"hostname": "dummy-749c8b54cd-js75j",
"version": "6.6.2",
"revision": "8d010c498e79f499d1b37480507ca1ffb81a3bf7",
"color": "#34577c",
"logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",
"message": "greetings from podinfo v6.6.2",
"goos": "linux",
"goarch": "arm64",
"runtime": "go1.22.2",
"num_goroutine": "6",
"num_cpu": "4"
}curl http://localhost 0.00s user 0.00s system 0% cpu 6.028 total
kind create cluster
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.config.global-rate-limit-memcached-host=asdf.blackhole.alphabet5.dev
cat <<EOF > manifests.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy
spec:
selector:
matchLabels:
app: dummy
replicas: 1
template:
metadata:
labels:
app: dummy
spec:
containers:
- name: podinfo
image: stefanprodan/podinfo
ports:
- containerPort: 9898
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dummy
annotations:
nginx.ingress.kubernetes.io/global-rate-limit: "1000000"
nginx.ingress.kubernetes.io/global-rate-limit-window: "1m"
spec:
ingressClassName: nginx
rules:
- host: "localhost"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: dummy
port:
number: 9898
---
apiVersion: v1
kind: Service
metadata:
name: dummy
spec:
selector:
app: dummy
ports:
- protocol: TCP
port: 9898
targetPort: 9898
EOF
sudo kubectl port-forward -n ingress-nginx svc/ingress-nginx-controller 80:80
time curl http://localhost
If you modify the memcached host to something that doesn't exist, the timeout works as expected. (i.e. if the failure is not dns related, and is the socket connect)
from ingress-nginx.
For this last reproduction, the time is not exactly 10s because the SERVFAIL time will vary based on your upstream dns servers. If lua can't connect to your dns server, it will always be 10s. (Because it's configured for 5x retries, 2s timeout.)
time nslookup asdf.blackhole.alphabet5.dev 1.1.1.1
Server: 1.1.1.1
Address: 1.1.1.1#53
** server can't find asdf.blackhole.alphabet5.dev: SERVFAIL
real 0m4.079s
user 0m0.005s
sys 0m0.005s
time nslookup asdf.blackhole.alphabet5.dev 8.8.8.8
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; no servers could be reached
real 0m15.131s
user 0m0.004s
sys 0m0.008s
from ingress-nginx.
Hi @alphabet5 , thank you very much for the updated info you provided. Appreciate the time taken to give details.
These 2 message from you highlighted below , seem to be the key to issue you are reporting.
NS resolution from within the global-rate-limit code-path does not use the service ip, and does not reach coredns if the pod IPs change.
and this message
This does not apply to nslookup from within the ingress-nginx pod, or to resolving the backend pods for an ingress. (I believe this uses the k8s api instead of dns, but regardless.)
I think it will help if you copy/paste the links to the code-snippets you are referring to.
Meanwhile I will try to look it up and see if I can understand the details of the 2 messages highlighted above.
from ingress-nginx.
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
from ingress-nginx.
Related Issues (20)
- nginx_status block configuration HOT 3
- Custom-error page is not working HOT 3
- error page for response code 302 HOT 6
- Option to skip the rbac creation in admission-webhooks HOT 1
- Download via Ingress Nginx super slow compared to External IP or NodePort HOT 20
- Kafka Broker Cannot Access Outside Kubernetes Cluster via ClusterIP Services HOT 1
- Open Telemetry can not collect header request and body HOT 1
- LanguageSpecificPackageVulnerability HOT 4
- Vulnerability (CVE-2022-27782) in the curl package HOT 2
- Add option to forbid plain http requests (where ssl-redirect is unsafe) HOT 4
- Kubectl plugin backends does not show backends for TCP and UDP services HOT 2
- [Vulnerabilities] current version of nginx-ingress-controller v1.10.1 has many vulnerabilties HOT 14
- Question: why the case insensitive regular expression location modifier will be enforced on ALL paths If the use-regex is set HOT 12
- Nginx version 1.25.3 is End Of Life HOT 5
- Add a few more http headers to custom-error-pages default backend requests HOT 3
- TLS passthrough fails if Client Hello is fragmented in multiple TCP packets. HOT 3
- make nginx-1.9.3-hardened1 error HOT 3
- Connection timed out for controller Pods. HOT 2
- Text Streaming not working when using Nginx-Ingress on Kubernetes HOT 7
- doc error about traffic mirror HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ingress-nginx.