GithubHelp home page GithubHelp logo

Comments (14)

howardjohn avatar howardjohn commented on May 26, 2024 2

Oh so they are rejected:

The table size of maglev must be prime

I didn't know about that constraint, and we don't validate it or document it -- but clearly we should

from istio.

howardjohn avatar howardjohn commented on May 26, 2024 1

Ah got it. Is this happening repeatedly? or you have a few stuck pods and haven't restarted all of them?

if repeatedly - can you send a new log now that the maglev issue is fixed to ensure that wasn't somehow messing with things?

from istio.

PrabhdeepsGill avatar PrabhdeepsGill commented on May 26, 2024 1

Steps to reproduce this issue.

  1. Starting with all the proxies in synced state, (istio version 1.19.7, also possible to repo in 1.18.x)
istioctl proxy-status  -n istio-system
NAME                                                                       CLUSTER        CDS        LDS        EDS        RDS        ECDS         ISTIOD                      VERSION
alertmanager-kube-prometheus-stack-alertmanager-0.istio-system             Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-55575fbb98-7rk66     1.19.7
grafana-6b7bd6c985-cgvxl.istio-system                                      Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-55575fbb98-7rk66     1.19.7
istio-ingressgateway-774c6b8695-2gz6t.istio-system                         Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-55575fbb98-7rk66     1.19.7
istio-ingressgateway-774c6b8695-lk8jt.istio-system                         Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-55575fbb98-7rk66     1.19.7
...

Relevant snapshot of /debug/sync from istiod

{
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-2gz6t.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "f3a59072-f389-4f09-876a-c1281254474c",
    "cluster_acked": "f3a59072-f389-4f09-876a-c1281254474c",
    "listener_sent": "4aeb4141-bd97-49f7-a9f3-691fff13bb25",
    "listener_acked": "4aeb4141-bd97-49f7-a9f3-691fff13bb25",
    "route_sent": "838a8e16-8181-4d77-8a64-31f0f9e7eee6",
    "route_acked": "838a8e16-8181-4d77-8a64-31f0f9e7eee6",
    "endpoint_sent": "10d2a8bf-0c70-46a5-8ecb-3f9e8b244089",
    "endpoint_acked": "10d2a8bf-0c70-46a5-8ecb-3f9e8b244089"
  },
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-lk8jt.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "752ba83c-e00d-4261-ae0f-27f3b567e4d6",
    "cluster_acked": "752ba83c-e00d-4261-ae0f-27f3b567e4d6",
    "listener_sent": "d1a184af-89ca-4b05-95c5-267ae1118a5e",
    "listener_acked": "d1a184af-89ca-4b05-95c5-267ae1118a5e",
    "route_sent": "705e2514-e807-43e5-ba96-fa9345f5d750",
    "route_acked": "705e2514-e807-43e5-ba96-fa9345f5d750",
    "endpoint_sent": "28d32732-be43-44a0-8f41-f602975786cf",
    "endpoint_acked": "28d32732-be43-44a0-8f41-f602975786cf"
  },
  1. Create a DestinationRule with Maglev lb example
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata: 
  name: fortio-test-routing
  namespace: fortio
spec: 
  host: fortio-client.fortio.svc.cluster.local
  trafficPolicy: 
    loadBalancer: 
      consistentHash: 
        httpQueryParameterName: url
        maglev: 
          tableSize: 500
  1. CDS on all proxies becomes stale. We see the error The table size of maglev must be prime on all proxies.
    /debug/sync info
{
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-2gz6t.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "d39434ea-00eb-4016-a490-873944dfc003",
    "cluster_acked": "f3a59072-f389-4f09-876a-c1281254474c",
    "listener_sent": "b7346660-ae73-4574-865c-e9556730ccb7",
    "listener_acked": "b7346660-ae73-4574-865c-e9556730ccb7",
    "route_sent": "ee5cb966-1e65-440b-af6d-5a8163d4e6bb",
    "route_acked": "ee5cb966-1e65-440b-af6d-5a8163d4e6bb",
    "endpoint_sent": "b68fd7c2-07da-4ee1-bce6-87a1e0efc2a7",
    "endpoint_acked": "b68fd7c2-07da-4ee1-bce6-87a1e0efc2a7"
  },
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-lk8jt.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "7954d038-fbb8-499c-8381-88a458efda70",
    "cluster_acked": "752ba83c-e00d-4261-ae0f-27f3b567e4d6",
    "listener_sent": "d00c1755-1edb-411f-98ab-86b85e7c871a",
    "listener_acked": "d00c1755-1edb-411f-98ab-86b85e7c871a",
    "route_sent": "7799a716-784e-46ce-b6bb-ab738ecf3d7d",
    "route_acked": "7799a716-784e-46ce-b6bb-ab738ecf3d7d",
    "endpoint_sent": "704fd12d-3941-4da3-ba78-24d5493dc025",
    "endpoint_acked": "704fd12d-3941-4da3-ba78-24d5493dc025"
  },
  1. Restart istiod. And proxies go into state STALE (Never Acknowledged).
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-2gz6t.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "e4266698-343c-41de-895c-ec390e2f9895",
    "listener_sent": "75f7814f-c4bb-4775-8bff-0affdc375d42",
    "listener_acked": "75f7814f-c4bb-4775-8bff-0affdc375d42",
    "route_sent": "62f4b423-b12c-4a98-83d3-8ad3ea2a2734",
    "route_acked": "62f4b423-b12c-4a98-83d3-8ad3ea2a2734",
    "endpoint_sent": "62ff557f-4bb9-423e-95db-1c0789d2d98c",
    "endpoint_acked": "62ff557f-4bb9-423e-95db-1c0789d2d98c"
  },
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-lk8jt.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "ea8157b8-6b65-40b2-a77d-47ddce075284",
    "listener_sent": "bfaf6443-3264-4c91-8da7-d64c09f60b84",
    "listener_acked": "bfaf6443-3264-4c91-8da7-d64c09f60b84",
    "route_sent": "ab1b8361-9f84-4083-ad96-7d45402da957",
    "route_acked": "ab1b8361-9f84-4083-ad96-7d45402da957",
    "endpoint_sent": "acc4e7c0-cd0b-44e4-98b4-2f28a4f66d43",
    "endpoint_acked": "acc4e7c0-cd0b-44e4-98b4-2f28a4f66d43"
  },
  1. After sometime (~30 mins). Istiod stops sending CDS conf to all the proxies
{
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-2gz6t.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "listener_sent": "68816399-c1b8-490a-a655-703283417d18",
    "listener_acked": "68816399-c1b8-490a-a655-703283417d18",
    "route_sent": "d311ea57-276e-42e5-b93a-749ab8b46b4d",
    "route_acked": "d311ea57-276e-42e5-b93a-749ab8b46b4d",
    "endpoint_sent": "7b6b697f-b6bd-47cc-8207-e7186582dfb2",
    "endpoint_acked": "7b6b697f-b6bd-47cc-8207-e7186582dfb2"
  },
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-ingressgateway-774c6b8695-lk8jt.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "listener_sent": "ed77d779-6bed-4b4d-aa9e-6cb4325150e5",
    "listener_acked": "ed77d779-6bed-4b4d-aa9e-6cb4325150e5",
    "route_sent": "20f491fd-c194-4255-bbb4-4bafa0730d23",
    "route_acked": "20f491fd-c194-4255-bbb4-4bafa0730d23",
    "endpoint_sent": "44d5669b-6e82-4701-aa73-e1ad1ec1f82e",
    "endpoint_acked": "44d5669b-6e82-4701-aa73-e1ad1ec1f82e"
  }
  1. Now you can fix the DestinationRule or delete it the proxies will be stuck in this state.

from istio.

ldemailly avatar ldemailly commented on May 26, 2024

Were istiod-5756dc65b7-pl9ns and istiod-5756dc65b7-sw8sc (not sent) different in any way from istiod-5756dc65b7-42jmx (synced) ?
Any error or issues in their logs?

Can you reproduce this if everyone is running 1.19.10 ? I see you have 1.17.3 (12 proxies) - that's more than 2 minor version behind

from istio.

bseenu avatar bseenu commented on May 26, 2024

Were istiod-5756dc65b7-pl9ns and istiod-5756dc65b7-sw8sc (not sent) different in any way from istiod-5756dc65b7-42jmx (synced) ? Any error or issues in their logs?

Nothing stood out to me apart from some duplicate serviceentry which was causing errors on all of the istiod pods

"message": "Duplicate cluster outbound|15199||localhost.service.entry found while pushing CDS"

Can you reproduce this if everyone is running 1.19.10 ? I see you have 1.17.3 (12 proxies) - that's more than 2 minor version behind

we do not have control on the proxy pods, they are brought to the new version when their deployment is updated. Not sure if i can reproduce this anyway

from istio.

howardjohn avatar howardjohn commented on May 26, 2024

Do you have full envoy proxy logs for the proxies with the issue

from istio.

bseenu avatar bseenu commented on May 26, 2024

Attached, [ removed the actual traffic logs and the messages of connected to upstream XDS server istiod ]
log.txt

from istio.

bseenu avatar bseenu commented on May 26, 2024

The above problem was fixed it was on one of the destination rule, whose proxy was stuck ( envoy was not coming up ).

from istio.

howardjohn avatar howardjohn commented on May 26, 2024

Do you mean you fixed the maglev error and still see the original issue of clusters not sent? or both are fixed

from istio.

bseenu avatar bseenu commented on May 26, 2024

Yes, the maglev issue is fixed now, but the cluster updates are still not happening unless the proxies are restarted

from istio.

bseenu avatar bseenu commented on May 26, 2024

i still have some pods which i have not restarted and are broken

from istio.

bseenu avatar bseenu commented on May 26, 2024

@howardjohn any thing else we can check here ?

from istio.

howardjohn avatar howardjohn commented on May 26, 2024

@bseenu its a bit hard to tell because it could easily just be because of the maglev issue. It would help if you could reproduce it now that that error isn't present, and include the istiod logs

from istio.

PrabhdeepsGill avatar PrabhdeepsGill commented on May 26, 2024

Adding a bit more information about this issue and steps to reproduce the issue:

I looked at two proxies from the same ingress, where one pod was not getting CDS updates sent to it (replica jsnck)

Info from /debug/syncz endpoint about two proxies

 {
    "cluster_id": "Kubernetes",
    "proxy": "istio-internal-ingressgateway-ff7f456dd-252qg.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "cluster_sent": "224c32c9-6776-4997-8d72-59648893b2c2",
    "cluster_acked": "224c32c9-6776-4997-8d72-59648893b2c2",
    "listener_sent": "3f6b051c-5ca7-4356-92cc-82349d2a785e",
    "listener_acked": "3f6b051c-5ca7-4356-92cc-82349d2a785e",
    "route_sent": "f5ab51b2-25ce-4b43-8928-57c4885391db",
    "route_acked": "f5ab51b2-25ce-4b43-8928-57c4885391db",
    "endpoint_sent": "dc1aa3aa-8672-4eb6-b5fd-3e5172b18a33",
    "endpoint_acked": "dc1aa3aa-8672-4eb6-b5fd-3e5172b18a33"
  },
  {
    "cluster_id": "Kubernetes",
    "proxy": "istio-internal-ingressgateway-ff7f456dd-jsnck.istio-system",
    "proxy_type": "router",
    "istio_version": "1.19.7",
    "listener_sent": "97f4c48a-b8da-4a3c-8f3a-3161401022a5",
    "listener_acked": "97f4c48a-b8da-4a3c-8f3a-3161401022a5",
    "route_sent": "ef2a6589-3f4e-47a7-bffc-cfe2a88d3e92",
    "route_acked": "ef2a6589-3f4e-47a7-bffc-cfe2a88d3e92",
    "endpoint_sent": "b4e7e726-7a18-4191-b8bb-bf791abb8de9",
    "endpoint_acked": "b4e7e726-7a18-4191-b8bb-bf791abb8de9"
  },

Looking at the envoy config dump from the "stuck" replica, We can see, that it still has the incorrect maglev table size 500 last updated 4/1. Even though we had updated the DestinationRule to maglev table size 547 (prime number as required by envoy doc)

"dynamic_warming_clusters": [
    {
     "version_info": "2024-04-03T22:17:04Z/3068",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "outbound|8080||svc-a.namespace-a.svc.cluster.local",
      "type": "EDS",
      "eds_cluster_config": {
       "eds_config": {
        "ads": {},
        "initial_fetch_timeout": "0s",
        "resource_api_version": "V3"
       },
       "service_name": "outbound|8080||svc-a.namespace-a.svc.cluster.local"
      },
      "connect_timeout": "10s",
      "lb_policy": "MAGLEV",
      "metadata": {
       "filter_metadata": {
        "istio": {
         "services": [
          {
           "name": "svc-a",
           "host": "svc-a.namespace-a.svc.cluster.local",
           "namespace": "namespace-a"
          }
         ],
         "config": "/apis/networking.istio.io/v1alpha3/namespaces/namespace-a/destination-rule/svc-a-transcode-routing"
        }
       }
      },
      "common_lb_config": {
       "locality_weighted_lb_config": {}
      },
      "filters": [
       {
        "name": "istio.metadata_exchange",
        "typed_config": {
         "@type": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
         "protocol": "istio-peer-exchange"
        }
       }
      ],
      "transport_socket_matches": [],
      "maglev_lb_config": {
       "table_size": "500"
      }
     },
     "last_updated": "2024-04-03T22:17:05.266Z"
    }
]

from istio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.