Comments (15)
Discussed offline very briefly with @howardjohn .
@hzxuzhonghu WDYT?
from istio.
How is "skewed" determined?
Say I have 3 zones. If I have 3/3/3 pods in each, is it "skewed"? I am only sending to 1/3 of the pods otherwise.
Is it 1/4/4? 1/1/7?
What if I am 3/3/3 but all the clients happen to be in one zone?
from istio.
How is "skewed" determined?
It is determined based on skew_factor
if it is set to 2, if any zone has > 2x pods compared to any other zone, it is considered skewed
What if I am 3/3/3 but all the clients happen to be in one zone?
If your 3 pods can handle all client requests happily - no change. If they can not handle and hpa is triggered , if hpa results in skew - we get in to disable locality mode
from istio.
The thing that is confusing to me.
Why is 3/3/3
ok, but 3/6/0
we should start change?
From the local zone its the same - 3 pods (33%) get traffic
from istio.
Why is 3/3/3 ok, but 3/6/0 we should start change?
That is a good point. The fact it went from even distribution(3/3/3/) assuming original distribution was good to 3/6/0 is a "hint" that tells us some thing changed via hpa. I do not know if we can compare the previous vs. current distribution to identify the skew. I know it is not perfect but my idea is to create this issue so that we can discuss and see if we can solve this.
from istio.
I feel like this is not the responsibility of a load balancer TBH, but of the scheduler to schedule pods where they are required
from istio.
I agree with john's point here. It is the scheduler not loadbalancer who should be in charge of this. LB is working as expected. Yes, sure it lacks the capability to be aware of the server load. There is a issues tracking this in envoy i think.
from istio.
I agree with the point that it is schedulers responsibility. But given how HPA works, I am trying to see if Load Balancer can be intelligent to handle this case.
Yes, sure it lacks the capability to be aware of the server load.
It is not just server load but combination of server load + the zone in which it is scheduled. So similar to how we fallback to other regions when all endpoints are unhealthy, I think it would be good to have a mechanism in Load Balancer that spills over to other zones if the current zone's pods are overloaded (not just unhealthy). Ofcourse the solution I proposed just assumes "skew" as an indicator for overload which is not correct in all cases.
Can you point me to envoy issue if you have it handy?
from istio.
The concern I have is I cannot come up with a reasonable algorithm for when we should starting spilling over due to skew that solves your use case, isn't stateful, and isn't just "round robin"..
from istio.
I do not think switching to round robin would help unless we disable locality load balancer unless I am missing some thing in your proposal.
from istio.
@ramaraochavali envoyproxy/envoy#6614 If you want more intelligent lb, this is the right requirement
from istio.
I do not think it will work when locality load balancer is enabled. When locality load balancer is enabled, we pick nodes based on priority first and once priority is picked we apply the load balancer (least request, round robin, cost aggregated etc). Does not it pick nodes in the same zone and apply this cost logic?
from istio.
Sure, locality load balancer collaborate with some other unimplemented algorithms could solve this case
from istio.
I think the key is to disable locality load balancer or make it behave in away to spill over traffic to other zones when we detect some abnormality in pod scheduling/load if enabled. How can we correctly detect is the question
from istio.
BTW, the skew above is similar to how k8s evaluates max_skew in PodTopologyConstraints. This is an interesting article on how even with PodTopologyConstraints can result in skew https://medium.com/wise-engineering/avoiding-kubernetes-pod-topology-spread-constraint-pitfalls-d369bb04689e - especially during scale down
from istio.
Related Issues (20)
- Documentation Improvement Request: Multi-Primary Cluster Set-Up and Service Discovery
- Incompletely support of traffic.sidecar.istio.io/excludeOutboundIPRanges while upgrading istio from version 1.15.2 to 1.20 HOT 7
- istioctl x waypoint generate is missing the `--for` config HOT 3
- [release-1.22] Update gateway-api version for conformance HOT 1
- [release-1.21] Fix gateway deployment on OpenShift HOT 1
- istioctl analyze shows IST0133 "addresses are required" for ServiceEntry objects even when ISTIO_META_DNS_AUTO_ALLOCATE is enabled
- ztunnel-config might not be printing out ztunnel data correctly HOT 6
- disable opencensus HOT 12
- [release-1.21] support for disabling metrics endpoint from outside of pod HOT 1
- Add "Header manipulation rules" to mirrors
- evnoy c-ares cause dns reslove failed HOT 1
- Support for connection via UNIX Domain Socket(UDS) when sending trace spans from istio-proxy to Datadog Agent. HOT 3
- ServiceEntry is causing SSL validation errors when acting as a TLS passthrough to an external IPv4 HTTPS service on an IPv6 cluster HOT 3
- how can i achieve sidecar version rollout for particular workload (deployment)
- OpenCensus Agent not honoring a deny sampling decision for traces HOT 9
- reserve none for use-waypoint
- [release-1.22] Use label istio.io/dataplane-mode=disabled for pod-level ambient opt-out
- Improve UX for workload captured waypoint HOT 1
- Noisy attempts to use extended fields when on standard Gateway API CRDs HOT 1
- Expose PadForwardPayloadHeader in JWTRule HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from istio.