GithubHelp home page GithubHelp logo

Comments (15)

ramaraochavali avatar ramaraochavali commented on June 16, 2024

Discussed offline very briefly with @howardjohn .

@hzxuzhonghu WDYT?

from istio.

howardjohn avatar howardjohn commented on June 16, 2024

How is "skewed" determined?

Say I have 3 zones. If I have 3/3/3 pods in each, is it "skewed"? I am only sending to 1/3 of the pods otherwise.

Is it 1/4/4? 1/1/7?

What if I am 3/3/3 but all the clients happen to be in one zone?

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

How is "skewed" determined?

It is determined based on skew_factor if it is set to 2, if any zone has > 2x pods compared to any other zone, it is considered skewed

What if I am 3/3/3 but all the clients happen to be in one zone?

If your 3 pods can handle all client requests happily - no change. If they can not handle and hpa is triggered , if hpa results in skew - we get in to disable locality mode

from istio.

howardjohn avatar howardjohn commented on June 16, 2024

The thing that is confusing to me.

Why is 3/3/3 ok, but 3/6/0 we should start change?

From the local zone its the same - 3 pods (33%) get traffic

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

Why is 3/3/3 ok, but 3/6/0 we should start change?

That is a good point. The fact it went from even distribution(3/3/3/) assuming original distribution was good to 3/6/0 is a "hint" that tells us some thing changed via hpa. I do not know if we can compare the previous vs. current distribution to identify the skew. I know it is not perfect but my idea is to create this issue so that we can discuss and see if we can solve this.

from istio.

howardjohn avatar howardjohn commented on June 16, 2024

I feel like this is not the responsibility of a load balancer TBH, but of the scheduler to schedule pods where they are required

from istio.

hzxuzhonghu avatar hzxuzhonghu commented on June 16, 2024

I agree with john's point here. It is the scheduler not loadbalancer who should be in charge of this. LB is working as expected. Yes, sure it lacks the capability to be aware of the server load. There is a issues tracking this in envoy i think.

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

I agree with the point that it is schedulers responsibility. But given how HPA works, I am trying to see if Load Balancer can be intelligent to handle this case.

Yes, sure it lacks the capability to be aware of the server load.

It is not just server load but combination of server load + the zone in which it is scheduled. So similar to how we fallback to other regions when all endpoints are unhealthy, I think it would be good to have a mechanism in Load Balancer that spills over to other zones if the current zone's pods are overloaded (not just unhealthy). Ofcourse the solution I proposed just assumes "skew" as an indicator for overload which is not correct in all cases.

Can you point me to envoy issue if you have it handy?

from istio.

howardjohn avatar howardjohn commented on June 16, 2024

The concern I have is I cannot come up with a reasonable algorithm for when we should starting spilling over due to skew that solves your use case, isn't stateful, and isn't just "round robin"..

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

I do not think switching to round robin would help unless we disable locality load balancer unless I am missing some thing in your proposal.

from istio.

hzxuzhonghu avatar hzxuzhonghu commented on June 16, 2024

@ramaraochavali envoyproxy/envoy#6614 If you want more intelligent lb, this is the right requirement

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

I do not think it will work when locality load balancer is enabled. When locality load balancer is enabled, we pick nodes based on priority first and once priority is picked we apply the load balancer (least request, round robin, cost aggregated etc). Does not it pick nodes in the same zone and apply this cost logic?

from istio.

hzxuzhonghu avatar hzxuzhonghu commented on June 16, 2024

Sure, locality load balancer collaborate with some other unimplemented algorithms could solve this case

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

I think the key is to disable locality load balancer or make it behave in away to spill over traffic to other zones when we detect some abnormality in pod scheduling/load if enabled. How can we correctly detect is the question

from istio.

ramaraochavali avatar ramaraochavali commented on June 16, 2024

BTW, the skew above is similar to how k8s evaluates max_skew in PodTopologyConstraints. This is an interesting article on how even with PodTopologyConstraints can result in skew https://medium.com/wise-engineering/avoiding-kubernetes-pod-topology-spread-constraint-pitfalls-d369bb04689e - especially during scale down

from istio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.