GithubHelp home page GithubHelp logo

kubernetes-sigs / cluster-proportional-vertical-autoscaler Goto Github PK

View Code? Open in Web Editor NEW
47.0 10.0 35.0 16.17 MB

License: Apache License 2.0

Makefile 28.23% Shell 5.08% Go 66.69%
k8s-sig-network

cluster-proportional-vertical-autoscaler's Issues

Add support for arm64 images

What would you like to be added:

Images build and published through this repository are only amd64 compatible. We should enable a multi-arch docker build to also offer arm64 images.

Support for Statefulset

We would like to have a autoscaler for statefulset which scales based on number of nodes. I have a couple of scenario in my architecture where I think it should be useful. I am not aware of any such solution available in OSS. I think it would be good idea to extend this for statefulset. I am happy to help out with the PR but I would like to know any concerns regarding this proposal that I should be aware of.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Vertical autoscaler doesn't support apps/v1

Noticed the issue when trying to upgrade Calico for latest k8s and saw vertical autoscaler is crash looping. It seems that current vertical autoscaler doesn't support apps/v1.

CPVPA fails if some API is not discoverable

If there are APIServices that are not discoverable then the CPVPA is crashlooping:

$ kubectl get apiservice  | grep metrics-ad
v1beta1.custom.metrics.k8s.io          kube-system/kube-metrics-adapter   False (MissingEndpoints)   26d
v1beta1.external.metrics.k8s.io        kube-system/kube-metrics-adapter   False (MissingEndpoints)   26d

$ kubectl -n kube-system logs calico-typha-vertical-autoscaler-5557c6d7d-2sd7c
I0420 04:39:57.408180       1 autoscaler.go:46] Scaling namespace: kube-system, target: deployment/calico-typha-deploy
E0420 04:39:59.408093       1 autoscaler.go:49] failed to discover apigroup for kind "Deployment": unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request, external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Is this desired behaviour? If yes, why does the CPVPA need to discover the full API?

Cannot patch apps/v1 deployment

In the logs I see the following entry repeated:

E1018 09:08:03.582625       1 autoscaler_server.go:153] Update failure: patch failed: Deployment.apps "foo" is invalid: spec.template.spec.containers[0].image: Required value

Kubernetes version - v1.16.2, v1.15.6
Deployment apiVersion - apps/v1
kubernetes-incubato/cluster-proportional-vertical-autoscaler version - v0.8.1 (k8s.gcr.io/cpvpa-amd64:v0.8.1)

cpva fails with "unknown target kind: Tap"

What happened:
cpva fails to start when deployment is available under multiple API groups.

How to reproduce it (as minimally and precisely as possible):

  1. Install linkerd. See https://linkerd.io/2/getting-started/

  2. Ensure that there are multiple API groups serving resource deployments:

$ k api-resources
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
# ...
daemonsets                        ds           apps                           true         DaemonSet
deployments                       deploy       apps                           true         Deployment
# ...
daemonsets                        ds           tap.linkerd.io                 true         Tap
deployments                       deploy       tap.linkerd.io                 true         Tap
  1. Ensure that cpva fails with
$ k logs cpva -n kube-system
I0217 20:41:29.699612       1 autoscaler.go:46] Scaling namespace: kube-system, target: deployment/calico-typha-deploy
E0217 20:41:30.799782       1 autoscaler.go:49] unknown target kind: Tap

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T23:41:24Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Image: k8s.gcr.io/cpvpa-amd64:v0.8.1

Multiple targets for scale up

Currently as far as I understand the component allows scaling only 1 target per deployment i.e. 1 instance of CPVA can scale 1 deployment/replicaset. Can we allow specifying a list of targets in --targets along with a list of configmap, this will save us resources in launching separate containers for each scaling requirement. I can help in implementing the same, I would like to know if there is any concern or suggestion around this feature.

CHangelog not updated

cluster-proportional-vertical-autoscaler changelog is not updated for current release
last updated release was 0.0.0 and current release is 0.8.3

Log necessary info on scale event and avoid logging when doing nothing

The autoscaler currently doesn't do a very good job on logging from what we have observed. Specifically it would be helpful to log the current node/cpu count when the scale event happens, so that it would be clear how that decision is made. Also it would be great to not log too frequently when doing nothing to reduce the noise.

cc @lzang

Strategic Merge Patch changes the order of the containers list

Problem

The patch being generated from a map, which are unordered, can cause the order of the containers list to change:

func (k *k8sClient) UpdateResources(resources map[string]apiv1.ResourceRequirements) error {
ctrs := []interface{}{}
for ctrName, res := range resources {
ctrs = append(ctrs, map[string]interface{}{
"name": ctrName,
"resources": res,
})
}

This creates a perpetual diff, especially noticeable in tools monitoring drift at all time like Argo CD.

Potential solutions

I have thought of 4 potential solutions:

  1. Load the configuration as an ordered map with a library supporting it, maybe challenging to find something parsing JSON directly into it
  2. Add an optional configuration field to specify the order, for example:
"containerA": {
  // ...
  "order": 1
}
"containerB": {
  // ...
  "order": 2
}
  1. Get the order from the deployment. It would not require any additional configuration from the user, but would require get permission on the deployment
  2. Change the configuration from a map to a list (breaking change):
[
  {
    "name": "containerA",
    "requests": {
      "cpu": {
        "base": "10m", "step": "1m", "coresPerStep": 1
      },
      "memory": {
        "base": "8Mi", "step": "1Mi", "coresPerStep": 1
      }
    }
  },
  {
    "name": "containerB",
    "requests": {
      "cpu": {
        "base": "250m", "step": "100m", "coresPerStep": 10
      }
    }
  }
]

Additional information

This is similar to the issue faced here: kubernetes/kubernetes#62830

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.