hjacobs / kube-downscaler Goto Github PK

View Code? Open in Web Editor NEW

521.0 13.0 91.0 489 KB

Scale down Kubernetes deployments after work hours

Home Page: https://hub.docker.com/r/hjacobs/kube-downscaler

License: GNU General Public License v3.0

Makefile 0.73% Python 97.51% Dockerfile 0.59% HTML 1.16%

kubernetes scaling

kube-downscaler's People

Contributors

Stargazers

Watchers

Forkers

muaazsaleem stafot azman0101 voltaireapp amriaz diogouchoas lemaral nachomillangarcia jamessantos10 opengov sakomws arjunrn mrgrazy aqui74s aoepeople obiordu jhinds ecosia dh-harald aroundus-inc rewardinsight andrespineros glennji shreben alanbover pedromstavares rsicart davejfranco arunbpt7 domaingrouposs starchen nvtkaszpir hugocosme ekirmayer valferon arunkumar-vn kevingessner sekka1 lord-y vijayraavi geobos23 whogan00 4m3ndy guikcd vishesh92 absolutarin runningman84 sadpdtchr gregsidelinger usman396 somaliz tylern91 avaussant arjun921 kostyrev man4ester srinivma1 mercantiandrea graillus 4sh nirroz93 pramine qafro1 pmady jalawala propalparolnapervom andrew-drigola aymensegni ahublersos pyrge pyryviitaaho elmeri saikumar843c chatcharoen ranjanprj lukkcgcg vasu4900 ka4a syllogy anborg marcosborges moorthy156 forky-mcforkface jiwan8985 diepes iq-scm as-polyakov ojaoferreira abhijeetmohanan

kube-downscaler's Issues

Add events to deployment when action is taken

It would be nice if an event was generated each time a deployment was scaled up/down.

Document required permissions / RBAC

The downscaler needs read and write access to the Kubernetes API (deployment resources). Document the needed configuration.

If uptime and downtime overlap the replicaset will be scaled down

kube-downscaler/kube_downscaler/scaler.py

Line 53 in 1d5da01

 is_uptime = helper.matches_time_spec(now, uptime) and not helper.matches_time_spec(now, downtime) 

In case uptime and downtime overlap this line will evaluate to is_uptime = false resulting in the replica sets to be scaled down.
I wonder if this is excepted/wanted behavior. In case of doubt (e.g. conflicing configuration) I'd rather keep a replica set up and running instead of shutting it down :)

Connection refused to API server (10.3.01) after rolling master nodes

We see some "connection refused" errors after rolling master nodes. To be investigated.

2019-06-05 09:16:53,927 INFO: Downscaler v0.14 started with debug=False, default_downtime=never, default_uptime=always, downscale_period=never, downtime_replicas=0, dry_run=False, exclude_deployments=kube-downscaler,downscaler,postgres-operator, exclude_namespaces=kube-system,visibility, exclude_statefulsets=, grace_period=900, interval=30, kind=['deployment', 'stack', 'deployment'], namespace=None, once=False, upscale_period=never
2019-06-05 09:18:55,888 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
     (self._dns_host, self.port), self.timeout, **extra_kw)
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
     raise err
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
     sock.connect(sa)
 ConnectionRefusedError: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
     chunked=chunked)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
     self._validate_conn(conn)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
     conn.connect()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
     conn = self._new_conn()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
     self, "Failed to establish a new connection: %s" % e)
 urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
     timeout=timeout
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
     _stacktrace=sys.exc_info()[2])
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
     raise MaxRetryError(_pool, url, error or ResponseError(cause))
 urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/kube_downscaler/main.py", line 41, in run_loop
     dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
   File "/kube_downscaler/scaler.py", line 159, in scale
     forced_uptime = pods_force_uptime(api, namespace)
   File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
     for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
     return iter(self.query_cache["objects"])
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
     cache["response"] = self.execute().json()
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
     r = self.api.get(**kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
     return self.session.get(*args, **self.get_kwargs(**kwargs))
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
     return self.request('GET', url, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
     r = adapter.send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
     response = self._do_send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
     raise ConnectionError(e, request=request)
 requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
 2019-06-05 16:12:36,328 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
     (self._dns_host, self.port), self.timeout, **extra_kw)
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
     raise err
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
     sock.connect(sa)
 ConnectionRefusedError: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
     chunked=chunked)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
     self._validate_conn(conn)
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
     conn.connect()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
     conn = self._new_conn()
   File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
     self, "Failed to establish a new connection: %s" % e)
 urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
     timeout=timeout
   File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
     _stacktrace=sys.exc_info()[2])
   File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
     raise MaxRetryError(_pool, url, error or ResponseError(cause))
 urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/kube_downscaler/main.py", line 41, in run_loop
     dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
   File "/kube_downscaler/scaler.py", line 159, in scale
     forced_uptime = pods_force_uptime(api, namespace)
   File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
     for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
     return iter(self.query_cache["objects"])
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
     cache["response"] = self.execute().json()
   File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
     r = self.api.get(**kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
     return self.session.get(*args, **self.get_kwargs(**kwargs))
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
     return self.request('GET', url, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
     r = adapter.send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
     response = self._do_send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
     raise ConnectionError(e, request=request)
 requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))

downscaler/force-uptime: "false" is not working when defined in a namespace resource

When configuring downscaler/force-uptime as "false" in a namespace resource , the uptime is still enforced.

apiVersion: v1
kind: Namespace
metadata:
name: ns1
annotations:
downscaler/uptime: "Mon-Fri 07:30-19:30 CET"
downscaler/force-uptime: "false"

I would expect this configuration to downscale outside the given uptime interval

Downscale not working as expected?

This is probably me not understanding how this works but it doesnt seem to be downscaling per my schedule.

I have put the labels on my namespace:

kubectl -n gar get ns gar -o yaml 
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    downscaler/downtime: Sat-Sun 03:00-03:00 UTC
  creationTimestamp: "2019-05-06T18:54:24Z"
  labels:
    name: gar
  name: gar
  resourceVersion: "87553798"
  selfLink: /api/v1/namespaces/gar
  uid: 5da3c6c7-7030-11e9-a1fa-0ec4fa901d62
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

The date on the downscaler:

kubectl -n kube-downscaler exec kube-downscaler-6969d86595-7g8dg date
Sat Nov  9 19:38:11 UTC 2019

I would have expected no pods to be running in this namespace:

kubectl -n gar get pods -o wide    
NAME                            READY   STATUS             RESTARTS   AGE   IP               NODE                            NOMINATED NODE
tornado-fb459c7f9-hb9v2         1/1     Running            0          38h   100.104.143.20   ip-172-17-50-252.ec2.internal   <none>
webapp-nginx-85fcf96f7f-h9vl4   1/2     CrashLoopBackOff   461        38h   100.98.154.12    ip-172-17-50-178.ec2.internal   <none>
webapp-nginx-85fcf96f7f-whfh9   1/2     CrashLoopBackOff   253        21h   100.104.146.22   ip-172-17-52-214.ec2.internal   <none>

I dont seen any logs with errors in the down-scaler:

2019-11-09 19:36:58,025 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:36:58,032 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:36:58,033 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)
2019-11-09 19:38:04,135 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:38:04,135 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:38:04,140 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:38:04,140 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)
2019-11-09 19:39:10,430 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:39:10,430 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:39:10,433 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:39:10,434 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)

kube-downscaler pod:

Containers:
  kube-downscaler:
    Container ID:  docker://ceb7b61de9d170a0e39d04697b8fc2887902b8a6a0b444e65d39496e48982178
    Image:         hjacobs/kube-downscaler:19.10.1
    Image ID:      docker-pullable://hjacobs/kube-downscaler@sha256:5f7d1e7fa9b58ac8af5e6685d725a549bf75ec58091d064afcc98acf03bc2510
    Port:          <none>
    Host Port:     <none>
    Args:
      --interval=60
      --debug
      --debug
    State:          Running
      Started:      Thu, 07 Nov 2019 14:01:49 -0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 02 Nov 2019 11:15:41 -0700
      Finished:     Thu, 07 Nov 2019 14:00:59 -0800
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     50m
      memory:  200Mi
    Requests:
      cpu:        50m
      memory:     200Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-downscaler-token-zg95h (ro)
Conditions:

Am I doing something wrong here? I would expected it to downscale all the deployment in the namespace.

Helm Chart contains sumologic.com annotations

The proprietary sumologic.com annotations should be removed from the Helm chart.

Scale up number of replicas

When scaling back up during uptime, does it remember the previous value of REPLICA before the scaledown for each deployment?

specify absolute scaling schedules

In addition to the supported weekday+time scaling schedules, kube-downscaler should support absolute time ranges for uptime and downtime periods. I imagine these would be specified as either a range of numeric unix timestamps or a string format like ISO-8601.

For example, downscaler/uptime: "2019-10-03T14:00:00+00:00-2019-10-04T02:00:00+00:00" would specify scaling for precisely that period from 2pm UTC October 3 to 2am UTC October 4.

Downscale and upscale period annotation frome resources not taken in consideration

if no value for period is defined on namespace level or before, annotation on deployment is not take in consideration. The following statement is never satisfied:

elif upscale_period != 'never' or downscale_period != 'never':

quick fix:

 def autoscale_resource(resource: pykube.objects.NamespacedAPIObject, upscale_period: str,       downscale_period: str,
                   default_uptime: str, default_downtime: str, forced_uptime: bool, dry_run: bool,
                   now: datetime.datetime, grace_period: int, downtime_replicas: int,    namespace_excluded=False):

try:
    exclude = namespace_excluded or ignore_resource(resource)
    original_replicas = resource.annotations.get(ORIGINAL_REPLICAS_ANNOTATION)
    downtime_replicas = int(resource.annotations.get(DOWNTIME_REPLICAS_ANNOTATION, downtime_replicas))

    if exclude and not original_replicas:
        logger.debug('%s %s/%s was excluded', resource.kind, resource.namespace, resource.name)
    else:
        replicas = resource.replicas
        ignore = False
        upscale_period = resource.annotations.get(UPSCALE_PERIOD_ANNOTATION, upscale_period)
        downscale_period = resource.annotations.get(DOWNSCALE_PERIOD_ANNOTATION, downscale_period)
        if forced_uptime or (exclude and original_replicas):
            uptime = "forced"
            downtime = "ignored"
            is_uptime = True
        elif upscale_period != 'never' or downscale_period != 'never':
            uptime = upscale_period
            downtime = downscale_period
            if helper.matches_time_spec(now, upscale_period) and helper.matches_time_spec(now, downscale_period):
                logger.debug('Upscale and downscale periods overlap, do nothing')
                ignore = True
            elif helper.matches_time_spec(now, upscale_period):
                is_uptime = True
            elif helper.matches_time_spec(now, downscale_period):
                is_uptime = False
            else:
                ignore = True
        else:
            uptime = resource.annotations.get(UPTIME_ANNOTATION, default_uptime)
            downtime = resource.annotations.get(DOWNTIME_ANNOTATION, default_downtime)
            is_uptime = helper.matches_time_spec(now, uptime) and not helper.matches_time_spec(now, downtime)

        update_needed = False

        if not ignore and is_uptime and replicas == downtime_replicas and original_replicas and int(original_replicas) > 0:
            logger.info('Scaling up %s %s/%s from %s to %s replicas (uptime: %s, downtime: %s)',
                        resource.kind, resource.namespace, resource.name, replicas, original_replicas,
                        uptime, downtime)
            resource.replicas = int(original_replicas)
            resource.annotations[ORIGINAL_REPLICAS_ANNOTATION] = None
            update_needed = True
        elif not ignore and not is_uptime and replicas > 0 and replicas > int(downtime_replicas):
            target_replicas = int(resource.annotations.get(DOWNTIME_REPLICAS_ANNOTATION, downtime_replicas))
            if within_grace_period(resource, grace_period, now):
                logger.info('%s %s/%s within grace period (%ds), not scaling down (yet)',
                            resource.kind, resource.namespace, resource.name, grace_period)
            else:

                logger.info('Scaling down %s %s/%s from %s to %s replicas (uptime: %s, downtime: %s)',
                            resource.kind, resource.namespace, resource.name, replicas, target_replicas,
                            uptime, downtime)
                resource.annotations[ORIGINAL_REPLICAS_ANNOTATION] = str(replicas)
                resource.replicas = target_replicas
                update_needed = True
        if update_needed:
            if dry_run:
                logger.info('**DRY-RUN**: would update %s %s/%s', resource.kind, resource.namespace, resource.name)
            else:
                resource.update()
except Exception as e:
    logger.exception('Failed to process %s %s/%s : %s', resource.kind, resource.namespace, resource.name, str(e))

Suspending CronJob will break CronJobs with startingDeadlineSeconds

The code sets startingDeadlineSeconds to zero:

kube-downscaler/kube_downscaler/scaler.py

Line 226 in 80aee07

resource.obj["spec"]["startingDeadlineSeconds"] = 0

This will break CronJobs. Kubernetes docs: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline

Move local helm chart to the official chart repository

Would you accept that I start a PR on https://github.com/helm/charts to move the chart from your local repository to the official one ?

This would be more convenient for our deploy flow and also give more visibility to your project.

Is it possible to have multiple kinds (as deployment and statefulset for instance)?

Trying to configure multiple kinds at the args but couldn't find the way. Is it feasible? Thanks!

Not all options are configurable in Helm Chart

Only interval, namespace and debug are configurable in the current helm chart, while more command line options are available. It would be grateful if all command line options are supported in the helm chart.

Feature request: Annotation on Namespace

Hi,

We have created for our developers an ephemeral instance concept. If a Demand or Service Manager wants to test a new feature we can deploy an ephemeral instance of our architecture incl some microservices or all. There you can also specify if this environment should be deleted after a time or should be available unlimited.

For the unlimited instances I'm searching for a feature to scale down this environment.
The complete ephemeral instance will be deployed in one namespace so it would be great to have an annotation on our namespace to define that all deployments or statefulsets are downscaled during non business hours.

I think if I look into the source code it seems you are only supporting statefulsets and deployments but not all statefulsets and deployments in a namespace.

What do you think?

Best regards,
Björn

Default original replicas flag

First, thanks for the project, it has helped us a lot.

It would be nice to be able specify a "--default-original-replicas" flag. This way if a new pod is added and someone forgets to add the "original-replicas" annotation, it will be recreated anyways.

Add namespace to RBAC clusterrole (v0.8+)

0.8 release added the namespace annotation support for uptime/downtime but didn't add namespace to clusterrole :)

Option to not touch newly started deployments

Would be nice to have an option or an annotation to exclude certain deployments that were recently created.
This is useful for running end-to-end tests on weekends. (System under test is deployed right before the tests run, so setting this annotation to something like 1 hour will give some time and then stop it when the tests are likely to be over).

Add support for suspending cronjobs

It would be great if it was possible to suspend cronjobs based on an annotation at the cronjob entity or inherited from the namespace entity.
(I'll try to contribute this feature as a PR if this sounds useful to others...)

Suggestion: make recurring spec annotation compat with rrules

What about supporting rfc5545 rrule in annotations ?

There is a use-case:

To implement a loosely coupled support of calendar (eg. Google Calendar) based scheduling of kube-downscale. This would be great if kube-downscaler directly support rrules.

What do you think about it @hjacobs ?

Regards

kube-downscaler with hpa

I want to understand how will this work with horizontal pod autoscaler as we already have it deployed on our environment. from what i understand this changes deployment number and hpa also works with min and max number of deployments so wont the autoscaler override the changes done by downscaler at any time ? is there a way to make it work with autoscaler or are these supposed to be exclusive of each other.

Error with helm chart if I use value namespace.active_in

When use value namespace.active_in with helm chart the pod start with the namespaces between "namespace_name" and this presence of " is not managed in the code.

Install Helm chart
helm install kube-downscaler incubator/kube-downscaler --version 0.4.0 --namespace kube-system --values ../tmp/incubator_kube-downscaler/0.4.0/incubator-kube-downscaler-profile.yaml

Use this value file

../tmp/incubator_kube-downscaler/0.4.0/incubator-kube-downscaler-profile.yaml
namespace:
active_in: dev
replicaCount: 1
image:
tag: 20.2.0
debug:
enable: True

Check deployment

k get deploy -n kube-system kube-downscaler -o yaml
...
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: kube-downscaler
app.kubernetes.io/name: kube-downscaler
spec:
containers:
- args:
- --interval=60
- --namespace="dev"
- --debug
image: hjacobs/kube-downscaler:20.2.0
imagePullPolicy: IfNotPresent
name: kube-downscaler
resources: {}
...

Error in logs: /%22dev%22/

k logs --follow -n kube-system kube-downscaler-68896fc54b-8f4vd
2020-03-20 14:11:40,145 DEBUG: Starting new HTTPS connection (1): 10.100.0.1
2020-03-20 14:11:40,153 DEBUG: https://10.100.0.1:443 "GET /api/v1/namespaces/%22dev%22/pods HTTP/1.1" 200 134
2020-03-20 14:11:40,159 DEBUG: https://10.100.0.1:443 "GET /apis/apps/v1/namespaces/%22dev%22/deployments HTTP/1.1" 200 159

If I edit the deployment

k edit deploy -n kube-system kube-downscaler
...
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: kube-downscaler
app.kubernetes.io/name: kube-downscaler
spec:
containers:
- args:
- --interval=60
- --namespace=dev
- --debug
image: hjacobs/kube-downscaler:20.2.0
imagePullPolicy: IfNotPresent
name: kube-downscaler
resources: {}
...

Now it works: /apis/apps/v1/namespaces/dev/deployments HTTP/1.1" 200 None

k logs --follow -n kube-system kube-downscaler-7c7bdd79bd-w7jfz
2020-03-20 14:17:38,029 DEBUG: Starting new HTTPS connection (1): 10.100.0.1
2020-03-20 14:17:38,044 DEBUG: https://10.100.0.1:443 "GET /api/v1/namespaces/dev/pods HTTP/1.1" 200 None
2020-03-20 14:17:38,052 DEBUG: https://10.100.0.1:443 "GET /apis/apps/v1/namespaces/dev/deployments HTTP/1.1" 200 None

downscaler and preemptible nodes

With GKE preemptible nodes every node gets restarted at least once in 24h.
If I set up downscaler like

  tag: 0.15
  args: 
     - --default-uptime=Mon-Fri 07:00-19:00 Europe/Berlin
     - --downtime-replicas=1

it scales down at the end of the day, remembers original replicas but when the downscaler pod restarts (with it's node) it gets somehow forgotten. resulting in everything stuck on 1 replicas even when uptime comes.

I could reproduce this by simply restarting downscaler with a different uptime setting (forcing it to upscale) after a downscale happened and experienced the same behaviour.

Do you have any idea how to get it work?

Comma separated time definitions

in the README its says:

Time definitions (e.g. DEFAULT_UPTIME) accept a comma separated list of specifications

but having given a comma separated list of specifications I get the following error:

2020-04-10 08:40:36,720 ERROR: Failed to process StatefulSet test/foo : Time spec value "Mon-Thur 09:00-20:00 Europe/London,Fri-Fri 10:00-18:00 Europe/London" does not match format ("Mon-Fri 06:30-20:30 Europe/Berlin" or"2019-01-01T00:00:00+00:00-2019-01-02T12:34:56+00:00")
Traceback (most recent call last):
  File "/kube_downscaler/scaler.py", line 129, in autoscale_resource
    is_uptime = helper.matches_time_spec(
  File "/kube_downscaler/helper.py", line 36, in matches_time_spec
    raise ValueError(
ValueError: Time spec value "Mon-Thur 09:00-20:00 Europe/London,Fri-Fri 10:00-18:00 Europe/London" does not match format ("Mon-Fri 06:30-20:30 Europe/Berlin" or"2019-01-01T00:00:00+00:00-2019-01-02T12:34:56+00:00")

Does it work with HPA Deployment?

I want to use this with HPA (Horizontal Pod Autoscaler) Deployment, but I think that it will not work as long as I see this code.
In order to make this work, I think that it should update spec.minReplicas and spec.maxReplicas to 0.

Can you implement that?
Thanks.

Fails to call API

We run kubes 1.9 with the v1beta2 API. We can installed with rbac, etc but the system is trying to call v1beta1 and gets access denied.

args.exclude_deployments.split(','), dry_run=args.dry_run)
File "/kube_downscaler/main.py", line 63, in autoscale
for deploy in deployments:
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 133, in iter
return iter(self.query_cache["objects"])
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 123, in query_cache
cache["response"] = self.execute().json()
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 108, in execute
r.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://100.64.0.1:443/apis/extensions/v1beta1/deployments

Is there anything we can change?

More scaled down than scaled up

Hi,

first of all, thank you for your project!

I've tried kube-downscaler in a nearly empty cluster. Here are the logs:

2019-03-12 12:52:11,615 INFO: Downscaler v0.9 started with debug=False, default_downtime=never, default_uptime=Mon-Fri 07:30-20:00 Europe/Berlin, dry_run=False, exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=kube-system, exclude_statefulsets=, grace_period=900, interval=60, kind=['deployment', 'deployment', 'statefulset'], namespace=None, once=False
2019-03-12 19:00:23,032 INFO: Scaling down Deployment authelia/authelia-app from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,045 INFO: Scaling down Deployment authelia/authelia-redis-slave from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,060 INFO: Scaling down Deployment cattle-system/cattle-cluster-agent from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,080 INFO: Scaling down Deployment logging/elasticsearch-client from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,103 INFO: Scaling down Deployment logging/elasticsearch-exporter from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,124 INFO: Scaling down Deployment logging/kibana from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,144 INFO: Scaling down Deployment logging/laas-metricbeat from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,161 INFO: Scaling down Deployment monitoring/kube-state-metrics from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,191 INFO: Scaling down StatefulSet authelia/authelia-redis-master from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,212 INFO: Scaling down StatefulSet authelia/mongo from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,231 INFO: Scaling down StatefulSet logging/elasticsearch-data from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,253 INFO: Scaling down StatefulSet logging/elasticsearch-master from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,271 INFO: Scaling down StatefulSet monitoring/alertmanager from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,290 INFO: Scaling down StatefulSet monitoring/grafana from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,310 INFO: Scaling down StatefulSet monitoring/prometheus from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,204 INFO: Scaling up Deployment authelia/authelia-app from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,216 INFO: Scaling up Deployment authelia/authelia-redis-slave from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,228 INFO: Scaling up Deployment cattle-system/cattle-cluster-agent from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,239 INFO: Scaling up Deployment logging/elasticsearch-client from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,252 INFO: Scaling up Deployment logging/elasticsearch-exporter from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,268 INFO: Scaling up Deployment logging/kibana from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,286 INFO: Scaling up Deployment logging/laas-metricbeat from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,299 INFO: Scaling up Deployment monitoring/kube-state-metrics from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,345 INFO: Scaling up StatefulSet monitoring/alertmanager from 0 to 3 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,358 INFO: Scaling up StatefulSet monitoring/grafana from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,372 INFO: Scaling up StatefulSet monitoring/prometheus from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)

As you can see: 15 x "Scaling down", but only 11 x "Scaling up". I have tried it several times, at each try this statefulsets were not scaled up:

authelia/authelia-redis-master
authelia/mongo
logging/elasticsearch-data
logging/elasticsearch-master

We use kube-downscaler version 0.9. (Kubernetes v1.12.6)

Any idea why?

downscale-period not working as expected

My understanding of the documentation is that by annotating a deployment downscaler/downscale-period="Mon-Sun 19:00-19:30 UTC", it should scale down to zero and never scale back up again. In practice, the downscaler simply ignores this deployment, as if it had no annotation at all. Am I misunderstanding the documentation?

Additionally, it would be extremely helpful if the Usage section could have a couple of examples for each command line option and annotation, and a note of their limitations. For example, I assumed that the --namespace option could take more than one namespace as a value, as the default is all namespaces, but I can't find a way of formatting a list of namespaces that it understands, making me think it's either all or just one, like kubectl commands. I would be more than happy to write that extra documentation, as soon as I fully understand it myself.

[Request] List of scanned namespaces instead of exclusion

Hi,
is there a reason, why there is only --namespace argument that allows only one namespace to be specified, but is no --namespaces that would allow multiple namespaces? Instead, one has to use --exclude-namespaces argument and make sure, that every new namespace that shouldn't be monitored needs to be added to the exclusion list

New annotation downscaler/exclude-until

The existing annotation downscaler/exclude only accepts boolean true/false. It's sometimes useful to specify an absolute end time, e.g.:

to keep a deployment from being downscaled until a test run has completed
to keep application pods running until a developer's "night shift" has ended

The new annotation should accept a timestamp in one of the following formats:

2020-04-05T20:59:00Z (same format as creationTimestamp)
2020-04-05T20:59 (short version)
2020-04-05 20:59 (short version, space instead of "T")
2020-04-05 (only date, assumes 00:00 UTC)

Why support multiple date/time formats? The annotation will most probably be set by a human to keep a deployment scaled up, so it should accept common ISO formats without the user having to look up the exact format.

Only downscale, with no automatic upscale

I've read the docs and still can't judge what the correct configuration for my use case is (if it's supported at all).

What I need is to simply downscale the cluster every day at 17:00 and that's it. I don't need to bring it back up automatically. Is this use case supported?

Support Stack resources with HPA

Zalando's StackSetController supports HPA (https://github.com/zalando-incubator/stackset-controller/blob/2baddca617e2b76e34976357765206280cfd382e/pkg/core/stack_resources.go#L190).
Instead of reading the replicas value, we need to support the horizontalPodAutoscaler property:

horizontalPodAutoscaler:
  maxReplicas: 4
  metadata:
    creationTimestamp: null
  metrics:
  - resource:
      name: cpu
      targetAverageUtilization: 80
    type: Resource
  - resource:
      name: memory
      targetAverageUtilization: 80
    type: Resource
  minReplicas: 2

Feature request: Amount of replicas to downscale to

Would be nice if instead of going to 0 replicas always, it could be parameterized and be set to 1 or 2, for big clusters it make sense not to kill all pods to keep the service running.

Namespace is retrieved for every resource (one extra API call per resource)

Not really a bug, but not nice: the resource's namespace is retrieved every time again from the Kubernetes API. This is clearly visible with --debug logging:

kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,760 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,760 DEBUG: Deployment default/xxpi has 0 replicas (original: 3, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,777 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,777 DEBUG: Deployment default/foo has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,792 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,792 DEBUG: Deployment default/bar has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,808 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,808 DEBUG: Deployment default/demo has 0 replicas (original: 2, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,823 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,823 DEBUG: Deployment default/example has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)

Downscale deployment with HPA not working

I have a deployment that has a HorizontalPodAutoScaler, in this scenario the kube-downscaler change replicas to 0 but HPA change again to original value, it's generate a "looop" and de deployment never is downscaled.

I need add/change some config to this works?

downtime-replicas not working as expected

When I set downtime-replicas=2 on a sts:

the scale down works well
but the scale up is stuck to 2 instead of the inital value which is 3

Should I use another annotation ? original replicas ?

Feature request: different logic

As I mentioned earlier, I'd like to use a different scenario. I need to downscale the replica, not no upscale later, so I'm suggesting a different logic, which maybe can live the existing logic as well...
I suggest two additional parameter:
upscale_period and downscale_period. You can set a period of time (for example in my case) only for downtime.

upscale_period: never
downscale_period: "Mon-Fri 19:00-19:10 UTC"

This means, that if a resource up between 19:00 and 19:10 it will downscale it.
(that will be a same logic for the upscale as well)
Pros:

you can do my task (no upscale, just downscale)
you don't need to take care about excluding on later period (for example if the developers needs the deployment temporary)

Could you implement this (or any other method), that I can downscale only my deployment)
Thanks

Downscaler --kind appends to default value (deployment)

I think the downscaler has this bug:
https://bugs.python.org/issue16399

If I set the kind as:

- --kind=statefulset

I get this in the logs:

2019-06-03 06:09:53,137 INFO: Downscaler v0.15 started with debug=True, default_downtime=never, default_uptime=<redacted>, downscale_period=never, downtime_replicas=0, dry_run=False, 
exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=<redacted>, exclude_statefulsets=, grace_period=900, interval=60, kind=['dep
loyment', 'statefulset'], namespace=None, once=False, upscale_period=never

It appends statefulset to deployment instead of overriding deployment with statefulset.
Maybe use a different strategy for this variable? Instead of multiple flags, receive a string and convert it to an array?

Waiting to build with regex applied to main.py

Good afternoon, yesterday I created the PR to adjust the regex -> #16, I am waiting for the build of the new image with this change applied. Could you do this build, and upgrade to dockerhub?

Bug: Error when no previous annotations

The downscaler fails at adding ORIGINAL_REPLICAS_ANNOTATION if there are no other annotations in the deployment/statefulset.

support grace period for updated deployments

Right now the docs state that " i.e. updated deployments will immediately be scaled down regardless of the grace period."

It would be great if an updated deployment would mean that the it would be upscaled for a grace period. This would allow us to use it in our ci cd pipelines... Right now they fail once a developer commits something at night because the health check does not pass after a deployment...

helm chart defaults to an old release

kube-downscaler/helm-chart/values.yaml

Line 26 in 78007f0

tag: 0.5.1

Set an "include" annotation

Right now, the only way to determine if a deployment or sts should be downscaled is using an "exclude" annotation.

It would be great if there was also a downscale/include: true mechanism to allow opt in for downscaling

SSL Errors

Hi,

Is it possible to add a flag to ignore SSL validation? I'm getting a ton of these:

r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
2019-02-27 21:43:03,579 ERROR: Certificate did not match expected hostname: 10.157.0.1. Certificate: {'subject': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'kubernetes'),), (('organizationalUnitName', 'at4d-c2'),), (('commonName', '.acme.dev'),)), 'issuer': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'acme Technologies'),), (('organizationalUnitName', 'IT'),), (('commonName', '.acme.com'),)), 'version': 3, 'serialNumber': '7BB0875690FE8FF14EA324BCC3D0C2353BBBE4F4', 'notBefore': 'Apr 3 20:29:00 2018 GMT', 'notAfter': 'Apr 2 20:29:00 2023 GMT', 'subjectAltName': (('DNS', 'kubernetes.default'), ('DNS', 'kubernetes.default.svc'), ('DNS', 'at4d-lvkc2m03'), ('DNS', 'at4d-lvkc2m03.acme.dev'), ('DNS', 'at4d-lvkc2m03.acme.com'), ('DNS', '.acme.dev'), ('DNS', '.acme.com'))}
2019-02-27 21:43:03,580 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname) r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
2019-02-27 21:43:03,579 ERROR: Certificate did not match expected hostname: 10.157.0.1. Certificate: {'subject': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'kubernetes'),), (('organizationalUnitName', 'at4d-c2'),), (('commonName', '.acme.dev'),)), 'issuer': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'acme Technologies'),), (('organizationalUnitName', 'IT'),), (('commonName', '.acme.com'),)), 'version': 3, 'serialNumber': '7BB0875690FE8FF14EA324BCC3D0C2353BBBE4F4', 'notBefore': 'Apr 3 20:29:00 2018 GMT', 'notAfter': 'Apr 2 20:29:00 2023 GMT', 'subjectAltName': (('DNS', 'kubernetes.default'), ('DNS', 'kubernetes.default.svc'), ('DNS', 'at4d-lvkc2m03'), ('DNS', 'at4d-lvkc2m03.acme.dev'), ('DNS', 'at4d-lvkc2m03.acme.com'), ('DNS', '.acme.dev'), ('DNS', '.acme.com'))}
2019-02-27 21:43:03,580 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname)
File "/usr/local/lib/python3.7/ssl.py", line 323, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/kube_downscaler/main.py", line 40, in run_loop
dry_run=dry_run, grace_period=grace_period)
File "/kube_downscaler/scaler.py", line 119, in scale
forced_uptime = pods_force_uptime(api, namespace)
File "/kube_downscaler/scaler.py", line 28, in pods_force_uptime
for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 133, in iter
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 123, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 107, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
File "/usr/local/lib/python3.7/ssl.py", line 323, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '*.acme.com'",)

During handling of the above exception, another exception occurred:

Scale back up if `exclude` annotation is added while downscaled

Sometimes, after our cluster has downscaled for the evening, a developer would like to scale something back up temporarily to test. At the moment, this would involve adding the exclude annotation, then manually triggering a scale-up with kubectl scale, then removing the exclude annotation to scale back down.

It would be great if the downscaler automatically recognized when something has an original-replicas annotation but is also marked as exclude, or in a namespace which is excluded, and scales those back up to their original replicas.

(I may be able to attempt a PR for this sometime soon, but not right away)

downscaler/force-uptime not documented and RBAC missing

Follow up to #10: documentation and RBAC needs to be adapted.

Question: Any possibilty to downscale on holidays ?

Hello,

First, thanks for this tool very usefull for savings costs for cloud users.

if I want to go further, is there a possibility of downscale on specific days such as holidays ?

Thanks a lot,
BR,
Jérémy

Use apps/v1 API version

The Workloads API became stable, use the latest apps/v1 if available.

Deployments scaled down even though downtime=never

2020-01-27 14:33:05,882 INFO: Scaling down Deployment default/flask-v1-tutorial from 1 to 0 replicas (uptime: Sun-Fri 00:00-23:59 UTC, downtime: never)
When uptime cover most of the day , I did try some other variation but the deployments goes down every time .

Invalid annotation value

Apparently annotations can't contain the colon character ":".

I get an invalid value error with the message:

metadata.labels: Invalid value: "Mon-Fri 07:00-19:00 US/Eastern": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

Kubernetes 1.11

hjacobs / kube-downscaler Goto Github PK

kube-downscaler's People

Contributors

Stargazers

Watchers

Forkers

kube-downscaler's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs