hjacobs / kube-downscaler Goto Github PK
View Code? Open in Web Editor NEWScale down Kubernetes deployments after work hours
Home Page: https://hub.docker.com/r/hjacobs/kube-downscaler
License: GNU General Public License v3.0
Scale down Kubernetes deployments after work hours
Home Page: https://hub.docker.com/r/hjacobs/kube-downscaler
License: GNU General Public License v3.0
It would be nice if an event was generated each time a deployment was scaled up/down.
The downscaler needs read and write access to the Kubernetes API (deployment resources). Document the needed configuration.
kube-downscaler/kube_downscaler/scaler.py
Line 53 in 1d5da01
In case uptime and downtime overlap this line will evaluate to is_uptime = false
resulting in the replica sets to be scaled down.
I wonder if this is excepted/wanted behavior. In case of doubt (e.g. conflicing configuration) I'd rather keep a replica set up and running instead of shutting it down :)
We see some "connection refused" errors after rolling master nodes. To be investigated.
2019-06-05 09:16:53,927 INFO: Downscaler v0.14 started with debug=False, default_downtime=never, default_uptime=always, downscale_period=never, downtime_replicas=0, dry_run=False, exclude_deployments=kube-downscaler,downscaler,postgres-operator, exclude_namespaces=kube-system,visibility, exclude_statefulsets=, grace_period=900, interval=30, kind=['deployment', 'stack', 'deployment'], namespace=None, once=False, upscale_period=never
2019-06-05 09:18:55,888 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/kube_downscaler/main.py", line 41, in run_loop
dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
File "/kube_downscaler/scaler.py", line 159, in scale
forced_uptime = pods_force_uptime(api, namespace)
File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
return self.session.get(*args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
response = self._do_send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd860>: Failed to establish a new connection: [Errno 111] Connection refused'))
2019-06-05 16:12:36,328 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/kube_downscaler/main.py", line 41, in run_loop
dry_run=dry_run, grace_period=grace_period, downtime_replicas=downtime_replicas)
File "/kube_downscaler/scaler.py", line 159, in scale
forced_uptime = pods_force_uptime(api, namespace)
File "/kube_downscaler/scaler.py", line 29, in pods_force_uptime
for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 148, in __iter__
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 138, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 122, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 267, in get
return self.session.get(*args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 133, in send
response = self._do_send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.3.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2a70ecd160>: Failed to establish a new connection: [Errno 111] Connection refused'))
When configuring downscaler/force-uptime as "false" in a namespace resource , the uptime is still enforced.
apiVersion: v1
kind: Namespace
metadata:
name: ns1
annotations:
downscaler/uptime: "Mon-Fri 07:30-19:30 CET"
downscaler/force-uptime: "false"
I would expect this configuration to downscale outside the given uptime interval
This is probably me not understanding how this works but it doesnt seem to be downscaling per my schedule.
I have put the labels on my namespace:
kubectl -n gar get ns gar -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
downscaler/downtime: Sat-Sun 03:00-03:00 UTC
creationTimestamp: "2019-05-06T18:54:24Z"
labels:
name: gar
name: gar
resourceVersion: "87553798"
selfLink: /api/v1/namespaces/gar
uid: 5da3c6c7-7030-11e9-a1fa-0ec4fa901d62
spec:
finalizers:
- kubernetes
status:
phase: Active
The date on the downscaler:
kubectl -n kube-downscaler exec kube-downscaler-6969d86595-7g8dg date
Sat Nov 9 19:38:11 UTC 2019
I would have expected no pods to be running in this namespace:
kubectl -n gar get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
tornado-fb459c7f9-hb9v2 1/1 Running 0 38h 100.104.143.20 ip-172-17-50-252.ec2.internal <none>
webapp-nginx-85fcf96f7f-h9vl4 1/2 CrashLoopBackOff 461 38h 100.98.154.12 ip-172-17-50-178.ec2.internal <none>
webapp-nginx-85fcf96f7f-whfh9 1/2 CrashLoopBackOff 253 21h 100.104.146.22 ip-172-17-52-214.ec2.internal <none>
I dont seen any logs with errors in the down-scaler:
2019-11-09 19:36:58,025 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:36:58,032 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:36:58,033 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)
2019-11-09 19:38:04,135 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:38:04,135 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:38:04,140 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:38:04,140 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)
2019-11-09 19:39:10,430 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:39:10,430 DEBUG: Deployment gar/tornado has 1 replicas (original: None, uptime: always)
2019-11-09 19:39:10,433 DEBUG: https://100.64.0.1:443 "GET /api/v1/namespaces/gar HTTP/1.1" 200 371
2019-11-09 19:39:10,434 DEBUG: Deployment gar/webapp-nginx has 2 replicas (original: None, uptime: always)
kube-downscaler pod:
Containers:
kube-downscaler:
Container ID: docker://ceb7b61de9d170a0e39d04697b8fc2887902b8a6a0b444e65d39496e48982178
Image: hjacobs/kube-downscaler:19.10.1
Image ID: docker-pullable://hjacobs/kube-downscaler@sha256:5f7d1e7fa9b58ac8af5e6685d725a549bf75ec58091d064afcc98acf03bc2510
Port: <none>
Host Port: <none>
Args:
--interval=60
--debug
--debug
State: Running
Started: Thu, 07 Nov 2019 14:01:49 -0800
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 02 Nov 2019 11:15:41 -0700
Finished: Thu, 07 Nov 2019 14:00:59 -0800
Ready: True
Restart Count: 1
Limits:
cpu: 50m
memory: 200Mi
Requests:
cpu: 50m
memory: 200Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-downscaler-token-zg95h (ro)
Conditions:
Am I doing something wrong here? I would expected it to downscale all the deployment in the namespace.
The proprietary sumologic.com
annotations should be removed from the Helm chart.
When scaling back up during uptime, does it remember the previous value of REPLICA before the scaledown for each deployment?
In addition to the supported weekday+time scaling schedules, kube-downscaler should support absolute time ranges for uptime and downtime periods. I imagine these would be specified as either a range of numeric unix timestamps or a string format like ISO-8601.
For example, downscaler/uptime: "2019-10-03T14:00:00+00:00-2019-10-04T02:00:00+00:00"
would specify scaling for precisely that period from 2pm UTC October 3 to 2am UTC October 4.
if no value for period is defined on namespace level or before, annotation on deployment is not take in consideration. The following statement is never satisfied:
elif upscale_period != 'never' or downscale_period != 'never':
quick fix:
def autoscale_resource(resource: pykube.objects.NamespacedAPIObject, upscale_period: str, downscale_period: str,
default_uptime: str, default_downtime: str, forced_uptime: bool, dry_run: bool,
now: datetime.datetime, grace_period: int, downtime_replicas: int, namespace_excluded=False):
try:
exclude = namespace_excluded or ignore_resource(resource)
original_replicas = resource.annotations.get(ORIGINAL_REPLICAS_ANNOTATION)
downtime_replicas = int(resource.annotations.get(DOWNTIME_REPLICAS_ANNOTATION, downtime_replicas))
if exclude and not original_replicas:
logger.debug('%s %s/%s was excluded', resource.kind, resource.namespace, resource.name)
else:
replicas = resource.replicas
ignore = False
upscale_period = resource.annotations.get(UPSCALE_PERIOD_ANNOTATION, upscale_period)
downscale_period = resource.annotations.get(DOWNSCALE_PERIOD_ANNOTATION, downscale_period)
if forced_uptime or (exclude and original_replicas):
uptime = "forced"
downtime = "ignored"
is_uptime = True
elif upscale_period != 'never' or downscale_period != 'never':
uptime = upscale_period
downtime = downscale_period
if helper.matches_time_spec(now, upscale_period) and helper.matches_time_spec(now, downscale_period):
logger.debug('Upscale and downscale periods overlap, do nothing')
ignore = True
elif helper.matches_time_spec(now, upscale_period):
is_uptime = True
elif helper.matches_time_spec(now, downscale_period):
is_uptime = False
else:
ignore = True
else:
uptime = resource.annotations.get(UPTIME_ANNOTATION, default_uptime)
downtime = resource.annotations.get(DOWNTIME_ANNOTATION, default_downtime)
is_uptime = helper.matches_time_spec(now, uptime) and not helper.matches_time_spec(now, downtime)
update_needed = False
if not ignore and is_uptime and replicas == downtime_replicas and original_replicas and int(original_replicas) > 0:
logger.info('Scaling up %s %s/%s from %s to %s replicas (uptime: %s, downtime: %s)',
resource.kind, resource.namespace, resource.name, replicas, original_replicas,
uptime, downtime)
resource.replicas = int(original_replicas)
resource.annotations[ORIGINAL_REPLICAS_ANNOTATION] = None
update_needed = True
elif not ignore and not is_uptime and replicas > 0 and replicas > int(downtime_replicas):
target_replicas = int(resource.annotations.get(DOWNTIME_REPLICAS_ANNOTATION, downtime_replicas))
if within_grace_period(resource, grace_period, now):
logger.info('%s %s/%s within grace period (%ds), not scaling down (yet)',
resource.kind, resource.namespace, resource.name, grace_period)
else:
logger.info('Scaling down %s %s/%s from %s to %s replicas (uptime: %s, downtime: %s)',
resource.kind, resource.namespace, resource.name, replicas, target_replicas,
uptime, downtime)
resource.annotations[ORIGINAL_REPLICAS_ANNOTATION] = str(replicas)
resource.replicas = target_replicas
update_needed = True
if update_needed:
if dry_run:
logger.info('**DRY-RUN**: would update %s %s/%s', resource.kind, resource.namespace, resource.name)
else:
resource.update()
except Exception as e:
logger.exception('Failed to process %s %s/%s : %s', resource.kind, resource.namespace, resource.name, str(e))
The code sets startingDeadlineSeconds
to zero:
kube-downscaler/kube_downscaler/scaler.py
Line 226 in 80aee07
This will break CronJobs. Kubernetes docs: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline
See also #99
Would you accept that I start a PR on https://github.com/helm/charts to move the chart from your local repository to the official one ?
This would be more convenient for our deploy flow and also give more visibility to your project.
Trying to configure multiple kinds at the args but couldn't find the way. Is it feasible? Thanks!
Only interval, namespace and debug are configurable in the current helm chart, while more command line options are available. It would be grateful if all command line options are supported in the helm chart.
Hi,
We have created for our developers an ephemeral instance concept. If a Demand or Service Manager wants to test a new feature we can deploy an ephemeral instance of our architecture incl some microservices or all. There you can also specify if this environment should be deleted after a time or should be available unlimited.
For the unlimited instances I'm searching for a feature to scale down this environment.
The complete ephemeral instance will be deployed in one namespace so it would be great to have an annotation on our namespace to define that all deployments or statefulsets are downscaled during non business hours.
I think if I look into the source code it seems you are only supporting statefulsets and deployments but not all statefulsets and deployments in a namespace.
What do you think?
Best regards,
Björn
First, thanks for the project, it has helped us a lot.
It would be nice to be able specify a "--default-original-replicas" flag. This way if a new pod is added and someone forgets to add the "original-replicas" annotation, it will be recreated anyways.
0.8 release added the namespace annotation support for uptime/downtime but didn't add namespace to clusterrole :)
Would be nice to have an option or an annotation to exclude certain deployments that were recently created.
This is useful for running end-to-end tests on weekends. (System under test is deployed right before the tests run, so setting this annotation to something like 1 hour will give some time and then stop it when the tests are likely to be over).
It would be great if it was possible to suspend cronjobs based on an annotation at the cronjob entity or inherited from the namespace entity.
(I'll try to contribute this feature as a PR if this sounds useful to others...)
I want to understand how will this work with horizontal pod autoscaler as we already have it deployed on our environment. from what i understand this changes deployment number and hpa also works with min and max number of deployments so wont the autoscaler override the changes done by downscaler at any time ? is there a way to make it work with autoscaler or are these supposed to be exclusive of each other.
When use value namespace.active_in with helm chart the pod start with the namespaces between "namespace_name" and this presence of " is not managed in the code.
Install Helm chart
helm install kube-downscaler incubator/kube-downscaler --version 0.4.0 --namespace kube-system --values ../tmp/incubator_kube-downscaler/0.4.0/incubator-kube-downscaler-profile.yaml
Use this value file
../tmp/incubator_kube-downscaler/0.4.0/incubator-kube-downscaler-profile.yaml
namespace:
active_in: dev
replicaCount: 1
image:
tag: 20.2.0
debug:
enable: True
Check deployment
k get deploy -n kube-system kube-downscaler -o yaml
...
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: kube-downscaler
app.kubernetes.io/name: kube-downscaler
spec:
containers:
- args:
- --interval=60
- --namespace="dev"
- --debug
image: hjacobs/kube-downscaler:20.2.0
imagePullPolicy: IfNotPresent
name: kube-downscaler
resources: {}
...
Error in logs: /%22dev%22/
k logs --follow -n kube-system kube-downscaler-68896fc54b-8f4vd
2020-03-20 14:11:40,145 DEBUG: Starting new HTTPS connection (1): 10.100.0.1
2020-03-20 14:11:40,153 DEBUG: https://10.100.0.1:443 "GET /api/v1/namespaces/%22dev%22/pods HTTP/1.1" 200 134
2020-03-20 14:11:40,159 DEBUG: https://10.100.0.1:443 "GET /apis/apps/v1/namespaces/%22dev%22/deployments HTTP/1.1" 200 159
If I edit the deployment
k edit deploy -n kube-system kube-downscaler
...
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: kube-downscaler
app.kubernetes.io/name: kube-downscaler
spec:
containers:
- args:
- --interval=60
- --namespace=dev
- --debug
image: hjacobs/kube-downscaler:20.2.0
imagePullPolicy: IfNotPresent
name: kube-downscaler
resources: {}
...
Now it works: /apis/apps/v1/namespaces/dev/deployments HTTP/1.1" 200 None
k logs --follow -n kube-system kube-downscaler-7c7bdd79bd-w7jfz
2020-03-20 14:17:38,029 DEBUG: Starting new HTTPS connection (1): 10.100.0.1
2020-03-20 14:17:38,044 DEBUG: https://10.100.0.1:443 "GET /api/v1/namespaces/dev/pods HTTP/1.1" 200 None
2020-03-20 14:17:38,052 DEBUG: https://10.100.0.1:443 "GET /apis/apps/v1/namespaces/dev/deployments HTTP/1.1" 200 None
With GKE preemptible nodes every node gets restarted at least once in 24h.
If I set up downscaler like
tag: 0.15
args:
- --default-uptime=Mon-Fri 07:00-19:00 Europe/Berlin
- --downtime-replicas=1
it scales down at the end of the day, remembers original replicas but when the downscaler pod restarts (with it's node) it gets somehow forgotten. resulting in everything stuck on 1 replicas even when uptime comes.
I could reproduce this by simply restarting downscaler with a different uptime setting (forcing it to upscale) after a downscale happened and experienced the same behaviour.
Do you have any idea how to get it work?
in the README its says:
Time definitions (e.g. DEFAULT_UPTIME) accept a comma separated list of specifications
but having given a comma separated list of specifications I get the following error:
2020-04-10 08:40:36,720 ERROR: Failed to process StatefulSet test/foo : Time spec value "Mon-Thur 09:00-20:00 Europe/London,Fri-Fri 10:00-18:00 Europe/London" does not match format ("Mon-Fri 06:30-20:30 Europe/Berlin" or"2019-01-01T00:00:00+00:00-2019-01-02T12:34:56+00:00")
Traceback (most recent call last):
File "/kube_downscaler/scaler.py", line 129, in autoscale_resource
is_uptime = helper.matches_time_spec(
File "/kube_downscaler/helper.py", line 36, in matches_time_spec
raise ValueError(
ValueError: Time spec value "Mon-Thur 09:00-20:00 Europe/London,Fri-Fri 10:00-18:00 Europe/London" does not match format ("Mon-Fri 06:30-20:30 Europe/Berlin" or"2019-01-01T00:00:00+00:00-2019-01-02T12:34:56+00:00")
Hi
I want to use this with HPA (Horizontal Pod Autoscaler) Deployment, but I think that it will not work as long as I see this code.
In order to make this work, I think that it should update spec.minReplicas
and spec.maxReplicas
to 0
.
Can you implement that?
Thanks.
We run kubes 1.9 with the v1beta2 API. We can installed with rbac, etc but the system is trying to call v1beta1 and gets access denied.
args.exclude_deployments.split(','), dry_run=args.dry_run)
File "/kube_downscaler/main.py", line 63, in autoscale
for deploy in deployments:
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 133, in iter
return iter(self.query_cache["objects"])
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 123, in query_cache
cache["response"] = self.execute().json()
File "/usr/lib/python3.6/site-packages/pykube/query.py", line 108, in execute
r.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://100.64.0.1:443/apis/extensions/v1beta1/deployments
Is there anything we can change?
Hi,
first of all, thank you for your project!
I've tried kube-downscaler in a nearly empty cluster. Here are the logs:
2019-03-12 12:52:11,615 INFO: Downscaler v0.9 started with debug=False, default_downtime=never, default_uptime=Mon-Fri 07:30-20:00 Europe/Berlin, dry_run=False, exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=kube-system, exclude_statefulsets=, grace_period=900, interval=60, kind=['deployment', 'deployment', 'statefulset'], namespace=None, once=False
2019-03-12 19:00:23,032 INFO: Scaling down Deployment authelia/authelia-app from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,045 INFO: Scaling down Deployment authelia/authelia-redis-slave from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,060 INFO: Scaling down Deployment cattle-system/cattle-cluster-agent from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,080 INFO: Scaling down Deployment logging/elasticsearch-client from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,103 INFO: Scaling down Deployment logging/elasticsearch-exporter from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,124 INFO: Scaling down Deployment logging/kibana from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,144 INFO: Scaling down Deployment logging/laas-metricbeat from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,161 INFO: Scaling down Deployment monitoring/kube-state-metrics from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,191 INFO: Scaling down StatefulSet authelia/authelia-redis-master from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,212 INFO: Scaling down StatefulSet authelia/mongo from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,231 INFO: Scaling down StatefulSet logging/elasticsearch-data from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,253 INFO: Scaling down StatefulSet logging/elasticsearch-master from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,271 INFO: Scaling down StatefulSet monitoring/alertmanager from 3 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,290 INFO: Scaling down StatefulSet monitoring/grafana from 1 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-12 19:00:23,310 INFO: Scaling down StatefulSet monitoring/prometheus from 2 to 0 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,204 INFO: Scaling up Deployment authelia/authelia-app from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,216 INFO: Scaling up Deployment authelia/authelia-redis-slave from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,228 INFO: Scaling up Deployment cattle-system/cattle-cluster-agent from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,239 INFO: Scaling up Deployment logging/elasticsearch-client from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,252 INFO: Scaling up Deployment logging/elasticsearch-exporter from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,268 INFO: Scaling up Deployment logging/kibana from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,286 INFO: Scaling up Deployment logging/laas-metricbeat from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,299 INFO: Scaling up Deployment monitoring/kube-state-metrics from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,345 INFO: Scaling up StatefulSet monitoring/alertmanager from 0 to 3 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,358 INFO: Scaling up StatefulSet monitoring/grafana from 0 to 1 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
2019-03-13 06:30:14,372 INFO: Scaling up StatefulSet monitoring/prometheus from 0 to 2 replicas (uptime: Mon-Fri 07:30-20:00 Europe/Berlin, downtime: never)
As you can see: 15 x "Scaling down", but only 11 x "Scaling up". I have tried it several times, at each try this statefulsets were not scaled up:
We use kube-downscaler version 0.9. (Kubernetes v1.12.6)
Any idea why?
My understanding of the documentation is that by annotating a deployment downscaler/downscale-period="Mon-Sun 19:00-19:30 UTC"
, it should scale down to zero and never scale back up again. In practice, the downscaler simply ignores this deployment, as if it had no annotation at all. Am I misunderstanding the documentation?
Additionally, it would be extremely helpful if the Usage section could have a couple of examples for each command line option and annotation, and a note of their limitations. For example, I assumed that the --namespace
option could take more than one namespace as a value, as the default is all namespaces, but I can't find a way of formatting a list of namespaces that it understands, making me think it's either all or just one, like kubectl commands. I would be more than happy to write that extra documentation, as soon as I fully understand it myself.
Hi,
is there a reason, why there is only --namespace
argument that allows only one namespace to be specified, but is no --namespaces
that would allow multiple namespaces? Instead, one has to use --exclude-namespaces
argument and make sure, that every new namespace that shouldn't be monitored needs to be added to the exclusion list
The existing annotation downscaler/exclude
only accepts boolean true
/false
. It's sometimes useful to specify an absolute end time, e.g.:
The new annotation should accept a timestamp in one of the following formats:
2020-04-05T20:59:00Z
(same format as creationTimestamp
)2020-04-05T20:59
(short version)2020-04-05 20:59
(short version, space instead of "T")2020-04-05
(only date, assumes 00:00 UTC)Why support multiple date/time formats? The annotation will most probably be set by a human to keep a deployment scaled up, so it should accept common ISO formats without the user having to look up the exact format.
I've read the docs and still can't judge what the correct configuration for my use case is (if it's supported at all).
What I need is to simply downscale the cluster every day at 17:00 and that's it. I don't need to bring it back up automatically. Is this use case supported?
Zalando's StackSetController supports HPA (https://github.com/zalando-incubator/stackset-controller/blob/2baddca617e2b76e34976357765206280cfd382e/pkg/core/stack_resources.go#L190).
Instead of reading the replicas
value, we need to support the horizontalPodAutoscaler
property:
horizontalPodAutoscaler:
maxReplicas: 4
metadata:
creationTimestamp: null
metrics:
- resource:
name: cpu
targetAverageUtilization: 80
type: Resource
- resource:
name: memory
targetAverageUtilization: 80
type: Resource
minReplicas: 2
Would be nice if instead of going to 0 replicas always, it could be parameterized and be set to 1 or 2, for big clusters it make sense not to kill all pods to keep the service running.
Not really a bug, but not nice: the resource's namespace is retrieved every time again from the Kubernetes API. This is clearly visible with --debug
logging:
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,760 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,760 DEBUG: Deployment default/xxpi has 0 replicas (original: 3, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,777 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,777 DEBUG: Deployment default/foo has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,792 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,792 DEBUG: Deployment default/bar has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,808 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,808 DEBUG: Deployment default/demo has 0 replicas (original: 2, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,823 DEBUG: https://10.3.0.1:443 "GET /api/v1/namespaces/default HTTP/1.1" 200 345
kube-downscaler-6d8c6fdc7d-9b2mb downscaler 2020-04-10 15:28:40,823 DEBUG: Deployment default/example has 0 replicas (original: 1, uptime: Mon-Fri 07:30-20:30 Europe/Berlin)
I have a deployment that has a HorizontalPodAutoScaler, in this scenario the kube-downscaler change replicas to 0 but HPA change again to original value, it's generate a "looop" and de deployment never is downscaled.
I need add/change some config to this works?
When I set downtime-replicas=2 on a sts:
Should I use another annotation ? original replicas ?
As I mentioned earlier, I'd like to use a different scenario. I need to downscale the replica, not no upscale later, so I'm suggesting a different logic, which maybe can live the existing logic as well...
I suggest two additional parameter:
upscale_period
and downscale_period
. You can set a period of time (for example in my case) only for downtime.
upscale_period: never
downscale_period: "Mon-Fri 19:00-19:10 UTC"
This means, that if a resource up between 19:00 and 19:10 it will downscale it.
(that will be a same logic for the upscale as well)
Pros:
Could you implement this (or any other method), that I can downscale only my deployment)
Thanks
I think the downscaler has this bug:
https://bugs.python.org/issue16399
If I set the kind as:
- --kind=statefulset
I get this in the logs:
2019-06-03 06:09:53,137 INFO: Downscaler v0.15 started with debug=True, default_downtime=never, default_uptime=<redacted>, downscale_period=never, downtime_replicas=0, dry_run=False,
exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=<redacted>, exclude_statefulsets=, grace_period=900, interval=60, kind=['dep
loyment', 'statefulset'], namespace=None, once=False, upscale_period=never
It appends statefulset to deployment instead of overriding deployment with statefulset.
Maybe use a different strategy for this variable? Instead of multiple flags, receive a string and convert it to an array?
Good afternoon, yesterday I created the PR to adjust the regex -> #16, I am waiting for the build of the new image with this change applied. Could you do this build, and upgrade to dockerhub?
The downscaler fails at adding ORIGINAL_REPLICAS_ANNOTATION
if there are no other annotations in the deployment/statefulset.
Right now the docs state that " i.e. updated deployments will immediately be scaled down regardless of the grace period."
It would be great if an updated deployment would mean that the it would be upscaled for a grace period. This would allow us to use it in our ci cd pipelines... Right now they fail once a developer commits something at night because the health check does not pass after a deployment...
kube-downscaler/helm-chart/values.yaml
Line 26 in 78007f0
Right now, the only way to determine if a deployment or sts should be downscaled is using an "exclude" annotation.
It would be great if there was also a downscale/include: true
mechanism to allow opt in for downscaling
Hi,
Is it possible to add a flag to ignore SSL validation? I'm getting a ton of these:
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
2019-02-27 21:43:03,579 ERROR: Certificate did not match expected hostname: 10.157.0.1. Certificate: {'subject': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'kubernetes'),), (('organizationalUnitName', 'at4d-c2'),), (('commonName', '.acme.dev'),)), 'issuer': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'acme Technologies'),), (('organizationalUnitName', 'IT'),), (('commonName', '.acme.com'),)), 'version': 3, 'serialNumber': '7BB0875690FE8FF14EA324BCC3D0C2353BBBE4F4', 'notBefore': 'Apr 3 20:29:00 2018 GMT', 'notAfter': 'Apr 2 20:29:00 2023 GMT', 'subjectAltName': (('DNS', 'kubernetes.default'), ('DNS', 'kubernetes.default.svc'), ('DNS', 'at4d-lvkc2m03'), ('DNS', 'at4d-lvkc2m03.acme.dev'), ('DNS', 'at4d-lvkc2m03.acme.com'), ('DNS', '.acme.dev'), ('DNS', '.acme.com'))}
2019-02-27 21:43:03,580 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname) r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
2019-02-27 21:43:03,579 ERROR: Certificate did not match expected hostname: 10.157.0.1. Certificate: {'subject': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'kubernetes'),), (('organizationalUnitName', 'at4d-c2'),), (('commonName', '.acme.dev'),)), 'issuer': ((('countryName', 'US'),), (('stateOrProvinceName', 'Georgia'),), (('localityName', 'Alpharetta'),), (('organizationName', 'acme Technologies'),), (('organizationalUnitName', 'IT'),), (('commonName', '.acme.com'),)), 'version': 3, 'serialNumber': '7BB0875690FE8FF14EA324BCC3D0C2353BBBE4F4', 'notBefore': 'Apr 3 20:29:00 2018 GMT', 'notAfter': 'Apr 2 20:29:00 2023 GMT', 'subjectAltName': (('DNS', 'kubernetes.default'), ('DNS', 'kubernetes.default.svc'), ('DNS', 'at4d-lvkc2m03'), ('DNS', 'at4d-lvkc2m03.acme.dev'), ('DNS', 'at4d-lvkc2m03.acme.com'), ('DNS', '.acme.dev'), ('DNS', '.acme.com'))}
2019-02-27 21:43:03,580 ERROR: Failed to autoscale : HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname)
File "/usr/local/lib/python3.7/ssl.py", line 323, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/kube_downscaler/main.py", line 40, in run_loop
dry_run=dry_run, grace_period=grace_period)
File "/kube_downscaler/scaler.py", line 119, in scale
forced_uptime = pods_force_uptime(api, namespace)
File "/kube_downscaler/scaler.py", line 28, in pods_force_uptime
for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 133, in iter
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 123, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 107, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
File "/usr/local/lib/python3.7/ssl.py", line 323, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '*.acme.com'",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '.acme.com'")))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/kube_downscaler/main.py", line 40, in run_loop
dry_run=dry_run, grace_period=grace_period)
File "/kube_downscaler/scaler.py", line 119, in scale
forced_uptime = pods_force_uptime(api, namespace)
File "/kube_downscaler/scaler.py", line 28, in pods_force_uptime
for pod in pykube.Pod.objects(api).filter(namespace=(namespace or pykube.all)):
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 133, in iter
return iter(self.query_cache["objects"])
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 123, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 107, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.7/site-packages/pykube/http.py", line 127, in get
return self.session.get(args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='10.157.0.1', port=443): Max retries exceeded with url: /api/v1/pods (Caused by SSLError(SSLCertVerificationError("hostname '10.157.0.1' doesn't match either of 'kubernetes.default', 'kubernetes.default.svc', 'at4d-lvkc2m03', 'at4d-lvkc2m03.acme.dev', 'at4d-lvkc2m03.acme.com', '.acme.dev', '*.acme.com'")))
Sometimes, after our cluster has downscaled for the evening, a developer would like to scale something back up temporarily to test. At the moment, this would involve adding the exclude
annotation, then manually triggering a scale-up with kubectl scale
, then removing the exclude annotation to scale back down.
It would be great if the downscaler automatically recognized when something has an original-replicas
annotation but is also marked as exclude
, or in a namespace which is excluded, and scales those back up to their original replicas.
(I may be able to attempt a PR for this sometime soon, but not right away)
Follow up to #10: documentation and RBAC needs to be adapted.
Hello,
First, thanks for this tool very usefull for savings costs for cloud users.
if I want to go further, is there a possibility of downscale on specific days such as holidays ?
Thanks a lot,
BR,
Jérémy
The Workloads API became stable, use the latest apps/v1
if available.
2020-01-27 14:33:05,882 INFO: Scaling down Deployment default/flask-v1-tutorial from 1 to 0 replicas (uptime: Sun-Fri 00:00-23:59 UTC, downtime: never)
When uptime cover most of the day , I did try some other variation but the deployments goes down every time .
Apparently annotations can't contain the colon character ":".
I get an invalid value error with the message:
metadata.labels: Invalid value: "Mon-Fri 07:00-19:00 US/Eastern": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')
Kubernetes 1.11
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.