kubecost / cluster-turndown Goto Github PK

View Code? Open in Web Editor NEW

252.0 252.0 24.0 898 KB

Automated turndown of Kubernetes clusters on specific schedules.

License: Apache License 2.0

Dockerfile 0.25% Makefile 0.17% Go 97.14% Shell 2.44%

cluster-turndown's People

Contributors

Stargazers

Watchers

cluster-turndown's Issues

panic: runtime error: invalid memory address or nil pointer dereference

LOGS:
clusterprovider.go:84] Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
eksclusterprovider.go:91] [Error] Failed to load service account.
provider.go:48] Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
validator.go:39] Validating Provider
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x16df68a]
goroutine 39 [running]:
github.com/kubecost/cluster-turndown/pkg/turndown/provider.(*EKSProvider).GetNodePools(0xc0004f54a0, 0x0, 0x0, 0x0, 0x0, 0x0)
/app/pkg/turndown/provider/eksprovider.go:52 +0x2a
github.com/kubecost/cluster-turndown/pkg/turndown/provider.validateProvider(0x1fa4480, 0xc0004f54a0, 0x5, 0xc0004f6cc0)
/app/pkg/turndown/provider/validator.go:15 +0x72
created by github.com/kubecost/cluster-turndown/pkg/turndown/provider.Validate
/app/pkg/turndown/provider/validator.go:42 +0xbd

using :
https://github.com/kubecost/cluster-turndown/releases/latest/download/cluster-turndown-full.yaml

cluster: EKS

theory:
maybe this function is returning null

cluster-turndown/pkg/cluster/provider/awsclusterprovider.go

Line 479 in fcabe35

func (p *AWSClusterProvider) GetNodePools() ([]NodePool, error) {

Turndown scaledown not works ( EKS)

Hello,

I wanted to use turndown to stop my dev cluster during off hours. It's an EKS cluster.
I succeeded to create the example schedule .
However when the schedule is supposed to happen, it failed with the following error :

E0525 11:47:00.907323       1 schedulecontroller.go:190] TurndownSchedule 'example-schedule' in work queue no longer exists
I0525 11:47:14.319434       1 namedlogger.go:24] [TurndownScheduler] Schedule Created: &{Current:scaledown ScaleDownID:1adae7c3-6f07-4837-baec-a5ed8a8efad0 ScaleDownTime:2022-05-25 11:55:00 +0000 UTC ScaleDownMetadata:map[repeat:daily type:scaledown] ScaleUpID:afcc2f9c-4324-4a29-880d-a532546857ef ScaleUpTime:2022-05-25 12:45:00 +0000 UTC ScaleUpMetadata:map[repeat:daily type:scaleup]}
I0525 11:47:14.339321       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"example-schedule", UID:"a6b0311e-8430-423a-ae1f-26935b7279ee", APIVersion:"kubecost.k8s.io/v1alpha1", ResourceVersion:"1476199", FieldPath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown
I0525 11:55:00.000175       1 turndownscheduler.go:404] -- Scale Down --
I0525 11:55:00.009267       1 namedlogger.go:24] [TurndownScheduler] Turndown Pod does not exist on expected host node. Preparing environment...
I0525 11:55:00.009306       1 namedlogger.go:24] [Turndown] Creating or Getting the Target Host Node...
I0525 11:55:00.419785       1 namedlogger.go:24] [MasterlessStrategy] Finite node backed cluster. Creating singleton nodepool for turndown.
I0525 11:55:00.665213       1 namedlogger.go:48] [Error] Failed to prepare current turndown environment. Cancelling. Err=AccessDeniedException: 
	status code: 403, request id: e5777622-11c8-4beb-aed1-1c1073c1925d
I0525 11:55:00.665544       1 scheduler.go:188] Job was cancelled: afcc2f9c-4324-4a29-880d-a532546857ef
I0525 11:55:00.685926       1 namedlogger.go:24] [TurndownScheduler] Turndown Schedule Successfully Cancelled

The error seems to be an "Access Denied" however i created the credentials with the suggested policy https://github.com/kubecost/cluster-turndown#eks--aws-kops-setup
Do you have any idea ?

Regards

Controller does not pick up a re-applied schedule.

Applied the example schedule, got an error saying the date was in the past (imo it should not matter as the repete was set to daily)

Updating with a future date, ran kubectl apply again without the controller picking up the change. removing and then adding the schedule made the controller pick up on the change.

Updating a Turndown Resource Fails

Using kubectl edit tds or kubectl apply to update a turndown schedule resource does not function as expected. We'll need to work outsome of the fences for "when" you're allowed to update. For example, prevention of modifying a turndown schedule while turndown is executing.

Work Around: Delete the resource with kubectl delete tds, and then recreate the schedule.

Add option to disable/remove autoscaling on node pools during turndown

If a cloud provider (GKE, EKS) autoscaler is present on node pools, turndown currently evicts pods and then relies on the autoscaler to reduce the node count. However, sometimes autoscaler parameters may be set with a too-high minimum node count, resulting in smaller than expected savings during turndown. This is a request to add an option to turndownschedules.kubecost.com to force turndown to behave like it does if an autoscaler isn't present.

This could be achieved by (1) modifying the minimum node count or (2) removing the autoscaling node pool entirely during turndown and recreating it on turnup.

Turndown fails on GKE due to empty zone string

Observed problem

Turndown fails to run on a GKE cluster with the following config info:

cluster-turndown-2.0.1
GKE v1.22.8-gke.202

Logs from user environment:

I0706 21:00:43.033004       1 main.go:118] Running Kubecost Turndown on: REDACTED
I0706 21:00:43.059698       1 validator.go:41] Validating Provider...
I0706 21:00:43.061743       1 gkemetadata.go:92] [Error] metadata: GCE metadata "instance/attributes/kube-env" not defined
I0706 21:00:43.063220       1 gkemetadata.go:92] [Error] metadata: GCE metadata "instance/attributes/kube-env" not defined
I0706 21:00:43.063445       1 namedlogger.go:24] [GKEClusterProvider] Loading node pools for: [ProjectID: REDACTED, Zone: , ClusterID: REDACTED]
I0706 21:00:43.192046       1 validator.go:27] [Error]: Failed to load node groups: rpc error: code = InvalidArgument desc = Location "" does not exist.

Source of the error in code

This "Loading node pools" message, followed by the error comes from here in the GKE provider.

cluster-turndown/pkg/cluster/provider/gkeclusterprovider.go

Lines 169 to 183 in c74e3bb

 // GetNodePools loads all of the provider NodePools in a cluster and returns them. 

 func (p *GKEClusterProvider) GetNodePools() ([]NodePool, error) { 

 ctx := context.TODO() 

 projectID := p.metadata.GetProjectID() 

 zone := p.metadata.GetMasterZone() 

 cluster := p.metadata.GetClusterID() 

 req := &container.ListNodePoolsRequest{Parent: p.getClusterResourcePath()} 

 p.log.Log("Loading node pools for: [ProjectID: %s, Zone: %s, ClusterID: %s]", projectID, zone, cluster) 

 resp, err := p.clusterManager.ListNodePools(ctx, req) 

 if err != nil { 

 return nil, err 

 }

The request being executed uses a path generator which is filling in the empty zone string, causing the error.

cluster-turndown/pkg/cluster/provider/gkeclusterprovider.go

Lines 496 to 500 in c74e3bb

 // gets the fully qualified resource path for the cluster 

 func (p *GKEClusterProvider) getClusterResourcePath() string { 

 return fmt.Sprintf("projects/%s/locations/%s/clusters/%s", 

 p.metadata.GetProjectID(), p.metadata.GetMasterZone(), p.metadata.GetClusterID()) 

 }

We're using md.client.InstanceAttributeValue("kube-env") to get the GCP zone/locaion:

cluster-turndown/pkg/cluster/provider/gkemetadata.go

Lines 84 to 94 in c74e3bb

 func (md *GKEMetaData) GetMasterZone() string { 

 z, ok := md.cache[GKEMetaDataMasterZoneKey] 

 if ok { 

 return z 

 } 

 results, err := md.client.InstanceAttributeValue("kube-env") 

 if err != nil { 

 klog.V(1).Infof("[Error] %s", err.Error()) 

 return "" 

 }

Possible cause

This may not be caused by the absence of kube-env metadata, but rather a lack of access to it. GKE offers "metadata concealment" which specifically calls out kube-env as data to be hidden. kube-env is also mentioned in GKE's NodeMetadata config "SECURE" setting.

Possible solution

Reporting user has suggested a different attribute value to use: cluster-location

curl -L -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-location
europe-west2

If this is a stable attribute provided by GKE-provisioned VMs this probably works. We could also investigate using v1/instance/zone as an alternative. It seems to be officially guaranteed on all GCP VMs. Other stable sources of node(pool) location information may be preferable, I just haven't dug deep enough to find them yet.

Other considerations

It is currently unclear if this is affecting all GKE environments or those of only a certain version, region, or configuration (e.g. metadata concealment). Any fixes here should be tested on earlier GKE versions to ensure compatibility.

EKS - withOIDC - instead of AWS access key - secret key

hi team
i m interested per your stuff but on my side we don't use harcoded credentials or technical accounts with creds.
We prefer to use service account linked with an dedicated policy on AWS.

I have tried to use cluster-turndown but without credentials it is refusing to execute the scale down
After a quick review on the go code, i see the AWS creds required.

Have you a solution to skip this issue for this use case with AWS Provider ?

Thanks a lot in advance
Alexandre

Remove HostPort from K8s Definition

Disable endpoint on cluster-turndown, and remove exposed port.

pod cannot start on EKS

Hello,

I have the similar issue with previous users with EKS, followed completely with IAM policies with autoscaling full control, eks privileges and same secret in the document. However the pod cannot even start and seems like the client cannot recognise the provider.

Thank you .

I0616 08:12:54.345492 1 main.go:118] Running Kubecost Turndown on: ip-xxxxxxxxxx.xxxxxxx.compute.internal
I0616 08:12:54.363880 1 clusterprovider.go:92] Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
I0616 08:12:54.399739 1 main.go:133] [Error]: Failed to create ClusterProvider: UnrecognizedClientException:
status code: 403, request id: 4ba56060-d2aa-4368-9e13-fd9d9cb24c81
I0616 08:13:08.966637 1 main.go:118] Running Kubecost Turndown on: ip-xxxxxxxxxx.xxxxxxx.compute.internal
I0616 08:13:08.984976 1 clusterprovider.go:92] Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
I0616 08:13:09.029962 1 main.go:133] [Error]: Failed to create ClusterProvider: UnrecognizedClientException:
status code: 403, request id: 788fd61f-597c-4bd5-9686-69c3565ed8d5

turndown sets own deployment to 0

Im running this on a cluster spun up by our current GKE terraform files, only deployment running on the cluster is teardown, the cluster currently has two node pools both with autoscaling enabled.

The very first thing turndown does, is turn its own deployment to 0, Im assuming this is why the turnup i have configured a 20 minutes after after turndown does not trigger. (poc schedule for seeing how turndown works)

This is the controller log:

cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:22:01.122236       1 validator.go:39] Validating Provider
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:22:01.127017       1 namedlogger.go:24] [GKEClusterProvider] Loading node pools for: [ProjectID: xxxxxxxx, Zone: xxxxxxx, ClusterID: xxxxxxx]
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:22:01.345562       1 reflector.go:122] Starting reflector *v1alpha1.TurndownSchedule (30s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:22:01.345602       1 reflector.go:160] Listing and watching *v1alpha1.TurndownSchedule from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:22:01.345705       1 schedulecontroller.go:109] Starting TurndownSchedule controller
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.345558       1 turndownscheduler.go:404] -- Scale Down --
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.356117       1 namedlogger.go:24] [TurndownScheduler] Already running on correct turndown host node. No need to setup environment.
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.356152       1 namedlogger.go:24] [Turndown] Scaling Down Cluster Now
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.364449       1 namedlogger.go:24] [GKEClusterProvider] Loading node pools for: [ProjectID: xxxxxxx, Zone: xxxxxx, ClusterID: xxxxxxx]
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.398052       1 namedlogger.go:24] [Turndown] Found Cluster-AutoScaler. Flattening Cluster...
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.398147       1 namedlogger.go:32]   [Flattener] Starting to Flatten All Deployments...
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:01.589561       1 namedlogger.go:32]   [Flattener] Starting to Flatten All DaemonSets...
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:02.010862       1 namedlogger.go:32]   [Flattener] Starting to Suspend All Jobs...
cluster-turndown-5c77649bff-67tng cluster-turndown I0607 18:23:02.021614       1 namedlogger.go:24] [Turndown] Resizing all non-autoscaling node groups to 0...

Any input on why this is happening?

JIT turndown node

It would be nice to be able to spin up / spin down the turndown node just before a schedule is set to execute.

Do Not Validate Dates for Repeating Jobs

We can use the existing scheduling flow, but validation can ignore dates for repeating jobs and assume the next available times.

Support multiple turndown schedules

It would be nice to have a daily turndown schedule for evenings in addition to a weekend turndown schedule. Seems perfectly fine for these to be set with two separate yaml files.

AWS EKS - Self managed nodegroups / Unmanaged nodegroups

Hi there!
I'm just trying out this project, and I see the note that AWS EKS requires managed nodegroups for the turndown service to work.

I'm curious if there is a work around for self managed nodes or if there isn't a way to run the service without that functionality.

I've installed everything per the instructions and this is the error message I get when it attempts to Turn Down:

main.go:118]                  Running Kubecost Turndown on: ip-X-X-X-X.us-west-2.compute.internal                                                                                                                                                            
clusterprovider.go:9          Found ProviderID starting with "aws", using AWS Provider                         
provider.go:63]               Found ProviderID starting with "aws", using AWS Provider                                                                                                                                                                         
validator.go:41]              Validating Provider...                                                                                                                                                                                                          
namedlogger.go:48]            [Error] Image Location: InvalidAMIID.NotFound: The image id '[ami-08e89be32f916757b]' does not exist status code: 400, request id: 90e7bf93-ee83-41c0-a5f5-7d89192aa4c7
reflector.go:122]             Starting reflector *v1alpha1.TurndownSchedule (30s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98                                                                               
reflector.go:160]             Listing and watching *v1alpha1.TurndownSchedule from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98                                                                                   
schedulecontroller.go:109]    Starting TurndownSchedule controller                                                                                                                                                                                  
namedlogger.go:24]            [TurndownScheduler] Schedule Created: &{Current:scaledown ScaleDownID:410612c9-98a3-4006-9c01-a1b1090db209 ScaleDownTime:2021-05-13 18:50:00 +0000 UTC ScaleDownMetadata:map[repeat:daily type:scaledown] ScaleUpID:6e995feb- 
event.go:258]                 Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"example-schedule", UID:"b1c9193d-8091-493d-9c00-c60c889ab4df", APIVersion:"kubecost.k8s.io/v1alpha1", ResourceVersion:"3231589", FieldPath:""}): type: 'Norm 
turndownscheduler.go:404]     -- Scale Down --                                                                                                                                                                                                       
namedlogger.go:24]            [TurndownScheduler] Turndown Pod does not exist on expected host node. Preparing environment...                                                                                                                               
namedlogger.go:24]            [Turndown] Creating or Getting the Target Host Node...                                                                                                                                                                        
namedlogger.go:48]            [Error] Failed to prepare current turndown environment. Cancelling. Err=Failed to locate master node in standard turndown strategy.                                                                                           
scheduler.go:188]             Job was cancelled: 6e995feb-69fc-4a0c-bef4-6137490156ff                                                                                                                                                                        
namedlogger.go:24]            [TurndownScheduler] Turndown Schedule Successfully Cancelled                                                                                                                                                                  
schedulecontroller.go:189]    TurndownSchedule 'example-schedule' in work queue no longer exists

GKE Autoscaling, creation of new node pool not working

The documentation states:
Managed Cluster Strategy (e.g. GKE + EKS)
When the turndown schedule occurs, a new node pool with a single g1-small node is created. Taints are added to this node to only allow specific pods to be scheduled there. We update our cluster-turndown deployment such that the turndown pod is allowed to schedule on the singleton node. Once the pod is moved to the new node, it will start back up and resume scaledown. This is done by cordoning all nodes in the cluster (other than our new g1-small node), and then reducing the node pool sizes to 0.

However when I add a new schedule (1.2.1 version of cluster-turndown but also happens on 1.3 snapshot) I see a label being added to one of my current nodes vs a new micro instance like it used to add. Is this some sort of bug, or expected change?

Thanks

Turndown fails on non-autoscaling clusters when PodDisruptionBudget(s) exist and eviction is possible

Description

On non-autoscaling clusters where eviction is available, cluster-turndown attempts to evict Pods as part of the "Drain" process. After draining is finished, the node pool is supposed to be scaled down. If a PDB exists in the cluster with a minReplicas > 0, there will be at least one un-evictable pod, meaning draining will never finish.

The eviction logic has an infinite loop which continuously retries eviction that fails with a non-nil, non-IsNotFound, or non-IsTooManyRequests error status. I added a log statement and got this error on my dev cluster from the PolicyV1beta1().Evictions().Evict() call:

I0216 16:12:35.055635       1 draininator.go:396] Evicting in namespace 'guestbook-with-pdb' pod 'frontend-6b6c9c585d-chsvc' failed: Cannot evict pod as it would violate the pod's disruption budget.

Cannot evict pod as it would violate the pod's disruption budget is the error being returned from .Evict().

Reproduce

Create GKE cluster

gcloud container clusters create \
    turndown-pdb-bug \
    --region "us-central1-b" \
    --project "---PROJECTIDHERE---" \
    --num-nodes 3

Create a deployment with a non-zero minReplicas PDB

SETUPYAML=$(cat <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: nginx
EOF
)

echo $SETUPYAML | kubectl apply -f -

kubectl get deployment

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3/3     3            3           42s

kubectl get pdb

NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
nginx-pdb   2               N/A               1                     43s

Put turndown in the cluster

bash ./scripts/gke-create-service-key.sh <yourproject> <servicekeyname>

kubectl get secret -n turndown

NAME                           TYPE                                  DATA   AGE
cluster-turndown-service-key   Opaque                                1      27s
default-token-vspf9            kubernetes.io/service-account-token   3      28s

kubectl apply -f ./artifacts/cluster-turndown-full.yaml

kubectl get pod -n turndown

NAME                                READY   STATUS    RESTARTS   AGE
cluster-turndown-7c7c7bcc74-2k5g4   1/1     Running   0          20s

Create a turndown schedule that will trigger soon

SCHEDULE=$(cat <<EOF
apiVersion: kubecost.k8s.io/v1alpha1
kind: TurndownSchedule
metadata:
  name: turndown-pdb-bug-test-schedule
  finalizers:
  - "finalizer.kubecost.k8s.io"
spec:
  start: 2022-02-16T17:35:00Z
  end: 2022-02-16T18:00:00Z
  repeat: daily
EOF
)

echo $SCHEDULE | kubectl apply -f -

kubectl get tds

NAME                             STATE             NEXT TURNDOWN          NEXT TURN UP
turndown-pdb-bug-test-schedule   ScheduleSuccess   2022-02-16T17:35:00Z   2022-02-16T18:00:00Z

Wait for turndown to start and finish.
Note in the logs that at least one node never finishes draining.

date -u --rfc-3339=seconds
echo

kubectl logs -n turndown -l app=cluster-turndown --tail=-1 | \
    grep -i 'Draininator' | \
    grep 'Draining Node\|Cordoning Node\|Drained Successfully'

2022-02-16 18:04:53+00:00

I0216 17:37:30.466494       1 namedlogger.go:24] [Draininator] Draining Node: gke-turndown-pdb-bug-default-pool-75d619d4-2wt5
I0216 17:37:30.466827       1 namedlogger.go:32]   [Draininator] Cordoning Node: gke-turndown-pdb-bug-default-pool-75d619d4-2wt5
I0216 17:37:46.155634       1 namedlogger.go:24] [Draininator] Node: gke-turndown-pdb-bug-default-pool-75d619d4-2wt5 was Drained Successfully
I0216 17:37:46.155643       1 namedlogger.go:24] [Draininator] Draining Node: gke-turndown-pdb-bug-default-pool-75d619d4-4msm
I0216 17:37:46.155647       1 namedlogger.go:32]   [Draininator] Cordoning Node: gke-turndown-pdb-bug-default-pool-75d619d4-4msm
I0216 17:38:11.992618       1 namedlogger.go:24] [Draininator] Node: gke-turndown-pdb-bug-default-pool-75d619d4-4msm was Drained Successfully
I0216 17:38:11.992975       1 namedlogger.go:24] [Draininator] Draining Node: gke-turndown-pdb-bug-default-pool-75d619d4-s2h9
I0216 17:38:11.993227       1 namedlogger.go:32]   [Draininator] Cordoning Node: gke-turndown-pdb-bug-default-pool-75d619d4-s2h9

Note in “kubectl get pods” that 2 of the deployment pods are still running and one is unschedulable.

kubectl get pods

NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-66b6c48dd5-9ct6l   0/1     Pending   0          26m
nginx-deployment-66b6c48dd5-lzs74   1/1     Running   0          26m
nginx-deployment-66b6c48dd5-wnqpp   1/1     Running   0          27m

Note in “kubectl get nodes” that after scaleup should have happened, we still have
a turndown node and the 3 regular nodes are sitting around marked as noschedule.

date -u --rfc-3339=seconds
echo

kubectl get nodes

2022-02-16 18:05:16+00:00

NAME                                                  STATUS                     ROLES    AGE   VERSION
gke-turndown-pdb-bug-cluster-turndown-2ecd8eb5-pp8t   Ready                      <none>   29m   v1.21.6-gke.1500
gke-turndown-pdb-bug-default-pool-75d619d4-2wt5       Ready,SchedulingDisabled   <none>   58m   v1.21.6-gke.1500
gke-turndown-pdb-bug-default-pool-75d619d4-4msm       Ready,SchedulingDisabled   <none>   58m   v1.21.6-gke.1500
gke-turndown-pdb-bug-default-pool-75d619d4-s2h9       Ready,SchedulingDisabled   <none>   58m   v1.21.6-gke.1500

Possible solutions

At the very least, we should have a retry limit on evictions so there isn't an infinite loop that makes the turndown pod hang.

Real solutions could involve some sort of "force deletion" or notifying the user of the PDB's presence and asking them to make a modification.

Select workloads to keep alive during turndown

I'm relaying a user request.

They would like to be able to select specific workloads to keep alive (e.g. Kubecost, Prometheus, Grafana) during a turndown.

This behavior is a little complicated to implement, especially in a non-autoscaling environment. We could initially only support this feature in autoscaling environments but I'd need to do some research and testing.

Roadmap positioning of this feature isn't known yet, but I wanted to record it somewhere!

Support Non-UTC timestamps

It would be easier to understand / specify the turndown CRD if they were relative to the user's timezones.

Turndown Scheduling Build Failure

Hi guys,

I am trying to use Turndown in a GKE Cluster (v1.22.8-gke.200) but unfortunately on this version I am
experiencing an issue when creating a schedule I see this error......"kubectl create -f example-schedule.yaml
-n turndown error: unable to recognize "example-schedule.yaml": no matches for
kind "TurndownSchedule"
in version "kubecost.k8s.io/v1alpha1"

This has worked for me in lower versions previously!

Thanks for your help.

Cheers,
John

	// GetNodePools loads all of the provider NodePools in a cluster and returns them.
	func (p *GKEClusterProvider) GetNodePools() ([]NodePool, error) {
	ctx := context.TODO()

	projectID := p.metadata.GetProjectID()
	zone := p.metadata.GetMasterZone()
	cluster := p.metadata.GetClusterID()

	req := &container.ListNodePoolsRequest{Parent: p.getClusterResourcePath()}
	p.log.Log("Loading node pools for: [ProjectID: %s, Zone: %s, ClusterID: %s]", projectID, zone, cluster)

	resp, err := p.clusterManager.ListNodePools(ctx, req)
	if err != nil {
	return nil, err
	}

	// gets the fully qualified resource path for the cluster
	func (p *GKEClusterProvider) getClusterResourcePath() string {
	return fmt.Sprintf("projects/%s/locations/%s/clusters/%s",
	p.metadata.GetProjectID(), p.metadata.GetMasterZone(), p.metadata.GetClusterID())
	}

	func (md *GKEMetaData) GetMasterZone() string {
	z, ok := md.cache[GKEMetaDataMasterZoneKey]
	if ok {
	return z
	}

	results, err := md.client.InstanceAttributeValue("kube-env")
	if err != nil {
	klog.V(1).Infof("[Error] %s", err.Error())
	return ""
	}