Comments (13)
seems that api nlb port 443 and 3988 (using dns=none) is now unhealthy and cluster contain 2 instances that are trying to get up but the error is:
Jan 24 17:29:01 i-001318f220a7e654f nodeup[1816]: W0124 17:29:01.529726 1816 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: Post "https://10.124.0.222:3988/bootstrap": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post "https://10.124.6.165:3988/bootstrap": context deadline exceeded (Client.Timeout exceeded while awaiting headers); Post "https://10.124.9.236:3988/bootstrap": context deadline exceeded (Client.Timeout exceeded while awaiting headers
Basically this means that this bug is quite critical, it will break the cluster scalability if people execute update --yes
the problem might ne that we are assuming that it is possible to modify old NLBs that does not have security groups? It is not possible to create security group rules to existing NLBs.
So what happened? When executed --yes in one cluster, it deleted the existing security group rules. However, it could not add the new rules -> API NLB unhealthy (missing sec group rules) -> scaling does not work
from kops.
tried to recreate the nlb and now whole cluster does not work..
looks like its needed to rotate ALL nodes and control planes after that to get anything working.
from kops.
Sorry about this - this shouldn't be the case. The theory is that I introduced it with the ForAPIServer changes, so I'm taking a look.
from kops.
I think the issue might be #15993
The error is "InvalidConfigurationRequest: You cannot set security groups on a Network Load Balancer which was created without any security groups.", and indeed in 1.28 (and earlier) we did not create NLBs with security groups, this was introduced in the above PR which is only in 1.29
from kops.
We do have upgrade tests that cover this upgrade e.g. https://testgrid.k8s.io/sig-cluster-lifecycle-kops#kops-aws-upgrade-k127-ko128-to-k128-kolatest-many-addons , but it isn't using load balancers; another good reason to switch to dns=none in our tests!
from kops.
So the best workaround I've found so far is to delete the load balancer; the next kops update can then recreate it. The problem is that with dns=none (at least) the IP address changes which then requires a forced-rolling update of the nodes. With dns=none, the kubeconfig also changes, because the load balancer name is randomized by AWS.
I haven't found a way to keep the load balancer IP / name yet.
from kops.
Thinking on this, I think the upgrade experience is fundamentally going to be bad (thank you for flagging @zetaab, and sorry that you hit it!). Rather than talking many people through the process, I think we should do something to make it a smooth process.
Two ideas:
- We revert the security group change on the NLB entirly. I don't love this, it feels like a big change.
- We can look at creating a second NLB during the upgrade. We would likely have to move the security group rule cleanup to a new phase after the rolling update.
The second option is trickier, but I'm going to kick the tires on what it looks like.
from kops.
the process could look like:
- create entire new API NLB with security group rules and wait that all controlplanes and nodes are using it (how to verify or just wait for next kops version?)
- cleanup old security group rules & cleanup old api NLB
but the thing is that this is going to change kubernetes api dns address in case of dns=none and also if someone still using kops with gossip. We updated something like 20 CI pipelines because of this. It was not huge, but still something to keep in mind.
This is not easy problem to solve, or it is if we just want to make downtime for everyone. Though, AWS could have done this better, even to have possibility to keep the dns name same (or have possibility to press the button "yes I want recreate my nlb that causes short downtime, but I want it to support security groups")...
from kops.
That is the broad approach I ended up on also! I uploaded a WIP/hack that implements it: #16291 . It's big, but not that big, and I'm trying to extract out the refactorings that I think are good ideas regardless (and revert some things I did that are irrelevant!) so we can whittle it down to something easier to analyze.
We do still end up with needing to distribute a new kubeconfig, but I don't think there is downtime because the old NLB should still work until we run the (proposed) kops update cluster --cleanup-after-upgrade --yes
command. The idea is that the new flow would be kops update cluster
/ kops rolling-update cluster
/ kops update cluster --cleanup-after-upgrade
, and we now have somewhere to do those "cleanup" steps.
Here the cleanup involves deleting the old NLB (which is what breaks kubeconfig with dns=none / gossip), deleting the old TargetGroup (we can't have two NLBs pointing at the same TargetGroup), cleaning up the SecurityGroupRules that were allowing access from the old NLB, detaching the old TargetGroups from the AutoScalingGroup.
I think this is potentially a powerful technique, but does introduce another step for users (cc @hakman )
from kops.
hmm, could kops rolling-update cluster trigger cleanup-after-upgrade automatically without user really knowing that? I am thinking that rolling-update knows when the upgrade is finished - it could check is all instances migrated to new format. If those are, it could automatically trigger cleanup-after-upgrade?
Actually we do have pre and post migration possibilities when we manage kOps clusters. But something similar could exists in kOps itself?
from kops.
hmm, could kops rolling-update cluster trigger cleanup-after-upgrade automatically without user really knowing that? I am thinking that rolling-update knows when the upgrade is finished - it could check is all instances migrated to new format. If those are, it could automatically trigger cleanup-after-upgrade?
It certainly could. The pattern we have today is that kops update
tells you that you need to run kops rolling-update
, and we could have that pattern also.
Actually we do have pre and post migration possibilities when we manage kOps clusters. But something similar could exists in kOps itself?
Right, I think this is a good opportunity in that a post-upgrade cleanup can allow for bigger changes / safer changes. And it's reasonable to want to delay cleanup until we've verified that the cluster (or even workloads) are actually working. I think to plug in to your own logic, you would want this to be runnable separately. That said, I do agree that as we adds steps it becomes less user-friendly, and we may want a user-friendly "easy mode wrapper" that runs update / rolling-update / post-update.
from kops.
It came up in office hours that this might not hit the default configuration (cc @hakman). I think you're right, for gossip clusters on AWS created with 1.28 at least, although users do still have to delete the existing NLB manually. We did hit some other problems, so I'm going to verify with a few more cases...
I created a gossip cluster:
export CLUSTER_NAME=foo.k8s.local
unset KOPS_BASE_URL
kops-1.28.3 create cluster ${CLUSTER_NAME} --zones us-east-2a --ssh-public-key ~/.ssh/id_rsa.pub
kops-1.28.3 update cluster ${CLUSTER_NAME} --yes --admin
kops-1.28.3 validate cluster ${CLUSTER_NAME} --wait=10m
Then with latest I did see the NLB change in kops update
:
NetworkLoadBalancer/api.foo.k8s.local
SecurityGroups <nil> -> [name:api-elb.foo.k8s.local id:sg-047dec0e24d2ed5be]
That change is the one that requires a new NLB.
SSHing to one of the nodes though, I can see they have the internal IP address of the control plane. They got this address over gossip, AFAICT.
kops update cluster --yes --admin
gives the expected error:
W0203 10:38:07.147773 203956 executor.go:141] error running task "NetworkLoadBalancer/api.foo.k8s.local" (9m58s remaining to succeed): Error updating security groups on Load Balancer: InvalidConfigurationRequest: You cannot set security groups on a Network Load Balancer which was created without any security groups.
We can delete the NLB with:
ARN=$(go run ./cmd/kops toolbox dump -ojson | jq -r '.resources[] | select(.type=="load-balancer") | .raw.LoadBalancerArn')
echo "ARN=${ARN}"
export AWS_DEFAULT_REGION=us-east-2
aws elbv2 delete-load-balancer --load-balancer-arn ${ARN}
I can see that the cluster is trying to set the ApiserverAdditionalIPs
in nodeup config, it's getting a little confused because that address is not known yet, but I don't think that's a real problem.
If we run go run ./cmd/kops validate cluster ${CLUSTER_NAME} --wait=10m
, we need to wait a few minutes for the NLB to actually start serving, but it does validate at this point. (I had to delete a aws-node-termination-handler
replicaset, which will be an issue at some point, but was caused by switching from release -> dev build I believe, and I guess we don't include the image sha256 for dev builds?)
The node still has the internal IP address of the control-plane VM, at this point.
So now I kick off the rolling-update:
go run ./cmd/kops rolling-update cluster ${CLUSTER_NAME}
go run ./cmd/kops rolling-update cluster ${CLUSTER_NAME} --yes
We're able to terminate and replace the control-plane VM, but the next problem is that cilium now fails to start on the node:
level=fatal msg="failed to start: daemon creation failed: unable to initialize kube-proxy replacement options: Invalid value for --kube-proxy-replacement: true" subsys=daemon
Using the ConfigMap with an OnDelete daemonset seems wrong, and I think the kube-proxy-replacement flag is indeed new in cilium 1.14. Not sure how I didn't hit this one before, but I deleted the crash-looping cilium pod to unblock the rolling-update.
And at this point the rolling-update did complete. So not terrible - particularly if we fix the cilium issue which I think we need to do anyway (hoping to figure out why that didn't hit me previously)!
from kops.
BTW the TLDR for 1.28 foo.k8s.local (gossip) with default public topology is that the normal process works if the user deletes the NLB, modulo two bugs(?) that have nothing to do with the NLB (aws-node-termination-handler and cilium)
I tried the same test with 1.28 foo.k8s.local (gossip) with private topology and got the same results.
I do think creating a second load balancer is a good direction for the project, but we may be able to get away without it for 1.29. That said, the behaviour isn't great (deleting the NLB), so we might still consider it. I'm going to continue testing the big scenarios while also trying to get some of the pre-work refactoring done (that will enable creating a second NLB), along with looking at the two bugs - I think at least the cilium one is real.
from kops.
Related Issues (20)
- kops 1.28.2 fails to create a cluster when using spotinst feature flags HOT 6
- AWS VPC CNI Ubuntu 22.04 MACAddressPolicy HOT 5
- nodeup will fail in nodes HOT 6
- 1.28 release notes missing from menu
- Add support to configure "concurrent-horizontal-pod-autoscaler-syncs" flag for HPA Controller in KCM
- Add support to configure "concurrent-job-syncs" flag for Job Controller in KCM
- cannot apply changes to Subnet: *gcetasks.Subnet HOT 2
- ulimit changed in pods between kops 1.28 -> master HOT 2
- create docs are self-inconsistent HOT 2
- Unable to configure disruption controls for karpenter HOT 6
- I wish for Dualstack support on Openstack HOT 2
- Expose imageMinimumGCAge and imageMaximumGCAge kubelet config
- Support dns=none with Terraform
- DNS None clusters fails OIDC e2e test
- [al2023][amazon-vpc-cni] Additional configuration required
- Private dns=none clusters incorrectly creating bastion DNS name tasks
- Treatment of overlapping ServiceCIDR and PodCIDRs HOT 2
- Inconsistencies between qualified names on AWS nodes HOT 1
- Failure cluster [19bd619d...] NoCredentialProviders: no valid providers in chain. Deprecated. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kops.