Comments (23)
Bumping this up as it hasn't seen any love in a while. This is super useful to my company, as we would like to be able to operate hybrid clusters (openstack and bare-metal in our case) while still being able to use cloud-controller-manager.
I'd be happy to contribute to this effort, just don't know where to start. A KEP, perhaps?
from cloud-provider.
@andrewsykim There are some scenarios that ccm should ignore nodes. eg. virtual-kubelet , edge node, datacenter nodes in hybrid cluster .
we should come up with more generous way to ignore those nodes.
alibaba cloudprovider use service.beta.kubernetes.io/exclude-node
in node labels to exclude node from ccm.
Any thoughts?
from cloud-provider.
AFAIK there's still none!
from cloud-provider.
Nevermind the v2 code in AWS CCM. That one is on ice and probably should be removed. But I doubt v1 is any better. But I am happy to support changes in this direction.
However, the more generic support (the mentioned flag and logic for whether the CCM interface is being interacted with) should be added to this repos. If we are lucky, it might be that all CCMs using this lib doesn't need any changes then.
from cloud-provider.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from cloud-provider.
/remove-lifecycle stale
from cloud-provider.
/lifecycle frozen
from cloud-provider.
@timoreimann I recall having a conversation about supporting multiple CCMs in a cluster, this is somewhat related. Are you interested in doing this work?
from cloud-provider.
@andrewsykim yes, I'm very interested as it'd help us at DigitalOcean to ease testing. Though my intent is to go beyond just nodes and include load balancers as well. kubernetes/kubernetes#88820 is the ticket I filed for the wider purpose, and kubernetes/kubernetes#88820 (comment) has the summary of our discussion in one of the SIG meetings.
Feel free to assign me to either / all tickets.
from cloud-provider.
hi everybody. I'm also looking at the way how to ignore some nodes on AWS. May I ask, do you know the solution to that?
from cloud-provider.
It comes down to how to identify when a node is owned by which CCM. AWS has some notion that nodes should be prefixed either with ip-
or i-
, but that is a poor heuristic.
It may be that a KEP is needed to introduce a flag to kubelet that will add something to the created node object hinting at what CCM should own it, and that all CCMs then implement support for ignoring the hint if set to another value than itself.
Not too unrelated to this is the ability to run multiple AWS CCMs for having nodes in multiple regions or accounts.
from cloud-provider.
That's a good point, it does mesh really nicely with allowing multiple CCMs (AWS or otherwise) to manage a single cluster.
I was more approaching the idea of having an annotation on a node that indicates which CCM it should belong to, but we'd need a reproducible(?) way to identify CCMs... could be done as a simple argument to the CCM, or...?
from cloud-provider.
Sounds like something similar to LoadBalancerClass and IngressClass.
from cloud-provider.
Yeah, feels very similar. I like that parallel a lot.
from cloud-provider.
Hi all, any update on this issue?
from cloud-provider.
How are people doing multi-cloud kubernetes clusters without this solved?
It comes down to how to identify when a node is owned by which CCM. AWS has some notion that nodes should be prefixed either with
ip-
ori-
, but that is a poor heuristic.
Why attempt to do it based on node name? Instead do it based on label or annotation:
- Nominate a new label to used on nodes, e.g.
node.kubernetes.io/cloud-provider: aws
- This label should be added by users via extra arguments to
kubelet
- A CCM should never initialize or delete a node with a label that doesn't match its own
--cloud-provider
argument
from cloud-provider.
This label should be added by users via extra arguments to kubelet
I don't think the underlying machine should be trusted to set this correctly for the same reason other k8s-namespaced labels are not allowed. It should be done by the provisioning/installer mechanism that handles things like the role labels.
A CCM should never initialize or delete a node with a label that doesn't match its own --cloud-provider argument
If one would want multi-region AWS, one would need multiple AWS CCMs, so this doesn't quite work. But some similar flag certainly.
from cloud-provider.
How i solved this issue:
- Use Talos as kubernetes solution
- Talos CCM only initializes the nodes and sets the ProviderID string.
- Native CCM (from cloud provider) launch only as
--controllers=cloud-node-lifecycle
I did not try to use routing/loadbalancing thought kubernetes resources. And I think, it will be very complicated.
from cloud-provider.
Interesting idea @sergelogvinov; I'm already using talos so trying to figure out how that would work.
Native CCM (from cloud provider) launch only as
--controllers=cloud-node-lifecycle
Looking at least the aws v2 code InstanceExists
returns false
for any non-aws nodes: https://github.com/kubernetes/cloud-provider-aws/blob/10ec1f461d50e7413fa8c97baefd8db24c1f9d8a/pkg/providers/v2/instances.go#L77
But in the cloud-node-lifecycle controller, doesn't it proceed to delete the node as soon as InstanceExists
returns false
?
- called by
- which would proceed to delete the node at
Do you actually have this working?
from cloud-provider.
I did not try AWS, this is in my nearest plan-list.
Unfortunately, sometimes you need to add a few if/else lines to native CCM.
to save a time you can check it out https://github.com/sergelogvinov/terraform-talos (this is my research)
from cloud-provider.
Nevermind the v2 code in AWS CCM. That one is on ice and probably should be removed.
oh? I didn't realise it wasn't ready for use. Could you share some info on that?
Are you speaking of v2 code in general? or AWS in particular?
But I doubt v1 is any better.
Indeed. if the instance is not found for the current cloud provider then ensureNodeExistsByProviderID
likewise returns false
and the same node deletion should happen (with my understanding/reading).
cloud-provider/controllers/nodelifecycle/node_lifecycle_controller.go
Lines 235 to 236 in 97fdc45
However, the more generic support (the mentioned flag and logic for whether the CCM interface is being interacted with) should be added to this repos. If we are lucky, it might be that all CCMs using this lib doesn't need any changes then.
The logic of ensureNodeExistsByProviderID
returning false leading to node deletion seems to be an issue that must be fixed in the cloud node lifecycle controller.
from cloud-provider.
Nevermind the v2 code in AWS CCM. That one is on ice and probably should be removed.
oh? I didn't realise it wasn't ready for use. Could you share some info on that? Are you speaking of v2 code in general? or AWS in particular?
v2 was an idea to make CCM more modern using CRDs for configuration and such. But as you can see from the git history, pretty much nothing has happened to it, while v1 is more actively maintained.
AWS CCM should absolutely be used in favour of the in-tree provider. kOps has been using by default since 1.24.
But I doubt v1 is any better.
Indeed. if the instance is not found for the current cloud provider then
ensureNodeExistsByProviderID
likewise returnsfalse
and the same node deletion should happen (with my understanding/reading).cloud-provider/controllers/nodelifecycle/node_lifecycle_controller.go
Lines 235 to 236 in 97fdc45
However, the more generic support (the mentioned flag and logic for whether the CCM interface is being interacted with) should be added to this repos. If we are lucky, it might be that all CCMs using this lib doesn't need any changes then.
The logic of
ensureNodeExistsByProviderID
returning false leading to node deletion seems to be an issue that must be fixed in the cloud node lifecycle controller.
I am thinking this should not be called if the node has a different label/class than what's passed in the flag.
from cloud-provider.
I have been thinking about this issue as I currently run an on prem with vultr setup and every so often the vultr ccm deletes all the on prem nodes. I want to add additional providers / regions.
I think core issue comes down to node provance and attestation of the in the node lifecycle controller. Ie can it trust node supplied data about being in cloud and managed via another ccm or manually configured and to be left alone.
This lead me to thinking of using spire/spiffe attestation however if pre joined spire-agent may not be deployed and creating a hard dependency on spiffe might not fit all environments... However attaching information to the node object and validating it I think can be done with existing machinery of addmission/validation webhook with object filtering (ie route delete requests via a finializer) possibly hijacking the token review to validate the node object was modified by a trusted ccm
Thoughts?
from cloud-provider.
Related Issues (20)
- Pass cli flags to cloud provider registration HOT 5
- controllermanager.go version is not shown in the log correctly HOT 3
- Labeling LoadBalancer service doesn't invoke EnsureLoadBalancer logic HOT 5
- UpdateLoadBalancer target services are not deterministic when starting process HOT 5
- Kubelet no longer restricts InternalIP to --node-ip after upgrade to CCM HOT 16
- node lifecycle controller should delete node if failed to check if node is shutdown because of 404
- LoadBalancer controller: nodes listing with externalTrafficPolicy == "local" HOT 7
- Service Controller can call provider EnsureLoadBalancer with a dirty Service object HOT 11
- [Feature discussion] Loadbalancer support to route traffic directly to Pods instead of NodePort. HOT 5
- Usage of IPs returned by `InstancesV2().InstanceMetadata()`, and interaction with `--node-ip` HOT 14
- app.NewCloudControllerManagerCommand additionalFlags not working as expected HOT 6
- Many CCM in one cluster. HOT 7
- Meaning of HasClusterID() ? HOT 7
- Gateway API integration HOT 4
- Node deletion via CCM HOT 5
- RFE: Ability to return arbitrary node labels from cloud provider HOT 6
- finer grained logs in cloud-provider libs HOT 4
- Implementing a cloud-controller-manager without golang (i.e. as an REST API server or in another language HOT 5
- Prevent Empty ProviderID in CloudNodeLifecycleController HOT 2
- HasClusterID() and allow-untagged-cloud
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloud-provider.