Comments (11)
Related: kubernetes/kubernetes#107631 (review)
from cloud-provider.
Related (but separate), I think we should remove the serviceCache. The cached was added in kubernetes/kubernetes@fc08a0a in 2015, and I suspect that the reasons it was originally added no longer apply. It is used to update Service object in response to changes to Nodes.
I suspect the reason the cache was added was to avoid fetching Service objects again from the API service. However, we already have these objects cached in the serviceInformer. I believe we can entirely replace this cache with calls to the serviceInformer. We can track failed reconciles by key and fetch them from the informer.
It adds unnecessary complexity to the code and could easily become another source of subtle bugs.
from cloud-provider.
It's possible that bug fix would address your problem though? But there's a follow-up to be had where we remove the cache as you suggested
from cloud-provider.
That's the incidental problem, not the main one, though!
from cloud-provider.
It's possible that bug fix would address your problem though? But there's a follow-up to be had where we remove the cache as you suggested
No, I've confused this issue by discussing 2 related but different things. The serviceCache is simply confusing now. It's not the source of the bug.
from cloud-provider.
You can see it here: https://github.com/kubernetes/kubernetes/blob/349900472a38a29fd6d85f7e4880d4f3d72ad6ee/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L346-L347
Note that we updated the cache with the service object we were passed from the informer, before passing the object from the informer to syncLoadBalancerIfNeeded(). The dirty object came from the informer, not the serviceCache. We never actually read from the serviceCache in this code path, only write to it.
from cloud-provider.
I've thrown up a placeholder PR which hopefully demonstrates what I think the problem is. It's completely untested! I'll try to work on it properly tomorrow.
from cloud-provider.
I have manually verified that kubernetes/kubernetes#109601 fixes the issue: I'm now more confident that the issue is that we're modifying the object returned by the informer when we should not. I'll explain how I tested it in the PR, and look for a practical way to add an automated test.
from cloud-provider.
I decided to quickly audit the other cloud-provider controllers looking for similar problematic uses of objects returned from an informer:
Node controller looks ok. Looks like it always copies or re-fetches before modifying.
Node lifecycle controller passes shallow-copied Node from informer to:
Route controller looks ok. Looks like it always copies or re-fetches before modifying.
This is quite hard to audit manually. It feels ripe for a 'safe' wrapper used consistently across the codebase. Without anything like rust's ability to separate our mutable from our immutable references automatically, it might be safer to simply always copy these objects before use.
from cloud-provider.
I think up to this point it was implied that the Service object passed from the controller should be read-only but agreed that passing a deep copy is probably the safer thing to do since it's not always obvious
from cloud-provider.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from cloud-provider.
Related Issues (20)
- Outdated services may be sent in UpdateLoadBalancer() interface HOT 6
- PV Admission breaks when external provider's CloudConfig diverges HOT 3
- Pass cli flags to cloud provider registration HOT 5
- controllermanager.go version is not shown in the log correctly HOT 3
- Labeling LoadBalancer service doesn't invoke EnsureLoadBalancer logic HOT 5
- UpdateLoadBalancer target services are not deterministic when starting process HOT 5
- Kubelet no longer restricts InternalIP to --node-ip after upgrade to CCM HOT 16
- node lifecycle controller should delete node if failed to check if node is shutdown because of 404
- LoadBalancer controller: nodes listing with externalTrafficPolicy == "local" HOT 7
- [Feature discussion] Loadbalancer support to route traffic directly to Pods instead of NodePort. HOT 5
- Usage of IPs returned by `InstancesV2().InstanceMetadata()`, and interaction with `--node-ip` HOT 14
- app.NewCloudControllerManagerCommand additionalFlags not working as expected HOT 6
- Many CCM in one cluster. HOT 7
- Meaning of HasClusterID() ? HOT 7
- Gateway API integration HOT 4
- Node deletion via CCM HOT 5
- RFE: Ability to return arbitrary node labels from cloud provider HOT 6
- finer grained logs in cloud-provider libs HOT 4
- Implementing a cloud-controller-manager without golang (i.e. as an REST API server or in another language HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloud-provider.