Comments (9)
@Wenliang-CHEN Keeping fingers crossed for you -- enjoy the holiday! 🙂
from linkerd2.
@Wenliang-CHEN Happy new year!! Just wanted to make sure this was still on your radar. 🙂
from linkerd2.
This sounds a bit similar to an issue we had where the destination controller could become locked and stop processing service discovery updates. However, this bug was fixed in stable-2.14.2 and should not affect you in stable-2.14.3. In order to rule out that possibility, you could take a look at the endpoints_updates
counter metric exposed by the destination controller:
linkerd diagnostics controller-metrics | grep endpoints_updates
You should see this counter incremented when the endpoints of a service change. If, instead, this counter remains at the same value, it means that the destination controller is not processing updates for some reason.
In stable-2.14.4 we added *_informer_lag_secs
histogram metrics to the destination controller for even more visibility. If you upgrade to stable-2.14.4 or later you can use these histograms to see if there is a substantial lag between when endpoints are updated in Kubernetes vs when the destination controller processes those updates.
from linkerd2.
Hey @adleong , thanks for the reply.
And yes, I do see the endpoints_updates counter incremented after the deployment of the target service: service A. With that I guess the destination controller was processing.
A couple of things worth mentioning:
- the issue happens about 20mins after the deployment of the target service.
- If we restart the deployment that owns the outbound pod, the issue is solved
Does it change anything?
And as action item, I think we will try to update to stable-2.14.4
and take a look at *_informer_lag_secs
as well.
Meanwhile, if we found anything new, we will report in the thread again.
Thanks!
from linkerd2.
@Wenliang-CHEN Any joy trying with stable-2.14.4
? 🙂
from linkerd2.
Hey @kflynn not yet...around Christmas holiday. I will let you know 😄
But there has not been another instance since I reported the issue. But to be safe, we are still observing...
from linkerd2.
Hey @kflynn happy new year!
And yes, we have not forgotten this. We just upgraded to v2.14.9. And so far we did not get any report about the same issue.
Hopefully the upgrade somehow fixes it. We will monitor it through out Feb. If there is no further report, I think we can close it for now. Thanks!
from linkerd2.
Okay, the issue happens again.
We are able to get the linkerd.endpoints_updates
, linkerd.endpointslices_informer_lag_seconds.bucket
and linkerd.endpoints_informer_lag_seconds.bucket
It seems they go in patterns: the linkerd.endpointslices_informer_lag_seconds.bucket
goes with linkerd.endpoints_updates
:
And the linkerd.endpoints_informer_lag_seconds.bucket
is aways 0
We are not sure how to understand this. Do they mean anything particular? Or are they totally normal?
from linkerd2.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
from linkerd2.
Related Issues (20)
- Change default `cr.l5d.io` to `ghcr.io`? HOT 1
- Linkerd Multi-Cluster service-mirroring to give option to mirror EndpointSlices as well HOT 3
- Helm upgrade always changing due to trust root? HOT 2
- Connection refused randomly for pairs of pods HOT 4
- Destination container in the linkerd-destination pod panics when using deployments with headless services
- Connection refused (os error 111) error.sources=[Connection refused (os error 111)] HOT 2
- [FR] - reduce endpoint added/removed logs to debug HOT 4
- `duplicate metrics` in destination controller
- Linkerd is giving 200 or 400 responses for the same un-encoded url request depending on the situation HOT 5
- Linkerd destination repeatedly logging endpoint profile translator errors HOT 3
- Linkerd proxy fails to connect to other proxy HOT 2
- duplicated copies of trust anchor certificate HOT 1
- IPv6 semantics differ from Kubernetes without Linkerd HOT 3
- Helm install documentation refers to incorrect repo HOT 1
- Traefik Router unable to communicate with meshed services when linkerd inbound policy is all-authenticated. HOT 5
- Policy controller fails to watch resources. HOT 2
- Multi-cluster demos using TrafficSplit object are not working HOT 1
- Proxy trying to connect to no-longer available endpoints HOT 3
- .
- Enable OpenSSF Scorecard to enhance security practices across the project
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linkerd2.