GithubHelp home page GithubHelp logo

Comments (10)

kdlee avatar kdlee commented on May 20, 2024 1

@pragyamehta, thanks for looking into it and your thoughts.
Fortunately, for now the service I'm replacing is a tiny service so having a script that quickly destroys & recreates (just takes a few seconds, plus extra few seconds to re-establish bridge) it about once every day or two when the bridge connection goes bad saves more time than killing the bridge connection per debug session.
But looking forward to bug fixes and added features for this tool. It's a great tool for the whole Kubernetes ecosystem!

from mindaro.

rakeshvanga avatar rakeshvanga commented on May 20, 2024 1

@irperez Thanks for the suggestion and providing detailed reason.
I've created a User Story for us to track this feature and would post more questions if required.

from mindaro.

greenie-msft avatar greenie-msft commented on May 20, 2024

Hi @kdlee

Thank you for reporting this issue. I'm logging a bug on our side to investigate further.

Just to confirm the reproduction steps:

  1. Connect to the cluster and replace a service
  2. After debugging, end the debug session but keep the connection to the cluster live.
  3. After the machine goes into a sleep state and the workstation is locked, unlocking and starting the debug session fails to start successfully.

It would be helpful if you can attach your Bridge logs:

  1. Delete your logs by navigating to the TEMP directory and deleting the Bridge to Kubernetes directory
  • For Windows: %TEMP%/Bridge to Kubernetes
  • For OSX/Linux: $TMPDIR/Bridge to Kubernetes
  1. Reproduce the issue

  2. Navigate to TEMP/Bridge to Kubernetes and attach the logs:

  • bridge-library
  • bridge-mindarocli
  • bridge-endpointmanager

Thank you,

from mindaro.

kdlee avatar kdlee commented on May 20, 2024

Hello @greenie-msft
I was just able to get a repro and attached the log files.

For the repro, step 3 may or may not be related.
I've tried making the machine lock, sleep, switch users, etc. and that didn't appear to repro.
But after some dev time it got in that state again, so it maybe more related to some kind of time out somewhere.

Thanks for looking into this.

bridge-library-2020-09-30-19-44-03-30368.txt
bridge-mindarocli-2020-09-30-19-44-02-30368.txt
bridge-endpointmanager-2020-09-30-19-44-15-38224.txt

Here's an extra set of logs that it finished writing after force disconnecting
bridge-library-2020-09-30-19-44-09-37732.txt
bridge-mindarocli-2020-09-30-19-44-08-37732.txt

from mindaro.

greenie-msft avatar greenie-msft commented on May 20, 2024

Thanks for attaching your logs @kdlee. Our engineering team will investigate your issue and I will respond to the thread once I have an update to share.

Thanks,
Nick

from mindaro.

pragyamehta avatar pragyamehta commented on May 20, 2024

Hi @kdlee Thanks for reaching out! While we are investigating this, I wanted to provide you with steps to unblock you.

  1. In your VS Code window, please navigate to View -> Command palette -> Preferences: Open Settings (UI)
  2. Under User -> Extensions -> Kubernetes Debugging tools -> please check the below checkbox
    image

Please try to reproduce this issue with these settings and let us know if you still face issues.

from mindaro.

kdlee avatar kdlee commented on May 20, 2024

Thanks for the suggestion.
But wouldn't doing what you suggested just defeat the purpose of using that option in the first place?

Having shorter bridge sessions by disconnecting every time you exit a debug session and have the bridge containers get destroyed and recreated, I would imagine it's harder to repro. This rather feels likes a few missing error handling bugs in the bridge container or vs code client side that is preventing graceful automatic reconnection, which could make this a much better experience and time saver.

I do have a work around that I mentioned above, but would prefer a more reliable sustained bridge rather than taking the iteration time hit to destroy & re-establish bridge every time (especially with the UAC dialogs).

from mindaro.

pragyamehta avatar pragyamehta commented on May 20, 2024

@kdlee , yes, I hear you. We are still treating the issue as a bug on our side and we will be working on it. My suggestion for checking the option was just to make sure you have a workaround with minimal pain. (I would imagine deleting and recreating the kubernetes service may be a painful process for you each time)

I will update this thread once I have an update on the bug. Thanks for your patience!

from mindaro.

pragyamehta avatar pragyamehta commented on May 20, 2024

Thank you for using the product and helping us improve it!

from mindaro.

irperez avatar irperez commented on May 20, 2024

@pragyamehta @greenie-msft

We have been having this issue (or similar) for almost a year now in Visual Studio. And I think we found our root cause to this issue. I'm wondering if there is a way for you guys to fix it by making a change to the deployment when bridge starts up.

And to be clear, we not only have connectivity issues with a 503s or no healthy upstream (with Istio), we get into a state where the deployment of the service we're debugging gets stuck with the bridge image and never rollback to the original.

The root cause for our team is that we have a liveness probe defined. We also have a readiness probe as well. But the probe that's causing the connectivity issues is the liveness probe. It occurs when we hit a break point and hold that break point beyond the "periodSeconds" threshhold that's defined in the probe. Here is how our liveness probe is defined.

image

The reason the liveness probe causes connectivity issues is how K8s treats the pod after the probe fails. It attempts to reboot that pod. The readiness probe does not do that if I'm not mistaken. And so when that happens the pod stays in a bad state stuck with the bridge image instead of going back to the original image.

Also our developers start to see 503 errors no healthy upstream after this point (because we use Istio proxy side cars).

Now when we remove the liveness probe only (we leave the readiness probe in) we do not get these issues and debugging is fine. We never lose connection, even after holding the break point for a long period of time. K8s detects issues with the readiness probe as expected, but it doesn't break the connection.

I think the solution to this problem is when you modify the deployment and inject the bridge image to the deployment, it might be good to remove the liveness probe section altogether and restore it when the debugging session is done.

For now we are going to handle it by removing the liveness probes in our dev environment.

Hope that helps.

from mindaro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.