GithubHelp home page GithubHelp logo

Comments (8)

selzoc avatar selzoc commented on June 15, 2024 2

I believe we have fixed this issue in d18bed8

from capi-release.

ishustava avatar ishustava commented on June 15, 2024 1

@selzoc yep, it was most prominent on bosh-lite.

from capi-release.

cf-gitbot avatar cf-gitbot commented on June 15, 2024

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/157404037

The labels on this github issue will be updated when the story is started.

from capi-release.

tcdowney avatar tcdowney commented on June 15, 2024

Additional context

For future CAPI team members, this is how the Cloud Controller hooks up its health check script with the Route Registrar:

  1. Cloud Controller uses a health check script that is generated by this template
  2. The CF route-registrar uses this code to run the health check script above

Reproduction Steps

I was able to trigger this somewhat on demand by manually patching /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/controllers/runtime/info_controller.rb to sleep for 21 second (slightly longer than cf-deployment's timeout) 50% of the time.

diff --git a/app/controllers/runtime/info_controller.rb b/app/controllers/runtime/info_controller.rb
index 9da7821ac..7a489f570 100644
--- a/app/controllers/runtime/info_controller.rb
+++ b/app/controllers/runtime/info_controller.rb
@@ -4,6 +4,8 @@ module VCAP::CloudController

     get '/v2/info', :read
     def read
+      chance =  [1, 2].sample
+      sleep(21)  unless chance == 2
       info = {
         name: @config.get(:info, :name),
         build: @config.get(:info, :build),

Then while running tail -f /var/vcap/sys/log/route_registrar/route_registrar.log I see the Cloud Controller's route waffle between being registered and unregistered.

Adding some retries and explicit timeouts to our health check script makes this less likely (I didn't observe it get unregistered during my manual experimentation). This might look something like this:

curl --retry 3 --retry-delay 1 --max-time 5 http://localhost:9022/v2/info

Our monit health check that functions similarly, will wait up to sixty seconds for 5 monit cycles:

and request '/v2/info'
with timeout 60 seconds for 5 cycles

from capi-release.

tcdowney avatar tcdowney commented on June 15, 2024

@ishustava @njbennett

Occasionally the CAPI healthcheck fails, for reasons that seem to have nothing to do with the status of the API itself.

We'd like to understand a little better how and why the curl is failing in your environment since a healthy API shouldn't take over. Is your environment under a significant amount of load? Do other requests get fielded in a reasonable amount of time? My repro steps above are a bit contrived and it's possible that adding retries might just exacerbate the actual underlying issue.

from capi-release.

selzoc avatar selzoc commented on June 15, 2024

Bumping this for question above ^^
@ishustava @njbennett

from capi-release.

ishustava avatar ishustava commented on June 15, 2024

Hey @tcdowney @selzoc,

Sorry for the delay.

We don't have any more error output other than the route_registrar.log posted above. At the time we could not find errors in cloud controller logs, so I suspect that curl might have failed for other reasons.

Unfortunately, we could not get enough information from the logs to be able to reproduce the problem and were hoping that we can add retries as a preventative measure.

from capi-release.

selzoc avatar selzoc commented on June 15, 2024

@ishustava @njbennett was this on a bosh lite? Could make sense that what @tcdowney was asking about a healthy api node taking over wouldn't apply in that case since the instance group is scaled to 1.

from capi-release.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.