Comments (8)
I believe we have fixed this issue in d18bed8
from capi-release.
@selzoc yep, it was most prominent on bosh-lite.
from capi-release.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/157404037
The labels on this github issue will be updated when the story is started.
from capi-release.
Additional context
For future CAPI team members, this is how the Cloud Controller hooks up its health check script with the Route Registrar:
- Cloud Controller uses a health check script that is generated by this template
- The CF route-registrar uses this code to run the health check script above
Reproduction Steps
I was able to trigger this somewhat on demand by manually patching /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/controllers/runtime/info_controller.rb
to sleep for 21 second (slightly longer than cf-deployment's timeout) 50% of the time.
diff --git a/app/controllers/runtime/info_controller.rb b/app/controllers/runtime/info_controller.rb
index 9da7821ac..7a489f570 100644
--- a/app/controllers/runtime/info_controller.rb
+++ b/app/controllers/runtime/info_controller.rb
@@ -4,6 +4,8 @@ module VCAP::CloudController
get '/v2/info', :read
def read
+ chance = [1, 2].sample
+ sleep(21) unless chance == 2
info = {
name: @config.get(:info, :name),
build: @config.get(:info, :build),
Then while running tail -f /var/vcap/sys/log/route_registrar/route_registrar.log
I see the Cloud Controller's route waffle between being registered and unregistered.
Adding some retries and explicit timeouts to our health check script makes this less likely (I didn't observe it get unregistered during my manual experimentation). This might look something like this:
curl --retry 3 --retry-delay 1 --max-time 5 http://localhost:9022/v2/info
Our monit health check that functions similarly, will wait up to sixty seconds for 5 monit cycles:
capi-release/jobs/cloud_controller_ng/monit
Lines 36 to 37 in c729f53
from capi-release.
Occasionally the CAPI healthcheck fails, for reasons that seem to have nothing to do with the status of the API itself.
We'd like to understand a little better how and why the curl is failing in your environment since a healthy API shouldn't take over. Is your environment under a significant amount of load? Do other requests get fielded in a reasonable amount of time? My repro steps above are a bit contrived and it's possible that adding retries might just exacerbate the actual underlying issue.
from capi-release.
Bumping this for question above ^^
@ishustava @njbennett
from capi-release.
Sorry for the delay.
We don't have any more error output other than the route_registrar.log
posted above. At the time we could not find errors in cloud controller logs, so I suspect that curl might have failed for other reasons.
Unfortunately, we could not get enough information from the logs to be able to reproduce the problem and were hoping that we can add retries as a preventative measure.
from capi-release.
@ishustava @njbennett was this on a bosh lite? Could make sense that what @tcdowney was asking about a healthy api node taking over wouldn't apply in that case since the instance group is scaled to 1.
from capi-release.
Related Issues (20)
- db_encryption_key is not yaml escaped
- Evaluate if it makes sense to deploy multiple CC_NG processes on a single VM to use all cores HOT 1
- Evaluate removal of v2 api for selected vms creating different cc_ng deployment groups
- Evaluate impact of db connection pool on throughput and performance
- Evaluate impact of thin_server.thread_pool_size HOT 1
- Regression with supported JSON schema on broker catalog HOT 4
- support for stemcell Ubuntu Jammy Jellyfish HOT 6
- Consider changing the Log Cache default host HOT 1
- cc.api_post_start_healthcheck_timeout_in_seconds doesn't seem to matter HOT 1
- Provide a way to get an information about security group updates HOT 1
- Logging Rate Limits - Allow operators to better control log production HOT 6
- After introducing new state "initial" in cf service, our applications are failing HOT 1
- Don't wait forever in cc-worker HOT 1
- cloud_controller cpu and memory use increases over time, even with no traffic. HOT 24
- cloud_controller_worker pre-backup-lock hangs when there are 10 or more cloud_controller_workers HOT 1
- slow /v3/roles call. HOT 9
- Increased `cf push` times + CF API unavailability since CAPI 1.132.0
- App env vars and service binding changes aren't reverted for failed rolling deployment HOT 1
- Emit audit.app.process.ready and audit.app.process.not-ready events HOT 1
- Healthcheck script exits when nginx is not available resulting deadlock HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from capi-release.