GithubHelp home page GithubHelp logo

Comments (24)

trastle avatar trastle commented on July 21, 2024

This guy seems to have had a similar error in the past:
https://gist.github.com/grenzr/6068943

from cf-services-contrib-release.

andypiper avatar andypiper commented on July 21, 2024

that guy is Ryan, and he's London-based too :-)

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Thanks @andypiper, I'll see if Ryan has any details.

from cf-services-contrib-release.

mhoran avatar mhoran commented on July 21, 2024

This is likely related to work the services team is doing which will
ultimately make provision and bind requests asynchronous. However, at the
moment these requests are synchronous and subject to the ELB timeout.

/cc @matt-royal

that guy is Ryan, and he's London-based too :-)


Reply to this email directly or view it on
GitHubhttps://github.com//issues/93#issuecomment-26412705
.

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Thanks @mhoran is there any way I can test if that the ELB timeout is causing the error?

I've upped the timeouts in my service broker, however from the logs it looks as it something on the Node is failing quite quickly however the error is only sent back to the client after the timeout is exceeded.

I cannot see any real activity on the RabbitMQ node during service creation. I am not certain of the expected activity on the Node and how best to monitor for it.

from cf-services-contrib-release.

mhoran avatar mhoran commented on July 21, 2024

It does indeed look like the failure is at the node level. I'm unfamiliar
with the Rabbit node, but if requests are getting routed through the load
balancer, that could be to blame.

Raising the ELB timeout requires an Amazon support ticket to be opened. I
assumed you're using Amazon. If that's not the case then it's likely
something else.
On Oct 16, 2013 8:56 AM, "trastle" [email protected] wrote:

Thanks @mhoran https://github.com/mhoran is there any way I can test if
that the ELB timeout is causing the error?

I've upped the timeouts in my service broker, however from the logs it
looks as it something on the Node is failing quite quickly however the
error is only sent back to the client after the timeout is exceeded.

I cannot see any real activity on the RabbitMQ node during service
creation. I am not certain of the expected activity on the Node and how
best to monitor for it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/93#issuecomment-26415063
.

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

@mhoran I am using VSphere. Sorry should have said :-)

from cf-services-contrib-release.

grenzr avatar grenzr commented on July 21, 2024

decloaks

Hello there.. Yeah, had that issue some months ago, although can't quite remember what I did to fix it.
Let me have a look at my current installation and see what differs between our configs.

Meanwhile, have you looked at the actual Rabbit service logs, not the warden container logs, but the actual service instance itself that runs inside that container?

Ryan

from cf-services-contrib-release.

drnic avatar drnic commented on July 21, 2024

If you are running the syslog_aggregator, you can ssh into it and find all
the aggregated logs in /var/vcap/store/log

from cf-services-contrib-release.

grenzr avatar grenzr commented on July 21, 2024

i dont have my services logs being syslog_aggregated yet.. :P i just bosh ssh'd into my node_rabbit and looked in /var/vcap/store/rabbit/instances

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Thanks @drnic and @grenzr
I have nothing in my instances, so the instance logs might be a challenge. As best I can tell none are being started.

$ hammer@cloudhammer [rdg-2-bxb] cf-install $ bosh ssh rabbit_service_node 0
bosh_mvjls4kx2@e7f511a4-5e93-490e-8ec5-577fdcc930c2:~$ cd /var/vcap/store/rabbit/instances
bosh_mvjls4kx2@e7f511a4-5e93-490e-8ec5-577fdcc930c2:/var/vcap/store/rabbit/instances$ ls -la
total 8
drwxr-xr-x 2 root root 4096 2013-10-16 12:48 .
drwxr-xr-x 4 root root 4096 2013-10-16 12:39 ..

Additionally there is nothing in the instance logs.

$ hammer@cloudhammer [rdg-2-bxb] cf-install $ bosh ssh rabbit_service_node 0
bosh_mvjls4kx2@e7f511a4-5e93-490e-8ec5-577fdcc930c2:~$ cd /var/vcap/sys/service-log/rabbit
root@e7f511a4-5e93-490e-8ec5-577fdcc930c2:/var/vcap/sys/service-log/rabbit# ls -la
total 8
drwxr-xr-x 2 root root 4096 2013-10-16 16:29 .
drwxr-xr-x 3 root root 4096 2013-10-16 12:39 ..

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

QQ:

How do I manually trigger the action performed against the node when I call create-service? I'd like to try running it manually from the terminal.

from cf-services-contrib-release.

drnic avatar drnic commented on July 21, 2024

Try "cf turnnel"

On Wed, Oct 16, 2013 at 9:47 AM, trastle [email protected] wrote:

QQ:

How do I manually trigger the action performed against the node when I
call create-service? I'd like to try running it manually from the terminal.


Reply to this email directly or view it on GitHubhttps://github.com//issues/93#issuecomment-26436048
.

Dr Nic Williams
Stark & Wayne LLC - consultancy for Cloud Foundry users
http://drnicwilliams.com
http://starkandwayne.com
cell +1 (415) 860-2185
twitter @drnic

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

@drnic that is pretty cool, did not know cf tunnel existed.
Unfortunately as I cannot provision the rabbit service I cannot tunnel into it.

from cf-services-contrib-release.

nmaurer23 avatar nmaurer23 commented on July 21, 2024

@trastle do you've tried to increase both timeouts?
Node: properties.rabbit_node.service_start_timeout
Gateway: properties.rabbit_gateway.node_timeout
This should also allows you to debug the problem further because the warden container isn't deleted directly.

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Thanks @nmaurer23 once I increase both the timeouts on the gateway plus the one on the node (I'd missed the one on the node). I end up with a slightly different error:

CFoundry::BadResponse: 502: 502 Bad Gateway: Registered endpoint failed to handle the request.

Progress!

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Huzzah! With long timeouts comes a better ability to debug.
Once I am over on the Rabbit node I can see that erlang is unavailable to instance of the rabbit service running inside warden.

bosh_3obcfjz1x@478328f9-3d8b-4712-8c28-cf1379e578b0:/var/vcap/sys/service-log/rabbit/2b373e6d-9090-44
cat rabbitmq_stderr.log
exec: 28: /var/vcap/data/packages/erlang/1.1/lib/erlang/erts-5.8.2/bin/erlexec: not found

However if I am on the Rabbit Node, the file being referenced clearly exists.

55-94d3-bce4e220a2ca$ ls -la /var/vcap/data/packages/erlang/1.1/lib/erlang/erts-5.8.2/bin/erlexec

-rwxr-xr-x 1 10003 10003 147137 2013-10-04 12:57 /var/vcap/data/packages/erlang/1.1/lib/erlang/erts-5.8.2/bin/erlexec

I am guessing that this file is not exposed to the warden container.

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

I do feel like this should have been resolved by #57

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

Sooooo.... I know this is probably the wrong thing to do but I have just copied the erlang package into the rootfs on the services node which lets Rabbit see it.

For my mind this confirms something is not being passed to Warden at startup

root@x:~# mkdir -p /var/vcap/data/packages/rootfs_lucid64/1.1/var/vcap/data/packages
root@x:/var/vcap/data/packages/rootfs_lucid64/1.1/var/vcap/data/packages# cp -R /var/vcap/data/packages/erlang .

from cf-services-contrib-release.

nmaurer23 avatar nmaurer23 commented on July 21, 2024

@trastle Could you try the following:
Remove the erlang version information in /var/vcap/data/packages/erlang/1.1/lib/erlang/bin/erl:28
The line should be:
ROOTDIR=/var/vcap/packages/erlang/lib/erlang

This solved the problem for us.

from cf-services-contrib-release.

drnic avatar drnic commented on July 21, 2024

"This solved the problem for us " - does something need to be fixed in this
bosh release so its solved for everyone?

from cf-services-contrib-release.

nmaurer23 avatar nmaurer23 commented on July 21, 2024

That's a good question :)
As we didn't had this problem in our OS installation I think that this is not cf-service-contrib related.
..
After some more debugging it finally ended up in a stemcell related difference.
Older bosh agents are using a different install target than newer bosh agents (newer stemcells).

The variable BOSH_INSTALL_TARGET used in the packaging script of erlang
is different between the stemcells. In newer stemcells it's the link target. In older
stemcells it's the real path.

from cf-services-contrib-release.

trastle avatar trastle commented on July 21, 2024

@nmaurer23 and @drnic

I'll test out the rootdir change when I get a chance. I don't have access to my dev environment for a few days as I'm out of the office.

For the record I am using the crazy old 0.8.0 'public' stemcell for this build. Additionally I am using a pretty old build of BOSH.

Discovered the new BOSH release process on Monday this week, we were tracking the old releases.
It is VERY likely if this is tied to old stemcells then that is what I am hitting.
I'll be able to test this with a newer BOSH once I've upgraded my dev environment.

from cf-services-contrib-release.

nmaurer23 avatar nmaurer23 commented on July 21, 2024

This is the related git commit
cloudfoundry/bosh@a5c0bdd

from cf-services-contrib-release.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.