Comments (11)
thanks for reporting this. I'll work on it immediately.
Original comment by [email protected]
on 11 Jun 2012 at 6:24
- Changed state: Started
from protobuf-rpc-pro.
fix released with 1.2.2
Original comment by [email protected]
on 11 Jun 2012 at 8:10
- Changed state: Fixed
from protobuf-rpc-pro.
Thank you for fixing this so quickly!!!
Original comment by [email protected]
on 12 Jun 2012 at 6:55
from protobuf-rpc-pro.
Hi again,
I seem to be running into another problem concerning the wait-interrupt part of
RpcClient.callBlockingMethod. This seems to go into a busy loop of some sort
taking all CPU.
I'm prototyping a distributed system with many server nodes that are called
from one client and I'm testing how to recover from random server or client
crashes and restarts over a not necessarily reliable network. This latest
problem seems to occur when the client crashes and tries to re-open a
connection using a client port that is already open on the server side. Or at
least I get a java.io.IOException: DuplexTcpServer CONNECT_RESPONSE indicated
error ALREADY_CONNECTED. I've attached a screenshot of the callstack for two
threads, although this is probably not very helpful. I can try to debug this
further, can you tell me where the thread waiting in callBlockingMethod is
interrupted from?
Br,
-M-
Original comment by [email protected]
on 12 Jun 2012 at 12:56
Attachments:
from protobuf-rpc-pro.
The ALREADY_CONNECTED issue can be a situation encountered under load, where
the server has yet to fully complete "discarding" a client which has crashed -
for whatever reason. It could be that the TCP stack of the server has not
realized the disconnect yet since nothing has been sent to the client, before
the client reconnects. The idea would be that a client would have to retry to
connect with a smallish sleep time between retries - and eventually it should
cleanly reconnect. I preferred this approach than the alternative, more
dangerous ( denial-of-service ) , which would be to kick out the existing
connected client if it presents with the same identity. If you want to avoid
the identity problem a bit , you could pass the <processId> as part of the
clients identity, which should be unique still after a crash. (
http://stackoverflow.com/questions/35842/process-id-in-java )
IF the client still cannot reconnect after say 90s ( usual TCP stack Operating
system parameter for socket close wait or something like this ) - then it's
probably because of a bug. In this case a stack trace of the server side would
be great.
About the wait-interrupt part of the RpcClient.callBlockingMethod. I'm looking
at this code and think it's not great, I havent been able to reproduce your
problem. Its not clear to me if its the server or the client which ends up in
the tight loop? If you could send me the stacktrace of the affected JVM i'd
appreciate it. send to [email protected] if you have privacy converns about
attaching to the bug comments.
Thanks and i hope to resove this soon. P.eter.
Original comment by [email protected]
on 12 Jun 2012 at 6:30
- Changed state: Started
from protobuf-rpc-pro.
looking at your VisualVM CPU chart, does not say to me that the
callBlockingMethod is caught in a tight CPU loop - but rather that the methods
are simply waiting about 1s per call for the remote answer to come back. Just
like the nio "select" which is waiting all the time and indicating 100% cpu
use, its deceiving - but to be honest i dont think there a problem here.
If you would use the non blocking call variant, you wouldnt see the "high" CPU
as in the picture, but your answers from the "listNodeContent" wouldnt get back
any faster from the remote side.
Original comment by [email protected]
on 12 Jun 2012 at 7:32
- Changed state: Fixed
from protobuf-rpc-pro.
Hi,
Thank you for your answers. I'll explain my system a bit more: The system is
not really under heavy load, and when I crash the client there's more than 30
sec before I restart it, and I still get the ALREADY_CONNECTED. But this might
be due to the TCP stack as you explained, I'll look into this in more detail
tomorrow.
As to your second comment, I get the busy loop on the client side and it never
gets out, i.e. nothing happens on the server side and it seems the client gets
interrupted every second and then goes back to sleep. Is there a way of
breaking out of this in my code? To set some kind of delay for missing answers?
Anyway, I'll try the non-blocking variant to see if it changes anything.
Br,
-M-
Original comment by [email protected]
on 12 Jun 2012 at 7:58
from protobuf-rpc-pro.
you're right, the more i look at the picture the more confused i get. If there
were only 2 calls but 169 calls of wait/interrupt theres something very wrong.
I dont think there is much point in looping around when the threads been
interrupted - so i intend to make a fix to at least log the interrupted
exception and exit the loop .
Original comment by [email protected]
on 12 Jun 2012 at 8:05
- Changed state: Started
from protobuf-rpc-pro.
so i made a new release 1.2.3 which should fix the wait/notify issue.
Br, Peter.
Original comment by [email protected]
on 12 Jun 2012 at 9:45
- Changed state: Fixed
from protobuf-rpc-pro.
Thanks! As far as I've been able to test this today, it seems to work fine.
I'll let you know if I run into new problems.
Br,
-M-
Original comment by [email protected]
on 13 Jun 2012 at 1:23
from protobuf-rpc-pro.
Original comment by [email protected]
on 18 Nov 2012 at 6:49
- Changed state: Done
from protobuf-rpc-pro.
Related Issues (20)
- async call's timeout not really work HOT 3
- Support Java 6 HOT 3
- RpcServiceRegistry cannot remove BlockingService HOT 1
- Protocol Buffers 2.6 HOT 7
- Disabling logging HOT 3
- Python rpc support HOT 1
- Upgrade to protobuf-java 3.0.0 ( after Netty ) HOT 3
- maven not building for project HOT 3
- protoc plugin that is being used HOT 4
- Using Websockets HOT 1
- CleanShutdownHandler to shutdown RpcClientConnectionWatchdog
- RpcClientChannel support "attributes" and isClosed method.
- CleanshutdownHandler leaves Thread prohibiting JVM shutdown HOT 1
- Allow RpcClientChannel attributes on client peerWith HOT 1
- CleanShutdownHandler leaves shutdown hook registered when shutdown explicitly HOT 1
- Is this project stable enough to use in production environment? HOT 2
- WatchdogThread is not renamed to a readable thread name HOT 2
- StackOverflowError DuplexTcpClientPipelineFactory.peerWith HOT 4
- C++ impl? HOT 1
- Compile Fail by Not Found AsyncThreadPoolCallExecutor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from protobuf-rpc-pro.