<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Bug in RpcClient.handleClosure(),about pjklauser/protobuf-rpc-pro

Comments (11)

GoogleCodeExporter commented on July 18, 2024

thanks for reporting this. I'll work on it immediately.

Original comment by [email protected] on 11 Jun 2012 at 6:24

Changed state: Started

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

fix released with 1.2.2

Original comment by [email protected] on 11 Jun 2012 at 8:10

Changed state: Fixed

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

Thank you for fixing this so quickly!!!

Original comment by [email protected] on 12 Jun 2012 at 6:55

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

Hi again,

I seem to be running into another problem concerning the wait-interrupt part of 
RpcClient.callBlockingMethod. This seems to go into a busy loop of some sort 
taking all CPU. 

I'm prototyping a distributed system with many server nodes that are called 
from one client and I'm testing how to recover from random server or client 
crashes and restarts over a not necessarily reliable network. This latest 
problem seems to occur when the client crashes and tries to re-open a 
connection using a client port that is already open on the server side. Or at 
least I get a java.io.IOException: DuplexTcpServer CONNECT_RESPONSE indicated 
error ALREADY_CONNECTED. I've attached a screenshot of the callstack for two 
threads, although this is probably not very helpful. I can try to debug this 
further, can you tell me where the thread waiting in callBlockingMethod is 
interrupted from?

Br,
-M-

Original comment by [email protected] on 12 Jun 2012 at 12:56

Attachments:

12.6.png

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

The ALREADY_CONNECTED issue can be a situation encountered under load, where 
the server has yet to fully complete "discarding" a client which has crashed - 
for whatever reason. It could be that the TCP stack of the server has not 
realized the disconnect yet since nothing has been sent to the client, before 
the client reconnects. The idea would be that a client would have to retry to 
connect with a smallish sleep time between retries - and eventually it should 
cleanly reconnect. I preferred this approach than the alternative, more 
dangerous ( denial-of-service ) , which would be to kick out the existing 
connected client if it presents with the same identity. If you want to avoid 
the identity problem a bit , you could pass the <processId> as part of the 
clients identity, which should be unique still after a crash. ( 
http://stackoverflow.com/questions/35842/process-id-in-java )

IF the client still cannot reconnect after say 90s ( usual TCP stack Operating 
system parameter for socket close wait or something like this ) - then it's 
probably because of a bug. In this case a stack trace of the server side would 
be great.

About the wait-interrupt part of the RpcClient.callBlockingMethod. I'm looking 
at this code and think it's not great, I havent been able to reproduce your 
problem. Its not clear to me if its the server or the client which ends up in 
the tight loop? If you could send me the stacktrace of the affected JVM i'd 
appreciate it. send to [email protected] if you have privacy converns about 
attaching to the bug comments.

Thanks and i hope to resove this soon. P.eter.

Original comment by [email protected] on 12 Jun 2012 at 6:30

Changed state: Started

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

looking at your VisualVM CPU chart, does not say to me that the 
callBlockingMethod is caught in a tight CPU loop - but rather that the methods 
are simply waiting about 1s per call for the remote answer to come back. Just 
like the nio "select" which is waiting all the time and indicating 100% cpu 
use, its deceiving - but to be honest i dont think there a problem here. 

If you would use the non blocking call variant, you wouldnt see the "high" CPU 
as in the picture, but your answers from the "listNodeContent" wouldnt get back 
any faster from the remote side.

Original comment by [email protected] on 12 Jun 2012 at 7:32

Changed state: Fixed

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

Hi,

Thank you for your answers. I'll explain my system a bit more: The system is 
not really under heavy load, and when I crash the client there's more than 30 
sec before I restart it, and I still get the ALREADY_CONNECTED. But this might 
be due to the TCP stack as you explained, I'll look into this in more detail 
tomorrow.

As to your second comment, I get the busy loop on the client side and it never 
gets out, i.e. nothing happens on the server side and it seems the client gets 
interrupted every second and then goes back to sleep. Is there a way of 
breaking out of this in my code? To set some kind of delay for missing answers? 
Anyway, I'll try the non-blocking variant to see if it changes anything.

Br,
-M-

Original comment by [email protected] on 12 Jun 2012 at 7:58

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

you're right, the more i look at the picture the more confused i get. If there 
were only 2 calls but 169 calls of wait/interrupt theres something very wrong. 
I dont think there is much point in looping around when the threads been 
interrupted - so i intend to make a fix to at least log the interrupted 
exception and exit the loop .

Original comment by [email protected] on 12 Jun 2012 at 8:05

Changed state: Started

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

so i made a new release 1.2.3 which should fix the wait/notify issue. 
Br, Peter.

Original comment by [email protected] on 12 Jun 2012 at 9:45

Changed state: Fixed

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

Thanks! As far as I've been able to test this today, it seems to work fine. 
I'll let you know if I run into new problems.
Br,
-M-

Original comment by [email protected] on 13 Jun 2012 at 1:23

from protobuf-rpc-pro.

GoogleCodeExporter commented on July 18, 2024

Original comment by [email protected] on 18 Nov 2012 at 6:49

Changed state: Done

from protobuf-rpc-pro.

Bug in RpcClient.handleClosure() about protobuf-rpc-pro HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs