zk-ruby / zk Goto Github PK
View Code? Open in Web Editor NEWA High-Level wrapper for Apache's Zookeeper
License: MIT License
A High-Level wrapper for Apache's Zookeeper
License: MIT License
for the concurrency-enthusiasts
set('/foo/bar', 'the new data', :check => 'the current data')
# which would do
data, stat = zk.get('/foo/bar')
if data == opts[:check]
zk.set('/foo/bar', 'the new data', :version => stat.version)
else
raise CheckAndSetAssertionFailed, "You think it's '#{opts[:check]}' but it's '#{data}'"
# yes, that's a reference to The Wire
end
with some number of :retries
perhaps
A good amount of our functionality is broken out into modules mixed into ZK::Client::Base, provide pointers to those methods in the docs so people can find them
In this comment from locker.rb:
# NOTE: These locks are _not_ safe for use across threads. If you want to use
# the same Locker class between threads, it is your responsibility to
# synchronize operations.
Should that say "Locker instance" instead?
I've tried finding thread-safety problems on the Locker class level but can't find any. I believe there are instance-level thread-safety problems, but nothing that I see would prevent me from using (e.g.) Locker.shared_locker in two threads simultaneously.
2012-07-25 13:09:36 _eric slyphon: so I can recreate the issue with the continuation error
2012-07-25 13:09:44 slyphon oh?
2012-07-25 13:10:21 _eric 20.times { Thread.new { sleep rand; zk.get('/', :watch => true) rescue pp [ $!, $!.backtrace ] } }
2012-07-25 13:10:26 _eric run that after you've done a
2012-07-25 13:10:34 _eric kill -STOP <zookeeper pid>
2012-07-25 13:16:18 _eric it only took a few minutes of doing -STOP -CONT
2012-07-25 13:16:20 _eric to get it to work
2012-07-25 13:16:30 _eric but the trick is you have to be calling get with :watch => true or it won't exhibit it
2012-07-25 13:16:35 _eric without that, it seems to act fine
2012-07-25 15:13:04 slyphon _eric: so, the -STOP -CONT needs to be timed properly?
2012-07-25 15:13:20 slyphon or it can just be -STOP then call with :watch a bunch of times
2012-07-25 15:15:04 _eric just do kill -STOP
2012-07-25 15:15:06 _eric do the watches
2012-07-25 15:15:08 _eric wait until they timeout
2012-07-25 15:15:10 _eric then do the -CONT
2012-07-25 15:15:22 slyphon kk
It would be nice to be able to check if someone has a given lock already or not.
I have a scheduler task that queues work if it is not already in the process of being worked on, so it would be helpful to know if someone has the lock or not.
ur abstractions, they're leaking...
Calling close!
from the event thread is asking for a deadlock. Help to warn people of the danger.
I'm testing a 3 node zookeeper setup.
I have a script with some watches set. I want to exit if any errors happen that require the connection be re-initialized.
to do this I have the following:
@zk.on_expired_session {
exit! 1
}
If the client hangs (I can test with SIGSTOP) then this callback gets called and life is good.
If I stop 2 of the 3 servers, my connection goes into state "closed", then when I bring them back up it goes back into state "connected".
Unfortunately at this point it seems like all of my watches are gone. This means that a temporary failure of the zookeeper cluster can break anything that's watching for changes.
Is this expected? What other states should I be looking for to ensure that my watches never disappear? There doesn't seem to be a "closed" handler.
While installng the zk gem the slyphon-zookeeper-0.2.9 zkc build fails due to https://issues.apache.org/jira/browse/ZOOKEEPER-1117
I managed to use zkc-3.3.4 (https://github.com/narkisr/zookeeper/blob/zkc-342/ext/zkc-3.3.4.tar.gz) and install it successfully.
While debugging the lockup issue on forked processes, I tried cycling the connection within the forked process to see if that fixed the issue. When calling:
client.close!
client.reopen
The reopen() call waits for a couple seconds and then returns :closed as the connection status. However, just instantiating a new client works immediately and without issue.
Hi there,
I just started using zk (jruby-1.6.7), current version of your zookeeper, current version of zookeeper server, macbook and am experiencing connection losses when sending commands like exists? or create directly after the connection was established.
This is my code:
@zk = ZK.new("localhost:2181", :watcher => :default)
@zk.create('/foo', '', :mode => :persistent)
This is the server log:
2012-03-28 12:48:55,528 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /0:0:0:0:0:0:0:1:60876
2012-03-28 12:48:55,551 - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@735] - Session establishment request from client /0:0:0:0:0:0:0:1:60876 client's lastZxid is 0x0
2012-03-28 12:48:55,551 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@777] - Client attempting to establish new session at /0:0:0:0:0:0:0:1:60876
2012-03-28 12:48:55,558 - INFO [SyncThread:0:FileTxnLog@199] - Creating new log file: log.1
2012-03-28 12:48:55,561 - DEBUG [SyncThread:0:FinalRequestProcessor@79] - Processing request:: sessionid:0x13658ebed630000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
2012-03-28 12:48:55,576 - DEBUG [SyncThread:0:FinalRequestProcessor@151] - sessionid:0x13658ebed630000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
2012-03-28 12:48:55,580 - INFO [SyncThread:0:NIOServerCnxn@1580] - Established session 0x13658ebed630000 with negotiated timeout 10000 for client /0:0:0:0:0:0:0:1:60876
2012-03-28 12:49:06,001 - INFO [SessionTracker:ZooKeeperServer@316] - Expiring session 0x13658ebed630000, timeout of 10000ms exceeded
2012-03-28 12:49:06,002 - INFO [ProcessThread:-1:PrepRequestProcessor@399] - Processed session termination for sessionid: 0x13658ebed630000
2012-03-28 12:49:06,003 - DEBUG [SyncThread:0:FinalRequestProcessor@79] - Processing request:: sessionid:0x13658ebed630000 type:closeSession cxid:0x0 zxid:0x2 txntype:-11 reqpath:n/a
2012-03-28 12:49:06,005 - INFO [SyncThread:0:NIOServerCnxn@1435] - Closed socket connection for client /0:0:0:0:0:0:0:1:60876 which had sessionid 0x13658ebed630000
the exception is being thrown before the server reports the timeout.
the actual "create" call does not seem to arrive at zookeeper.
leaving :watcher => :default out changes nothing.
any ideas?
for when you just want the node to exist with the data
create('/foo/bar', 'thedata', :or => :set)
# and for symmetry
set('/foo/bar', 'thedata', :or => :create)
In trying to watch for changes to children of a znode, I found that it is difficult to keep track of which children have been register()'d for and which haven't.
It would be nice to be able to provide either a glob or a regex to register() that could be registered once and fire for all of the children.
If the session is lost while waiting for a node to be deleted, the calling thread will never wake up
This may affect more than just chrooted clients, but anyway- here's the procedure and the error I got. Coming from an already running zookeeper server and adding a chroot namespace (thanks for that last chroot fix, btw), I tried the basic locking test again:
client.with_lock("lockname") do
puts "Something..."
end
and received the following error:
NameError: uninitialized constant ZK::Client::Unixisms::KeeperException
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:28:in rescue in mkdir_p' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:22:in
mkdir_p'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:31:in rescue in mkdir_p' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:22:in
mkdir_p'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:111:in create_root_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:121:in
rescue in create_lock_path!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:117:in create_lock_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:210:in
lock!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:55:in `with_lock'
I then tried ran client.mkdir_p("/"), followed by the same lock test as above and it completed successfully. In summary, I think the client in a chroot-namespaced connection should mkdir_p("/") automatically, immediately on connect- but you should also look into that Uninitialized Constant problem.
I just noticed this and thought it was an interesting idea: redis/redis-rb#226
implement support for
zk.create('/foo', :ephemeral => true)
# instead of requiring
zk.create('/foo', '', :ephemeral => true)
Hi,
This is in reference to issue #40, where you added the very useful functionality of being able to specify a timeout when waiting for locks.
I've just started testing it out, and there seems to be one problem with it: if you don't succeed in getting a lock (i.e. you receive a LockWaitTimeoutError), the lock can forevermore no longer be acquired by any other process or by this process, until this process is killed. So deadlock.
What seems to happen is that the little ephemeral lock attempt node seems to stick around in the lock's parent node, even though the process gave up on the attempt. Would it be possible to clean this node up when we time out? It would be similar to how it gets cleaned up if we don't block waiting, and the lock attempt is unsuccessful.
(The problem is actually quite bad, because if there's contention, everybody is timing out and leaving their little ephemeral lock attempts around, and even if you aggressively kill the processes if they fail to acquire the locks, there's a bit of delay there, so you pretty much get livelock rather than deadlock.)
Thanks!
John
this gizmo looks like it could be useful.
There is some interaction between calling #reopen()
and having callbacks not fire.
I've registered for an on_connected
callback to re-register watches, but after calling reopen()
, they quit firing.
Right now it's possible for a locker to unlock a lock it didn't actually create:
The solution we've come up with is to track the ctime
of the parent and only delete the lock if the ctime
of the parent matches what we thought it was.
The reason why this works is if the parent hasn't been deleted, the sequence numbers of the lock files will always be increasing.
From http://zookeeper.apache.org/doc/trunk/recipes.html
Distributed systems use barriers to block processing of a set of nodes until a condition is met at which time all the nodes are allowed to proceed. Barriers are implemented in ZooKeeper by designating a barrier node. The barrier is in place if the barrier node exists. Here's the pseudo code:
- Client calls the ZooKeeper API's exists() function on the barrier node, with watch set to true.
- If exists() returns false, the barrier is gone and the client proceeds
- Else, if exists() returns true, the clients wait for a watch event from ZooKeeper for the barrier node.
- When the watch event is triggered, the client reissues the exists( ) call, again waiting until the barrier node is removed.
I plan on implementing this at some point if no one else does first...
I haven't decided if it makes sense to be in this gem or in a recipes gem. It seems reasonable it should be in the same place the locker is...
I was finally able to reproduce the issue that we were looking at in IRC on Friday. Here's an isolated test case:
zk = ZK.new('..')
l = zk.locker('foo')
l.lock!(true)
zk = ZK.new('..')
l = zk.locker('foo')
l.lock!(true)
Now press Ctrl-C to simulate an exception/error happening while in the blocking #lock! call. Note that in my scenario I was producing this error condition by killing a couple ZK servers.
Now, when you do:
zk.children(l.root_locker_path)
You'll notice that there are some entries there that never get cleaned up.
I would love for this to get cleaned up somehow if possible, because in the case of redis_failover, it ends up producing a situation where the non-master node managers never get a chance to become a real master since they always hang on their next #lock!(true) call. They hang because their previous #lock!(true) call failed and their old ephemeral sequential znode never got deleted.
As a workaround, I'm doing the following hack in redis_failover (master):
# we manually attempt to delete the lock path before
# acquiring the lock, since currently the lock doesn't
# get cleaned up if there is a connection error while
# the client was previously blocked in the #lock! call.
if path = @zk_lock.lock_path
@zk.delete(path, :ignore => :no_node)
end
@zk_lock.lock!(true)
mentioned in #8 by @ryanlecompte
Need to write specs to check the behavior.
From http://zookeeper.apache.org/doc/trunk/recipes.html
Double barriers enable clients to synchronize the beginning and the end of a computation. When enough processes have joined the barrier, processes start their computation and leave the barrier once they have finished. This recipe shows how to use a ZooKeeper node as a barrier.
The pseudo code in this recipe represents the barrier node as b. Every client process p registers with the barrier node on entry and unregisters when it is ready to leave. A node registers with the barrier node via the Enter procedure below, it waits until x client process register before proceeding with the computation. (The x here is up to you to determine for your system.)
Enter
- Create a name
n = b+"/"+p
- Set watch:
exists(b + "/ready", true)
- Create child:
create( n, EPHEMERAL)
L = getChildren(b, false)
- if fewer children in L than x, wait for watch event
- else
create(b + "/ready", REGULAR)
Leave
L = getChildren(b, false)
- if no children, exit
- if p is only process node in L,
delete(n)
and exit- if p is the lowest process node in L, wait on highest process node in P
- else
delete(n)
if still exists and wait on lowest process node in L- goto 1
Like #42, it seems like this should go in the same gem as the locker.
I build a celluloid application running in JRuby utilizing the ZK gem with 5 pooled connections using mperham's connection_pool gem.
I'm using these connections within 5 celluloid actor threads that act as workers which process data.
Each worker uses 2 nested with_lock
blocks with the same zookeeper connection from the above mentioned connection pool to lock its workload from being done simultaneously by another thread.
Everything seems to be working fine so far regarding my other zookeeper operations (task management) but I experienced one odd thing looking through the debug log that I wanted to share/discuss here:
Every now and then, I see the output of ZK::EventHandlerSubscription::Base
like I attached below. I noticed that the @callbacks
hash is growing with empty arrays over time after serving a few requests. This shouldn't be the case, right? Besides It should be okay to always use different lock names?
If you need more information, let me know. I could also try to nail this in a small sample application if requested.
#<ZK::EventHandlerSubscription::Base:0x1c122c7a @interests=#<Set: {:created, :deleted, :changed, :child}>,
@path="/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003",
@mutex=#<ZK::Monitor:0x5cdcc3b8 @mon_count=0, @mon_mutex=#<Mutex:0x21b594a9>, @mon_owner=nil>,
@parent=#<ZK::EventHandler:0x329bbe66 @mutex=#<ZK::Monitor:0xef1347f @mon_count=0, @mon_mutex=#<Mutex:0x7dad8582>,
@mon_owner=nil>, @default_watcher_block=#<Proc:0xb185a44@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/event_handler.rb:250 (lambda)>,
@zk=#<ZK::Client::Threaded:5818 zk_session_id=0x13a4080816605ba ...>,
@callbacks={"state_3"=>[], :all_state_events=>[], "/_zklocking/task_lock:c6548120-121b-11e2-87cd-a5de314d1973:sum_result_0/ex0000000007"=>[],
"state_-112"=>[
#<ZK::EventHandlerSubscription::Base:0x1832f489 @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_-112",
@mutex=#<ZK::Monitor:0x51141ddf @mon_count=0, @mon_mutex=#<Mutex:0x7fb5450e>, @mon_owner=nil>,
@parent=#<ZK::EventHandler:0x329bbe66 ...>,
@callable=#<Proc:0x55c8dba2@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>],
"state_1"=>[
#<ZK::EventHandlerSubscription::Base:0x75cb94ad @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_1",
@mutex=#<ZK::Monitor:0x2c72c20d @mon_count=0, @mon_mutex=#<Mutex:0x2221fa47>, @mon_owner=nil>,
@parent=#<ZK::EventHandler:0x329bbe66 ...>,
@callable=#<Proc:0x64b65cd2@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>],
"state_0"=>[
#<ZK::EventHandlerSubscription::Base:0x8812a6 @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_0",
@mutex=#<ZK::Monitor:0x4b291058 @mon_count=0, @mon_mutex=#<Mutex:0x74f027f4>, @mon_owner=nil>, @parent=#<ZK::EventHandler:0x329bbe66 ...>,
@callable=#<Proc:0x42cf4026@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>
],
:all_node_events=>[],
"/_zklocking/task_lock:d225dd50-121b-11e2-9470-e74969dc6fb6:sum_result_0/ex0000000003"=>[],
"/_zklocking/task_calculation_lock:calculation_task_3:user_1/ex0000000000"=>[],
"/_zklocking/task_lock:dd5ca730-121b-11e2-9c91-1a29f145ecf1:mul_result_0/ex0000000005"=>[],
"/_zklocking/task_lock:df2495f0-121b-11e2-9b18-221ea8a36fe9:mul_result_0/ex0000000002"=>[],
"/_zklocking/task_lock:df643700-121b-11e2-92f2-37bbec17b37e:sum_result_0/ex0000000006"=>[],
"/_zklocking/task_calculation_lock:calculation_task_2:user_1/ex0000000001"=>[],
"/_zklocking/finish_lock:e148ae70-121b-11e2-999d-40dd7a593966/ex0000000000"=>[],
"/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003"=>[#<ZK::EventHandlerSubscription::Base:0x1c122c7a ...>]
},
@orig_pid=63595,
@state=:running,
@thread_opt=:single,
@outstanding_watches={
:data=>#<Set: {"/_zklocking/task_lock:dd5ca730-121b-11e2-9c91-1a29f145ecf1:mul_result_0/ex0000000005"}>,
:child=>#<Set: {}>}
>,
@callable=#<Proc:0x6231b90d@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:183 (lambda)>
> with [#<Zookeeper::Callbacks::WatcherCallback:0x5a4e3fa1 @context=nil, @path="/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003", @state=3, @completed=true, @proc=#<Proc:0x47339158@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zookeeper-1.3.0-java/lib/zookeeper/callbacks.rb:24>, @type=2, @zk=#<ZK::Client::Threaded:5818 zk_session_id=0x13a4080816605ba ...>>]
I was experimenting with the ZK gem and spent a fair amount of time trying to get the register callbacks to work.
In process 1
require 'zk'
zk = ZK.new
zk,create('/foo', 'bar')
sleep 30
zk.set('/foo', 'baz')
In process 2 during the 30 second sleep period
require 'zk'
zk = ZK.new
ns = zk.register('/foo') {|e| "Got event #{e.event_name} #{e.path}" }
I noticed nothing happens.
After looking more at the examples it appears I have have to do this
require 'zk'
zk = ZK.new
ns = zk.register('/foo') {|e| "Got event #{e.event_name} #{e.path}" }
zk.stat('/foo', :watch => true)
Is there a reason why register does not call stat with :watch => true automatically?
Sometimes when sessions expire, a lock's parent directory will never be cleaned up.
It would be great to have a call to cleanup all parent directories that are no longer in use.
The logic could be something like:
def cleanup_lockers(client, root_lock_node = Locker.default_root_lock_node)
client.children(root_lock_node).each do |zname|
locker = exclusive_locker(client, zname)
if locker.lock
locker.unlock
end
end
end
Rational: All we need to do is see if we can get the lock, and if we do, unlock and the unlock will delete the parent if no one else is waiting.
I haven't been able to find a description of an implementation of a semaphore that I like, so I'm going to try to work this one out.
An obvious use of a semaphore would be to restrict concurrent access to a resource (for things like preventing too many workers from hitting the same database or website).
My thoughts on the semaphore would be that it would take a name and a size. Any clients that attempt to acquire the semaphore that are more than the size limit will block until a slot is available.
I don't have a strong opinion between using acquire/release vs wait/signal for the method names.
There are two obvious ways to implement this:
Like a lock, have each client wait for the sequential znode below them and wait for the watcher to fire.
The problem with this is if you have a size of 10, and 20 clients, none of the clients above 10 will progress until the 10th one completes.
Watch for any changes to the parent znode and recalculate the clients position each time.
The downside of this is it will cause all clients to be awoken when any client releases. In reality, if there is not a large number of clients waiting, it should provide for a better experience.
I'm hoping there's another obvious solution I'm missing here.
After reading the code to the netflix curator recipe for a semaphore, it looks like they opted for approach 1, but didn't specifically call out the behavior: https://github.com/Netflix/curator/wiki/Shared-Semaphore
Is there any way to set a timeout (per-lock or even globally) on how long a client should block waiting to acquire a lock?
From what I can tell I don't think there is, but I figured I'd ask.
Thanks!
Recently we've run into an issue where a process using zk and redis_failover sometimes fails to die on exit, but instead hangs in an infinite sched_yield() loop taking all CPU. A SIGKILL is required to get rid of it. I guess it has to do with the program exiting without properly closing the connection first, but i guess it should still die cleanly.
gdb output gives this:
#0 0x00007fb7e6765a67 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fb7e6bd86f7 in gvl_yield (vm=0x1bfdf20, th=<optimized out>) at thread_pthread.c:125
#2 0x00007fb7e6bdac16 in rb_thread_schedule_limits (limits_us=0) at thread.c:1025
#3 rb_thread_schedule_limits (limits_us=0) at thread.c:1033
#4 rb_thread_schedule () at thread.c:1035
#5 0x00007fb7e6bdad5f in rb_thread_terminate_all () at thread.c:375
#6 0x00007fb7e6abf89e in ruby_cleanup (ex=0) at eval.c:140
#7 0x00007fb7e6abfa25 in ruby_run_node (n=0x24f0428) at eval.c:244
#8 0x00000000004007fb in main (argc=3, argv=0x7fff7725e948) at main.c:38
After adding some debug-code to ruby side to get a backtrace when the process is hung, I was able to get this:
Thread TID-t26i0
ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
shared/bundle/ruby/1.9.1/gems/zk-1.7.1/lib/zk/threadpool.rb:268:in `worker_thread_body'
When trying to reproduce it without redis_failover i was able to get it hang in a similar way, but in a different place:
Thread TID-ccaag
ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common/queue_with_pipe.rb:59:in `pop'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common.rb:56:in `get_next_event'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common.rb:94:in `dispatch_thread_body'
and
Thread TID-alg44
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:110:in `wait'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:122:in `wait_while'
shared/bundle/ruby/1.9.1/gems/zk-1.7.1/lib/zk/client/threaded.rb:533:in `block in reconnect_thread_body'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
Code in all is somewhat similar (@cond.wait).
Any ideas? Ruby is 1.9.3-p194, but it also happens at least on 1.9.3-p0. ZK 1.7.1, zookeeper 1.3.0. OS: linux ubuntu 12.04.
At least in ruby 1.9.3 you can give a timeout to ConditionVariable's wait(), maybe that would help.
With this I was able to get it hang in a similar way quite often:
#!/usr/bin/env ruby
require 'rubygems'
require 'zk'
$stdout.sync = true
@zkservers = "localhost:2181"
trap 'TTIN' do
Thread.list.each do |thread|
puts "Thread TID-#{thread.object_id.to_s(36)}"
puts thread.backtrace.join("\n")
end
end
def do_something
zk = ZK.new(@zkservers)
puts zk.children('/').inspect
sleep 1
end
puts "Pid: #{$$}"
count = 50
stack = []
(0..count).each do |i|
stack << Thread.new { do_something }
end
sleep rand(0)
Running it in while true; do ./test.rb; done loop until it gets stuck and then kill -TTIN prints the backtraces of the still alive threads.
It would be nice to have something equivalent to redis's set()
command that just creates a node if it doesn't exist or sets it if it does.
Also, it seems like optionally making it do an mkdir_p()
if needed would be a nice addition as well.
When running zk.mkdir_p("/foo/bar") in an empty zookeeper tree an NoNode-exception is thrown, but it's not caught in the handler, so no recursion is performed. It is possible to use it to create a single new level (mkdir_p("/foo")), mimicing the behavior of create.
This is on jruby 1.7.0-preview1 and 2.
mentioned by @ryanlecompte #8 (comment)
people are starting to use ZK, let's be professional
checked in stuff referring to this issue, but goofed, supposed to be #38
Recent zk/zookeper seems to pull in a version of backports which causes a bunch of already initialized constant warnings. Any way to prevent this? I'm running on mri 1.9.3
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrDimensionMismatch
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrNotRegular
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrOperationNotDefined
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrOperationNotImplemented
/Users/mconway/.rvm/gems/ruby-1.9.3-p125-falcon/gems/backports-2.5.3/lib/backports/1.9.2/stdlib/matrix.rb:516: warning: already initialized constant SELECTORS
When using ZK.with_lock
, it should yield the created lock to the block, so that any block code can query about the state of the lock.
Example:
ZK.with_lock do |lock|
# etc etc
lock.assert!
puts "We lost the lock" unless lock.locked?
end
I have been seeing SessionExpired exceptions when performing get operations. What should I be doing in these cases (or to prevent them from happening in the first place)?
When constructing a client with a chroot namespace (specifically a ThreadedClient), calling:
client.with_lock("lock_name") do
... (anything) ...
end
will acquire the lock, execute the code block, and then fail to release it with the following error:
removing lock path /chroot_namespace/_zklocking/lock_name/ex0000000000
ZK::Exceptions::NoNode: inputs: {:path=>"/chroot_namespace/_zklocking/lock_name/ex0000000000", :version=>-1}
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/base.rb:669:in check_rc' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/base.rb:552:in
delete'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:128:in cleanup_lock_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:78:in
unlock!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:59:in with_lock' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/conveniences.rb:97:in
with_lock'
from (irb):1
@tobowers @eric @ryanlecompte if you guys have a moment, I've got this idea for increasing the parallelism of applications built on ZK (i.e. get rid of the "single event thread") while at the same time keeping execution somewhat sane (multiple threads not calling registered watchers simultaneously). I think this is a good compromise, and better than just "upping the number of threads in the threadpool."
From the code:
# Stealing some ideas from Celluloid, this event handler subscription
# (basically, the wrapper around the user block), will spin up its own
# thread for delivery, and use a queue. This gives us the basis for better
# concurrency (event handlers run in parallel), but preserves the
# underlying behavior that a single-event-thread ZK gives us, which is that
# a single callback block is inherently serial. Without this, you have to
# make sure your callbacks are either synchronized, or totally reentrant,
# so that multiple threads could be calling your block safely (which is
# really difficult, and annoying).
#
# Using this delivery mechanism means that the block still must not block
# forever, however each event will "wait its turn" and all callbacks will
# receive their events in the same order (which is what ZooKeeper
# guarantees), just perhaps at different times.
The downside is that for fast-lived subscriptions (like in the tests, or maybe for heavy use of locking), having a thread-per-subscription is heavyweight (the tests run about 5x slower).
I'm thinking that this would be a good optional feature that people could pick, perhaps at registration time (right now it's a per-connection option, but I'm going to change that).
If you get a chance, please have a look at 928e23e and let me know what you think, I'd really appreciate the feedback.
What is the correct thing to do when forking?
Call close!
before forking and reopen
after?
The event handler thread is owned by ZK (actually by zookeeper), users of ZK should be aware of the implications of running code on the event thread (and how to avoid this).
Have to report this issue here, since the slyphon-zookeeper gem doesn't have an issues section. Anything that uses slyphon-zookeeper v 0.8.3, and therefore recent installations of ZK throw this error as soon as you try to "require" them. eg:
baconator@dev:~$ irb
1.9.3-p125 :001 > require 'rubygems'
=> false
1.9.3-p125 :002 > require 'zookeeper'
NameError: uninitialized constant ZookeeperCommon::QueueWithPipe::Forwardable
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:4:in `<class:QueueWithPipe>'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:3:in `<module:ZookeeperCommon>'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:1:in `<top (required)>'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common.rb:168:in `<top (required)>'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper.rb:5:in `<top (required)>'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:59:in `require'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:59:in `rescue in require'
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:35:in `require'
from (irb):2
from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
1.9.3-p125 :003 >
there are situations when you want to know when the session has changed in any way, not just for specific events
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.