GithubHelp home page GithubHelp logo

zk's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zk's Issues

implement compare-and-swap (check-and-set, whatevs)

for the concurrency-enthusiasts

set('/foo/bar', 'the new data', :check => 'the current data')

# which would do

data, stat = zk.get('/foo/bar')

if data == opts[:check]
  zk.set('/foo/bar', 'the new data', :version => stat.version)  
else
  raise CheckAndSetAssertionFailed, "You think it's '#{opts[:check]}' but it's '#{data}'"
  # yes, that's a reference to The Wire
end

with some number of :retries perhaps

Threadsafety of Locker class

In this comment from locker.rb:

  # NOTE: These locks are _not_ safe for use across threads. If you want to use
  # the same Locker class between threads, it is your responsibility to
  # synchronize operations.

Should that say "Locker instance" instead?

I've tried finding thread-safety problems on the Locker class level but can't find any. I believe there are instance-level thread-safety problems, but nothing that I see would prevent me from using (e.g.) Locker.shared_locker in two threads simultaneously.

Continuation errors when there are connectivity problems

2012-07-25 13:09:36     _eric   slyphon: so I can recreate the issue with the continuation error
2012-07-25 13:09:44     slyphon oh?
2012-07-25 13:10:21     _eric   20.times { Thread.new { sleep rand; zk.get('/', :watch => true) rescue pp [ $!, $!.backtrace ] } }
2012-07-25 13:10:26     _eric   run that after you've done a
2012-07-25 13:10:34     _eric   kill -STOP <zookeeper pid>

2012-07-25 13:16:18     _eric   it only took a few minutes of doing -STOP -CONT
2012-07-25 13:16:20     _eric   to get it to work
2012-07-25 13:16:30     _eric   but the trick is you have to be calling get with :watch => true or it won't exhibit it
2012-07-25 13:16:35     _eric   without that, it seems to act fine

2012-07-25 15:13:04     slyphon _eric: so, the -STOP -CONT needs to be timed properly?
2012-07-25 15:13:20     slyphon or it can just be -STOP then call with :watch a bunch of times
2012-07-25 15:15:04     _eric   just do kill -STOP
2012-07-25 15:15:06     _eric   do the watches
2012-07-25 15:15:08     _eric   wait until they timeout
2012-07-25 15:15:10     _eric   then do the -CONT
2012-07-25 15:15:22     slyphon kk

Provide a way to see if it's likely a lock is locked

It would be nice to be able to check if someone has a given lock already or not.

I have a scheduler task that queues work if it is not already in the process of being worked on, so it would be helpful to know if someone has the lock or not.

Entered closed and then connected state, watches lost, no on_expired_session callback

I'm testing a 3 node zookeeper setup.

I have a script with some watches set. I want to exit if any errors happen that require the connection be re-initialized.

to do this I have the following:
@zk.on_expired_session {
exit! 1
}

If the client hangs (I can test with SIGSTOP) then this callback gets called and life is good.

If I stop 2 of the 3 servers, my connection goes into state "closed", then when I bring them back up it goes back into state "connected".

Unfortunately at this point it seems like all of my watches are gone. This means that a temporary failure of the zookeeper cluster can break anything that's watching for changes.

Is this expected? What other states should I be looking for to ensure that my watches never disappear? There doesn't seem to be a "closed" handler.

client.reopen fails, client = ZK::Client.new( ) works without issue

While debugging the lockup issue on forked processes, I tried cycling the connection within the forked process to see if that fixed the issue. When calling:

client.close!
client.reopen

The reopen() call waits for a couple seconds and then returns :closed as the connection status. However, just instantiating a new client works immediately and without issue.

ZK reports lost connections

Hi there,

I just started using zk (jruby-1.6.7), current version of your zookeeper, current version of zookeeper server, macbook and am experiencing connection losses when sending commands like exists? or create directly after the connection was established.
This is my code:

@zk = ZK.new("localhost:2181", :watcher => :default)
@zk.create('/foo', '', :mode => :persistent)

This is the server log:

2012-03-28 12:48:55,528 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /0:0:0:0:0:0:0:1:60876
2012-03-28 12:48:55,551 - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@735] - Session establishment request from client /0:0:0:0:0:0:0:1:60876 client's lastZxid is 0x0
2012-03-28 12:48:55,551 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@777] - Client attempting to establish new session at /0:0:0:0:0:0:0:1:60876
2012-03-28 12:48:55,558 - INFO  [SyncThread:0:FileTxnLog@199] - Creating new log file: log.1
2012-03-28 12:48:55,561 - DEBUG [SyncThread:0:FinalRequestProcessor@79] - Processing request:: sessionid:0x13658ebed630000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
2012-03-28 12:48:55,576 - DEBUG [SyncThread:0:FinalRequestProcessor@151] - sessionid:0x13658ebed630000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
2012-03-28 12:48:55,580 - INFO  [SyncThread:0:NIOServerCnxn@1580] - Established session 0x13658ebed630000 with negotiated timeout 10000 for client /0:0:0:0:0:0:0:1:60876
2012-03-28 12:49:06,001 - INFO  [SessionTracker:ZooKeeperServer@316] - Expiring session 0x13658ebed630000, timeout of 10000ms exceeded
2012-03-28 12:49:06,002 - INFO  [ProcessThread:-1:PrepRequestProcessor@399] - Processed session termination for sessionid: 0x13658ebed630000
2012-03-28 12:49:06,003 - DEBUG [SyncThread:0:FinalRequestProcessor@79] - Processing request:: sessionid:0x13658ebed630000 type:closeSession cxid:0x0 zxid:0x2 txntype:-11 reqpath:n/a
2012-03-28 12:49:06,005 - INFO  [SyncThread:0:NIOServerCnxn@1435] - Closed socket connection for client /0:0:0:0:0:0:0:1:60876 which had sessionid 0x13658ebed630000

the exception is being thrown before the server reports the timeout.
the actual "create" call does not seem to arrive at zookeeper.

leaving :watcher => :default out changes nothing.

any ideas?

:or feature for updates

for when you just want the node to exist with the data

create('/foo/bar', 'thedata', :or => :set)

# and for symmetry

set('/foo/bar', 'thedata', :or => :create)

Provide a way to register() for globs or regex's

In trying to watch for changes to children of a znode, I found that it is difficult to keep track of which children have been register()'d for and which haven't.

It would be nice to be able to provide either a glob or a regex to register() that could be registered once and fire for all of the children.

Client has to create root node for chrooted namespace before any other operation

This may affect more than just chrooted clients, but anyway- here's the procedure and the error I got. Coming from an already running zookeeper server and adding a chroot namespace (thanks for that last chroot fix, btw), I tried the basic locking test again:

client.with_lock("lockname") do
puts "Something..."
end

and received the following error:
NameError: uninitialized constant ZK::Client::Unixisms::KeeperException
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:28:in rescue in mkdir_p' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:22:inmkdir_p'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:31:in rescue in mkdir_p' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/client/unixisms.rb:22:inmkdir_p'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:111:in create_root_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:121:inrescue in create_lock_path!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:117:in create_lock_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:210:inlock!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.7/lib/z_k/locker.rb:55:in `with_lock'

I then tried ran client.mkdir_p("/"), followed by the same lock test as above and it completed successfully. In summary, I think the client in a chroot-namespaced connection should mkdir_p("/") automatically, immediately on connect- but you should also look into that Uninitialized Constant problem.

create should be variadic

implement support for

zk.create('/foo', :ephemeral => true)

# instead of requiring

zk.create('/foo', '', :ephemeral => true)

LockWaitTimeout causes lock to be forever unusable

Hi,

This is in reference to issue #40, where you added the very useful functionality of being able to specify a timeout when waiting for locks.

I've just started testing it out, and there seems to be one problem with it: if you don't succeed in getting a lock (i.e. you receive a LockWaitTimeoutError), the lock can forevermore no longer be acquired by any other process or by this process, until this process is killed. So deadlock.

What seems to happen is that the little ephemeral lock attempt node seems to stick around in the lock's parent node, even though the process gave up on the attempt. Would it be possible to clean this node up when we time out? It would be similar to how it gets cleaned up if we don't block waiting, and the lock attempt is unsuccessful.

(The problem is actually quite bad, because if there's contention, everybody is timing out and leaving their little ephemeral lock attempts around, and even if you aggressively kill the processes if they fail to acquire the locks, there's a bit of delay there, so you pretty much get livelock rather than deadlock.)

Thanks!
John

Fix possible race condition in lockers with session expiration

Right now it's possible for a locker to unlock a lock it didn't actually create:

  • client1 grab lock lock
  • client2 try to grab lock and wait
  • client1 session expires (lock released)
  • client2 gets lock
  • client2 releases lock and deletes parent
  • client3 grabs lock and creates parent
  • client1 has reconnected and releases lock (it didn't create)

The solution we've come up with is to track the ctime of the parent and only delete the lock if the ctime of the parent matches what we thought it was.

The reason why this works is if the parent hasn't been deleted, the sequence numbers of the lock files will always be increasing.

Recipe: Barrier

From http://zookeeper.apache.org/doc/trunk/recipes.html

Distributed systems use barriers to block processing of a set of nodes until a condition is met at which time all the nodes are allowed to proceed. Barriers are implemented in ZooKeeper by designating a barrier node. The barrier is in place if the barrier node exists. Here's the pseudo code:

  1. Client calls the ZooKeeper API's exists() function on the barrier node, with watch set to true.
  2. If exists() returns false, the barrier is gone and the client proceeds
  3. Else, if exists() returns true, the clients wait for a watch event from ZooKeeper for the barrier node.
  4. When the watch event is triggered, the client reissues the exists( ) call, again waiting until the barrier node is removed.

I plan on implementing this at some point if no one else does first...

I haven't decided if it makes sense to be in this gem or in a recipes gem. It seems reasonable it should be in the same place the locker is...

Ephemeral node for exclusive lock not cleaned up when failure happens during lock acquisition

I was finally able to reproduce the issue that we were looking at in IRC on Friday. Here's an isolated test case:

In process 1:

zk = ZK.new('..')
l = zk.locker('foo')

l.lock!(true)

In process 2:

zk = ZK.new('..')
l = zk.locker('foo')

l.lock!(true)

Now press Ctrl-C to simulate an exception/error happening while in the blocking #lock! call. Note that in my scenario I was producing this error condition by killing a couple ZK servers.

Now, when you do:

zk.children(l.root_locker_path)

You'll notice that there are some entries there that never get cleaned up.

I would love for this to get cleaned up somehow if possible, because in the case of redis_failover, it ends up producing a situation where the non-master node managers never get a chance to become a real master since they always hang on their next #lock!(true) call. They hang because their previous #lock!(true) call failed and their old ephemeral sequential znode never got deleted.

As a workaround, I'm doing the following hack in redis_failover (master):

        # we manually attempt to delete the lock path before
        # acquiring the lock, since currently the lock doesn't
        # get cleaned up if there is a connection error while
        # the client was previously blocked in the #lock! call.
        if path = @zk_lock.lock_path
          @zk.delete(path, :ignore => :no_node)
        end
        @zk_lock.lock!(true)

Recipe: Double Barrier

From http://zookeeper.apache.org/doc/trunk/recipes.html

Double barriers enable clients to synchronize the beginning and the end of a computation. When enough processes have joined the barrier, processes start their computation and leave the barrier once they have finished. This recipe shows how to use a ZooKeeper node as a barrier.

The pseudo code in this recipe represents the barrier node as b. Every client process p registers with the barrier node on entry and unregisters when it is ready to leave. A node registers with the barrier node via the Enter procedure below, it waits until x client process register before proceeding with the computation. (The x here is up to you to determine for your system.)

Enter

  1. Create a name n = b+"/"+p
  2. Set watch: exists(b + "/ready", true)
  3. Create child: create( n, EPHEMERAL)
  4. L = getChildren(b, false)
  5. if fewer children in L than x, wait for watch event
  6. else create(b + "/ready", REGULAR)

Leave

  1. L = getChildren(b, false)
  2. if no children, exit
  3. if p is only process node in L, delete(n) and exit
  4. if p is the lowest process node in L, wait on highest process node in P
  5. else delete(n) if still exists and wait on lowest process node in L
  6. goto 1

Like #42, it seems like this should go in the same gem as the locker.

Callbacks Hash in EventHandlerSubscription::Base gets longer randomly

I build a celluloid application running in JRuby utilizing the ZK gem with 5 pooled connections using mperham's connection_pool gem.
I'm using these connections within 5 celluloid actor threads that act as workers which process data.
Each worker uses 2 nested with_lock blocks with the same zookeeper connection from the above mentioned connection pool to lock its workload from being done simultaneously by another thread.
Everything seems to be working fine so far regarding my other zookeeper operations (task management) but I experienced one odd thing looking through the debug log that I wanted to share/discuss here:

Every now and then, I see the output of ZK::EventHandlerSubscription::Base like I attached below. I noticed that the @callbacks hash is growing with empty arrays over time after serving a few requests. This shouldn't be the case, right? Besides It should be okay to always use different lock names?

If you need more information, let me know. I could also try to nail this in a small sample application if requested.

#<ZK::EventHandlerSubscription::Base:0x1c122c7a @interests=#<Set: {:created, :deleted, :changed, :child}>,
@path="/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003",
@mutex=#<ZK::Monitor:0x5cdcc3b8 @mon_count=0, @mon_mutex=#<Mutex:0x21b594a9>, @mon_owner=nil>,
@parent=#<ZK::EventHandler:0x329bbe66 @mutex=#<ZK::Monitor:0xef1347f @mon_count=0, @mon_mutex=#<Mutex:0x7dad8582>,
@mon_owner=nil>, @default_watcher_block=#<Proc:0xb185a44@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/event_handler.rb:250 (lambda)>,
@zk=#<ZK::Client::Threaded:5818 zk_session_id=0x13a4080816605ba ...>,
@callbacks={"state_3"=>[], :all_state_events=>[], "/_zklocking/task_lock:c6548120-121b-11e2-87cd-a5de314d1973:sum_result_0/ex0000000007"=>[],
 "state_-112"=>[ 
   #<ZK::EventHandlerSubscription::Base:0x1832f489 @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_-112",
   @mutex=#<ZK::Monitor:0x51141ddf @mon_count=0, @mon_mutex=#<Mutex:0x7fb5450e>, @mon_owner=nil>,
   @parent=#<ZK::EventHandler:0x329bbe66 ...>,
   @callable=#<Proc:0x55c8dba2@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>],
   "state_1"=>[
     #<ZK::EventHandlerSubscription::Base:0x75cb94ad @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_1",
     @mutex=#<ZK::Monitor:0x2c72c20d @mon_count=0, @mon_mutex=#<Mutex:0x2221fa47>, @mon_owner=nil>,
     @parent=#<ZK::EventHandler:0x329bbe66 ...>,
     @callable=#<Proc:0x64b65cd2@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>],
     "state_0"=>[ 
       #<ZK::EventHandlerSubscription::Base:0x8812a6 @interests=#<Set: {:created, :deleted, :changed, :child}>, @path="state_0",
       @mutex=#<ZK::Monitor:0x4b291058 @mon_count=0, @mon_mutex=#<Mutex:0x74f027f4>, @mon_owner=nil>, @parent=#<ZK::EventHandler:0x329bbe66 ...>,
       @callable=#<Proc:0x42cf4026@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:199 (lambda)>>
     ], 
     :all_node_events=>[],
     "/_zklocking/task_lock:d225dd50-121b-11e2-9470-e74969dc6fb6:sum_result_0/ex0000000003"=>[],
     "/_zklocking/task_calculation_lock:calculation_task_3:user_1/ex0000000000"=>[],
     "/_zklocking/task_lock:dd5ca730-121b-11e2-9c91-1a29f145ecf1:mul_result_0/ex0000000005"=>[],
     "/_zklocking/task_lock:df2495f0-121b-11e2-9b18-221ea8a36fe9:mul_result_0/ex0000000002"=>[],
     "/_zklocking/task_lock:df643700-121b-11e2-92f2-37bbec17b37e:sum_result_0/ex0000000006"=>[],
     "/_zklocking/task_calculation_lock:calculation_task_2:user_1/ex0000000001"=>[],
     "/_zklocking/finish_lock:e148ae70-121b-11e2-999d-40dd7a593966/ex0000000000"=>[],
     "/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003"=>[#<ZK::EventHandlerSubscription::Base:0x1c122c7a ...>]
  },
  @orig_pid=63595,
  @state=:running,
  @thread_opt=:single,
  @outstanding_watches={
    :data=>#<Set: {"/_zklocking/task_lock:dd5ca730-121b-11e2-9c91-1a29f145ecf1:mul_result_0/ex0000000005"}>,
     :child=>#<Set: {}>}
   >,
   @callable=#<Proc:0x6231b90d@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zk-1.7.2/lib/zk/node_deletion_watcher.rb:183 (lambda)>
> with [#<Zookeeper::Callbacks::WatcherCallback:0x5a4e3fa1 @context=nil, @path="/_zklocking/task_lock:e7c74770-121b-11e2-8de4-82fc76d42e2c:mul_result_0/ex0000000003", @state=3, @completed=true, @proc=#<Proc:0x47339158@/Users/grinser/.rvm/gems/jruby-1.6.8/gems/zookeeper-1.3.0-java/lib/zookeeper/callbacks.rb:24>, @type=2, @zk=#<ZK::Client::Threaded:5818 zk_session_id=0x13a4080816605ba ...>>]

Should register automatically call stat with :watch => true ?

I was experimenting with the ZK gem and spent a fair amount of time trying to get the register callbacks to work.

In process 1

require 'zk'
zk = ZK.new
zk,create('/foo', 'bar')
sleep 30
zk.set('/foo', 'baz')

In process 2 during the 30 second sleep period

require 'zk'
zk = ZK.new
ns = zk.register('/foo')  {|e|  "Got event #{e.event_name} #{e.path}" }

I noticed nothing happens.
After looking more at the examples it appears I have have to do this

require 'zk'
zk = ZK.new
ns = zk.register('/foo')  {|e|  "Got event #{e.event_name} #{e.path}" }
zk.stat('/foo', :watch => true)

Is there a reason why register does not call stat with :watch => true automatically?

Feature request: lock cleanup

Sometimes when sessions expire, a lock's parent directory will never be cleaned up.

It would be great to have a call to cleanup all parent directories that are no longer in use.

The logic could be something like:

  def cleanup_lockers(client, root_lock_node = Locker.default_root_lock_node)
    client.children(root_lock_node).each do |zname|
      locker = exclusive_locker(client, zname)
      if locker.lock
        locker.unlock
      end
    end
  end

Rational: All we need to do is see if we can get the lock, and if we do, unlock and the unlock will delete the parent if no one else is waiting.

Recipe: Semaphore

I haven't been able to find a description of an implementation of a semaphore that I like, so I'm going to try to work this one out.

An obvious use of a semaphore would be to restrict concurrent access to a resource (for things like preventing too many workers from hitting the same database or website).

My thoughts on the semaphore would be that it would take a name and a size. Any clients that attempt to acquire the semaphore that are more than the size limit will block until a slot is available.

I don't have a strong opinion between using acquire/release vs wait/signal for the method names.

There are two obvious ways to implement this:

1. The more efficient approach

Like a lock, have each client wait for the sequential znode below them and wait for the watcher to fire.

The problem with this is if you have a size of 10, and 20 clients, none of the clients above 10 will progress until the 10th one completes.

2. The more immediate approach

Watch for any changes to the parent znode and recalculate the clients position each time.

The downside of this is it will cause all clients to be awoken when any client releases. In reality, if there is not a large number of clients waiting, it should provide for a better experience.

I'm hoping there's another obvious solution I'm missing here.

After reading the code to the netflix curator recipe for a semaphore, it looks like they opted for approach 1, but didn't specifically call out the behavior: https://github.com/Netflix/curator/wiki/Shared-Semaphore

Timeout on lock waiting

Is there any way to set a timeout (per-lock or even globally) on how long a client should block waiting to acquire a lock?

From what I can tell I don't think there is, but I figured I'd ask.

Thanks!

Thread hangs on exit

Recently we've run into an issue where a process using zk and redis_failover sometimes fails to die on exit, but instead hangs in an infinite sched_yield() loop taking all CPU. A SIGKILL is required to get rid of it. I guess it has to do with the program exiting without properly closing the connection first, but i guess it should still die cleanly.

gdb output gives this:

#0  0x00007fb7e6765a67 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb7e6bd86f7 in gvl_yield (vm=0x1bfdf20, th=<optimized out>) at thread_pthread.c:125
#2  0x00007fb7e6bdac16 in rb_thread_schedule_limits (limits_us=0) at thread.c:1025
#3  rb_thread_schedule_limits (limits_us=0) at thread.c:1033
#4  rb_thread_schedule () at thread.c:1035
#5  0x00007fb7e6bdad5f in rb_thread_terminate_all () at thread.c:375
#6  0x00007fb7e6abf89e in ruby_cleanup (ex=0) at eval.c:140
#7  0x00007fb7e6abfa25 in ruby_run_node (n=0x24f0428) at eval.c:244
#8  0x00000000004007fb in main (argc=3, argv=0x7fff7725e948) at main.c:38

After adding some debug-code to ruby side to get a backtrace when the process is hung, I was able to get this:

Thread TID-t26i0
ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
shared/bundle/ruby/1.9.1/gems/zk-1.7.1/lib/zk/threadpool.rb:268:in `worker_thread_body'

When trying to reproduce it without redis_failover i was able to get it hang in a similar way, but in a different place:

Thread TID-ccaag
ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common/queue_with_pipe.rb:59:in `pop'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common.rb:56:in `get_next_event'
shared/bundle/ruby/1.9.1/gems/zookeeper-1.3.0/lib/zookeeper/common.rb:94:in `dispatch_thread_body'

and

Thread TID-alg44
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/thread.rb:71:in `wait'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:110:in `wait'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:122:in `wait_while'
shared/bundle/ruby/1.9.1/gems/zk-1.7.1/lib/zk/client/threaded.rb:533:in `block in reconnect_thread_body'
rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'

Code in all is somewhat similar (@cond.wait).

Any ideas? Ruby is 1.9.3-p194, but it also happens at least on 1.9.3-p0. ZK 1.7.1, zookeeper 1.3.0. OS: linux ubuntu 12.04.

At least in ruby 1.9.3 you can give a timeout to ConditionVariable's wait(), maybe that would help.

With this I was able to get it hang in a similar way quite often:

#!/usr/bin/env ruby
require 'rubygems'
require 'zk'
$stdout.sync = true
@zkservers = "localhost:2181"

trap 'TTIN' do
  Thread.list.each do |thread|
    puts "Thread TID-#{thread.object_id.to_s(36)}"
    puts thread.backtrace.join("\n")
  end
end

def do_something
  zk = ZK.new(@zkservers)
  puts zk.children('/').inspect 
  sleep 1
end

puts "Pid: #{$$}"
count = 50
stack = []
(0..count).each do |i|
  stack << Thread.new { do_something }
end
sleep rand(0)

Running it in while true; do ./test.rb; done loop until it gets stuck and then kill -TTIN prints the backtraces of the still alive threads.

Feature Request: create-or-set method

It would be nice to have something equivalent to redis's set() command that just creates a node if it doesn't exist or sets it if it does.

Also, it seems like optionally making it do an mkdir_p() if needed would be a nice addition as well.

mkdir_p in jruby fails if more than one part of the path is nonexistent

When running zk.mkdir_p("/foo/bar") in an empty zookeeper tree an NoNode-exception is thrown, but it's not caught in the handler, so no recursion is performed. It is possible to use it to create a single new level (mkdir_p("/foo")), mimicing the behavior of create.

This is on jruby 1.7.0-preview1 and 2.

already initialized constants

Recent zk/zookeper seems to pull in a version of backports which causes a bunch of already initialized constant warnings. Any way to prevent this? I'm running on mri 1.9.3

/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrDimensionMismatch
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrNotRegular
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrOperationNotDefined
/Users/mconway/.rvm/rubies/ruby-1.9.3-p125-falcon/lib/ruby/1.9.1/e2mmap.rb:130: warning: already initialized constant ErrOperationNotImplemented
/Users/mconway/.rvm/gems/ruby-1.9.3-p125-falcon/gems/backports-2.5.3/lib/backports/1.9.2/stdlib/matrix.rb:516: warning: already initialized constant SELECTORS

ZK#with_lock should yield the lock back to the block

When using ZK.with_lock, it should yield the created lock to the block, so that any block code can query about the state of the lock.

Example:

ZK.with_lock do |lock|
   # etc etc
   lock.assert!
   puts "We lost the lock" unless lock.locked?
end

ExclusiveLock with chroot namespace failure

When constructing a client with a chroot namespace (specifically a ThreadedClient), calling:

client.with_lock("lock_name") do
... (anything) ...
end

will acquire the lock, execute the code block, and then fail to release it with the following error:

removing lock path /chroot_namespace/_zklocking/lock_name/ex0000000000
ZK::Exceptions::NoNode: inputs: {:path=>"/chroot_namespace/_zklocking/lock_name/ex0000000000", :version=>-1}
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/base.rb:669:in check_rc' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/base.rb:552:indelete'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:128:in cleanup_lock_path!' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:78:inunlock!'
from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/locker.rb:59:in with_lock' from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/zk-0.8.6/lib/z_k/client/conveniences.rb:97:inwith_lock'
from (irb):1

RFC: Actor-style event delivery - saner parallelism

@tobowers @eric @ryanlecompte if you guys have a moment, I've got this idea for increasing the parallelism of applications built on ZK (i.e. get rid of the "single event thread") while at the same time keeping execution somewhat sane (multiple threads not calling registered watchers simultaneously). I think this is a good compromise, and better than just "upping the number of threads in the threadpool."

From the code:

# Stealing some ideas from Celluloid, this event handler subscription
# (basically, the wrapper around the user block), will spin up its own
# thread for delivery, and use a queue. This gives us the basis for better
# concurrency (event handlers run in parallel), but preserves the
# underlying behavior that a single-event-thread ZK gives us, which is that
# a single callback block is inherently serial. Without this, you have to
# make sure your callbacks are either synchronized, or totally reentrant,
# so that multiple threads could be calling your block safely (which is
# really difficult, and annoying).
#
# Using this delivery mechanism means that the block still must not block
# forever, however each event will "wait its turn" and all callbacks will
# receive their events in the same order (which is what ZooKeeper
# guarantees), just perhaps at different times.

The downside is that for fast-lived subscriptions (like in the tests, or maybe for heavy use of locking), having a thread-per-subscription is heavyweight (the tests run about 5x slower).

I'm thinking that this would be a good optional feature that people could pick, perhaps at registration time (right now it's a per-connection option, but I'm going to change that).

If you get a chance, please have a look at 928e23e and let me know what you think, I'd really appreciate the feedback.

NameError: uninitialized constant ZookeeperCommon::QueueWithPipe::Forwardable

Have to report this issue here, since the slyphon-zookeeper gem doesn't have an issues section. Anything that uses slyphon-zookeeper v 0.8.3, and therefore recent installations of ZK throw this error as soon as you try to "require" them. eg:

baconator@dev:~$ irb
1.9.3-p125 :001 > require 'rubygems'
 => false 
1.9.3-p125 :002 > require 'zookeeper'
NameError: uninitialized constant ZookeeperCommon::QueueWithPipe::Forwardable
    from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:4:in `<class:QueueWithPipe>'
    from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:3:in `<module:ZookeeperCommon>'
    from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common/queue_with_pipe.rb:1:in `<top (required)>'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper/common.rb:168:in `<top (required)>'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /home/baconator/.rvm/gems/ruby-1.9.3-p125@rails320/gems/slyphon-zookeeper-0.8.3/lib/zookeeper.rb:5:in `<top (required)>'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:59:in `require'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:59:in `rescue in require'
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:35:in `require'
    from (irb):2
    from /home/baconator/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
1.9.3-p125 :003 > 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.