GithubHelp home page GithubHelp logo

Potential deadlock about cl-rdkafka HOT 1 CLOSED

sahilkang avatar sahilkang commented on July 24, 2024
Potential deadlock

from cl-rdkafka.

Comments (1)

SahilKang avatar SahilKang commented on July 24, 2024 1

Thanks for digging in and sharing the details 💯; I'm able to reproduce this some of the time with:

(ql:quickload '(cl-rdkafka babel))


(defparameter +topic+ "test-topic")

(defparameter +num-messages+ 100000)

(let ((producer (make-instance
                 'kf:producer
                 :conf '("bootstrap.servers" "127.0.0.1:9092")
                 :serde #'babel:string-to-octets)))
  (loop
     repeat +num-messages+
     do (kf:send producer +topic+ "test-message"))
  (kf:flush producer))


(defparameter +consumer+
  (make-instance
   'kf:consumer
   :conf '("bootstrap.servers" "127.0.0.1:9092"
           "group.id" "test-group-id"
           "enable.auto.commit" "false"
           "auto.offset.reset" "earliest"
           "offset.store.method" "broker"
           "enable.partition.eof" "false")
   :serde #'babel:octets-to-string))

(kf:subscribe +consumer+ +topic+)

(defparameter *count* 0)

(defparameter +lock+ (bt:make-lock "+lock+"))

(defparameter +channel+
  (let ((lparallel:*kernel*
         (lparallel:make-kernel 3 :name "test-kernel")))
    (lparallel:make-channel)))

(defun process ()
  (bt:with-lock-held (+lock+)
    (incf *count*)
    (kf:commit +consumer+ :asyncp t)))

(defun poll ()
  (let (message)
    (bt:with-lock-held (+lock+)
      (setf message (kf:poll +consumer+ 5)))
    (when message
      (lparallel:submit-task +channel+ #'process))))


(lparallel:submit-task
 +channel+
 (lambda ()
   (loop
      repeat +num-messages+
      do (poll))))

Running (bt:all-threads) will sometimes show that all the threads are running okay, but other times I'll see the deadlock you described:

#<SB-THREAD:THREAD "worker" waiting on:
      #<MUTEX "address->queue-lock"
          owner: #<SB-THREAD:THREAD "cl-rdkafka" RUNNING {100271A6E3}>>
    {1005DB8F93}>

The reason why poll-loop won't release the lock is because this lparallel.queue:pop-queue call will block indefinitely when the queue is empty. it's not a valid state to reach this call with an empty queue and the reason why it's occurring is because the commit call is not grabbing the mutex early enough: the mutex should be grabbed before the cl-rdkafka/ll:rd-kafka-commit-queue call a few lines above it.

I'll go into more details in the PR/commit that fixes this, but at a high level:

  • this cl-rdkafka/ll:rd-kafka-commit-queue call adds more events to the rd-kafka-queue without grabbing the mutex
  • this causes process-events to keep looping (process-events is called by poll-loop)
  • this enqueue-payload will attempt to acquire the mutex but will be blocked because process-events is still looping with the lock acquired
  • eventually this lparallel.queue:pop-queue call, which is called by process-events, blocks on an empty queue
  • since enqueue-payload is the function which adds elements to this empty queue and since it's blocked while acquiring the mutex, we end up with a deadlock

For analogous reasons, a similar oversight also causes the same deadlock with the producer send method, so you get a double whammy 🏆

from cl-rdkafka.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.