GithubHelp home page GithubHelp logo

Comments (15)

josevalim avatar josevalim commented on July 22, 2024 1

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

It seems your supervision tree is not terminating correctly. Do you see any errors or the offset persisted the first time you stop it? Can you try reproducing the issue with a regular topology?

You can also try to stop it by passing use Broadway, restart: :tempotaryand then calling GenServer.stop . Also please let us know your broadway and broadway_kafka versions. Make sure you are on latest!

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim thanks for the help.

my versions are

broadway 0.6.2
broadway_kafka 0.1.4
brod 3.14.0

It seems your supervision tree is not terminating correctly. Do you see any errors or the offset persisted the first time you stop it? Can you try reproducing the issue with a regular topology?

I do not see any errors when i am using DynamicSupervisor.terminate_child/2. It is returning an :ok response. And you mean was the offset persisted in Kafka for that specific ConsumerGroupId yes ? Also what do you mean by try reproducing by using a regular topology ?

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

Broadway should persist the offset when the topology terminated but for some reason isn’t. By regular topology I meant one outside of a DynamkcSupervisor, where you start it regularly and terminate by calling System.stop. Basically let’s try to find a minimal way to reproduce the issue. :) what is your offset config?

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim Alright i can set that up real quick and test it out and i will post the results here. But when you say by calling System.stop do you mean stopping the entire applications supervision tree ?

Entire BroadwayKafka start_link config:

      name: String.to_atom(group_id <> "Pipeline"),
      producer: [
        module:
          {BroadwayKafka.Producer,
           [
             hosts: hosts,
             group_id: group_id,
             topics: topics,
             offset_commit_on_ack: true,
             offset_reset_policy: :earliest,
             group_config: [
               session_timeout_seconds: 15
             ],
             fetch_config: [
               # 3 MB
               max_bytes: 3_145_728
             ],
             client_config: [
               # 15 seconds
               connect_timeout: 15000
             ]
           ]},
        concurrency: 10,
        transformer:
          {__MODULE__, :transform, [group_id: group_id, event_definition_id: event_definition_id]}
      ],
      processors: [
        default: [
          concurrency: Config.event_processor_stages()
        ]
      ],
      context: [event_type: event_type]
    )
  end

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim so i took one of my less complicated pipelines that uses the same configs and added it directly to the application supervision tree.

In this screen shot 1 is the pipeline started under the app supervision tree and 2 is the pipeline started under the DynamicSupervisor.
Screen Shot 2020-09-15 at 1 30 29 PM

When testing using Genserver.stop/2 i got the same result

iex(5)> drilldown_pid = Process.whereis(:DrilldownPipeline)
#PID<0.890.0>
iex(6)> GenServer.stop(drilldown_pid, :shutdown)
:ok
iex(7)> 13:38:20.648 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.897.0>,cb=#PID<0.894.0>,generation=1):
re-joining group, reason::rebalance_in_progress
13:38:20.648 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.897.0>,cb=#PID<0.894.0>,generation=1):
Leaving group, reason: {:noproc, {GenServer, :call, [#PID<0.894.0>, :drain_after_revoke, :infinity]}}

eventually throwing this error:

13:38:20.665 [error] GenServer #PID<0.905.0> terminating
** (stop) exited in: GenServer.call(#PID<0.902.0>, :drain_after_revoke, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.10.3) lib/gen_server.ex:1023: GenServer.call/3
    (broadway_kafka 0.1.4) lib/producer.ex:415: BroadwayKafka.Producer.assignments_revoked/1
    (brod 3.14.0) /Users/amacciola/Desktop/CogilityDev/cogynt-workstation-ingest/deps/brod/src/brod_group_coordinator.erl:477: :brod_group_coordinator.stabilize/3
    (brod 3.14.0) /Users/amacciola/Desktop/CogilityDev/cogynt-workstation-ingest/deps/brod/src/brod_group_coordinator.erl:391: :brod_group_coordinator.handle_info/2
    (stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
    (stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

and starting a new pipeline with unknown offsets and re-ingesting all the data.

When i tested with System.stop/1 it killed the application the first time and when i restarted the application it started up and logged

13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
elected=true
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
elected=false
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
failed to join group
reason: :rebalance_in_progress
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
re-joining group, reason::rebalance_in_progress
13:43:55.839 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
failed to join group
reason: :rebalance_in_progress
13:43:55.839 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
re-joining group, reason::rebalance_in_progress
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.888.0>,cb=#PID<0.885.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.892.0>,cb=#PID<0.889.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.876.0>,cb=#PID<0.873.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.880.0>,cb=#PID<0.877.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.872.0>,cb=#PID<0.869.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=8):
elected=true
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.900.0>,cb=#PID<0.897.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.868.0>,cb=#PID<0.865.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.864.0>,cb=#PID<0.861.0>,generation=8):
elected=false
13:43:55.843 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.888.0>,cb=#PID<0.885.0>,generation=8):
assignments received:
  template_solution_events:
    partition=7 begin_offset=undefined
  template_solutions:
    partition=7 begin_offset=undefined
13:43:55.843 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.876.0>,cb=#PID<0.873.0>,generation=8):
assignments received:

and proceeded to create a new pipeline with the same consumer_group_id but again with an undefined offset so it re-ingested all the data

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

Btw, at least something was processed in both cases right? Other things to try out: try using brod 3.10 and see if it changes anything? And try switch the offset_commit_on_ack. Thanks!

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim yes when the pipeline initially comes up it ingests the data as it should. It is when the Pipeline is restarted is where the issues are happening. I will try downgrading the version of brod. I will test it out with offset_commit_on_ack: false however having that set to true is one of my major needs so if that does not work that would be a reason for me to look elsewhere

Edit:
Also just to note i did overwrite the Kafka_protocol version to be

{:broadway_kafka, "~> 0.1.0", override: true},
      {:kafka_protocol, "~> 2.4.1", override: true},

because if i use the version thatbrod 3.10 uses my application will not compile

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

Much appreciated @josevalim

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

Just to add some more information. Here is a screen shot of the describing the consumerGroup

  1. when first starting the pipeline
  2. when stopping the pipeline
  3. when starting the pipeline again with the same consumer_group_id

Screen Shot 2020-09-17 at 2 36 47 PM

It does not look like its removing the consumer_group its just shutting down all of its members. So ya it just feels like the offsets are not being persisted

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim So cloned BroadwayKafka and added logs and was doing testing and i think i found what the main issue was. In my Pipelines i was defining an ack callback and doing some work once a message was ack'd. It seems since i was defining my own the BroadwayKafka Acknowledgers were not being called. Therefore the offsets were not being committed.

I have got it working with my ack callback commented out. But now i am missing the logic that i had been running at the end of each acked message. There is no way to define my own as well with this library is there ?

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

@amacciola When you set your own, you can store the lib one and call it. However, I would suggest to simply call ack_immediately and then execute your ack logic, without changing the message's ack fields.

In any case, since this is not a lib bug, I will close this. Thanks for the follow up!

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim

However, I would suggest to simply call ack_immediately and then execute your ack logic, without changing the message's ack fields.

I am not quite sure what you mean by this. Could you elaborate pls ?

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

Instead of changing the ack fields, you can use Broadway.Message.ack_immediately and then whatever you want to do right after calling ack_immediately. Basically, broadway has ways for you to force an ack to happen at a certain moment, precisely so you don't have to message with the ack fields.

from broadway_kafka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.