Comments (15)
from broadway_kafka.
It seems your supervision tree is not terminating correctly. Do you see any errors or the offset persisted the first time you stop it? Can you try reproducing the issue with a regular topology?
You can also try to stop it by passing use Broadway, restart: :tempotary
and then calling GenServer.stop . Also please let us know your broadway and broadway_kafka versions. Make sure you are on latest!
from broadway_kafka.
@josevalim thanks for the help.
my versions are
broadway 0.6.2
broadway_kafka 0.1.4
brod 3.14.0
It seems your supervision tree is not terminating correctly. Do you see any errors or the offset persisted the first time you stop it? Can you try reproducing the issue with a regular topology?
I do not see any errors when i am using DynamicSupervisor.terminate_child/2
. It is returning an :ok
response. And you mean was the offset persisted in Kafka for that specific ConsumerGroupId yes ? Also what do you mean by try reproducing by using a regular topology ?
from broadway_kafka.
Broadway should persist the offset when the topology terminated but for some reason isn’t. By regular topology I meant one outside of a DynamkcSupervisor, where you start it regularly and terminate by calling System.stop. Basically let’s try to find a minimal way to reproduce the issue. :) what is your offset config?
from broadway_kafka.
@josevalim Alright i can set that up real quick and test it out and i will post the results here. But when you say by calling System.stop
do you mean stopping the entire applications supervision tree ?
Entire BroadwayKafka start_link config:
name: String.to_atom(group_id <> "Pipeline"),
producer: [
module:
{BroadwayKafka.Producer,
[
hosts: hosts,
group_id: group_id,
topics: topics,
offset_commit_on_ack: true,
offset_reset_policy: :earliest,
group_config: [
session_timeout_seconds: 15
],
fetch_config: [
# 3 MB
max_bytes: 3_145_728
],
client_config: [
# 15 seconds
connect_timeout: 15000
]
]},
concurrency: 10,
transformer:
{__MODULE__, :transform, [group_id: group_id, event_definition_id: event_definition_id]}
],
processors: [
default: [
concurrency: Config.event_processor_stages()
]
],
context: [event_type: event_type]
)
end
from broadway_kafka.
from broadway_kafka.
@josevalim so i took one of my less complicated pipelines that uses the same configs and added it directly to the application supervision tree.
In this screen shot 1
is the pipeline started under the app supervision tree and 2
is the pipeline started under the DynamicSupervisor.
When testing using Genserver.stop/2
i got the same result
iex(5)> drilldown_pid = Process.whereis(:DrilldownPipeline)
#PID<0.890.0>
iex(6)> GenServer.stop(drilldown_pid, :shutdown)
:ok
iex(7)> 13:38:20.648 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.897.0>,cb=#PID<0.894.0>,generation=1):
re-joining group, reason::rebalance_in_progress
13:38:20.648 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.897.0>,cb=#PID<0.894.0>,generation=1):
Leaving group, reason: {:noproc, {GenServer, :call, [#PID<0.894.0>, :drain_after_revoke, :infinity]}}
eventually throwing this error:
13:38:20.665 [error] GenServer #PID<0.905.0> terminating
** (stop) exited in: GenServer.call(#PID<0.902.0>, :drain_after_revoke, :infinity)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(elixir 1.10.3) lib/gen_server.ex:1023: GenServer.call/3
(broadway_kafka 0.1.4) lib/producer.ex:415: BroadwayKafka.Producer.assignments_revoked/1
(brod 3.14.0) /Users/amacciola/Desktop/CogilityDev/cogynt-workstation-ingest/deps/brod/src/brod_group_coordinator.erl:477: :brod_group_coordinator.stabilize/3
(brod 3.14.0) /Users/amacciola/Desktop/CogilityDev/cogynt-workstation-ingest/deps/brod/src/brod_group_coordinator.erl:391: :brod_group_coordinator.handle_info/2
(stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
(stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6
(stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
and starting a new pipeline with unknown offsets and re-ingesting all the data.
When i tested with System.stop/1
it killed the application the first time and when i restarted the application it started up and logged
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
elected=true
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
elected=false
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
failed to join group
reason: :rebalance_in_progress
13:43:55.838 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=7):
re-joining group, reason::rebalance_in_progress
13:43:55.839 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
failed to join group
reason: :rebalance_in_progress
13:43:55.839 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=7):
re-joining group, reason::rebalance_in_progress
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.896.0>,cb=#PID<0.893.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.888.0>,cb=#PID<0.885.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.892.0>,cb=#PID<0.889.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.876.0>,cb=#PID<0.873.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.880.0>,cb=#PID<0.877.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.872.0>,cb=#PID<0.869.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.884.0>,cb=#PID<0.881.0>,generation=8):
elected=true
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.900.0>,cb=#PID<0.897.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.868.0>,cb=#PID<0.865.0>,generation=8):
elected=false
13:43:55.841 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.864.0>,cb=#PID<0.861.0>,generation=8):
elected=false
13:43:55.843 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.888.0>,cb=#PID<0.885.0>,generation=8):
assignments received:
template_solution_events:
partition=7 begin_offset=undefined
template_solutions:
partition=7 begin_offset=undefined
13:43:55.843 [info] Group member (Drilldown-consumer-temp-id-1,coor=#PID<0.876.0>,cb=#PID<0.873.0>,generation=8):
assignments received:
and proceeded to create a new pipeline with the same consumer_group_id but again with an undefined offset so it re-ingested all the data
from broadway_kafka.
Btw, at least something was processed in both cases right? Other things to try out: try using brod 3.10 and see if it changes anything? And try switch the offset_commit_on_ack. Thanks!
from broadway_kafka.
@josevalim yes when the pipeline initially comes up it ingests the data as it should. It is when the Pipeline is restarted is where the issues are happening. I will try downgrading the version of brod
. I will test it out with offset_commit_on_ack: false however having that set to true is one of my major needs so if that does not work that would be a reason for me to look elsewhere
Edit:
Also just to note i did overwrite the Kafka_protocol version to be
{:broadway_kafka, "~> 0.1.0", override: true},
{:kafka_protocol, "~> 2.4.1", override: true},
because if i use the version thatbrod 3.10
uses my application will not compile
from broadway_kafka.
Much appreciated @josevalim
from broadway_kafka.
Just to add some more information. Here is a screen shot of the describing the consumerGroup
- when first starting the pipeline
- when stopping the pipeline
- when starting the pipeline again with the same consumer_group_id
It does not look like its removing the consumer_group its just shutting down all of its members. So ya it just feels like the offsets are not being persisted
from broadway_kafka.
@josevalim So cloned BroadwayKafka and added logs and was doing testing and i think i found what the main issue was. In my Pipelines i was defining an ack
callback and doing some work once a message was ack'd. It seems since i was defining my own the BroadwayKafka Acknowledgers were not being called. Therefore the offsets were not being committed.
I have got it working with my ack
callback commented out. But now i am missing the logic that i had been running at the end of each acked message. There is no way to define my own as well with this library is there ?
from broadway_kafka.
@amacciola When you set your own, you can store the lib one and call it. However, I would suggest to simply call ack_immediately and then execute your ack logic, without changing the message's ack fields.
In any case, since this is not a lib bug, I will close this. Thanks for the follow up!
from broadway_kafka.
However, I would suggest to simply call ack_immediately and then execute your ack logic, without changing the message's ack fields.
I am not quite sure what you mean by this. Could you elaborate pls ?
from broadway_kafka.
Instead of changing the ack fields, you can use Broadway.Message.ack_immediately
and then whatever you want to do right after calling ack_immediately. Basically, broadway has ways for you to force an ack to happen at a certain moment, precisely so you don't have to message with the ack fields.
from broadway_kafka.
Related Issues (20)
- Offsets accumulating in the producer ack state HOT 5
- Support :query_api_versions brod option HOT 1
- Cut release 0.3.6 ? HOT 2
- Consumer Static Membership HOT 9
- No rejoin after "payload connection down :shutdown, :tcp_closed}" deadlock on race between assigments_revoked call and handle DOWN message HOT 16
- the table identifier does not refer to an existing ETS table HOT 5
- Deadlock on race between assigments_revoked call and handle DOWN message HOT 3
- drain_after_revoke failed due to killed process HOT 3
- Producers stuck in :assignments_revoked causing endless group rebalancing HOT 24
- Feature: Add option to set the starting offset for new consumer HOT 6
- Backoff strategy HOT 1
- Manual Partition Assignment HOT 4
- Allow to force consume the topic from the beginning or the end
- Undesirable resource usage related to producer concurrency HOT 8
- Add support for reseting offsets to a specific timestamp HOT 1
- Request for a new release HOT 1
- Offsets accumulating in the producer ack state (take 2) HOT 6
- Updates on the release date of the next version? HOT 5
- Fails to compile on otp 27 HOT 9
- Implementing offset lag telemetry HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from broadway_kafka.