Comments (22)
Keep in mind that Broadway draining on shutdown pretty much sets GenStage.demand/2 to halt it, so it should be enough to achieve the desired behaviour. The issue with moving this to Broadway is that the behaviour can be drastically different between producers. For example in RabbitMQ we would need to ask the server to stop sending data and then ask it to resume. That's not necessary in none of the others.
from broadway_kafka.
Try using :sys.suspend
and :sys.resume
on the producer names.
from broadway_kafka.
@josevalim if I understand correctly, this wouldn't drain current messages, would it? I'm not sure if the rest of the flow depends on the producer process to be responding
from broadway_kafka.
You are correct. I also just realized that the better option is to call GenStage.demand/2
on the producers. That will at least drain any demand already requested.
from broadway_kafka.
Personally, I'm all in for this kind of feature to be available upstream on Broadway.
from broadway_kafka.
@josevalim so the best option to "halt + drain" the Broadway Kafka pipeline is to use the GenStage.demand/2
call ?
how would i resume
after halting
using this command ?
from broadway_kafka.
I think so but please try it out.
from broadway_kafka.
@josevalim just to clarify you mean using
demand(stage(), :forward | :accumulate) :: :ok
and setting the mode
to :accumulate
so it builds up the demand but does not forward it to the producers. Then when wanting to resume just change the mode
back to :forward
from broadway_kafka.
Yes.
from broadway_kafka.
@josevalim it worked great !
i just did something simple like this for reference
Broadway.producer_names(:pipeline_name)
|> Enum.each(fn producer ->
GenStage.demand(producer, :accumulate)
end)
Broadway.producer_names(:pipeline_name)
|> Enum.each(fn producer ->
GenStage.demand(producer, :forward)
end)
from broadway_kafka.
Fantastic!
from broadway_kafka.
@josevalim i have ran into an interesting scenario that have stemmed from the changes i made in this thread.
The scenario is a pipeline is started and the Kafka topic exists but does not yet have any data. Once the pipeline is started the user shortly after suspends
it. On the backend this just changes the demand mode to :accumulate
.
After it has been suspended data is pushed to the Kafka topic. What happens is that it ingests X amount of data from the Kafka topic even though the Genstage demand mode is set to :accumulate
because when it initially started it passed X demand to the consumers . Since there were no messages to process the demand is just waiting to be fulfilled.
My question is what is the best way to handle this scenario ? because i do not want data ingesting after a user suspends a pipeline. It will confuse them as to why this is happening.
Should a Genstage demand mode start as :accumulate
as default and only switch to :forward
once it detects it has data to ingest ?
Or should i just try and handle this in my solution and when i suspend a pipeline, detect when it is done processing and try to flush any demand from the consumers when done.
from broadway_kafka.
@amacciola I may be missing something here but not that any demand requested before accumulating will still be served. See this example:
# Usage: mix run examples/producer_consumer.exs
#
# Hit Ctrl+C twice to stop it.
#
# This is a base example where a producer A emits items,
# which are amplified by a producer consumer B and printed
# by consumer C.
defmodule A do
use GenStage
def init(counter) do
{:producer, counter}
end
def handle_demand(demand, counter) when demand > 0 do
Process.send_after(self(), {:produce, demand}, 2000)
{:noreply, [], counter}
end
def handle_info({:produce, demand}, counter) do
# If the counter is 3 and we ask for 2 items, we will
# emit the items 3 and 4, and set the state to 5.
events = Enum.to_list(counter..counter+demand-1)
{:noreply, events, counter + demand}
end
end
defmodule C do
use GenStage
def init(:ok) do
{:consumer, :the_state_does_not_matter}
end
def handle_events(events, _from, state) do
# Inspect the events.
IO.inspect(events)
# We are a consumer, so we would never emit items.
{:noreply, [], state}
end
end
{:ok, a} = GenStage.start_link(A, 0) # starting from zero
{:ok, c} = GenStage.start_link(C, :ok) # state does not matter
GenStage.sync_subscribe(c, to: a)
GenStage.demand(a, :accumulate)
Process.sleep(:infinity)
from broadway_kafka.
@josevalim maybe i am also mis understanding. But how i thought this worked was
- We have 1 Producer -> many Consumers.
- When the GenStage pipeline is started (in this case the Broadway Kafka pipeline) the demand is forwarded from the Producer to the many Consumers.
- Since the Consumers cannot fulfill the demand, since there is nothing in the Kafka topic, the demand for each consumer reaches its max and does not request any more from the Producer.
- Then i change the Producer demand mode to be
:accumulate
. So it does not send any more demand to the Consumers if they ask for more. - But there still exists the initial demand on all of the Consumers that was forwarded to each of them. So when data is pushed to Kafka, each consumer fetches the amount to satisfy their demand but does not go any further since the Producer will not give them any more in its current state
from broadway_kafka.
I understand your scenario better. You are correct, if you suspend it, never resume it, then those messages will be there unless you also drain it.
If you want to suspend and never resume, why not terminate it?
In any case, we can allow setting the demand directly in this project. We only need to return it from the producer init callback. But if you know you will immediately start it as accumulated and never resume it, why start it in the first place?
from broadway_kafka.
@josevalim i do want to resume it. A user may resume it whenever they choose.
however when they suspended it the ingested count was 0. So its very confusing to have the pipeline be in a suspended state and them see the count increase (increase by the initial demand)
But if you know you will immediately start it as accumulated and never resume it, why start it in the first place?
we would change the state as long as there is Data to meet the demand.
im trying to figure out if this is a problem that needs to be solved in the library or on my end somewhere
like if i need to flush the demand each time i suspend the pipeline and detect no data is there to meet the demand or something.
however i dont know if there is a way for me to flush the demand from the consumers
this is very much an edge case for me. However one i need to cover
from broadway_kafka.
Suspending a pipeline takes time and it is only concluded when the pipeline drains all of its contents. So UI wise you should show as suspending until everything you eventually requested is consumed.
from broadway_kafka.
@josevalim suspending a pipeline is pretty fast in my exp. By suspending i mean i am just changing the GenStage demand mode to :accumulate
and waiting till all the current data in the consumers finishes processing.
We show the user in the UI a loading/progress bar when this is happening.
The only issue here is that there was never any data ingested from kafka in this scenario. So nothing actually in the Consumers processing. So the user is not actually waiting for something to finish
from broadway_kafka.
You already asked for the data upstream though. It cannot be suspended until that is cancelled or consumed somehow.
from broadway_kafka.
@josevalim well that is what im asking i guess. If it should be the case that i am able to start the Pipeline with the option of being in :accumulate
or :forward
demand mode. This way if a Kafka Topic has no data to ingest. It should not send any data to the consumers.
I do feel like this is a problem i need to solve and less one for the library to implement. However do you know how i would flush the demand of the consumers ? or is that just not something that is possible
from broadway_kafka.
Starting in accumulate should be doable. You need to return demand: :accumulate in the init producer tuple and we can support as an option in this library.
the one with consumer is more complicated because it needs coordination throughout the pipeline.
from broadway_kafka.
@josevalim i will test with starting it in accumulate mode in the init and only changing it to forward if it meets my params
Edit:
i just re read and now understand that the support would have to be added for me to pass it in the init for the Broadway producer
from broadway_kafka.
Related Issues (20)
- Offsets accumulating in the producer ack state HOT 5
- Support :query_api_versions brod option HOT 1
- Cut release 0.3.6 ? HOT 2
- Consumer Static Membership HOT 9
- No rejoin after "payload connection down :shutdown, :tcp_closed}" deadlock on race between assigments_revoked call and handle DOWN message HOT 16
- the table identifier does not refer to an existing ETS table HOT 5
- Deadlock on race between assigments_revoked call and handle DOWN message HOT 3
- drain_after_revoke failed due to killed process HOT 3
- Producers stuck in :assignments_revoked causing endless group rebalancing HOT 24
- Feature: Add option to set the starting offset for new consumer HOT 6
- Backoff strategy HOT 1
- Manual Partition Assignment HOT 4
- Allow to force consume the topic from the beginning or the end
- Undesirable resource usage related to producer concurrency HOT 8
- Add support for reseting offsets to a specific timestamp HOT 1
- Request for a new release HOT 1
- Offsets accumulating in the producer ack state (take 2) HOT 6
- Updates on the release date of the next version? HOT 5
- Fails to compile on otp 27 HOT 9
- Implementing offset lag telemetry HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from broadway_kafka.