dashbitco / flow Goto Github PK
View Code? Open in Web Editor NEWComputational parallel flows on top of GenStage
Home Page: https://hexdocs.pm/flow
Computational parallel flows on top of GenStage
Home Page: https://hexdocs.pm/flow
G’day!
I trust this’ll be a simple pointer to the documentation I somehow missed rather than needing to reverse-engineer the Flow.Window
protocol:
How do I carve a stream of events into windows of maximum size x but with arrival time separated by no more than y? I’d like to send many at a time, but fewer than ten if the oldest has been waiting longer than I’d like.
The following is a super simplified version of some code that deals with external apis.
my problem is that as you can see, the function inside the stream is raising an error. But, sometimes the inspects at the bottom are reached, before the process is killed.
defmodule Test do
def run do
result =
(fn (page) ->
IO.inspect page, label: "page"
raise "ups"
end)
|> create_stream
|> Flow.from_enumerable()
|> Flow.map(fn(_value) -> %{} end)
|> Enum.to_list
IO.inspect "It should not reach here, but sometimes does"
IO.inspect result, label: "result"
end
def create_stream(api_func) do
Stream.resource(fn -> 1 end, &api_func.(&1), fn _ -> :ok end)
end
end
Test.run
When executing, sometimes I get:
$ mix run test.exs
page: 1
"It should not reach here, but sometimes does"
result: []
[error] GenServer #PID<0.502.0> terminating
** (RuntimeError) ups
test.exs:6: anonymous fn/1 in Test.run/0
(elixir) lib/stream.ex:1285: Stream.do_resource/5
(gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
(gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
(gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:"$demand", :forward}}
State: #Function<0.55142349/1 in GenStage.Streamer.init/1>
if any of the work done by flow raised, I would expect the whole thing to fail in the Enum.to_list call, not afterwards.
Is there anything I can do stop the processing when running Enum.to_list
? Shouldn't the process die when calling Enum.to_list
I originally posted it in ElixirForum HERE, but i'm more and more convinced that it's a bug somewhere
I'm not sure if this is a bug or a misunderstanding on my part. I've narrowed it down to this simple example which just prints out the demand received by the source of a Flow
:
defmodule FlowBug do
use GenStage
def start_link(name) do
GenStage.start_link(__MODULE__, name)
end
def init(name) do
IO.puts("#{name} init")
{:producer, name}
end
def handle_demand(demand, name) do
IO.puts("#{name} DEMAND: #{demand}")
{:noreply, [], name}
end
end
If I use Flow.run()
, the demand is 1
, as I would expect:
iex(1)> {:ok, pid_flow} = FlowBug.start_link("Flow.run")
Flow.run init
{:ok, #PID<0.162.0>}
iex(2)> pid_flow |> Flow.from_stage(stages: 1, max_demand: 1) |> Flow.run
Flow.run DEMAND: 1
However, if I use an Enum
of any sort (I often do this from iex
when I am trying to debug things), max_demand
does not seem to be respected:
iex(1)> {:ok, pid_enum} = FlowBug.start_link("Enum.to_list")
Enum.to_list init
{:ok, #PID<0.150.0>}
iex(2)> pid_enum |> Flow.from_stage(max_demand: 1) |> Enum.to_list
Enum.to_list DEMAND: 1000
I am using Flow for IO-bound things, like calling out to other services during web scraping, etc. I need the back-pressure features more than the parallelism for this. In debugging, I often will do this:
flow |> Stream.take(n) |> Enum.to_list
Just to get things working... I think this used to work fine, but now it seems different (or never worked).
Looking through the code I found a theoretical issue or at least a potential source of confusion.
Let's look at an example:
enums_with_urls
|> Flow.from_enumerables(stages: 4, max_demand: 1)
|> Flow.map(&fetch_url/1)
|> Flow.run()
After looking at this code one would expect to have 4 processes that do the job of fetching urls from a particular API. That might be important because of throttling or just to prevent flooding a service.
However, if the input list contains more than 4 enumerables Flow will trust the job of the mapper operations to the generated GenStage.Streamer
processes. Let's imagine that the input is generated from a directory listing where we have 100 files, each mapped to a Stream with File.stream!/3
. Now we have 100 processes each fetching pages concurrently from some service that is suddenly getting more heat than expected. :)
I think we should either highlight this case in the documentation of Flow.from_enumberables/2
or remove this clause from the case expression.
I have a data set like this: [["1", "2", "3"], ["2", "2", "2"], ["1", "1", "1"], ["3", "3", "3"]]
, after partition Flow.partition(key: fn e -> Enum.at(e, 1) end)
, I surpose ["1", "2", "3"] and ["2", "2", "2"] should be routed to the same stage, but it not. Here is the example:
a = [["1", "2", "3"], ["2", "2", "2"], ["1", "1", "1"], ["3", "3", "3"]]
list = Flow.from_enumerable(a)
|> Flow.partition(key: fn e -> Enum.at(e, 1) end)
|> Flow.reduce(fn -> %{} end, fn(arr, map) ->
new_map = case map do
%{} ->
Map.put(map, "earlier", arr)
earlier ->
old_number = Map.get(earlier, "earlier")
|> Enum.at(1)
new_number = Enum.at(arr, 1)
IO.inspect(old_number == new_number)
Map.put(earlier, "later", arr)
end
IO.inspect(new_map)
new_map
end)
|> Enum.to_list()
The truth is data routed into four stages, and the element have a distinct stage, and the strange thing is the final list is [{"earlier", ["3", "3", "3"]}, {"earlier", ["1", "1", "1"]}]
.
Hey all,
On my team, we have been using Flow and GenStage for the past 9 months or more. We use it to process streams of data from a number of different sources, and to process a lot of data throughout the day. Ideally, we don't shut down. Currently the system has OOM crashes and slows down because of backed-up message queues.
We consistently have an issue where a GenStage process will slow its processing, will begin to accumulate memory. The GenStages that we have written ourselves are all named, but the GenStage generated by our use of Flow are nameless. We have to guess where the problems exist and I think that is a problem.
I would like to be able to provide a name prefix, and to have each "stage" of the Flow (map, flat_map, filter, etc) to append another prefix, then have each GenStage within the "stage" append a further suffix, but just enough to not collide, like _#{i}
, like the partitions in Registry.
An example:
input_stage
|> Flow.from_stage(window: initial_window)
|> Flow.flat_map(&processing_function/1, name: __MODULE__.Flow.ProcessingFunction)
|> Flow.partition(window: post_processing_window)
|> Flow.filter(&filtering_function/1, name: __MODULE__.Flow.FilteringFunction)
|> Flow.into_stages([output_stage], name: __MODULE__.Flow)
This will generate GenStage processes like:
__MODULE__.Flow.ProcessingFunction.FlatMap._0
__MODULE__.Flow.ProcessingFunction.FlatMap._1
__MODULE__.Flow.ProcessingFunction.FlatMap._2
__MODULE__.Flow.ProcessingFunction.FlatMap._3
__MODULE__.Flow.FilteringFunction.Filter._0
__MODULE__.Flow.FilteringFunction.Filter._1
__MODULE__.Flow.FilteringFunction.Filter._2
__MODULE__.Flow.FilteringFunction.Filter._3
A feature, like this, that allows us to reliably determine the line(s) of code from where a process originated, would take a lot of the guesswork out of debugging and performance tuning for my team, and make it so that we can use tools like :observer
and WombatOAM more effectively.
If this is of interest to the maintainers, I would be happy to implement this feature, with your supervision & advice.
Hey all!
I have been seeing an error that I think comes from Flow.from_specs
. It assumes that the result of the start_link
anonymous function is {:ok, pid()}
, and when it is {:error, term()}
, the output of the system is not easy to debug.
I can work on a PR for this, but I just wanted to put it here, if someone got to it first.
There is a discussion about how a failure should be handled, and I am personally of the opinion that it should fail on startup, but I understand that there might be different opinions on this.
There's a weird behaviour when dealing with a producer that has too little elements to deal with:
** (exit) exited in: GenStage.close_stream(%{#Reference<0.212111494.3424387074.139905> => {:subscribed, #PID<0.672.0>, :transient, 500, 1000, 1000}, #Reference<0.212111494.3424387074.139907> => {:subscribed, #PID<0.674.0>, :transient, 500, 1000, 1000}, #Reference<0.212111494.3424387074.139908> => {:subscribed, #PID<0.675.0>, :transient, 500, 1000, 1000}})
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
code: assert :ok = MyFlow.run()
stacktrace:
(gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
(elixir) lib/stream.ex:1370: Stream.do_resource/5
(elixir) lib/enum.ex:2979: Enum.reverse/1
(elixir) lib/enum.ex:2611: Enum.to_list/1
(flow) lib/flow.ex:1012: Flow.run/1
my_flow_test.exs:46: (test)
Here is a gist that have minimal code for this bug: https://gist.github.com/Fenntasy/1e930da0f7b6c2055a660831d4406a96
You can toggle between lines 30 and 31 to provoke the error or to not have it occur.
When the list is only two element, the producer will send its :terminate
event immediatly and the previous error will show up. Displaying something on the console (via IO.inspect
) would also be enough to not have the error (presumably because it adds a little bit of time before the producer closes).
Am I doing something wrong or is it really a race condition?
In https://github.com/elixir-lang/flow/blob/master/lib/flow.ex#L256 there is a nonexistent link to GenStage.async_notify/2
. Has this behavior changed?
The documentation in lib/flow/window.ex#L31 lists Session windows as a supported window type, but they were removed in 0.14:
This release also deprecates Flow.Window.session/3 as developers can trivially roll their own with more customization power and flexibility using emit_and_reduce/3 and on_trigger/2.
Should we remove the reference to session windows being supported, or add documentation on how to implement a session window using emit_and_reduce/3
and on_trigger/2
?
I'll include an example here, in case it is helpful to anyone:
iex> data = [
...> {"elixir", 1_000},
...> {"erlang", 60_000},
...> {"elixir", 3_200_000},
...> {"erlang", 4_000_000},
...> {"elixir", 4_100_000},
...> {"erlang", 6_000_000}
...> ]
iex> flow = Flow.from_enumerable(data) |> Flow.partition(key: fn {k, _} -> k end, stages: 2)
iex> flow =
...> Flow.emit_and_reduce(flow, fn -> %{} end, fn {word, time}, acc ->
...> {count, prev_time} = Map.get(acc, word, {1, time})
...>
...> if time - prev_time > 1_000_000 do
...> {[{word, {count, prev_time}}], Map.put(acc, word, {1, time})}
...> else
...> {[], Map.update(acc, word, {1, time}, fn {count, _} -> {count + 1, time} end)}
...> end
...> end)
iex> flow = Flow.on_trigger(flow, fn acc -> {Enum.to_list(acc), :unused} end)
iex> Enum.to_list(flow)
[
{"erlang", {1, 60000}},
{"erlang", {2, 6000000}},
{"elixir", {1, 1000}},
{"elixir", {2, 4000000}}
]
Hi, I'm not sure if this is the right way to go about this, but I'd like to propose a scan operator similar to Stream.scan
. Something like:
def scan(flow, initial, combine, opts \\ []) do
scan_window = Flow.Window.global |> Flow.Window.trigger_every(1, :keep)
flow
|> Flow.partition(Keyword.put(opts, :window, scan_window))
|> Flow.reduce(initial, combine)
|> Flow.emit(:state)
end
Now things like realtime counts are viable:
Flow.from_stage(MyTextInputStage)
|> Flow.flat_map(&String.split(&1, " "))
|> Flow.scan(fn -> %{} end, fn word, acc ->
Map.update(acc, word, 1, & &1 + 1)
end)
|> Flow.each(fn {word, count} -> Dashboards.update_word_count(word, count) end)
|> Flow.run
If you guys think it would be useful, I'd be more than happy to PR it with tests etc.
https://elixirforum.com/t/flow-into-from-genstage/14262
In the above forum thread we discuss the desire to bring externally written, demand aware components to Flows. We like the Flow orchestration, but the current from_stage, into_stages functionality doesn't quite work.
So, we wanted to have a way to run GenStage.sync/async.subscribe
and subscribe_to an into_stages
producer_consumer pid that was already running, which should work, right, as per docs?
Here's some sample code that returns an error:
defmodule C do
use GenStage
def start_link() do
GenStage.start_link(C, :ok)
end
def init(:ok) do
{:consumer, :the_state_does_not_matter}
end
def handle_events(events, _from, state) do
# Wait for a second.
:timer.sleep(1000)
# Inspect the events.
IO.inspect(events)
# We are a consumer, so we would never emit items.
{:noreply, [], state}
end
end
{:ok, flowpid} = Flow.from_enumerable(1..20) |> Flow.filter(& rem(2, &1) == 0) |> Flow.into_stages([])
{:ok, c} = C.start_link()
GenStage.async_subscribe(c, to: flowpid)
** (EXIT from #PID<0.429.0>) an exception was raised:
** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2
(flow) lib/flow/coordinator.ex:69: Flow.Coordinator.handle_info({:"$gen_producer", {#PID<0.457.0>, #Reference<0.0.1.213>}, {:subscribe, nil, []}}, %{intermediary: [{#PID<0.462.0>, []}, {#PID<0.463.0>, []}, {#PID<0.464.0>, []}, {#PID<0.465.0>, []}, {#PID<0.466.0>, []}, {#PID<0.467.0>, []}, {#PID<0.468.0>, []}, {#PID<0.469.0>, []}], parent_ref: #Reference<0.0.1.200>, producers: [#PID<0.461.0>], refs: [#Reference<0.0.1.192>, #Reference<0.0.1.193>, #Reference<0.0.1.194>, #Reference<0.0.1.195>, #Reference<0.0.1.196>, #Reference<0.0.1.197>, #Reference<0.0.1.198>, #Reference<0.0.1.199>], supervisor: #PID<0.460.0>})
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:667: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
I came up with a workable fix, but I wanted to see a) if it's even worthwhile and b) how the coordinator matching/checking can/should be updated to allow for sync or async subscriptions, as for any initial consumers, everything is a sync_subscribe as per https://github.com/elixir-lang/flow/blob/master/lib/flow/coordinator.ex#L54. Maybe we always want it sync no matter what? As of now, it's seems difficult to allow for async here, unless I'm missing something. Additionally, using a handle_info for this may just be a bad hack. So, I hope for a good solution :).
The hack:
diff --git a/lib/flow/coordinator.ex b/lib/flow/coordinator.ex
index ece4416..0da06bd 100644
--- a/lib/flow/coordinator.ex
+++ b/lib/flow/coordinator.ex
@@ -66,6 +66,13 @@ defmodule Flow.Coordinator do
{:noreply, state}
end
+ def handle_info({:"$gen_producer", {consumer, _}, _}, %{intermediary: intermediary} = state) do
+ for {pid, _} <- intermediary do
+ subscribe(consumer, pid)
+ end
+ {:noreply, state}
+ end
+
def handle_info({:DOWN, ref, _, _, reason}, %{parent_ref: ref} = state) do
{:stop, reason, state}
end
Thanks.
# test.exs
Flow.from_enumerable(1..1000)
|> Flow.map_batch(fn batch -> [Enum.sum(batch)] end)
|> Flow.map(&IO.puts("Sum: #{&1}"))
|> Flow.run()
** (exit) exited in: Enumerable.Flow.reduce(%Flow{operations: [{:on_trigger, #Function<20.17204979/3 in Flow.inject_on_trigger/4>}, {:mapper, :map, [#Function<1.27327935 in file:test.exs>]}, {:batch, #Function<0.27327935 in file:test.exs>}], options: [stages: 32], producers: {:enumerables, [1..1000]}, window: %Flow.Window.Global{periodically: [], trigger: nil}}, {:cont, []}, #Function<146.29191728/2 in Enum.reverse/1>)
** (EXIT) an exception was raised:
** (CaseClauseError) no case clause matching: {[], [{:batch, #Function<0.27327935 in file:test.exs>}, {:mapper, :map, [#Function<1.27327935 in file:test.exs>]}, {:on_trigger, #Function<20.17204979/3 in Flow.inject_on_trigger/4>}]}
(flow 1.0.0) lib/flow/materialize.ex:649: Flow.Materialize.build_trigger/1
(flow 1.0.0) lib/flow/materialize.ex:611: Flow.Materialize.reducer_ops/1
(flow 1.0.0) lib/flow/materialize.ex:45: Flow.Materialize.split_operations/1
(flow 1.0.0) lib/flow/materialize.ex:17: Flow.Materialize.materialize/5
(flow 1.0.0) lib/flow/coordinator.ex:34: Flow.Coordinator.init/1
(stdlib 3.12.1) gen_server.erl:374: :gen_server.init_it/2
(stdlib 3.12.1) gen_server.erl:342: :gen_server.init_it/6
(stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
(flow 1.0.0) lib/flow.ex:1995: Enumerable.Flow.reduce/3
(elixir 1.10.3) lib/enum.ex:3383: Enum.reverse/1
(elixir 1.10.3) lib/enum.ex:2982: Enum.to_list/1
(flow 1.0.0) lib/flow.ex:992: Flow.run/1
(elixir 1.10.3) lib/code.ex:926: Code.require_file/2
(mix 1.10.3) lib/mix/tasks/run.ex:145: Mix.Tasks.Run.run/5
(mix 1.10.3) lib/mix/tasks/run.ex:85: Mix.Tasks.Run.run/1
(mix 1.10.3) lib/mix/task.ex:330: Mix.Task.run_task/3
(mix 1.10.3) lib/mix/cli.ex:82: Mix.CLI.run_task/2
(elixir 1.10.3) lib/code.ex:926: Code.require_file/2
It is already possible to route a flow into a Collectable, e.g.
Flow.from_enumerable([1, 2, 3]) |> Enum.into([])
This works, but forces the collecting to happen in the context of the process creating the Flow, rather than as a separate GenStage consumer process, and therefore "hogs" the Flow-spawning process's inbox from being used for other purposes, as discussed here.)
This can be sensible, if the Flow-spawning process is then going to use the Collected data—it won't attempt to do anything else until the Enum.into/2
completes anyway, and once it proceeds, it will need everything that was delivered to it to reside in its own process's heap. But if the Collectable exists solely to cause side-effects upon insertion rather than as a value object that will carry around its inserted values, this blocking behavior can be suboptimal, since the (potentially long-lived) parent process will end up full of garbage—and blocking as it GC-sweeps—from the messages that were delivered from the GenStage.stream
to the Collectable.
For example, Ecto's Ecto.Adapters.SQL.Stream
struct supports the Collectable behavior, allowing code like this:
db_stream = Ecto.Adapters.SQL.stream(MyRepo, "COPY foo FROM STDIN WITH (FORMAT csv, HEADER false)")
MyRepo.transaction fn ->
Enum.into(csv_flow, db_stream)
end
Here, the process executing the Ecto transaction will receive—and linearize!—all the data produced from csv_flow
, only to pass it off again to db_stream
, where the data will turn around and travel back out to a DBConnection process.
Add a function, Flow.into_collectable(flow, collectable)
, which would be a terminal, demand-driving call for the Flow (like Enum.into/2
is.)
into_collectable/2
would pass each GenStage process in the current partition a copy of the collectable. For correct concurrency semantics, it may be advisable for collectable
to actually be collectable_or_fn
where the user could supply a fun that is called by each GenStage process in the partition, and which returns a concurrency-isolated instance of the collectable.)
Each GenStage process, upon receiving the collectable from into_collectable/2
, would immediately call Collectable.into/1
on it the to get a reducer, and then would hold onto said reducer in its state.
Each GenStage process would then, in its handle_events/3
, apply the reducer to the received events.
Optionally, one could also add a function Flow.through_collectable(flow, collectable)
, which would work similarly, but would be non-terminal. The partition would simply be extended with a step that passes events into the reducer—but then, having done so and having acquired the modified reducer, would simply pass those same events unmodified to the next step in the partition (along with storing the modified reducer in its state.)
Flow.through_collectable/2
would be perfect for use-cases like that of Ecto.Adapters.SQL.stream/2
, where the goal is simply to cause the side-effect of storing the structs being processed into a database (i.e. "durable-izing" them) without necessarily wanting to end the processing of the structs there, and without necessarily having any need to linearize the durabilization process.
As well, both Flow.into_collectable/2
and Flow.through_collectable/2
would potentially get people to make a lot more of their libraries implement Collectable! The Collectable behavior is much simpler to implement than the GenStage consumer behavior; if implementing Collectable on a struct automatically gave a developer effectively all the advantages of a GenStage consumer, with only the time investment of writing the Collectable reducer, developers would likely be more interested in making their structs Collectable.
It seems like in the test there is use of a sliding window already.
Would be really cool to have an implementation of that in flow directly.
Flow.Window.sliding(count: 2, overlap: 1)
I'd be happy to implement it if there is need.
After upgrading to 0.14
a previously working setup of Flow map steps -> reduce, emit(:state)
-> further map steps no longer seems to work.
1..50
|> Flow.from_enumerable()
|> Flow.map(& &1 * 2)
|> Flow.partition(stages: 1, window: Flow.Window.count(5))
|> Flow.reduce(fn -> 0 end, &+/2)
|> Flow.emit(:state)
|> Flow.filter(& &1 > 200)
|> Enum.to_list
[230, 280, 330, 380, 430, 480]
** (exit) exited in: Enumerable.Flow.reduce(%Flow{operations: [{:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}, {:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:reduce, #Function<20.127694169/0 in :erl_eval.expr/5>, &:erlang.+/2}], options: [stages: 1], producers: {:flows, [%Flow{operations: [{:mapper, :map, [#Function<6.127694169/1 in :erl_eval.expr/5>]}], options: [stages: 8], producers: {:enumerables, [1..50]}, window: %Flow.Window.Global{periodically: [], trigger: nil}}]}, window: %Flow.Window.Count{count: 5, periodically: [], trigger: nil}}, {:cont, []}, #Function<131.83463370/2 in Enum.reverse/1>)
** (EXIT) an exception was raised:
** (CaseClauseError) no case clause matching: {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}
(flow) lib/flow/materialize.ex:600: Flow.Materialize.build_trigger/1
(flow) lib/flow/materialize.ex:559: Flow.Materialize.reducer_ops/1
(flow) lib/flow/materialize.ex:52: Flow.Materialize.split_operations/3
(flow) lib/flow/materialize.ex:17: Flow.Materialize.materialize/4
(flow) lib/flow/coordinator.ex:25: Flow.Coordinator.init/1
(stdlib) gen_server.erl:374: :gen_server.init_it/2
(stdlib) gen_server.erl:342: :gen_server.init_it/6
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
(flow) lib/flow.ex:1653: Enumerable.Flow.reduce/3
(elixir) lib/enum.ex:1911: Enum.reverse/1
(elixir) lib/enum.ex:2588: Enum.to_list/1
iex(8)> 19:13:46.115 [error] GenServer #PID<0.585.0> terminating
** (CaseClauseError) no case clause matching: {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}
(flow) lib/flow/materialize.ex:600: Flow.Materialize.build_trigger/1
(flow) lib/flow/materialize.ex:559: Flow.Materialize.reducer_ops/1
(flow) lib/flow/materialize.ex:52: Flow.Materialize.split_operations/3
(flow) lib/flow/materialize.ex:17: Flow.Materialize.materialize/4
(flow) lib/flow/coordinator.ex:25: Flow.Coordinator.init/1
(stdlib) gen_server.erl:374: :gen_server.init_it/2
(stdlib) gen_server.erl:342: :gen_server.init_it/6
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.584.0>, {{:case_clause, {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}}, [{Flow.Materialize, :build_trigger, 1, [file: 'lib/flow/materialize.ex', line: 600]}, {Flow.Materialize, :reducer_ops, 1, [file: 'lib/flow/materialize.ex', line: 559]}, {Flow.Materialize, :split_operations, 3, [file: 'lib/flow/materialize.ex', line: 52]}, {Flow.Materialize, :materialize, 4, [file: 'lib/flow/materialize.ex', line: 17]}, {Flow.Coordinator, :init, 1, [file: 'lib/flow/coordinator.ex', line: 25]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 374]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 342]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}}
This example is limited for the purpose of demonstrating the issue. I understand that in this case a simple Enum.filter/2
would do the job. The idea is that after reduce it should be possible to carry on with the flow and run further map and reduce steps.
The following code works (test passes) but an error is emitted to log.
defmodule FlowtestTest do
use ExUnit.Case
defmodule TestProducer do
use GenStage
# stage emits the string "one" continuously.
def start_link(), do: GenStage.start_link(__MODULE__, :ok)
def init(:ok), do: {:producer, "one"}
def handle_demand(demand, state) do
supply = fn -> state end
|> Stream.repeatedly
|> Enum.take(demand)
{:noreply, supply, state}
end
end
defmodule TestProsumer do
use GenStage
#stage takes strings and turns them into atoms.
def start_link(), do: GenStage.start_link(__MODULE__, :ok)
def init(:ok), do: {:producer_consumer, :ok}
def handle_events(elist, _from, state) do
{:noreply, Enum.map(elist, &String.to_atom/1), state}
end
end
test "test" do
{:ok, producer} = TestProducer.start_link()
{:ok, prosumer} = TestProsumer.start_link()
GenStage.sync_subscribe(prosumer, to: producer, partition: 1)
assert [:one] == Flow.from_stages([prosumer])
|> Enum.take(1)
end
test "test2" do
{:ok, producer} = TestProducer.start_link()
{:ok, prosumer} = TestProsumer.start_link()
GenStage.sync_subscribe(prosumer, to: producer, partition: 1)
assert [:one] == Flow.from_stage(prosumer)
|> Enum.take(1)
end
end
these error are emitted (omitting pretty green success dots):
13:49:19.513 [error] Demand mode can only be set for producers, GenStage #PID<0.190.0> is a producer_consumer
13:49:19.516 [error] Demand mode can only be set for producers, GenStage #PID<0.196.0> is a producer_consumer
documentation says:
"producers are already running stages that have type :producer or :producer_consumer"
so the error shouldn't be emitted, correct? Can submit a pr on this if my supposition is true.
12:19:12.137 [error] Error in process #PID<0.1821.0> on node :"[email protected]" with exit value:
{:undef,
[{Flow, :from_enumerable,
[%File.Stream{line_or_bytes: :line, modes: [:raw, :read_ahead, :binary],
path: "/root/subway/nginx_log/ccb-access.log-20170226", raw: true}], []},
{Honey.Nginx, :read, 3, [file: 'lib/honey/nginx.ex', line: 37]}]}
That's what I got from the log. It works perfectly when running with mix.
The part of code:
def read(:file, file_path \\ @file_path, options \\ @default_options) do
enum = File.stream!(file_path)
options = [file_name: file_path] ++ options
read(:enum, enum, options)
end
def read(:enum, enum, options) do
file_name = options[:file_name]
ets_table_name = file_name |> String.to_atom
case Honey.ETSServer.create(ets_table_name) do
{:new, ets_table} ->
Logger.debug "new ets table: #{inspect ets_table}"
enum
|> Flow.from_enumerable()
|> Flow.flat_map(&String.split(&1, "\n"))
|> Flow.partition()
|> Flow.partition(window: Flow.Window.count(1000), stages: 4)
|> Flow.reduce(fn -> [] end, fn line, acc ->
Are there any examples of supervised Flow?
I have a producer stage and I want to setup Flow with Window to pass events to Consumers after processing. I realize that I probably can create a child in the supervision tree, something like worker(Flow, [Producer], [function: Flow.from_stage])
but how (where) do I add logic to that flow?
UPD: Hm... Just tried to launch empty flow and "something like" didn't work
It would be useful to run the Flow test suite against multiple versions of Elixir. A CI setup with Travis could help with that.
Right now it seems Flow and GenStage progress in lockstep which is fine until they both reach 1.0. After that it would be also useful to test Flow against multiple versions of GenStage.
The following code:
[1, 2, 3]
|> Flow.from_enumerable(stages: 1)
|> Flow.flat_map(&[&1])
|> Flow.emit_and_reduce(fn -> %{} end, fn event, acc ->
IO.inspect acc, label: "ACC"
{[event], Map.update(acc, event, 1, & &1 + 1)}
end)
|> Enum.to_list()
gives the output:
ACC: %{}
ACC: %{1 => 1}
ACC: %{1 => 1, 2 => 1}
[1, 2, 3, {1, 1}, {2, 1}, {3, 1}]
Change the line with Flow.flat_map/2
to be:
|> Flow.flat_map(&[&1, &1])
and now the shape of the accumulator in Flow.emit_and_reduce/3
gets corrupted:
ACC: %{}
ACC: {[1], %{1 => 1}}
15:55:30.022 [error] GenServer #PID<0.170.0> terminating
** (BadMapError) expected a map, got: {[1], %{1 => 1}}
(elixir) lib/map.ex:595: Map.update({[1], %{1 => 1}}, 1, 1, #Function<12.115662033/1 in TestFlow.emit_and_reduce/0>)
reduce.exs:47: anonymous fn/2 in TestFlow.emit_and_reduce/0
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
(flow) lib/flow/materialize.ex:623: anonymous fn/3 in Flow.Materialize.build_emit_and_reducer/2
(flow) lib/flow/materialize.ex:630: Flow.Materialize."-build_emit_and_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:630: anonymous fn/5 in Flow.Materialize.build_emit_and_reducer/2
(flow) lib/flow/map_reducer.ex:54: Flow.MapReducer.handle_events/3
(gen_stage) lib/gen_stage.ex:2315: GenStage.consumer_dispatch/6
Last message: {:"$gen_consumer", {#PID<0.169.0>, #Reference<0.210650528.1048579.149223>}, [1, 2, 3]}
State: {%{#Reference<0.210650528.1048579.149223> => nil}, %{done?: false, producers: %{#Reference<0.210650528.1048579.149223> => #PID<0.169.0>}, trigger: #Function<2.13930487/3 in Flow.Window.Global.materialize/5>}, {0, 1}, %{}, #Function<2.60253262/4 in Flow.Materialize.build_emit_and_reducer/2>}
** (exit) exited in: GenStage.close_stream(%{})
** (EXIT) an exception was raised:
** (BadMapError) expected a map, got: {[1], %{1 => 1}}
(elixir) lib/map.ex:595: Map.update({[1], %{1 => 1}}, 1, 1, #Function<12.115662033/1 in TestFlow.emit_and_reduce/0>)
reduce.exs:47: anonymous fn/2 in TestFlow.emit_and_reduce/0
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
(flow) lib/flow/materialize.ex:623: anonymous fn/3 in Flow.Materialize.build_emit_and_reducer/2
(flow) lib/flow/materialize.ex:630: Flow.Materialize."-build_emit_and_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:630: anonymous fn/5 in Flow.Materialize.build_emit_and_reducer/2
(flow) lib/flow/map_reducer.ex:54: Flow.MapReducer.handle_events/3
(gen_stage) lib/gen_stage.ex:2315: GenStage.consumer_dispatch/6
(gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
(elixir) lib/stream.ex:1370: Stream.do_resource/5
(elixir) lib/enum.ex:2979: Enum.reverse/1
(elixir) lib/enum.ex:2611: Enum.to_list/1
reduce.exs:57: (file)
result = 1..10_000 |> Flow.from_enumerable |> Flow.map(fn x -> "abc" |> String.duplicate(x) end) |> Enum.map(fn x -> "xyz" <> x end) |> List.flatten; length result
Hi all,
I was following this section of the Flow documentation regarding partition
: https://hexdocs.pm/flow/Flow.html#module-partitioning
If I run this code which doesn't have the partition step:
defmodule Test do
def run do
{:ok, stream} =
"roses are red\nviolets are blue\n"
|> StringIO.open()
stream
|> IO.binstream(:line)
|> Flow.from_enumerable()
|> Flow.flat_map(&String.split(&1, " "))
|> Flow.reduce(fn -> %{} end, fn word, acc ->
Map.update(acc, word, 1, & &1 + 1)
end)
|> Enum.to_list()
end
end
I should receive something like:
[{"roses", 1}, {"are", 1}, {"red", 1}, {"violets", 1}, {"are", 1}, {"blue", 1}]
But instead I see this:
[{"are", 2}, {"blue\n", 1}, {"red\n", 1}, {"roses", 1}, {"violets", 1}]
Hello!
I have a simple flow set up here: https://github.com/ninjanicely/flow_fun which demonstrates my issue.
The "problem" I'm seeing is lack of parallelism when using Flow.from_spec |> Flow.through_spec
. Here's some example code below..
defmodule MyFlow do
def start_link(_opts) do
Flow.from_specs([{ A , 1000 }], [max_demand: 1])
#|> Flow.map(&B.factorial/1)
|> Flow.through_specs( [ { {B, []} , []} ] )
|> Flow.into_specs( [ {C, []} ], [] )
end
end
So, brief background, B
has a factorial function in it, I use that to tax the cpu.
The handle_events/3
function in B
calls the factorial function.
If I use Flow.through_specs/3
, I see roughly 100% cpu usage on my machine (where I would expect to see 400%). When I commented out the Flow.through_specs/3
line and replace
with the Flow.map/2
function, I see all cpus being used.
I was hoping that I could force the parallelization in the Flow.through_specs/3
call, but that doesn't seem to be the case. Am I missing something? Should I see parallelism when configuring a Flow like this?
Thanks!
Related to: elixir-lang/gen_stage#132
I have malformed CSV file:
this,is,malformed,"csv,data
and even if I do:
try do
file_path
|> File.stream!()
|> NimbleCSV.RFC4180.parse_stream()
|> Flow.from_enumerable()
|> Flow.partition()
|> Enum.to_list()
catch
:exit, exit -> nil
end
I'm still getting:
08:02:07.440 [error] GenServer #PID<0.184.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
(nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
(elixir) lib/stream.ex:800: Stream.do_transform/8
(gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
(gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
(gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:"$demand", :forward}}
State: #Function<0.55142349/1 in GenStage.Streamer.init/1>
08:02:07.447 [info] GenStage consumer #PID<0.185.0> is stopping after receiving cancel from producer #PID<0.184.0> with reason: {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"},
[{NimbleCSV.RFC4180, :finalize_parser, 1,
[file: 'lib/nimble_csv.ex', line: 207]},
{Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]},
{GenStage.Streamer, :handle_demand, 2,
[file: 'lib/gen_stage/streamer.ex', line: 18]},
{GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]},
{GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3,
[file: 'lib/gen_stage.ex', line: 2209]},
{:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]},
{:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]},
{:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}
08:02:07.449 [error] GenServer #PID<0.185.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
(nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
(elixir) lib/stream.ex:800: Stream.do_transform/8
(gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
(gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
(gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:DOWN, #Reference<0.3607181968.2794192898.201176>, :process, #PID<0.184.0>, {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"}, [{NimbleCSV.RFC4180, :finalize_parser, 1, [file: 'lib/nimble_csv.ex', line: 207]}, {Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]}, {GenStage.Streamer, :handle_demand, 2, [file: 'lib/gen_stage/streamer.ex', line: 18]}, {GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]}, {GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3, [file: 'lib/gen_stage.ex', line: 2209]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.79412627/4 in Flow.Window.Global.materialize/5>}, {0, 4}, [], #Function<33.66250525/4 in Flow.Materialize.mapper_ops/1>}
Is there a way to handle that?
I am trying to supervise my Flow, and restart it if something goes wrong. In my supervisor init/1
I have something like this:
children = [
worker(MyStreamer, [], restart: :transient)
]
MyStreamer
has a start_link
function that just creates the Flow
then starts it:
def start_link(_) do
flow = make_flow()
Flow.start_link(flow)
end
The problem is that my supervisor is not restarting my flow when a runtime exception occurs, because the exit messages have :normal
reasons like this:
{:EXIT, #PID<0.195.0>, :normal}
I was able to reproduce it by running this example in iex which raises a runtime error but has a normal exit reason:
Process.flag(:trap_exit, true)
flow = [3,2,1,0,-1,-2,-3] |> Flow.from_enumerable() |> Flow.map(&(10/&1))
Flow.start_link(flow)
flush()
I am expecting to see a non-normal exit reason like when doing this in iex:
Process.flag(:trap_exit, true)
spawn_link(fn -> 10/0 end)
flush()
This behaviour is probably intentional, but sometimes using Enumerable
the Flow
can keep running after the process that consumes the stream it generates is gone. In the context I initially encountered this, we're running HTTP requests inside a Flow and might want to abort the group mid run (for a failure or a need to stop hitting the server we're pointed at). Here's a reduced example that demonstrates a Flow outliving the task that starts it (you may need to run it two or three times to witness)
def example() do
p =
spawn(fn ->
1..100
|> Flow.from_enumerable()
|> Flow.map(fn n ->
:timer.sleep(1000)
IO.inspect(n)
n
end)
|> Enum.map(fn n ->
n
end)
end)
:timer.sleep(5000)
Process.exit(p, :kill)
end
In working around this, I've made a module that cribs off of your implementation of Enumerable
that allows us to reduce over the results, but also be able to cancel the Flow
's processing:
defmodule FlowUtils do
@spec linked_reduce(Flow.t(), term, (term, term -> term)) :: term()
def linked_reduce(flow, acc, fun) do
{:ok, {pid, stream}} =
flow
|> flow_to_stream()
result = Enum.reduce(stream, acc, fun)
:ok = ensure_shutdown(pid)
result
end
@spec linked_reduce_while(Flow.t(), term, (term, term -> term)) :: term()
def linked_reduce_while(flow, acc, fun) do
{:ok, {pid, stream}} =
flow
|> flow_to_stream()
result = Enum.reduce_while(stream, acc, fun)
:ok = ensure_shutdown(pid)
result
end
defp flow_to_stream(flow) do
opts = [demand: :accumulate]
case Flow.Coordinator.start_link(flow, :producer_consumer, {:outer, fn _ -> [] end}, opts) do
{:ok, pid} ->
{:ok, {pid, Flow.Coordinator.stream(pid)}}
{:error, reason} ->
exit({reason, {__MODULE__, :flow_to_stream, [flow]}})
end
end
defp ensure_shutdown(flow_pid) do
Process.exit(flow_pid, :kill)
:ok
end
end
First, is there a more typical way to get this kind of assurance, that I can terminate a running Flow externally when enumerating over the emitted results? Second, is there a chance to add something like this to make cancellable Flows more user-friendly?
When there's an exception in Flow, the parent process dies. This is not good in cases like Oban where it leads to zombie jobs. I note that this happens in IEx as well (IEx restarts). Minimal example:
defmodule WillDie do
def dies(x) do
raise "OH NO #{inspect(x)}"
end
end
1..10 |> Flow.from_enumerable() |> Flow.map(&WillDie.dies/1) |> Enum.to_list()
I think there must be an obvious answer here to trap the exit and not crash the parent, but I don't know what it is.
When using a Flow as the enumerable for the events as a GenStage producer, the process can receive consumer messages causing it to error. Interestingly enough, most of the messages to get correctly routed to the consumer but there appears to be some issue with the producer also receiving them. It appears to be some sort of race case as removing the Process.sleep
causes it to work fine for me on my machine. Additionally, the error seems to only happen at the beginning and then the events will run fine afterwards.
Here is some example code that can recreate the issue:
defmodule FlowTest do
@moduledoc """
Documentation for FlowTest.
"""
def run do
{:ok, producer} = Producer.start_link()
{:ok, consumer} = Consumer.start_link()
GenStage.sync_subscribe(consumer, to: producer, max_demand: 200, min_demand: 100)
end
end
defmodule Producer do
use GenStage
#########################
# Public API
#########################
def start_link() do
GenStage.start_link(__MODULE__, :ok)
end
#########################
# GenStage Callbacks
#########################
def init(_) do
{:producer, %{cont: build_flow()}}
end
def handle_demand(demand, %{cont: cont} = state) do
case cont.({:cont, {[], demand}}) do
{:suspended, {list, 0}, cont} ->
{:noreply, :lists.reverse(list), %{state | cont: cont}}
{_finished, {list, _}} ->
IO.puts "Done!"
{:noreply, :lists.reverse(list), %{}}
end
end
#########################
# Private Helper
#########################
defp build_flow() do
flow =
0..1_000_000
|> Flow.from_enumerable(consumers: :permanent)
|> Flow.map(fn val ->
# Simulate a computation
Process.sleep(10)
val + 1
end)
# Largely borrowed from the GenStage.Streamer implementation
&Enumerable.reduce(flow, &1, fn
x, {acc, 1} ->
{:suspend, {[x | acc], 0}}
x, {acc, demand} ->
{:cont, {[x | acc], demand - 1}}
end)
end
end
defmodule Consumer do
use GenStage
def start_link() do
GenStage.start_link(__MODULE__, :ok, name: __MODULE__)
end
def init(_) do
{:consumer, :no_state}
end
def handle_events(events, _from, state) do
IO.puts "Got #{length events} events"
{:noreply, [], state}
end
end
Here is some of the error messages that come through:
13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.165.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.93>}},
[1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013,
1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026,
1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039,
1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, ...]}
13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.166.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.94>}},
[2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026,
2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039,
2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048, ...]}
13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.167.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.95>}},
[3001, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009, 3010, 3011, 3012, 3013,
3014, 3015, 3016, 3017, 3018, 3019, 3020, 3021, 3022, 3023, 3024, 3025, 3026,
3027, 3028, 3029, 3030, 3031, 3032, 3033, 3034, 3035, 3036, 3037, 3038, 3039,
3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, ...]}
13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.168.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.96>}},
[4001, 4002, 4003, 4004, 4005, 4006, 4007, 4008, 4009, 4010, 4011, 4012, 4013,
4014, 4015, 4016, 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026,
4027, 4028, 4029, 4030, 4031, 4032, 4033, 4034, 4035, 4036, 4037, 4038, 4039,
4040, 4041, 4042, 4043, 4044, 4045, 4046, 4047, 4048, ...]}
13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.169.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.97>}},
[5001, 5002, 5003, 5004, 5005, 5006, 5007, 5008, 5009, 5010, 5011, 5012, 5013,
5014, 5015, 5016, 5017, 5018, 5019, 5020, 5021, 5022, 5023, 5024, 5025, 5026,
5027, 5028, 5029, 5030, 5031, 5032, 5033, 5034, 5035, 5036, 5037, 5038, 5039,
5040, 5041, 5042, 5043, 5044, 5045, 5046, 5047, 5048, ...]}
13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.170.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.98>}},
[6001, 6002, 6003, 6004, 6005, 6006, 6007, 6008, 6009, 6010, 6011, 6012, 6013,
6014, 6015, 6016, 6017, 6018, 6019, 6020, 6021, 6022, 6023, 6024, 6025, 6026,
6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039,
6040, 6041, 6042, 6043, 6044, 6045, 6046, 6047, 6048, ...]}
13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
{#PID<0.171.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.99>}},
[7001, 7002, 7003, 7004, 7005, 7006, 7007, 7008, 7009, 7010, 7011, 7012, 7013,
7014, 7015, 7016, 7017, 7018, 7019, 7020, 7021, 7022, 7023, 7024, 7025, 7026,
7027, 7028, 7029, 7030, 7031, 7032, 7033, 7034, 7035, 7036, 7037, 7038, 7039,
7040, 7041, 7042, 7043, 7044, 7045, 7046, 7047, 7048, ...]}
I'm playing with Flow
and I'm trying to make my code raise an exception to test the behaviour of my parent GenServer
.
Thus, I ran a Enum.random
on an empty array. I can see several exceptions raised of type Enum.EmptyError
.
But I can also see that a last exception raised with this message:
** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2
Here is a code to reproduce this bug:
defmodule FlowBug do
def run do
1..1_000
|> Flow.from_enumerable()
|> Flow.partition()
|> Flow.reduce(fn -> [] end, &([&1 | &2]))
|> Flow.emit(:state)
|> Flow.partition()
|> Flow.map(&pick_one/1)
|> Flow.run
end
def pick_one(_) do
Enum.random([]) # Force exception here
end
end
Here is a sample of the output:
** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.8.905> => {:subscribed, #PID<0.175.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.906> => {:subscribed, #PID<0.176.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.908> => {:subscribed, #PID<0.178.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.909> => {:subscribed, #PID<0.179.0>, :permanent, 500, 1000, 1000}})
** (EXIT) an exception was raised:
** (Enum.EmptyError) empty error
(elixir) lib/enum.ex:1684: Enum.random/1
(flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
(flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
(flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
lib/gen_stage.ex:1598: GenStage.close_stream/1
(elixir) lib/stream.ex:1248: Stream.do_resource/5
(elixir) lib/enum.ex:1767: Enum.reverse/2
(elixir) lib/enum.ex:2528: Enum.to_list/1
(flow) lib/flow.ex:693: Flow.run/1
12:34:01.672 [error] GenServer #PID<0.181.0> terminating
** (Enum.EmptyError) empty error
(elixir) lib/enum.ex:1684: Enum.random/1
(flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
(flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
(flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.171.0>, #Reference<0.0.7.574>}, [[991, 972, 937, 926, 916, 898, 896, 856, 852, 850, 838, 814, 807, 795, 789, 781, 780, 750, 718, 701, 699, 697, 694, 688, 682, 679, 658, 649, 645, 628, 612, 610, 603, 595, 590, 581, 563, 549, 548, 545, 544, 538, 532, 524, 511, 494, 493, 485, ...]]}
State: {%{#Reference<0.0.7.570> => nil, #Reference<0.0.7.571> => nil, #Reference<0.0.7.572> => nil, #Reference<0.0.7.573> => nil, #Reference<0.0.7.574> => nil, #Reference<0.0.7.575> => nil, #Reference<0.0.7.576> => nil, #Reference<0.0.7.577> => nil}, %{active: [#Reference<0.0.7.577>, #Reference<0.0.7.576>, #Reference<0.0.7.575>, #Reference<0.0.7.574>, #Reference<0.0.7.570>], consumers: [{#Reference<0.0.8.903>, #Reference<0.0.8.911>}], done?: false, producers: %{#Reference<0.0.7.570> => #PID<0.167.0>, #Reference<0.0.7.571> => #PID<0.168.0>, #Reference<0.0.7.572> => #PID<0.169.0>, #Reference<0.0.7.573> => #PID<0.170.0>, #Reference<0.0.7.574> => #PID<0.171.0>, #Reference<0.0.7.575> => #PID<0.172.0>, #Reference<0.0.7.576> => #PID<0.173.0>, #Reference<0.0.7.577> => #PID<0.174.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.109138938/4 in Flow.Materialize.build_reducer/2>}
# Several other errors like the last one ....
# And finally this error :
12:34:01.675 [error] GenServer #PID<0.164.0> terminating
** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2
(flow) lib/flow/coordinator.ex:69: Flow.Coordinator.handle_info({:EXIT, #PID<0.165.0>, :shutdown}, %{intermediary: [{#PID<0.175.0>, []}, {#PID<0.176.0>, []}, {#PID<0.177.0>, []}, {#PID<0.178.0>, []}, {#PID<0.179.0>, []}, {#PID<0.180.0>, []}, {#PID<0.181.0>, []}, {#PID<0.182.0>, []}], parent_ref: #Reference<0.0.7.596>, producers: [#PID<0.166.0>], refs: [#Reference<0.0.7.588>], supervisor: #PID<0.165.0>})
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:667: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.165.0>, :shutdown}
State: %{intermediary: [{#PID<0.175.0>, []}, {#PID<0.176.0>, []}, {#PID<0.177.0>, []}, {#PID<0.178.0>, []}, {#PID<0.179.0>, []}, {#PID<0.180.0>, []}, {#PID<0.181.0>, []}, {#PID<0.182.0>, []}], parent_ref: #Reference<0.0.7.596>, producers: [#PID<0.166.0>], refs: [#Reference<0.0.7.588>], supervisor: #PID<0.165.0>}
I also tested with a simpler code but I don't have this FunctionClauseError
exception.
Here is the simpler code:
defmodule FlowBug do
def run do
[1..1_000, []] # Enum.random will raise an exception on the second item of the list
|> Flow.from_enumerable()
|> Flow.partition()
|> Flow.map(&Enum.random/1)
|> Flow.run
end
end
Here is the log:
** (exit) exited in: GenStage.close_stream(%{})
** (EXIT) an exception was raised:
** (Enum.EmptyError) empty error
(elixir) lib/enum.ex:1684: Enum.random/1
(flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
(flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
(flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
lib/gen_stage.ex:1598: GenStage.close_stream/1
(elixir) lib/stream.ex:1248: Stream.do_resource/5
(elixir) lib/enum.ex:1767: Enum.reverse/2
(elixir) lib/enum.ex:2528: Enum.to_list/1
(flow) lib/flow.ex:693: Flow.run/1
12:39:32.397 [error] GenServer #PID<0.173.0> terminating
** (Enum.EmptyError) empty error
(elixir) lib/enum.ex:1684: Enum.random/1
(flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
(flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
(flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
(flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.166.0>, #Reference<0.0.3.452>}, [[]]}
State: {%{#Reference<0.0.3.452> => nil}, %{active: [#Reference<0.0.3.452>], consumers: [{#Reference<0.0.3.468>, #Reference<0.0.3.476>}], done?: false, producers: %{#Reference<0.0.3.452> => #PID<0.166.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.109138938/4 in Flow.Materialize.build_reducer/2>}
In investigating Dialyzer errors, I found that after #80 and #81 the last remaining Dialyzer error stems from flow.ex
in the call
opts = [demand: :accumulate]
case Flow.Coordinator.start(flow, :producer_consumer, {:outer, fn _ -> [] end}, opts) do
which calls GenServer.start(__MODULE__, {flow, type, consumers, options}, options)
. Because :demand
is not a valid option for https://hexdocs.pm/elixir/GenServer.html#t:options/0, this makes Dialyzer upset. The call in Flow.Coordinator.start_link
has the same problem. The easiest way to make Dialyzer happy is simply to do something like:
filtered_options = Keyword.take(options, [:debug, :name, :timeout, :spawn_opt, :hibernate_after])
GenServer.start(__MODULE__, {flow, type, consumers, options}, filtered_options)
Is this something you'd be open to?
After this, #80 and #81, are all resolved Dialyzer can then be added to the CI process.
Hello, and sorry for the vague PR title.
I have the following flow, which results in the error below.
Flow:
window = Flow.Window.global() |> Flow.Window.trigger_every(5)
0..99
|> Flow.from_enumerable(window: window)
|> Flow.map(fn number -> Process.sleep(100); IO.inspect(number, label: :number) end)
|> Flow.group_by(fn i -> rem(i, 2) == 0 end)
|> Flow.map(fn batch -> Process.sleep(100); IO.inspect(batch, label: :batch) end)
|> Flow.run()
Output:
number: 0
number: 1
number: 2
number: 3
number: 4
batch: {false, [3, 1]}
batch: {true, [4, 2, 0]}
number: 5
10:22:25.684 [error] GenServer #PID<0.193.0> terminating
** (BadMapError) expected a map, got: [false: [3, 1], true: [4, 2, 0]]
(elixir 1.12.3) lib/map.ex:623: Map.update([false: [3, 1], true: [4, 2, 0]], false, [5], #Function<36.94148943/1 in Flow.group_by/3>)
(flow 1.1.0) lib/flow/materialize.ex:643: Flow.Materialize."-build_reducer/2-lists^foldl/2-0-"/3
(flow 1.1.0) lib/flow/materialize.ex:643: anonymous fn/5 in Flow.Materialize.build_reducer/2
(flow 1.1.0) lib/flow/materialize.ex:553: Flow.Materialize.maybe_punctuate/10
(flow 1.1.0) lib/flow/map_reducer.ex:59: Flow.MapReducer.handle_events/3
(gen_stage 1.1.2) lib/gen_stage.ex:2471: GenStage.consumer_dispatch/6
(gen_stage 1.1.2) lib/gen_stage.ex:2660: GenStage.take_pc_events/3
(stdlib 3.13.2) gen_server.erl:680: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.192.0>, #Reference<0.1202714464.28835843.142742>}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, ...]}
State: {%{#Reference<0.1202714464.28835843.142742> => nil}, %{done?: false, producers: %{#Reference<0.1202714464.28835843.142742> => #PID<0.192.0>}, trigger: #Function<2.2490666/3 in Flow.Window.Global.materialize/5>}, {0, 12}, {5, %{}}, #Function<1.2490666/4 in Flow.Window.Global.materialize/5>}
If I change the last line of my flow from Flow.run()
to Enum.to_list()
then everything works fine, but this is not an option for me as I am dealing with an unbounded stream in my real code. I went and took a quick look at lib/flow/materialize.ex:643
but it wasn't clear to my why a Map was needed, or how changing Flow.run
to Enum.to_list()
would prevent this from happening.
Version of flow is 1.1.0
and gen_stage is 1.1.2
.
In investigating the Dialyzer errors (see also #80), I found that flow.ex
operates on join
of both :outer
and :full_outer
. The docs and the usage in Flow.Materialize
and the tests suggest it should be :full_outer
but the there are uses of :outer
in the @join
module attribute and the @type join
, as well as several function calls. Which of these is actually intended?
Additionally, the typespec for @typep producers
needs to be | {:join, join, t, t, fun(), fun(), fun()}
with the join
as the second argument. Whenever the :full_outer
v :outer
is resolved, and this is added, that section of the errors will disappear.
When I write this code:
1..20
|> Flow.from_enumerable(min_demand: 0, max_demand: 5, stages: 1)
|> Flow.map(fn item ->
IO.puts "Flow.map #{item}"
item
end)
|> Enum.each(fn item ->
Process.sleep(10)
IO.puts "Enum.each #{item}"
end)
I expect it to print
Flow.map 1
Flow.map 2
Flow.map 3
Flow.map 4
Flow.map 5
Enum.each 1
Flow.map 6
Enum.each 2
and so on.
But when I run it, it prints this:
Flow.map 1
Flow.map 2
Flow.map 3
Flow.map 4
Flow.map 5
Flow.map 6
Flow.map 7
Flow.map 8
Flow.map 9
Flow.map 10
Flow.map 11
Flow.map 12
Flow.map 13
Flow.map 14
Flow.map 15
Flow.map 16
Flow.map 17
Flow.map 18
Flow.map 19
Flow.map 20
Enum.each 1
Enum.each 2
Enum.each 3
Enum.each 4
Enum.each 5
Enum.each 6
Enum.each 7
Enum.each 8
Enum.each 9
Enum.each 10
Enum.each 11
Enum.each 12
Enum.each 13
Enum.each 14
Enum.each 15
Enum.each 16
Enum.each 17
Enum.each 18
Enum.each 19
Enum.each 20
I think this means that the first call to Flow.map makes the consumer immediately output everything it has to offer, and doesn't properly back-pressure the demand that's slowed down by Enum.each.
In production, we are running code similar to this, except that 1..20
is a database query as a GenStage producer, and Enum.each is |> Stream.chunk_every(250) |> Enum.each(&Repo.insert_all(Schema, &1))
.
What we're seeing is that the query stage will immediately load all entries for the query as fast as it can and then the node crashes due to the memory being exhausted.
Hello,
I need to connect my flow to three different downstream flows. However, I'm unable to instantiate the downstream flows because all Flow constructors require that you have knowledge of the upstream source (which must be either materialized Flow.from_stages
or written as a GenStage module Flow.from_specs
).
However, if there was a way to construct a non-materialized flow (e.g. Flow.new()
), then I could do this:
file_parser = File.stream!(file) |> Flow.from_enumerable() # this is my existing flow
database_writer = Flow.new() |> Flow.filter(..) |> Flow.map(..) |> Flow.partition() |> Flow.reduce(..)
data_summarizer = Flow.new() |> Flow.flat_map(..) |> Flow.partition() |> Flow.reduce(..)
stats_collector = Flow.new() |> Flow.emit_and_reduce(..) |> Flow.on_trigger(..)
overall = file_parser |> Flow.through_flows(database_writer, data_summarizer, stats_collector)
In the overall flow, events coming out of upstream flow are copied into each of the downstream flows. As a bonus, this lets us further connect the downstream flows together into a more complex graph like this:
file_parser = File.stream!(file) |> Flow.from_enumerable() # this is my existing flow
database_writer = Flow.new() |> Flow.filter(..) |> Flow.map(..) |> Flow.partition() |> Flow.reduce(..)
data_summarizer = Flow.new() |> Flow.flat_map(..) |> Flow.partition() |> Flow.reduce(..)
stats_collector = Flow.new() |> Flow.emit_and_reduce(..) |> Flow.on_trigger(..)
data_summarizer = data_summarizer |> Flow.into_flows(database_writer)
stats_collector = stats_collector |> Flow.into_flows(database_writer)
overall = file_parser |> Flow.through_flows(database_writer, data_summarizer, stats_collector)
Thanks for your consideration.
There's some kind of race condition within bounded_join
. When the demand causes sufficiently large messages to be sent, the join direction being matched against is nil
. I'm currently working around this by shrinking my demand.
a = Flow.from_enumerable(Stream.zip(1..100_000, 1..100_000))
b = Flow.from_enumerable(Stream.zip(1..100_000, -1..-100_000))
Flow.bounded_join(:left_outer, a, b,
&elem(&1,0), &elem(&1,0),
fn
{k,v1}, {k,v2} -> {k, {v1,v2}}
{k,v1}, nil -> {k, {v1,nil}}
end, min_demand: 10_000, max_demand: 20_000)
|> Stream.run()
Exception details:
[error] GenServer #PID<0.28590.0> terminating
** (FunctionClauseError) no function clause matching in Flow.Materialize.dispatch_join/6
(flow) lib/flow/materialize.ex:328: Flow.Materialize.dispatch_join(
[{98431, {98431, 98431}}, {98439, {98439, 98439}}, {98442, {98442, 98442}}, {98443, {98443, 98443}}, {98446, {98446, 98446}}, {98449, {98449, 98449}}, {98454, {98454, 98454}}, {98457, {98457, 98457}}, {98458, {98458, 98458}}, {98461, {98461, 98461}}, {98465, {98465, 98465}}, {98466, {98466, 98466}}, {98467, {98467, 98467}}, {98470, {98470, 98470}}, {98472, {98472, 98472}}, {98474, {98474, 98474}}, {98475, {98475, 98475}}, {98476, {98476, 98476}}, {98484, {98484, 98484}}, {98485, {98485, 98485}}, {98488, {98488, 98488}}, {98495, {98495, 98495}}, {98501, {98501, 98501}}, {98502, {98502, 98502}}, {98516, {98516, 98516}}, {98519, {98519, 98519}}, {98524, {98524, 98524}}, {98526, {98526, 98526}}, {98531, {98531, 98531}}, {98548, {98548, 98548}}, {98552, {98552, 98552}}, {98553, {98553, 98553}}, {98561, {98561, 98561}}, {98566, {98566, 98566}}, {98571, {98571, 98571}}, {98574, {98574, 98574}}, {98576, {98576, 98576}}, {98585, {98585, 98585}}, {98599, {98599, 98599}}, {98603, {98603, 98603}}, {98614, {98614, 98614}}, {98615, {98615, 98615}}, {98619, {98619, 98619}}, {98621, {98621, 98621}}, {98632, {98632, 98632}}, {98634, {98634, 98634}}, {98639, {98639, 98639}}, {98643, {98643, ...}}, {98645, ...}, {...}, ...],
nil,
%{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, 66854}], 21938 => [{21938, 21938}], 38839 => [{38839, 38839}], 76035 => [{76035, 76035}], 75873 => [{75873, 75873}], 73528 => [{73528, ...}], 4969 => [...], ...},
%{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, -73528}], 4969 => [{4969, -4969}], 16129 => [{16129, -16129}], 75116 => [{75116, -75116}], 45250 => [{45250, -45250}], 74081 => [{74081, -74081}], 15093 => [{15093, ...}], 45659 => [...], ...},
#Function<7.85826293/2 in Wasatch.Jobs.JoinCatsTask.test_run/0>,
[]
)
(flow) lib/flow/materialize.ex:286: anonymous fn/6 in Flow.Materialize.join_ops/5
(flow) lib/flow/map_reducer.ex:49: Flow.MapReducer.handle_events/3
(gen_stage) lib/gen_stage.ex:2502: GenStage.consumer_dispatch/7
(gen_stage) lib/gen_stage.ex:2618: GenStage.take_pc_events/3
Last message: {:"$gen_producer", {#PID<0.326.0>, {#Reference<0.0.2.6935>, #Reference<0.0.2.6939>}}, {:ask, 680}}
State: {%{#Reference<0.0.2.6920> => nil}, %{done?: false, producers: %{#Reference<0.0.2.6920> => #PID<0.28587.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {2, 4}, {%{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, ...}], 21938 => [...], ...}, %{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, ...}], 4969 => [...], ...}, []},
#Function<21.105143239/4 in Flow.Materialize.join_ops/5>}
** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.2.6937> => {:subscribed, #PID<0.28588.0>, :transient, 500, 1000, 593}, #Reference<0.0.2.6938> => {:subscribed, #PID<0.28589.0>, :transient, 500, 1000, 898}, #Reference<0.0.2.6940> => {:subscribed, #PID<0.28591.0>, :transient, 500, 1000, 572}})
** (EXIT) an exception was raised:
** (FunctionClauseError) no function clause matching in Flow.Materialize.dispatch_join/6
(flow) lib/flow/materialize.ex:328: Flow.Materialize.dispatch_join(
[{98431, {98431, 98431}}, {98439, {98439, 98439}}, {98442, {98442, 98442}}, {98443, {98443, 98443}}, {98446, {98446, 98446}}, {98449, {98449, 98449}}, {98454, {98454, 98454}}, {98457, {98457, 98457}}, {98458, {98458, 98458}}, {98461, {98461, 98461}}, {98465, {98465, 98465}}, {98466, {98466, 98466}}, {98467, {98467, 98467}}, {98470, {98470, 98470}}, {98472, {98472, 98472}}, {98474, {98474, 98474}}, {98475, {98475, 98475}}, {98476, {98476, 98476}}, {98484, {98484, 98484}}, {98485, {98485, 98485}}, {98488, {98488, 98488}}, {98495, {98495, 98495}}, {98501, {98501, 98501}}, {98502, {98502, 98502}}, {98516, {98516, 98516}}, {98519, {98519, 98519}}, {98524, {98524, 98524}}, {98526, {98526, 98526}}, {98531, {98531, 98531}}, {98548, {98548, 98548}}, {98552, {98552, 98552}}, {98553, {98553, 98553}}, {98561, {98561, 98561}}, {98566, {98566, 98566}}, {98571, {98571, 98571}}, {98574, {98574, 98574}}, {98576, {98576, 98576}}, {98585, {98585, 98585}}, {98599, {98599, 98599}}, {98603, {98603, 98603}}, {98614, {98614, 98614}}, {98615, {98615, 98615}}, {98619, {98619, 98619}}, {98621, {98621, 98621}}, {98632, {98632, 98632}}, {98634, {98634, 98634}}, {98639, {98639, 98639}}, {98643, {98643, ...}}, {98645, ...}, {...}, ...],
nil,
%{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, 66854}], 21938 => [{21938, 21938}], 38839 => [{38839, 38839}], 76035 => [{76035, 76035}], 75873 => [{75873, 75873}], 73528 => [{73528, ...}], 4969 => [...], ...},
%{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, -73528}], 4969 => [{4969, -4969}], 16129 => [{16129, -16129}], 75116 => [{75116, -75116}], 45250 => [{45250, -45250}], 74081 => [{74081, -74081}], 15093 => [{15093, ...}], 45659 => [...], ...},
#Function<7.85826293/2 in Wasatch.Jobs.JoinCatsTask.test_run/0>,
[]
)
(flow) lib/flow/materialize.ex:286: anonymous fn/6 in Flow.Materialize.join_ops/5
(flow) lib/flow/map_reducer.ex:49: Flow.MapReducer.handle_events/3
(gen_stage) lib/gen_stage.ex:2502: GenStage.consumer_dispatch/7
(gen_stage) lib/gen_stage.ex:2618: GenStage.take_pc_events/3
(gen_stage) lib/gen_stage.ex:1705: GenStage.close_stream/1
(elixir) lib/stream.ex:1250: Stream.do_resource/5
(elixir) lib/stream.ex:570: Stream.run/1
I have this code:
defmodule T do
def hello do
r = Flow.from_enumerable([1,2,3,4,5,6])
|> Flow.partition()
|> Flow.map(fn x -> { x, x } end)
|> Flow.partition()
|> Flow.reduce( fn -> [] end, fn a, acc -> [ a | acc ] end)
|> Flow.emit(:state)
|> Enum.to_list
end
end
Result of this:
iex(1)> T.hello();
[[], [], [], [{4, 4}, {1, 1}], [{2, 2}, {3, 3}], [], [{6, 6}], [{5, 5}]]
I would want to spread items by key in last partition.
So i made this code:
defmodule T do
def hello do
r = Flow.from_enumerable([1,2,3,4,5,6])
|> Flow.partition()
|> Flow.map(fn x -> { x, x } end)
|> Flow.partition(key: { :elem, 2 })
|> Flow.reduce( fn -> [] end, fn a, acc -> [ a | acc ] end)
|> Flow.emit(:state)
|> Enum.to_list
end
end
But Flow makes error for this code:
iex(1)> T.hello()
** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.2.168> => {:subscribed, #PID<0.173.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.169> => {:subscribed, #PID<0.174.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.170> => {:subscribed, #PID<0.175.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.171> => {:subscribed, #PID<0.176.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.173> => {:subscribed, #PID<0.178.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.174> => {:subscribed, #PID<0.179.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.175> => {:subscribed, #PID<0.180.0>, :permanent, 500, 1000, 1000}})
** (EXIT) shutdown
(gen_stage) lib/gen_stage.ex:1598: GenStage.close_stream/1
(elixir) lib/stream.ex:1248: Stream.do_resource/5
(elixir) lib/enum.ex:1767: Enum.reverse/2
(elixir) lib/enum.ex:2528: Enum.to_list/1
iex(1)>
09:30:20.432 [error] GenServer #PID<0.167.0> terminating
** (ArgumentError) argument error
(flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
(gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
(elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
(gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
(gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
(gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
(gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.164.0>, #Reference<0.0.2.38>}, [1, 5]}
State: {%{#Reference<0.0.2.38> => nil}, %{active: [#Reference<0.0.2.38>], consumers: [#Reference<0.0.1.39>, #Reference<0.0.1.28>, #Reference<0.0.2.142>, #Reference<0.0.1.10>, #Reference<0.0.2.123>, #Reference<0.0.2.106>, #Reference<0.0.2.88>, #Reference<0.0.2.70>], done?: false, producers: %{#Reference<0.0.2.38> => #PID<0.164.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {2, 8}, [], #Function<33.45510982/4 in Flow.Materialize.mapper_ops/1>}
09:30:20.433 [error] GenServer #PID<0.168.0> terminating
** (ArgumentError) argument error
(flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
(gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
(elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
(gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
(gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
(gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
(gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.164.0>, #Reference<0.0.2.43>}, [3, 4]}
State: {%{#Reference<0.0.2.43> => nil}, %{active: [#Reference<0.0.2.43>], consumers: [#Reference<0.0.1.40>, #Reference<0.0.1.29>, #Reference<0.0.2.143>, #Reference<0.0.1.11>, #Reference<0.0.2.124>, #Reference<0.0.2.107>, #Reference<0.0.2.89>, #Reference<0.0.2.71>], done?: false, producers: %{#Reference<0.0.2.43> => #PID<0.164.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {3, 8}, [], #Function<33.45510982/4 in Flow.Materialize.mapper_ops/1>}
09:30:20.434 [error] GenServer #PID<0.179.0> terminating
** (ArgumentError) argument error
(flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
(gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
(elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
(gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
(gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
(gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
(gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
(stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:DOWN, #Reference<0.0.1.28>, :process, #PID<0.167.0>, {:badarg, [{Flow.Materialize, :"-hash_by_key/2-fun-1-", 3, [file: 'lib/flow/materialize.ex', line: 181]}, {GenStage.PartitionDispatcher, :"-dispatch/3-fun-0-", 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 196]}, {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: 'lib/enum.ex', line: 1755]}, {GenStage.PartitionDispatcher, :dispatch, 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 195]}, {GenStage, :dispatch_events, 3, [file: 'lib/gen_stage.ex', line: 2187]}, {GenStage, :consumer_dispatch, 7, [file: 'lib/gen_stage.ex', line: 2410]}, {GenStage, :take_pc_events, 3, [file: 'lib/gen_stage.ex', line: 2531]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 601]}]}}
State: {%{#Reference<0.0.1.26> => nil, #Reference<0.0.1.27> => nil, #Reference<0.0.1.29> => nil, #Reference<0.0.1.30> => nil, #Reference<0.0.1.31> => nil, #Reference<0.0.1.32> => nil, #Reference<0.0.1.33> => nil}, %{active: [#Reference<0.0.1.33>, #Reference<0.0.1.32>, #Reference<0.0.1.31>, #Reference<0.0.1.30>, #Reference<0.0.1.29>, #Reference<0.0.1.27>, #Reference<0.0.1.26>], consumers: [{#Reference<0.0.2.166>, #Reference<0.0.2.174>}], done?: false, producers: %{#Reference<0.0.1.26> => #PID<0.165.0>, #Reference<0.0.1.27> => #PID<0.166.0>, #Reference<0.0.1.29> => #PID<0.168.0>, #Reference<0.0.1.30> => #PID<0.169.0>, #Reference<0.0.1.31> => #PID<0.170.0>, #Reference<0.0.1.32> => #PID<0.171.0>, #Reference<0.0.1.33> => #PID<0.172.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.45510982/4 in Flow.Materialize.build_reducer/2>}
What is wrong in my code ?
I've been playing with an extension for Flow that is either a good idea or I'm thinking about it wrong.
Say I have 2 or tasks that must be called - and these tasks are not CPU bound (instead, IO Bound) so they are best run in parallel.
This could be accomplished by:
def fork_join(flow, fork1_fun, fork2_fun, join_fun) when is_function(fork1_fun, 1) and is_function(fork2_fun, 1) and is_function(join_fun, 2)
One problem I see is that there could be multiple subsequent signatures to accept [n] number of simultaneous joins.
Thoughts? Valuable? Other ways to think about it? I know I'm applying Rx-isms here which might not be right. I'm certainly happy to create a pull request if it helps.
Hello,
I would like to minimize the amount of passthrough traffic that goes through my flows because the memory overhead of message passing (as each item passes from one partition to the next) is raising the peak memory usage (VmHWM) of my app too high.
For example, here is a common use case found in my flows:
A
or B
A
and B
:
A
items to pass through to the next partition (don't do anything)B
items into output items belonging to category C
A
and C
:
A
items to pass through to the next partition (don't do anything)C
items into output items belonging to category A
A
items!I have many such flows (similar to the pattern described above) connected together.
Since the topology and interconnection are materialized by Flow, I'm wondering if there can be a way for me to give Flow a hint that certain items can bypass a given partition? For example, in the use case described above, I could provide an option to each Flow.partition()
saying Flow.partition(bypass: &(&1.category == :A))
and that would effectively fast-track 🏃♂️💨 all category A
traffic straight down to the bottom of the flow. 😇 Would this be possible?
Thanks for your consideration.
There is a race condition in the "enumerable-unpartioned-stream allows custom windowing test that is a bit tricky to reproduce. It's possible that of the 4 stages one is unlucky, stuck receiving numbers in the first window and only gets scheduled by the time the other three empty the enumerable. In this case this unlucky stage won't emit the second window since it never sees it.
Here's an example:
window =
Flow.Window.fixed(1, :second, fn
x when x <= 15 -> 0
x when x <= 30 -> 1_000
end)
loop = fn cb, n ->
windows =
Flow.from_enumerable(1..30, window: window, stages: 4, max_demand: 3)
|> Flow.reduce(fn -> 0 end, &(&1 + &2))
|> Flow.on_trigger(fn e, i, t -> IO.inspect({e, i, t}); {[e], e} end)
|> Enum.to_list()
if length(windows) == 8, do: cb.(cb, n + 1), else: raise inspect({n, windows})
end
loop.(loop, 1)
runs for quite some time, then the last iteration outputs something like:
{30, {1, 4}, {:fixed, 0, :done}}
{33, {3, 4}, {:fixed, 0, :done}}
{33, {0, 4}, {:fixed, 0, :done}}
{115, {1, 4}, {:fixed, 1000, :done}}
{117, {0, 4}, {:fixed, 1000, :done}}
{113, {3, 4}, {:fixed, 1000, :done}}
{24, {2, 4}, {:fixed, 0, :done}}
** (RuntimeError) {139919, [30, 33, 33, 115, 117, 113, 24]}
It's possible that with an artificial delay in the reduce step or using a larger enumerable would make the test more deterministic.
I have this test code:
defmodule T do
def hello do
{ :ok, f } = :file.open('t.txt', [ :write ])
Flow.from_enumerable(1..5000000)
|> Flow.partition(window: Flow.Window.periodic(1, :second))
|> Flow.map(fn x -> {x, [ :erlang.integer_to_binary(x), 10 ]} end)
|> Flow.partition(stages: 20000,window: Flow.Window.periodic(1, :second), key: {:elem, 0})
|> Flow.reduce( fn -> [] end, fn {_,a}, acc -> [ a | acc ] end)
|> Flow.emit(:state)
|> Flow.partition()
|> Flow.reduce( fn -> [] end, fn x, acc -> [ x | acc ] end)
|> Flow.emit(:state)
|> Flow.each(fn a ->
:file.write(f, a)
end)
|> Flow.run()
:file.close(f)
end
end
This code generates range 1..5000000 and writes values in file in parallel mode with aggregation data in buffers. This is simplified my real use case.
So i expects 5M unique values in file. But i found more:
$ wc t.txt
5009198 5009198 38962480 t.txt
$ sort t.txt | uniq -d | wc
9198 9198 73584
Every time these values are different. It is second launch:
$ sort t.txt | uniq -d | wc
3949 3949 31592
When i changed window type by default, i have right result:
def hello do
{ :ok, f } = :file.open('t.txt', [ :write ])
Flow.from_enumerable(1..5000000)
|> Flow.partition()
|> Flow.map(fn x -> {x, [ :erlang.integer_to_binary(x), 10 ]} end)
|> Flow.partition(stages: 20000, key: {:elem, 0})
|> Flow.reduce( fn -> [] end, fn {_,a}, acc -> [ a | acc ] end)
|> Flow.emit(:state)
|> Flow.partition()
|> Flow.reduce( fn -> [] end, fn x, acc -> [ x | acc ] end)
|> Flow.emit(:state)
|> Flow.each(fn a ->
:file.write(f, a)
end)
|> Flow.run()
:file.close(f)
end
end
$ sort t.txt | uniq -d | wc
0 0 0
So could you explain or fix this issue ?
Thanks.
Consider following stream processing:
File.stream!(path)
|> Enum.each(fn x -> rabbitmq_publish(x) end)
def rabbitmq_publish(x)
# publish and return nothing
nil
end
It does not need to return the data for further usage, so it's safe to immediately remove processed data from the memory, which seems to be the case when using plain streams (memory usage is constant).
However if I incorporate Flow into the mix:
File.stream!(path)
Flow.from_enumerable() |> Flow.each(fn x -> rabbitmq_publish(x) end) |> Flow.run
memory usage grows as if whole file was kept in the memory.
It seems the proper way that avoids memory usage would be to use Flow.map:
File.stream!(path)
Flow.from_enumerable() |> Flow.map(fn x -> rabbitmq_publish(x) end) |> Flow.run
Is this an expected behavior or something that could be improved? If it's expected, is it worth to mention it in the documentation?
Hello,
I'm getting :noproc
errors when I connect a short-lived GenStage producer to a Flow using Flow.from_stages()
. Below is a minimal example that reproduces the problem I'm seeing:
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:1] [hipe]
Interactive Elixir (1.9.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 2) |> Enum.to_list()
[1, 2, 3]
iex(2)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms);
ms end) |> Enum.to_list()
"#PID<0.194.0>: sleep 1"
"#PID<0.194.0>: sleep 2"
"#PID<0.194.0>: sleep 3"
[1000, 2000, 3000]
iex(3)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 2) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms);
ms end) |> Enum.to_list()
"#PID<0.200.0>: sleep 1"
12:14:48.166 [info] GenStage consumer #PID<0.201.0> is stopping after receiving cancel from producer #PID<0.197.0> with reason: :noproc
12:14:48.180 [error] GenServer #PID<0.201.0> terminating
** (stop) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Last message: {:DOWN, #Reference<0.3205571650.1190133762.109920>, :process, #PID<0.197.0>, :noproc}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.127884580/3 in Flow.Window.Global.materialize/5>}, {1, 2}, [], #Function<33.87744541/4 in Flow.Materialize.mapper_ops/1>}
"#PID<0.200.0>: sleep 2"
"#PID<0.200.0>: sleep 3"
** (exit) exited in: GenStage.close_stream(%{#Reference<0.3205571650.1190133762.109930> => {:subscribed, #PID<0.200.0>, :transient, 500, 1000, 1000}})
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
(elixir) lib/stream.ex:1400: Stream.do_resource/5
(elixir) lib/enum.ex:3023: Enum.reverse/1
(elixir) lib/enum.ex:2668: Enum.to_list/1
iex(3)>
How can I safely connect a short-lived GenStage producer to a Flow using multiple stages? 😕 See also GenStage: How to cancel a Flow from the producer? whose answer relies on an outdated GenStage API.
Thanks for your consideration.
Hello,
I'm using GenStage 0.14.2 and Flow master (at 1ffac6a) under Elixir 1.9.0 and Erlang/OTP 22, where I'm encountering :noproc
errors when I connect a short-lived GenStage producer to a Flow and then immediately partition that flow. Below is a minimal example for reproduction (see related issue #88).
In my actual use case, I'm connecting a large (but finite) GenStage to a Flow partition with 32 stages.
Thanks for your consideration.
from_stages()
has 1 stage, and partition()
has 1 stage. ✔️from_stages()
has 1 stage, and partition()
has 2 stages. ✔️from_stages()
has 1 stage, and partition()
has 3 stages. 💥Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:1] [hipe]
Interactive Elixir (1.9.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Flow.from_enumerable(1..3) |> Flow.partition(stages: 3) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.213.0>: sleep 2"
"#PID<0.203.0>: sleep 1"
"#PID<0.196.0>: sleep 3"
[1000, 2000, 3000]
iex(2)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 1) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.231.0>: sleep 1"
"#PID<0.231.0>: sleep 2"
"#PID<0.231.0>: sleep 3"
[1000, 2000, 3000]
iex(3)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 2) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.238.0>: sleep 1"
"#PID<0.239.0>: sleep 3"
"#PID<0.238.0>: sleep 2"
[3000, 1000, 2000]
iex(4)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 3) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.246.0>: sleep 2"
"#PID<0.247.0>: sleep 1"
16:29:54.396 pid=<0.248.0> [info] GenStage consumer #PID<0.248.0> is stopping after receiving cancel from producer #PID<0.245.0> with reason: :noproc
16:29:54.444 pid=<0.248.0> [error] GenServer #PID<0.248.0> terminating
** (stop) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Last message: {:DOWN, #Reference<0.1394119603.2769551375.77567>, :process, #PID<0.245.0>, :noproc}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.127884580/3 in Flow.Window.Global.materialize/5>}, {2, 3}, [], #Function<32.81753312/4 in Flow.Materialize.mapper_ops/1>}
"#PID<0.247.0>: sleep 3"
** (exit) exited in: GenStage.close_stream(%{#Reference<0.1394119603.2769551372.77002> => {:subscribed, #PID<0.246.0>, :transient, 500, 1000, 1000}, #Reference<0.1394119603.2769551372.77003> => {:subscribed, #PID<0.247.0>, :transient, 500, 1000, 1000}})
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
(elixir) lib/stream.ex:1400: Stream.do_resource/5
(elixir) lib/enum.ex:3023: Enum.reverse/1
(elixir) lib/enum.ex:2668: Enum.to_list/1
$ uname -a
Linux myhost 4.1.15.pnotify #18 SMP Thu May 18 15:50:05 PDT 2017 x86_64 GNU/Linux
$ elixir -v
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]
Elixir 1.9.0 (compiled with Erlang/OTP 22)
$ cat mix.lock
%{
"file_system": {:hex, :file_system, "0.2.7", "e6f7f155970975789f26e77b8b8d8ab084c59844d8ecfaf58cbda31c494d14aa", [:mix], [], "hexpm"},
"flow": {:git, "https://github.com/plataformatec/flow.git", "1ffac6a801602bf8b02192488e58ce5728b581aa", []},
"gen_stage": {:hex, :gen_stage, "0.14.2", "6a2a578a510c5bfca8a45e6b27552f613b41cf584b58210f017088d3d17d0b14", [:mix], [], "hexpm"},
"jason": {:hex, :jason, "1.1.2", "b03dedea67a99223a2eaf9f1264ce37154564de899fd3d8b9a21b1a6fd64afe7", [:mix], [{:decimal, "~> 1.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm"},
"mix_test_watch": {:hex, :mix_test_watch, "0.9.0", "c72132a6071261893518fa08e121e911c9358713f62794a90c95db59042af375", [:mix], [{:file_system, "~> 0.2.1 or ~> 0.3", [hex: :file_system, repo: "hexpm", optional: false]}], "hexpm"},
}
I'd like to be able to specify a longer timeout than 5000, and I know that this is an option than GenStage accepts.
In addition to :max_demand and :min_demand, could we get a :timeout option that's passed through to GenStage?
I am trying to understand how Flow's join work, but instead I am crashing Elixir w/ a segfault:
flow_a = Flow.from_enumerable(1_000..1_100)
flow_b = Flow.from_enumerable( 500..1_100)
flow = Flow.window_join(:inner, flow_a, flow_b, Flow.Window.global, & &1, & &1, fn a, a -> a end)
assert flow |> Enum.sort |> Enum.take(3) == [1_000, 1_001, 1_002]
Pushed the complete (hopefully reproducible) source here: https://github.com/larskluge/crash
Please advise. Thank you!
Hi!
This might be me misunderstanding how this is supposed to work, but given the following quote from the docs:
The difference between max_demand and min_demand works as the batch size when the producer is full. If the producer has fewer events than requested by consumers, it usually sends the remaining events available.
My intuitive understanding was that the following example would start to process the events as soon as they're available. What I'm seeing is that Flow.from_enumerable/2
seems to buffer until it has max_demand
events available to send downstream until processing starts. Which would mean that if events are coming in slowly, it might take a very long time for the pipeline to start executing.
Setting max_demand: 1
starts the downstream phases immediately, but if I understand the docs correctly this change shouldn't be necessary since demand as well as data is available for processing. Is this a bug or am I missing something important?
A full example can be found here: https://github.com/frekw/flow-example
defmodule Example.Queue do
def start_link(limit) do
BlockingQueue.start_link(limit, name: __MODULE__)
end
def push(x) do
BlockingQueue.push(__MODULE__, x)
end
def to_stream() do
BlockingQueue.pop_stream(__MODULE__)
end
end
defmodule Example.Producer do
require Logger
use GenServer
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, 0, opts)
end
def init(state) do
loop()
{:ok, state}
end
defp loop do
Process.send_after(self(), :loop, 10)
end
def handle_info(:loop, state) do
Logger.info("pushing: #{state}")
Example.Queue.push("message-#{state}")
loop()
{:noreply, state + 1}
end
end
defmodule Example.Pipeline do
require Logger
def start_link() do
Logger.info("Starting pipeline")
Example.Queue.to_stream()
# This seems to work as I would expect.
# |> Flow.from_enumerable(max_demand: 1)
# This wait for an initial 1000 messages before
# Flow.each starts running.
|> Flow.from_enumerable(min_demand: 1)
|> Flow.each(fn x -> Logger.info("handled: #{x}") end)
|> Flow.start_link()
end
end
defmodule Example do
use Application
def start(_type, _args) do
import Supervisor.Spec
children = [
worker(Example.Queue, [:infinity]),
worker(Example.Producer, []),
worker(Example.Pipeline, []),
]
Supervisor.start_link(children, strategy: :rest_for_one)
end
end
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.