GithubHelp home page GithubHelp logo

flow's People

Contributors

adrianomitre avatar arkham avatar asummers avatar axelson avatar costaraphael avatar davidsulc avatar emilsoman avatar gj avatar husseinmorsy avatar isaacsanders avatar jeroenvisser101 avatar josevalim avatar kianmeng avatar lackac avatar lmarlow avatar mhinz avatar neilw avatar optikfluffel avatar philss avatar rkushnir avatar simonmcconnell avatar sirwerto avatar sunaku avatar supersimple avatar tschmidleithner avatar whatyouhide avatar wojtekmach avatar wpcarro avatar wrgoldstein avatar zeeshanlakhani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flow's Issues

Carve events into windows by count and timeout

G’day!

I trust this’ll be a simple pointer to the documentation I somehow missed rather than needing to reverse-engineer the Flow.Window protocol:

How do I carve a stream of events into windows of maximum size x but with arrival time separated by no more than y? I’d like to send many at a time, but fewer than ten if the oldest has been waiting longer than I’d like.

Error in flow source not stoping parent process in a timely manner

The following is a super simplified version of some code that deals with external apis.
my problem is that as you can see, the function inside the stream is raising an error. But, sometimes the inspects at the bottom are reached, before the process is killed.

defmodule Test do
  def run do
    result =
      (fn (page) ->
        IO.inspect page, label: "page"
        raise "ups"
      end)
      |> create_stream
      |> Flow.from_enumerable()
      |> Flow.map(fn(_value) -> %{} end)
      |> Enum.to_list
    IO.inspect "It should not reach here, but sometimes does"
    IO.inspect result, label: "result"
  end

  def create_stream(api_func) do
    Stream.resource(fn -> 1 end, &api_func.(&1), fn _ -> :ok end)
  end
end

Test.run

When executing, sometimes I get:

$ mix run test.exs
page: 1
"It should not reach here, but sometimes does"
result: []
[error] GenServer #PID<0.502.0> terminating
** (RuntimeError) ups
    test.exs:6: anonymous fn/1 in Test.run/0
    (elixir) lib/stream.ex:1285: Stream.do_resource/5
    (gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
    (gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
    (gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:"$demand", :forward}}
State: #Function<0.55142349/1 in GenStage.Streamer.init/1>

if any of the work done by flow raised, I would expect the whole thing to fail in the Enum.to_list call, not afterwards.
Is there anything I can do stop the processing when running Enum.to_list? Shouldn't the process die when calling Enum.to_list

I originally posted it in ElixirForum HERE, but i'm more and more convinced that it's a bug somewhere

max_demand from Enum seems wrong

I'm not sure if this is a bug or a misunderstanding on my part. I've narrowed it down to this simple example which just prints out the demand received by the source of a Flow:

defmodule FlowBug do
  use GenStage

  def start_link(name) do
    GenStage.start_link(__MODULE__, name)
  end

  def init(name) do
    IO.puts("#{name} init")
    {:producer, name}
  end

  def handle_demand(demand, name) do
    IO.puts("#{name} DEMAND: #{demand}")
    {:noreply, [], name}
  end
end

If I use Flow.run(), the demand is 1, as I would expect:

iex(1)> {:ok, pid_flow} = FlowBug.start_link("Flow.run")
Flow.run init
{:ok, #PID<0.162.0>}
iex(2)> pid_flow |> Flow.from_stage(stages: 1, max_demand: 1) |> Flow.run
Flow.run DEMAND: 1

However, if I use an Enum of any sort (I often do this from iex when I am trying to debug things), max_demand does not seem to be respected:

iex(1)> {:ok, pid_enum} = FlowBug.start_link("Enum.to_list")
Enum.to_list init
{:ok, #PID<0.150.0>}
iex(2)> pid_enum |> Flow.from_stage(max_demand: 1) |> Enum.to_list
Enum.to_list DEMAND: 1000

I am using Flow for IO-bound things, like calling out to other services during web scraping, etc. I need the back-pressure features more than the parallelism for this. In debugging, I often will do this:

flow |> Stream.take(n) |> Enum.to_list

Just to get things working... I think this used to work fine, but now it seems different (or never worked).

Unexpected parallelism with Flow.from_enumerables

Looking through the code I found a theoretical issue or at least a potential source of confusion.

Let's look at an example:

enums_with_urls
|> Flow.from_enumerables(stages: 4, max_demand: 1)
|> Flow.map(&fetch_url/1)
|> Flow.run()

After looking at this code one would expect to have 4 processes that do the job of fetching urls from a particular API. That might be important because of throttling or just to prevent flooding a service.

However, if the input list contains more than 4 enumerables Flow will trust the job of the mapper operations to the generated GenStage.Streamer processes. Let's imagine that the input is generated from a directory listing where we have 100 files, each mapped to a Stream with File.stream!/3. Now we have 100 processes each fetching pages concurrently from some service that is suddenly getting more heat than expected. :)

I think we should either highlight this case in the documentation of Flow.from_enumberables/2 or remove this clause from the case expression.

Flow.partition does not route data to the right stage

I have a data set like this: [["1", "2", "3"], ["2", "2", "2"], ["1", "1", "1"], ["3", "3", "3"]], after partition Flow.partition(key: fn e -> Enum.at(e, 1) end), I surpose ["1", "2", "3"] and ["2", "2", "2"] should be routed to the same stage, but it not. Here is the example:

    a = [["1", "2", "3"], ["2", "2", "2"], ["1", "1", "1"], ["3", "3", "3"]]
    list = Flow.from_enumerable(a)
    |> Flow.partition(key: fn e -> Enum.at(e, 1) end)
    |> Flow.reduce(fn -> %{} end, fn(arr, map) ->
      new_map = case map do
        %{} ->
          Map.put(map, "earlier", arr)
        earlier ->
          old_number = Map.get(earlier, "earlier")
          |> Enum.at(1)
          new_number = Enum.at(arr, 1)
          IO.inspect(old_number == new_number)
          Map.put(earlier, "later", arr)
      end
      IO.inspect(new_map)
      new_map
    end)
    |> Enum.to_list()

The truth is data routed into four stages, and the element have a distinct stage, and the strange thing is the final list is [{"earlier", ["3", "3", "3"]}, {"earlier", ["1", "1", "1"]}].

Naming Stages

Hey all,

On my team, we have been using Flow and GenStage for the past 9 months or more. We use it to process streams of data from a number of different sources, and to process a lot of data throughout the day. Ideally, we don't shut down. Currently the system has OOM crashes and slows down because of backed-up message queues.

We consistently have an issue where a GenStage process will slow its processing, will begin to accumulate memory. The GenStages that we have written ourselves are all named, but the GenStage generated by our use of Flow are nameless. We have to guess where the problems exist and I think that is a problem.

I would like to be able to provide a name prefix, and to have each "stage" of the Flow (map, flat_map, filter, etc) to append another prefix, then have each GenStage within the "stage" append a further suffix, but just enough to not collide, like _#{i}, like the partitions in Registry.

An example:

input_stage
|> Flow.from_stage(window: initial_window)
|> Flow.flat_map(&processing_function/1, name: __MODULE__.Flow.ProcessingFunction)
|> Flow.partition(window: post_processing_window)
|> Flow.filter(&filtering_function/1, name: __MODULE__.Flow.FilteringFunction)
|> Flow.into_stages([output_stage], name: __MODULE__.Flow)

This will generate GenStage processes like:

__MODULE__.Flow.ProcessingFunction.FlatMap._0
__MODULE__.Flow.ProcessingFunction.FlatMap._1
__MODULE__.Flow.ProcessingFunction.FlatMap._2
__MODULE__.Flow.ProcessingFunction.FlatMap._3
__MODULE__.Flow.FilteringFunction.Filter._0
__MODULE__.Flow.FilteringFunction.Filter._1
__MODULE__.Flow.FilteringFunction.Filter._2
__MODULE__.Flow.FilteringFunction.Filter._3

A feature, like this, that allows us to reliably determine the line(s) of code from where a process originated, would take a lot of the guesswork out of debugging and performance tuning for my team, and make it so that we can use tools like :observer and WombatOAM more effectively.

If this is of interest to the maintainers, I would be happy to implement this feature, with your supervision & advice.

Flow.from_specs and failed starts

Hey all!

I have been seeing an error that I think comes from Flow.from_specs. It assumes that the result of the start_link anonymous function is {:ok, pid()}, and when it is {:error, term()}, the output of the system is not easy to debug.

I can work on a PR for this, but I just wanted to put it here, if someone got to it first.

There is a discussion about how a failure should be handled, and I am personally of the opinion that it should fail on startup, but I understand that there might be different opinions on this.

Race condition when a producer is closing too soon

There's a weird behaviour when dealing with a producer that has too little elements to deal with:

** (exit) exited in: GenStage.close_stream(%{#Reference<0.212111494.3424387074.139905> => {:subscribed, #PID<0.672.0>, :transient, 500, 1000, 1000}, #Reference<0.212111494.3424387074.139907> => {:subscribed, #PID<0.674.0>, :transient, 500, 1000, 1000}, #Reference<0.212111494.3424387074.139908> => {:subscribed, #PID<0.675.0>, :transient, 500, 1000, 1000}})
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: assert :ok = MyFlow.run()
     stacktrace:
       (gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
       (elixir) lib/stream.ex:1370: Stream.do_resource/5
       (elixir) lib/enum.ex:2979: Enum.reverse/1
       (elixir) lib/enum.ex:2611: Enum.to_list/1
       (flow) lib/flow.ex:1012: Flow.run/1
        my_flow_test.exs:46: (test)

Here is a gist that have minimal code for this bug: https://gist.github.com/Fenntasy/1e930da0f7b6c2055a660831d4406a96

You can toggle between lines 30 and 31 to provoke the error or to not have it occur.

When the list is only two element, the producer will send its :terminate event immediatly and the previous error will show up. Displaying something on the console (via IO.inspect) would also be enough to not have the error (presumably because it adds a little bit of time before the producer closes).

Am I doing something wrong or is it really a race condition?

Session windows are mentioned as supported but are not

The documentation in lib/flow/window.ex#L31 lists Session windows as a supported window type, but they were removed in 0.14:

This release also deprecates Flow.Window.session/3 as developers can trivially roll their own with more customization power and flexibility using emit_and_reduce/3 and on_trigger/2.

Should we remove the reference to session windows being supported, or add documentation on how to implement a session window using emit_and_reduce/3 and on_trigger/2?

I'll include an example here, in case it is helpful to anyone:

iex> data = [
...>   {"elixir", 1_000},
...>   {"erlang", 60_000},
...>   {"elixir", 3_200_000},
...>   {"erlang", 4_000_000},
...>   {"elixir", 4_100_000},
...>   {"erlang", 6_000_000}
...> ]
iex> flow = Flow.from_enumerable(data) |> Flow.partition(key: fn {k, _} -> k end, stages: 2)
iex> flow =
...>   Flow.emit_and_reduce(flow, fn -> %{} end, fn {word, time}, acc ->
...>     {count, prev_time} = Map.get(acc, word, {1, time})
...>
...>     if time - prev_time > 1_000_000 do
...>       {[{word, {count, prev_time}}], Map.put(acc, word, {1, time})}
...>     else
...>       {[], Map.update(acc, word, {1, time}, fn {count, _} -> {count + 1, time} end)}
...>     end
...>   end)
iex> flow = Flow.on_trigger(flow, fn acc -> {Enum.to_list(acc), :unused} end)
iex> Enum.to_list(flow)
[
  {"erlang", {1, 60000}},
  {"erlang", {2, 6000000}},
  {"elixir", {1, 1000}},
  {"elixir", {2, 4000000}}
]

Proposal: scan operator

Hi, I'm not sure if this is the right way to go about this, but I'd like to propose a scan operator similar to Stream.scan. Something like:

def scan(flow, initial, combine, opts \\ []) do
  scan_window = Flow.Window.global |> Flow.Window.trigger_every(1, :keep)

  flow
  |> Flow.partition(Keyword.put(opts, :window, scan_window))
  |> Flow.reduce(initial, combine)
  |> Flow.emit(:state)
end

Now things like realtime counts are viable:

Flow.from_stage(MyTextInputStage)
|> Flow.flat_map(&String.split(&1, " "))
|> Flow.scan(fn -> %{} end, fn word, acc ->
  Map.update(acc, word, 1, & &1 + 1)
end)
|> Flow.each(fn {word, count} -> Dashboards.update_word_count(word, count) end)
|> Flow.run

If you guys think it would be useful, I'd be more than happy to PR it with tests etc.

Error on sync/async subscribe to pid(s)/atom(s) returned from into_stages

So, we wanted to have a way to run GenStage.sync/async.subscribe and subscribe_to an into_stages producer_consumer pid that was already running, which should work, right, as per docs?

Here's some sample code that returns an error:

  • First, let's just use a consumer from the general docs
defmodule C do
  use GenStage

  def start_link() do
    GenStage.start_link(C, :ok)
  end

  def init(:ok) do
    {:consumer, :the_state_does_not_matter}
  end

  def handle_events(events, _from, state) do
    # Wait for a second.
    :timer.sleep(1000)

    # Inspect the events.
    IO.inspect(events)

    # We are a consumer, so we would never emit items.
    {:noreply, [], state}
  end
end
  • now, here's the the flow which gives us a {:ok, proc}, when ending w/ into_stages
{:ok, flowpid} = Flow.from_enumerable(1..20) |> Flow.filter(& rem(2, &1) == 0) |> Flow.into_stages([])
  • now, here's starting the consumer and subscribing (after the fact, as the flow already exists)
{:ok, c} = C.start_link() 

GenStage.async_subscribe(c, to: flowpid)
  • then, here comes the error
** (EXIT from #PID<0.429.0>) an exception was raised:
    ** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2
        (flow) lib/flow/coordinator.ex:69: Flow.Coordinator.handle_info({:"$gen_producer", {#PID<0.457.0>, #Reference<0.0.1.213>}, {:subscribe, nil, []}}, %{intermediary: [{#PID<0.462.0>, []}, {#PID<0.463.0>, []}, {#PID<0.464.0>, []}, {#PID<0.465.0>, []}, {#PID<0.466.0>, []}, {#PID<0.467.0>, []}, {#PID<0.468.0>, []}, {#PID<0.469.0>, []}], parent_ref: #Reference<0.0.1.200>, producers: [#PID<0.461.0>], refs: [#Reference<0.0.1.192>, #Reference<0.0.1.193>, #Reference<0.0.1.194>, #Reference<0.0.1.195>, #Reference<0.0.1.196>, #Reference<0.0.1.197>, #Reference<0.0.1.198>, #Reference<0.0.1.199>], supervisor: #PID<0.460.0>})
        (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
        (stdlib) gen_server.erl:667: :gen_server.handle_msg/5
        (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

I came up with a workable fix, but I wanted to see a) if it's even worthwhile and b) how the coordinator matching/checking can/should be updated to allow for sync or async subscriptions, as for any initial consumers, everything is a sync_subscribe as per https://github.com/elixir-lang/flow/blob/master/lib/flow/coordinator.ex#L54. Maybe we always want it sync no matter what? As of now, it's seems difficult to allow for async here, unless I'm missing something. Additionally, using a handle_info for this may just be a bad hack. So, I hope for a good solution :).

The hack:

diff --git a/lib/flow/coordinator.ex b/lib/flow/coordinator.ex
index ece4416..0da06bd 100644
--- a/lib/flow/coordinator.ex
+++ b/lib/flow/coordinator.ex
@@ -66,6 +66,13 @@ defmodule Flow.Coordinator do
     {:noreply, state}
   end

+  def handle_info({:"$gen_producer", {consumer, _}, _}, %{intermediary: intermediary} = state) do
+    for {pid, _} <- intermediary do
+        subscribe(consumer, pid)
+    end
+    {:noreply, state}
+  end
+
   def handle_info({:DOWN, ref, _, _, reason}, %{parent_ref: ref} = state) do
     {:stop, reason, state}
   end

Thanks.

Flow.map_batch/2 fails with CaseClauseError

# test.exs
Flow.from_enumerable(1..1000)
|> Flow.map_batch(fn batch -> [Enum.sum(batch)] end)
|> Flow.map(&IO.puts("Sum: #{&1}"))
|> Flow.run()
Click to see stacktrace
** (exit) exited in: Enumerable.Flow.reduce(%Flow{operations: [{:on_trigger, #Function<20.17204979/3 in Flow.inject_on_trigger/4>}, {:mapper, :map, [#Function<1.27327935 in file:test.exs>]}, {:batch, #Function<0.27327935 in file:test.exs>}], options: [stages: 32], producers: {:enumerables, [1..1000]}, window: %Flow.Window.Global{periodically: [], trigger: nil}}, {:cont, []}, #Function<146.29191728/2 in Enum.reverse/1>)
    ** (EXIT) an exception was raised:
        ** (CaseClauseError) no case clause matching: {[], [{:batch, #Function<0.27327935 in file:test.exs>}, {:mapper, :map, [#Function<1.27327935 in file:test.exs>]}, {:on_trigger, #Function<20.17204979/3 in Flow.inject_on_trigger/4>}]}
            (flow 1.0.0) lib/flow/materialize.ex:649: Flow.Materialize.build_trigger/1
            (flow 1.0.0) lib/flow/materialize.ex:611: Flow.Materialize.reducer_ops/1
            (flow 1.0.0) lib/flow/materialize.ex:45: Flow.Materialize.split_operations/1
            (flow 1.0.0) lib/flow/materialize.ex:17: Flow.Materialize.materialize/5
            (flow 1.0.0) lib/flow/coordinator.ex:34: Flow.Coordinator.init/1
            (stdlib 3.12.1) gen_server.erl:374: :gen_server.init_it/2
            (stdlib 3.12.1) gen_server.erl:342: :gen_server.init_it/6
            (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    (flow 1.0.0) lib/flow.ex:1995: Enumerable.Flow.reduce/3
    (elixir 1.10.3) lib/enum.ex:3383: Enum.reverse/1
    (elixir 1.10.3) lib/enum.ex:2982: Enum.to_list/1
    (flow 1.0.0) lib/flow.ex:992: Flow.run/1
    (elixir 1.10.3) lib/code.ex:926: Code.require_file/2
    (mix 1.10.3) lib/mix/tasks/run.ex:145: Mix.Tasks.Run.run/5
    (mix 1.10.3) lib/mix/tasks/run.ex:85: Mix.Tasks.Run.run/1
    (mix 1.10.3) lib/mix/task.ex:330: Mix.Task.run_task/3
    (mix 1.10.3) lib/mix/cli.ex:82: Mix.CLI.run_task/2
    (elixir 1.10.3) lib/code.ex:926: Code.require_file/2

into_collectable/2 and through_collectable/2 to complement from_enumerable(s)/1

Problem description

It is already possible to route a flow into a Collectable, e.g.

Flow.from_enumerable([1, 2, 3]) |> Enum.into([])

This works, but forces the collecting to happen in the context of the process creating the Flow, rather than as a separate GenStage consumer process, and therefore "hogs" the Flow-spawning process's inbox from being used for other purposes, as discussed here.)

This can be sensible, if the Flow-spawning process is then going to use the Collected data—it won't attempt to do anything else until the Enum.into/2 completes anyway, and once it proceeds, it will need everything that was delivered to it to reside in its own process's heap. But if the Collectable exists solely to cause side-effects upon insertion rather than as a value object that will carry around its inserted values, this blocking behavior can be suboptimal, since the (potentially long-lived) parent process will end up full of garbage—and blocking as it GC-sweeps—from the messages that were delivered from the GenStage.stream to the Collectable.

For example, Ecto's Ecto.Adapters.SQL.Stream struct supports the Collectable behavior, allowing code like this:

db_stream = Ecto.Adapters.SQL.stream(MyRepo, "COPY foo FROM STDIN WITH (FORMAT csv, HEADER false)")

MyRepo.transaction fn ->
  Enum.into(csv_flow, db_stream)
end

Here, the process executing the Ecto transaction will receive—and linearize!—all the data produced from csv_flow, only to pass it off again to db_stream, where the data will turn around and travel back out to a DBConnection process.

Proposed solution

Add a function, Flow.into_collectable(flow, collectable), which would be a terminal, demand-driving call for the Flow (like Enum.into/2 is.)

  • into_collectable/2 would pass each GenStage process in the current partition a copy of the collectable. For correct concurrency semantics, it may be advisable for collectable to actually be collectable_or_fn where the user could supply a fun that is called by each GenStage process in the partition, and which returns a concurrency-isolated instance of the collectable.)

  • Each GenStage process, upon receiving the collectable from into_collectable/2, would immediately call Collectable.into/1 on it the to get a reducer, and then would hold onto said reducer in its state.

  • Each GenStage process would then, in its handle_events/3, apply the reducer to the received events.

Optionally, one could also add a function Flow.through_collectable(flow, collectable), which would work similarly, but would be non-terminal. The partition would simply be extended with a step that passes events into the reducer—but then, having done so and having acquired the modified reducer, would simply pass those same events unmodified to the next step in the partition (along with storing the modified reducer in its state.)

Flow.through_collectable/2 would be perfect for use-cases like that of Ecto.Adapters.SQL.stream/2, where the goal is simply to cause the side-effect of storing the structs being processed into a database (i.e. "durable-izing" them) without necessarily wanting to end the processing of the structs there, and without necessarily having any need to linearize the durabilization process.

As well, both Flow.into_collectable/2 and Flow.through_collectable/2 would potentially get people to make a lot more of their libraries implement Collectable! The Collectable behavior is much simpler to implement than the GenStage consumer behavior; if implementing Collectable on a struct automatically gave a developer effectively all the advantages of a GenStage consumer, with only the time investment of writing the Collectable reducer, developers would likely be more interested in making their structs Collectable.

Sliding Window implementation

It seems like in the test there is use of a sliding window already.

Would be really cool to have an implementation of that in flow directly.

Flow.Window.sliding(count: 2, overlap: 1)

I'd be happy to implement it if there is need.

Not able to run map stages after reduce

After upgrading to 0.14 a previously working setup of Flow map steps -> reduce, emit(:state) -> further map steps no longer seems to work.

Example failing flow

1..50
|> Flow.from_enumerable()
|> Flow.map(& &1 * 2)
|> Flow.partition(stages: 1, window: Flow.Window.count(5))
|> Flow.reduce(fn -> 0 end, &+/2)
|> Flow.emit(:state)
|> Flow.filter(& &1 > 200)
|> Enum.to_list

Expected result

[230, 280, 330, 380, 430, 480]

Actual outcome

** (exit) exited in: Enumerable.Flow.reduce(%Flow{operations: [{:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}, {:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:reduce, #Function<20.127694169/0 in :erl_eval.expr/5>, &:erlang.+/2}], options: [stages: 1], producers: {:flows, [%Flow{operations: [{:mapper, :map, [#Function<6.127694169/1 in :erl_eval.expr/5>]}], options: [stages: 8], producers: {:enumerables, [1..50]}, window: %Flow.Window.Global{periodically: [], trigger: nil}}]}, window: %Flow.Window.Count{count: 5, periodically: [], trigger: nil}}, {:cont, []}, #Function<131.83463370/2 in Enum.reverse/1>)
    ** (EXIT) an exception was raised:
        ** (CaseClauseError) no case clause matching: {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}
            (flow) lib/flow/materialize.ex:600: Flow.Materialize.build_trigger/1
            (flow) lib/flow/materialize.ex:559: Flow.Materialize.reducer_ops/1
            (flow) lib/flow/materialize.ex:52: Flow.Materialize.split_operations/3
            (flow) lib/flow/materialize.ex:17: Flow.Materialize.materialize/4
            (flow) lib/flow/coordinator.ex:25: Flow.Coordinator.init/1
            (stdlib) gen_server.erl:374: :gen_server.init_it/2
            (stdlib) gen_server.erl:342: :gen_server.init_it/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    (flow) lib/flow.ex:1653: Enumerable.Flow.reduce/3
    (elixir) lib/enum.ex:1911: Enum.reverse/1
    (elixir) lib/enum.ex:2588: Enum.to_list/1
iex(8)> 19:13:46.115 [error] GenServer #PID<0.585.0> terminating
** (CaseClauseError) no case clause matching: {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}
    (flow) lib/flow/materialize.ex:600: Flow.Materialize.build_trigger/1
    (flow) lib/flow/materialize.ex:559: Flow.Materialize.reducer_ops/1
    (flow) lib/flow/materialize.ex:52: Flow.Materialize.split_operations/3
    (flow) lib/flow/materialize.ex:17: Flow.Materialize.materialize/4
    (flow) lib/flow/coordinator.ex:25: Flow.Coordinator.init/1
    (stdlib) gen_server.erl:374: :gen_server.init_it/2
    (stdlib) gen_server.erl:342: :gen_server.init_it/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.584.0>, {{:case_clause, {[], [{:on_trigger, #Function<5.47868220/3 in Flow.emit/2>}, {:mapper, :filter, [#Function<6.127694169/1 in :erl_eval.expr/5>]}]}}, [{Flow.Materialize, :build_trigger, 1, [file: 'lib/flow/materialize.ex', line: 600]}, {Flow.Materialize, :reducer_ops, 1, [file: 'lib/flow/materialize.ex', line: 559]}, {Flow.Materialize, :split_operations, 3, [file: 'lib/flow/materialize.ex', line: 52]}, {Flow.Materialize, :materialize, 4, [file: 'lib/flow/materialize.ex', line: 17]}, {Flow.Coordinator, :init, 1, [file: 'lib/flow/coordinator.ex', line: 25]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 374]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 342]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}}

This example is limited for the purpose of demonstrating the issue. I understand that in this case a simple Enum.filter/2 would do the job. The idea is that after reduce it should be possible to carry on with the flow and run further map and reduce steps.

error logged, but works?

The following code works (test passes) but an error is emitted to log.

defmodule FlowtestTest do
  use ExUnit.Case

  defmodule TestProducer do
    use GenStage

    # stage emits the string "one" continuously.

    def start_link(), do: GenStage.start_link(__MODULE__, :ok)
    def init(:ok), do: {:producer, "one"}
    def handle_demand(demand, state) do
      supply = fn -> state end
      |> Stream.repeatedly
      |> Enum.take(demand)

      {:noreply, supply, state}
    end
  end

  defmodule TestProsumer do
    use GenStage

   #stage takes strings and turns them into atoms.

    def start_link(), do: GenStage.start_link(__MODULE__, :ok)
    def init(:ok), do: {:producer_consumer, :ok}
    def handle_events(elist, _from, state) do
      {:noreply, Enum.map(elist, &String.to_atom/1), state}
    end
  end

  test "test" do
    {:ok, producer} = TestProducer.start_link()
    {:ok, prosumer} = TestProsumer.start_link()
    GenStage.sync_subscribe(prosumer, to: producer, partition: 1)

    assert [:one] == Flow.from_stages([prosumer])
    |> Enum.take(1)
  end

  test "test2" do
    {:ok, producer} = TestProducer.start_link()
    {:ok, prosumer} = TestProsumer.start_link()
    GenStage.sync_subscribe(prosumer, to: producer, partition: 1)

    assert [:one] == Flow.from_stage(prosumer)
    |> Enum.take(1)
  end
end

these error are emitted (omitting pretty green success dots):

13:49:19.513 [error] Demand mode can only be set for producers, GenStage #PID<0.190.0> is a producer_consumer
13:49:19.516 [error] Demand mode can only be set for producers, GenStage #PID<0.196.0> is a producer_consumer

documentation says:

"producers are already running stages that have type :producer or :producer_consumer"

so the error shouldn't be emitted, correct? Can submit a pr on this if my supposition is true.

Keep getting this error when running a release.

12:19:12.137 [error] Error in process #PID<0.1821.0> on node :"[email protected]" with exit value:
{:undef,
 [{Flow, :from_enumerable,
   [%File.Stream{line_or_bytes: :line, modes: [:raw, :read_ahead, :binary],
     path: "/root/subway/nginx_log/ccb-access.log-20170226", raw: true}], []},
  {Honey.Nginx, :read, 3, [file: 'lib/honey/nginx.ex', line: 37]}]}

That's what I got from the log. It works perfectly when running with mix.
The part of code:

 def read(:file, file_path \\ @file_path, options \\ @default_options) do
        enum = File.stream!(file_path)
        options = [file_name: file_path] ++ options
        read(:enum, enum, options)
    end
    def read(:enum, enum, options) do
        file_name = options[:file_name]
        ets_table_name = file_name |> String.to_atom
        case Honey.ETSServer.create(ets_table_name) do
            {:new, ets_table} ->
                Logger.debug "new ets table: #{inspect ets_table}"
                enum
                    |> Flow.from_enumerable()
                    |> Flow.flat_map(&String.split(&1, "\n"))
                    |> Flow.partition()
                    |> Flow.partition(window: Flow.Window.count(1000), stages: 4)
                    |> Flow.reduce(fn -> [] end, fn line, acc ->

Examples of supervised Flow?

Are there any examples of supervised Flow?

I have a producer stage and I want to setup Flow with Window to pass events to Consumers after processing. I realize that I probably can create a child in the supervision tree, something like worker(Flow, [Producer], [function: Flow.from_stage]) but how (where) do I add logic to that flow?

UPD: Hm... Just tried to launch empty flow and "something like" didn't work

Setup CI

It would be useful to run the Flow test suite against multiple versions of Elixir. A CI setup with Travis could help with that.

Right now it seems Flow and GenStage progress in lockstep which is fine until they both reach 1.0. After that it would be also useful to test Flow against multiple versions of GenStage.

Spec Incorrect in Flow.into_specs/3

According to the test here and the docs, the first argument of the tuple is a module, but the typespec here says that it should take a Supervisor.child_spec(). Should this have an | module() in the spec?

`emit_and_reduce/3` doesn't keep the shape of accumulator

The following code:

[1, 2, 3]
|> Flow.from_enumerable(stages: 1)
|> Flow.flat_map(&[&1])
|> Flow.emit_and_reduce(fn -> %{} end, fn event, acc ->
  IO.inspect acc, label: "ACC"
  {[event], Map.update(acc, event, 1, & &1 + 1)}
end)
|> Enum.to_list()

gives the output:

ACC: %{}
ACC: %{1 => 1}
ACC: %{1 => 1, 2 => 1}
[1, 2, 3, {1, 1}, {2, 1}, {3, 1}]

Change the line with Flow.flat_map/2 to be:

|> Flow.flat_map(&[&1, &1])

and now the shape of the accumulator in Flow.emit_and_reduce/3 gets corrupted:

ACC: %{}
ACC: {[1], %{1 => 1}}

15:55:30.022 [error] GenServer #PID<0.170.0> terminating
** (BadMapError) expected a map, got: {[1], %{1 => 1}}
    (elixir) lib/map.ex:595: Map.update({[1], %{1 => 1}}, 1, 1, #Function<12.115662033/1 in TestFlow.emit_and_reduce/0>)
    reduce.exs:47: anonymous fn/2 in TestFlow.emit_and_reduce/0
    (elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
    (flow) lib/flow/materialize.ex:623: anonymous fn/3 in Flow.Materialize.build_emit_and_reducer/2
    (flow) lib/flow/materialize.ex:630: Flow.Materialize."-build_emit_and_reducer/2-lists^foldl/2-1-"/3
    (flow) lib/flow/materialize.ex:630: anonymous fn/5 in Flow.Materialize.build_emit_and_reducer/2
    (flow) lib/flow/map_reducer.ex:54: Flow.MapReducer.handle_events/3
    (gen_stage) lib/gen_stage.ex:2315: GenStage.consumer_dispatch/6
Last message: {:"$gen_consumer", {#PID<0.169.0>, #Reference<0.210650528.1048579.149223>}, [1, 2, 3]}
State: {%{#Reference<0.210650528.1048579.149223> => nil}, %{done?: false, producers: %{#Reference<0.210650528.1048579.149223> => #PID<0.169.0>}, trigger: #Function<2.13930487/3 in Flow.Window.Global.materialize/5>}, {0, 1}, %{}, #Function<2.60253262/4 in Flow.Materialize.build_emit_and_reducer/2>}
** (exit) exited in: GenStage.close_stream(%{})
    ** (EXIT) an exception was raised:
        ** (BadMapError) expected a map, got: {[1], %{1 => 1}}
            (elixir) lib/map.ex:595: Map.update({[1], %{1 => 1}}, 1, 1, #Function<12.115662033/1 in TestFlow.emit_and_reduce/0>)
            reduce.exs:47: anonymous fn/2 in TestFlow.emit_and_reduce/0
            (elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
            (flow) lib/flow/materialize.ex:623: anonymous fn/3 in Flow.Materialize.build_emit_and_reducer/2
            (flow) lib/flow/materialize.ex:630: Flow.Materialize."-build_emit_and_reducer/2-lists^foldl/2-1-"/3
            (flow) lib/flow/materialize.ex:630: anonymous fn/5 in Flow.Materialize.build_emit_and_reducer/2
            (flow) lib/flow/map_reducer.ex:54: Flow.MapReducer.handle_events/3
            (gen_stage) lib/gen_stage.ex:2315: GenStage.consumer_dispatch/6
    (gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
    (elixir) lib/stream.ex:1370: Stream.do_resource/5
    (elixir) lib/enum.ex:2979: Enum.reverse/1
    (elixir) lib/enum.ex:2611: Enum.to_list/1
    reduce.exs:57: (file)

Possibly inaccurate doc about the use of partition

Hi all,

I was following this section of the Flow documentation regarding partition: https://hexdocs.pm/flow/Flow.html#module-partitioning

If I run this code which doesn't have the partition step:

defmodule Test do
  def run do
    {:ok, stream} =
      "roses are red\nviolets are blue\n"
      |> StringIO.open()

    stream
    |> IO.binstream(:line)
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
      Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()
  end
end

I should receive something like:

[{"roses", 1}, {"are", 1}, {"red", 1}, {"violets", 1}, {"are", 1}, {"blue", 1}]

But instead I see this:

[{"are", 2}, {"blue\n", 1}, {"red\n", 1}, {"roses", 1}, {"violets", 1}]

Possible Issue With Flow.through_specs/3

Hello!

I have a simple flow set up here: https://github.com/ninjanicely/flow_fun which demonstrates my issue.
The "problem" I'm seeing is lack of parallelism when using Flow.from_spec |> Flow.through_spec. Here's some example code below..

defmodule MyFlow do

  def start_link(_opts) do
    Flow.from_specs([{ A , 1000 }], [max_demand:  1])
    #|> Flow.map(&B.factorial/1)
    |> Flow.through_specs( [ { {B, []} , []} ] )
    |> Flow.into_specs( [ {C, []} ], [] )
  end

end

So, brief background, B has a factorial function in it, I use that to tax the cpu.
The handle_events/3 function in B calls the factorial function.
If I use Flow.through_specs/3, I see roughly 100% cpu usage on my machine (where I would expect to see 400%). When I commented out the Flow.through_specs/3 line and replace
with the Flow.map/2 function, I see all cpus being used.

I was hoping that I could force the parallelization in the Flow.through_specs/3 call, but that doesn't seem to be the case. Am I missing something? Should I see parallelism when configuring a Flow like this?

Thanks!

How to catch exceptions?

Related to: elixir-lang/gen_stage#132

I have malformed CSV file:

this,is,malformed,"csv,data

and even if I do:

    try do
      file_path
      |> File.stream!()
      |> NimbleCSV.RFC4180.parse_stream()
      |> Flow.from_enumerable()
      |> Flow.partition()
      |> Enum.to_list()
    catch
      :exit, exit -> nil
    end

I'm still getting:

08:02:07.440 [error] GenServer #PID<0.184.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
    (nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
    (elixir) lib/stream.ex:800: Stream.do_transform/8
    (gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
    (gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
    (gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:"$demand", :forward}}
State: #Function<0.55142349/1 in GenStage.Streamer.init/1>

08:02:07.447 [info]  GenStage consumer #PID<0.185.0> is stopping after receiving cancel from producer #PID<0.184.0> with reason: {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"},
 [{NimbleCSV.RFC4180, :finalize_parser, 1,
   [file: 'lib/nimble_csv.ex', line: 207]},
  {Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]},
  {GenStage.Streamer, :handle_demand, 2,
   [file: 'lib/gen_stage/streamer.ex', line: 18]},
  {GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]},
  {GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3,
   [file: 'lib/gen_stage.ex', line: 2209]},
  {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]},
  {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]},
  {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}


08:02:07.449 [error] GenServer #PID<0.185.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
    (nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
    (elixir) lib/stream.ex:800: Stream.do_transform/8
    (gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
    (gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
    (gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:DOWN, #Reference<0.3607181968.2794192898.201176>, :process, #PID<0.184.0>, {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"}, [{NimbleCSV.RFC4180, :finalize_parser, 1, [file: 'lib/nimble_csv.ex', line: 207]}, {Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]}, {GenStage.Streamer, :handle_demand, 2, [file: 'lib/gen_stage/streamer.ex', line: 18]}, {GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]}, {GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3, [file: 'lib/gen_stage.ex', line: 2209]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.79412627/4 in Flow.Window.Global.materialize/5>}, {0, 4}, [], #Function<33.66250525/4 in Flow.Materialize.mapper_ops/1>}

Is there a way to handle that?

Flow exiting with normal reason

I am trying to supervise my Flow, and restart it if something goes wrong. In my supervisor init/1 I have something like this:

children = [
  worker(MyStreamer, [], restart: :transient)
]

MyStreamer has a start_link function that just creates the Flow then starts it:

def start_link(_) do
  flow = make_flow()
  Flow.start_link(flow)
end

The problem is that my supervisor is not restarting my flow when a runtime exception occurs, because the exit messages have :normal reasons like this:

{:EXIT, #PID<0.195.0>, :normal}

I was able to reproduce it by running this example in iex which raises a runtime error but has a normal exit reason:

Process.flag(:trap_exit, true)
flow = [3,2,1,0,-1,-2,-3] |> Flow.from_enumerable() |> Flow.map(&(10/&1))
Flow.start_link(flow)
flush()

I am expecting to see a non-normal exit reason like when doing this in iex:

Process.flag(:trap_exit, true)
spawn_link(fn -> 10/0 end)
flush()

Flow outlives a Task that starts it

This behaviour is probably intentional, but sometimes using Enumerable the Flow can keep running after the process that consumes the stream it generates is gone. In the context I initially encountered this, we're running HTTP requests inside a Flow and might want to abort the group mid run (for a failure or a need to stop hitting the server we're pointed at). Here's a reduced example that demonstrates a Flow outliving the task that starts it (you may need to run it two or three times to witness)

  def example() do
    p =
      spawn(fn ->
        1..100
        |> Flow.from_enumerable()
        |> Flow.map(fn n ->
          :timer.sleep(1000)
          IO.inspect(n)
          n
        end)
        |> Enum.map(fn n ->
          n
        end)
      end)
    :timer.sleep(5000)
    Process.exit(p, :kill)
  end

In working around this, I've made a module that cribs off of your implementation of Enumerable that allows us to reduce over the results, but also be able to cancel the Flow's processing:

defmodule FlowUtils do

  @spec linked_reduce(Flow.t(), term, (term, term -> term)) :: term()
  def linked_reduce(flow, acc, fun) do
    {:ok, {pid, stream}} =
      flow
      |> flow_to_stream()

    result = Enum.reduce(stream, acc, fun)
    :ok = ensure_shutdown(pid)

    result
  end

  @spec linked_reduce_while(Flow.t(), term, (term, term -> term)) :: term()
  def linked_reduce_while(flow, acc, fun) do
    {:ok, {pid, stream}} =
      flow
      |> flow_to_stream()

    result = Enum.reduce_while(stream, acc, fun)
    :ok = ensure_shutdown(pid)

    result
  end

  defp flow_to_stream(flow) do
    opts = [demand: :accumulate]

      case Flow.Coordinator.start_link(flow, :producer_consumer, {:outer, fn _ -> [] end}, opts) do
        {:ok, pid} ->
          {:ok, {pid, Flow.Coordinator.stream(pid)}}

        {:error, reason} ->
          exit({reason, {__MODULE__, :flow_to_stream, [flow]}})
      end
  end

  defp ensure_shutdown(flow_pid) do
    Process.exit(flow_pid, :kill)

    :ok
  end
end

First, is there a more typical way to get this kind of assurance, that I can terminate a running Flow externally when enumerating over the emitted results? Second, is there a chance to add something like this to make cancellable Flows more user-friendly?

Exceptions in Flow killing parent Oban process

When there's an exception in Flow, the parent process dies. This is not good in cases like Oban where it leads to zombie jobs. I note that this happens in IEx as well (IEx restarts). Minimal example:

defmodule WillDie do
  def dies(x) do
    raise "OH NO #{inspect(x)}"
  end
end

1..10 |> Flow.from_enumerable() |> Flow.map(&WillDie.dies/1) |> Enum.to_list()

I think there must be an obvious answer here to trap the exit and not crash the parent, but I don't know what it is.

Using a Flow with a GenStage producer causes errors

When using a Flow as the enumerable for the events as a GenStage producer, the process can receive consumer messages causing it to error. Interestingly enough, most of the messages to get correctly routed to the consumer but there appears to be some issue with the producer also receiving them. It appears to be some sort of race case as removing the Process.sleep causes it to work fine for me on my machine. Additionally, the error seems to only happen at the beginning and then the events will run fine afterwards.

Here is some example code that can recreate the issue:

defmodule FlowTest do
  @moduledoc """
  Documentation for FlowTest.
  """

  def run do
    {:ok, producer} = Producer.start_link()
    {:ok, consumer} = Consumer.start_link()

    GenStage.sync_subscribe(consumer, to: producer, max_demand: 200, min_demand: 100)
  end
end

defmodule Producer do
  use GenStage

  #########################
  # Public API
  #########################
  
  def start_link() do
    GenStage.start_link(__MODULE__, :ok)
  end

  #########################
  # GenStage Callbacks
  #########################

  def init(_) do
    {:producer, %{cont: build_flow()}}
  end

  def handle_demand(demand, %{cont: cont} = state) do
    case cont.({:cont, {[], demand}}) do
      {:suspended, {list, 0}, cont} ->
        {:noreply, :lists.reverse(list), %{state | cont: cont}}

      {_finished, {list, _}} ->
        IO.puts "Done!"
        {:noreply, :lists.reverse(list), %{}}
    end
  end

  #########################
  # Private  Helper
  #########################
  defp build_flow() do
    flow =
      0..1_000_000
      |> Flow.from_enumerable(consumers: :permanent)
      |> Flow.map(fn val ->
        # Simulate a computation
        Process.sleep(10)
        val + 1
      end)

    # Largely borrowed from the GenStage.Streamer implementation
    &Enumerable.reduce(flow, &1, fn
      x, {acc, 1} ->
        {:suspend, {[x | acc], 0}}
      x, {acc, demand} ->
        {:cont, {[x | acc], demand - 1}}
    end)
  end

end

defmodule Consumer do
  use GenStage

  def start_link() do
    GenStage.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  def init(_) do
    {:consumer, :no_state}
  end

  def handle_events(events, _from, state) do
    IO.puts "Got #{length events} events"
    {:noreply, [], state}
  end
end

Here is some of the error messages that come through:

13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.165.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.93>}},
 [1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013,
  1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026,
  1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039,
  1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, ...]}


13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.166.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.94>}},
 [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
  2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026,
  2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039,
  2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048, ...]}


13:16:59.215 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.167.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.95>}},
 [3001, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009, 3010, 3011, 3012, 3013,
  3014, 3015, 3016, 3017, 3018, 3019, 3020, 3021, 3022, 3023, 3024, 3025, 3026,
  3027, 3028, 3029, 3030, 3031, 3032, 3033, 3034, 3035, 3036, 3037, 3038, 3039,
  3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, ...]}


13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.168.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.96>}},
 [4001, 4002, 4003, 4004, 4005, 4006, 4007, 4008, 4009, 4010, 4011, 4012, 4013,
  4014, 4015, 4016, 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026,
  4027, 4028, 4029, 4030, 4031, 4032, 4033, 4034, 4035, 4036, 4037, 4038, 4039,
  4040, 4041, 4042, 4043, 4044, 4045, 4046, 4047, 4048, ...]}


13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.169.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.97>}},
 [5001, 5002, 5003, 5004, 5005, 5006, 5007, 5008, 5009, 5010, 5011, 5012, 5013,
  5014, 5015, 5016, 5017, 5018, 5019, 5020, 5021, 5022, 5023, 5024, 5025, 5026,
  5027, 5028, 5029, 5030, 5031, 5032, 5033, 5034, 5035, 5036, 5037, 5038, 5039,
  5040, 5041, 5042, 5043, 5044, 5045, 5046, 5047, 5048, ...]}


13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.170.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.98>}},
 [6001, 6002, 6003, 6004, 6005, 6006, 6007, 6008, 6009, 6010, 6011, 6012, 6013,
  6014, 6015, 6016, 6017, 6018, 6019, 6020, 6021, 6022, 6023, 6024, 6025, 6026,
  6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039,
  6040, 6041, 6042, 6043, 6044, 6045, 6046, 6047, 6048, ...]}


13:16:59.216 [error] GenStage producer Producer received $gen_consumer message: {:"$gen_consumer",
 {#PID<0.171.0>, {#Reference<0.0.1.90>, #Reference<0.0.1.99>}},
 [7001, 7002, 7003, 7004, 7005, 7006, 7007, 7008, 7009, 7010, 7011, 7012, 7013,
  7014, 7015, 7016, 7017, 7018, 7019, 7020, 7021, 7022, 7023, 7024, 7025, 7026,
  7027, 7028, 7029, 7030, 7031, 7032, 7033, 7034, 7035, 7036, 7037, 7038, 7039,
  7040, 7041, 7042, 7043, 7044, 7045, 7046, 7047, 7048, ...]}

Flow exited with FunctionClauseException

I'm playing with Flow and I'm trying to make my code raise an exception to test the behaviour of my parent GenServer.
Thus, I ran a Enum.random on an empty array. I can see several exceptions raised of type Enum.EmptyError.
But I can also see that a last exception raised with this message:

** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2

Here is a code to reproduce this bug:

defmodule FlowBug do
  def run do
    1..1_000
    |> Flow.from_enumerable()
    |> Flow.partition()
    |> Flow.reduce(fn -> [] end, &([&1 | &2]))
    |> Flow.emit(:state)
    |> Flow.partition()
    |> Flow.map(&pick_one/1)
    |> Flow.run
  end

  def pick_one(_) do
    Enum.random([])  # Force exception here
  end
end

Here is a sample of the output:

** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.8.905> => {:subscribed, #PID<0.175.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.906> => {:subscribed, #PID<0.176.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.908> => {:subscribed, #PID<0.178.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.8.909> => {:subscribed, #PID<0.179.0>, :permanent, 500, 1000, 1000}})
    ** (EXIT) an exception was raised:
        ** (Enum.EmptyError) empty error
            (elixir) lib/enum.ex:1684: Enum.random/1
            (flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
            (flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
            (flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
            (flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
            lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
            lib/gen_stage.ex:2531: GenStage.take_pc_events/3
            (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
             lib/gen_stage.ex:1598: GenStage.close_stream/1
    (elixir) lib/stream.ex:1248: Stream.do_resource/5
    (elixir) lib/enum.ex:1767: Enum.reverse/2
    (elixir) lib/enum.ex:2528: Enum.to_list/1
      (flow) lib/flow.ex:693: Flow.run/1
12:34:01.672 [error] GenServer #PID<0.181.0> terminating
** (Enum.EmptyError) empty error
    (elixir) lib/enum.ex:1684: Enum.random/1
    (flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
    (flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
    (flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
    (flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
    lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
    lib/gen_stage.ex:2531: GenStage.take_pc_events/3
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.171.0>, #Reference<0.0.7.574>}, [[991, 972, 937, 926, 916, 898, 896, 856, 852, 850, 838, 814, 807, 795, 789, 781, 780, 750, 718, 701, 699, 697, 694, 688, 682, 679, 658, 649, 645, 628, 612, 610, 603, 595, 590, 581, 563, 549, 548, 545, 544, 538, 532, 524, 511, 494, 493, 485, ...]]}
State: {%{#Reference<0.0.7.570> => nil, #Reference<0.0.7.571> => nil, #Reference<0.0.7.572> => nil, #Reference<0.0.7.573> => nil, #Reference<0.0.7.574> => nil, #Reference<0.0.7.575> => nil, #Reference<0.0.7.576> => nil, #Reference<0.0.7.577> => nil}, %{active: [#Reference<0.0.7.577>, #Reference<0.0.7.576>, #Reference<0.0.7.575>, #Reference<0.0.7.574>, #Reference<0.0.7.570>], consumers: [{#Reference<0.0.8.903>, #Reference<0.0.8.911>}], done?: false, producers: %{#Reference<0.0.7.570> => #PID<0.167.0>, #Reference<0.0.7.571> => #PID<0.168.0>, #Reference<0.0.7.572> => #PID<0.169.0>, #Reference<0.0.7.573> => #PID<0.170.0>, #Reference<0.0.7.574> => #PID<0.171.0>, #Reference<0.0.7.575> => #PID<0.172.0>, #Reference<0.0.7.576> => #PID<0.173.0>, #Reference<0.0.7.577> => #PID<0.174.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.109138938/4 in Flow.Materialize.build_reducer/2>}

# Several other errors like the last one ....
# And finally this error :

12:34:01.675 [error] GenServer #PID<0.164.0> terminating
** (FunctionClauseError) no function clause matching in Flow.Coordinator.handle_info/2
    (flow) lib/flow/coordinator.ex:69: Flow.Coordinator.handle_info({:EXIT, #PID<0.165.0>, :shutdown}, %{intermediary: [{#PID<0.175.0>, []}, {#PID<0.176.0>, []}, {#PID<0.177.0>, []}, {#PID<0.178.0>, []}, {#PID<0.179.0>, []}, {#PID<0.180.0>, []}, {#PID<0.181.0>, []}, {#PID<0.182.0>, []}], parent_ref: #Reference<0.0.7.596>, producers: [#PID<0.166.0>], refs: [#Reference<0.0.7.588>], supervisor: #PID<0.165.0>})
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:667: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.165.0>, :shutdown}
State: %{intermediary: [{#PID<0.175.0>, []}, {#PID<0.176.0>, []}, {#PID<0.177.0>, []}, {#PID<0.178.0>, []}, {#PID<0.179.0>, []}, {#PID<0.180.0>, []}, {#PID<0.181.0>, []}, {#PID<0.182.0>, []}], parent_ref: #Reference<0.0.7.596>, producers: [#PID<0.166.0>], refs: [#Reference<0.0.7.588>], supervisor: #PID<0.165.0>}

I also tested with a simpler code but I don't have this FunctionClauseError exception.
Here is the simpler code:

defmodule FlowBug do
  def run do
    [1..1_000, []]  # Enum.random will raise an exception on the second item of the list
    |> Flow.from_enumerable()
    |> Flow.partition()
    |> Flow.map(&Enum.random/1)
    |> Flow.run
  end
end

Here is the log:

** (exit) exited in: GenStage.close_stream(%{})
    ** (EXIT) an exception was raised:
        ** (Enum.EmptyError) empty error
            (elixir) lib/enum.ex:1684: Enum.random/1
            (flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
            (flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
            (flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
            (flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
            lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
            lib/gen_stage.ex:2531: GenStage.take_pc_events/3
            (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
             lib/gen_stage.ex:1598: GenStage.close_stream/1
    (elixir) lib/stream.ex:1248: Stream.do_resource/5
    (elixir) lib/enum.ex:1767: Enum.reverse/2
    (elixir) lib/enum.ex:2528: Enum.to_list/1
      (flow) lib/flow.ex:693: Flow.run/1

12:39:32.397 [error] GenServer #PID<0.173.0> terminating
** (Enum.EmptyError) empty error
    (elixir) lib/enum.ex:1684: Enum.random/1
    (flow) lib/flow/materialize.ex:539: anonymous fn/4 in Flow.Materialize.mapper/2
    (flow) lib/flow/materialize.ex:428: Flow.Materialize."-build_reducer/2-lists^foldl/2-1-"/3
    (flow) lib/flow/materialize.ex:428: anonymous fn/5 in Flow.Materialize.build_reducer/2
    (flow) lib/flow/map_reducer.ex:50: Flow.MapReducer.handle_events/3
    lib/gen_stage.ex:2408: GenStage.consumer_dispatch/7
    lib/gen_stage.ex:2531: GenStage.take_pc_events/3
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.166.0>, #Reference<0.0.3.452>}, [[]]}
State: {%{#Reference<0.0.3.452> => nil}, %{active: [#Reference<0.0.3.452>], consumers: [{#Reference<0.0.3.468>, #Reference<0.0.3.476>}], done?: false, producers: %{#Reference<0.0.3.452> => #PID<0.166.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.109138938/4 in Flow.Materialize.build_reducer/2>}

GenServer.start/3 Dialyzer Issues in Flow.Coordinator

In investigating Dialyzer errors, I found that after #80 and #81 the last remaining Dialyzer error stems from flow.ex in the call

opts = [demand: :accumulate]

case Flow.Coordinator.start(flow, :producer_consumer, {:outer, fn _ -> [] end}, opts) do

which calls GenServer.start(__MODULE__, {flow, type, consumers, options}, options). Because :demand is not a valid option for https://hexdocs.pm/elixir/GenServer.html#t:options/0, this makes Dialyzer upset. The call in Flow.Coordinator.start_link has the same problem. The easiest way to make Dialyzer happy is simply to do something like:

 filtered_options = Keyword.take(options, [:debug, :name, :timeout, :spawn_opt, :hibernate_after])
GenServer.start(__MODULE__, {flow, type, consumers, options}, filtered_options)

Is this something you'd be open to?

After this, #80 and #81, are all resolved Dialyzer can then be added to the CI process.

Unhelpful Error Message

Hello, and sorry for the vague PR title.

I have the following flow, which results in the error below.

Flow:

window = Flow.Window.global() |> Flow.Window.trigger_every(5)

0..99
|> Flow.from_enumerable(window: window)
|> Flow.map(fn number -> Process.sleep(100); IO.inspect(number, label: :number) end)
|> Flow.group_by(fn i -> rem(i, 2) == 0 end)
|> Flow.map(fn batch -> Process.sleep(100); IO.inspect(batch, label: :batch) end)
|> Flow.run()

Output:

number: 0
number: 1
number: 2
number: 3
number: 4
batch: {false, [3, 1]}
batch: {true, [4, 2, 0]}
number: 5

10:22:25.684 [error] GenServer #PID<0.193.0> terminating
** (BadMapError) expected a map, got: [false: [3, 1], true: [4, 2, 0]]
    (elixir 1.12.3) lib/map.ex:623: Map.update([false: [3, 1], true: [4, 2, 0]], false, [5], #Function<36.94148943/1 in Flow.group_by/3>)
    (flow 1.1.0) lib/flow/materialize.ex:643: Flow.Materialize."-build_reducer/2-lists^foldl/2-0-"/3
    (flow 1.1.0) lib/flow/materialize.ex:643: anonymous fn/5 in Flow.Materialize.build_reducer/2
    (flow 1.1.0) lib/flow/materialize.ex:553: Flow.Materialize.maybe_punctuate/10
    (flow 1.1.0) lib/flow/map_reducer.ex:59: Flow.MapReducer.handle_events/3
    (gen_stage 1.1.2) lib/gen_stage.ex:2471: GenStage.consumer_dispatch/6
    (gen_stage 1.1.2) lib/gen_stage.ex:2660: GenStage.take_pc_events/3
    (stdlib 3.13.2) gen_server.erl:680: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.192.0>, #Reference<0.1202714464.28835843.142742>}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, ...]}
State: {%{#Reference<0.1202714464.28835843.142742> => nil}, %{done?: false, producers: %{#Reference<0.1202714464.28835843.142742> => #PID<0.192.0>}, trigger: #Function<2.2490666/3 in Flow.Window.Global.materialize/5>}, {0, 12}, {5, %{}}, #Function<1.2490666/4 in Flow.Window.Global.materialize/5>}

If I change the last line of my flow from Flow.run() to Enum.to_list()then everything works fine, but this is not an option for me as I am dealing with an unbounded stream in my real code. I went and took a quick look at lib/flow/materialize.ex:643 but it wasn't clear to my why a Map was needed, or how changing Flow.run to Enum.to_list() would prevent this from happening.

Version of flow is 1.1.0 and gen_stage is 1.1.2.

Clarify :full_outer v :outer joins

In investigating the Dialyzer errors (see also #80), I found that flow.ex operates on join of both :outer and :full_outer. The docs and the usage in Flow.Materialize and the tests suggest it should be :full_outer but the there are uses of :outer in the @join module attribute and the @type join, as well as several function calls. Which of these is actually intended?

Additionally, the typespec for @typep producers needs to be | {:join, join, t, t, fun(), fun(), fun()} with the join as the second argument. Whenever the :full_outer v :outer is resolved, and this is added, that section of the errors will disappear.

Need help understanding back pressure

When I write this code:

1..20
|> Flow.from_enumerable(min_demand: 0, max_demand: 5, stages: 1)
|> Flow.map(fn item ->
  IO.puts "Flow.map #{item}"
  item
end)
|> Enum.each(fn item ->
  Process.sleep(10)
  IO.puts "Enum.each #{item}"
end)

I expect it to print

Flow.map 1
Flow.map 2
Flow.map 3
Flow.map 4
Flow.map 5
Enum.each 1
Flow.map 6
Enum.each 2

and so on.

But when I run it, it prints this:

Flow.map 1
Flow.map 2
Flow.map 3
Flow.map 4
Flow.map 5
Flow.map 6
Flow.map 7
Flow.map 8
Flow.map 9
Flow.map 10
Flow.map 11
Flow.map 12
Flow.map 13
Flow.map 14
Flow.map 15
Flow.map 16
Flow.map 17
Flow.map 18
Flow.map 19
Flow.map 20
Enum.each 1
Enum.each 2
Enum.each 3
Enum.each 4
Enum.each 5
Enum.each 6
Enum.each 7
Enum.each 8
Enum.each 9
Enum.each 10
Enum.each 11
Enum.each 12
Enum.each 13
Enum.each 14
Enum.each 15
Enum.each 16
Enum.each 17
Enum.each 18
Enum.each 19
Enum.each 20

I think this means that the first call to Flow.map makes the consumer immediately output everything it has to offer, and doesn't properly back-pressure the demand that's slowed down by Enum.each.

In production, we are running code similar to this, except that 1..20 is a database query as a GenStage producer, and Enum.each is |> Stream.chunk_every(250) |> Enum.each(&Repo.insert_all(Schema, &1)).

What we're seeing is that the query stage will immediately load all entries for the query as fast as it can and then the node crashes due to the memory being exhausted.

provide from/into/through_flows() to connect flows

Hello,

I need to connect my flow to three different downstream flows. However, I'm unable to instantiate the downstream flows because all Flow constructors require that you have knowledge of the upstream source (which must be either materialized Flow.from_stages or written as a GenStage module Flow.from_specs).

However, if there was a way to construct a non-materialized flow (e.g. Flow.new()), then I could do this:

file_parser = File.stream!(file) |> Flow.from_enumerable() # this is my existing flow

database_writer = Flow.new() |> Flow.filter(..) |> Flow.map(..) |> Flow.partition() |> Flow.reduce(..)
data_summarizer = Flow.new() |> Flow.flat_map(..) |> Flow.partition() |> Flow.reduce(..)
stats_collector = Flow.new() |> Flow.emit_and_reduce(..) |> Flow.on_trigger(..)

overall = file_parser |> Flow.through_flows(database_writer, data_summarizer, stats_collector)

In the overall flow, events coming out of upstream flow are copied into each of the downstream flows. As a bonus, this lets us further connect the downstream flows together into a more complex graph like this:

file_parser = File.stream!(file) |> Flow.from_enumerable() # this is my existing flow

database_writer = Flow.new() |> Flow.filter(..) |> Flow.map(..) |> Flow.partition() |> Flow.reduce(..)
data_summarizer = Flow.new() |> Flow.flat_map(..) |> Flow.partition() |> Flow.reduce(..)
stats_collector = Flow.new() |> Flow.emit_and_reduce(..) |> Flow.on_trigger(..)

data_summarizer = data_summarizer |> Flow.into_flows(database_writer)
stats_collector = stats_collector |> Flow.into_flows(database_writer)

overall = file_parser |> Flow.through_flows(database_writer, data_summarizer, stats_collector)

Thanks for your consideration.

Race condition in `bounded_join` for large subscribe messages

There's some kind of race condition within bounded_join. When the demand causes sufficiently large messages to be sent, the join direction being matched against is nil. I'm currently working around this by shrinking my demand.

a = Flow.from_enumerable(Stream.zip(1..100_000, 1..100_000))
b = Flow.from_enumerable(Stream.zip(1..100_000, -1..-100_000))
Flow.bounded_join(:left_outer, a, b,
                  &elem(&1,0), &elem(&1,0),
                  fn
                    {k,v1}, {k,v2} -> {k, {v1,v2}}
                    {k,v1}, nil -> {k, {v1,nil}}
                  end, min_demand: 10_000, max_demand: 20_000)
|> Stream.run()

Exception details:

[error] GenServer #PID<0.28590.0> terminating
** (FunctionClauseError) no function clause matching in Flow.Materialize.dispatch_join/6
    (flow) lib/flow/materialize.ex:328: Flow.Materialize.dispatch_join(
      [{98431, {98431, 98431}}, {98439, {98439, 98439}}, {98442, {98442, 98442}}, {98443, {98443, 98443}}, {98446, {98446, 98446}}, {98449, {98449, 98449}}, {98454, {98454, 98454}}, {98457, {98457, 98457}}, {98458, {98458, 98458}}, {98461, {98461, 98461}}, {98465, {98465, 98465}}, {98466, {98466, 98466}}, {98467, {98467, 98467}}, {98470, {98470, 98470}}, {98472, {98472, 98472}}, {98474, {98474, 98474}}, {98475, {98475, 98475}}, {98476, {98476, 98476}}, {98484, {98484, 98484}}, {98485, {98485, 98485}}, {98488, {98488, 98488}}, {98495, {98495, 98495}}, {98501, {98501, 98501}}, {98502, {98502, 98502}}, {98516, {98516, 98516}}, {98519, {98519, 98519}}, {98524, {98524, 98524}}, {98526, {98526, 98526}}, {98531, {98531, 98531}}, {98548, {98548, 98548}}, {98552, {98552, 98552}}, {98553, {98553, 98553}}, {98561, {98561, 98561}}, {98566, {98566, 98566}}, {98571, {98571, 98571}}, {98574, {98574, 98574}}, {98576, {98576, 98576}}, {98585, {98585, 98585}}, {98599, {98599, 98599}}, {98603, {98603, 98603}}, {98614, {98614, 98614}}, {98615, {98615, 98615}}, {98619, {98619, 98619}}, {98621, {98621, 98621}}, {98632, {98632, 98632}}, {98634, {98634, 98634}}, {98639, {98639, 98639}}, {98643, {98643, ...}}, {98645, ...}, {...}, ...],
      nil,
      %{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, 66854}], 21938 => [{21938, 21938}], 38839 => [{38839, 38839}], 76035 => [{76035, 76035}], 75873 => [{75873, 75873}], 73528 => [{73528, ...}], 4969 => [...], ...},
      %{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, -73528}], 4969 => [{4969, -4969}], 16129 => [{16129, -16129}], 75116 => [{75116, -75116}], 45250 => [{45250, -45250}], 74081 => [{74081, -74081}], 15093 => [{15093, ...}], 45659 => [...], ...},
      #Function<7.85826293/2 in Wasatch.Jobs.JoinCatsTask.test_run/0>,
      []
    )
    (flow) lib/flow/materialize.ex:286: anonymous fn/6 in Flow.Materialize.join_ops/5
    (flow) lib/flow/map_reducer.ex:49: Flow.MapReducer.handle_events/3
    (gen_stage) lib/gen_stage.ex:2502: GenStage.consumer_dispatch/7
    (gen_stage) lib/gen_stage.ex:2618: GenStage.take_pc_events/3
Last message: {:"$gen_producer", {#PID<0.326.0>, {#Reference<0.0.2.6935>, #Reference<0.0.2.6939>}}, {:ask, 680}}
State: {%{#Reference<0.0.2.6920> => nil}, %{done?: false, producers: %{#Reference<0.0.2.6920> => #PID<0.28587.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {2, 4}, {%{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, ...}], 21938 => [...], ...}, %{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, ...}], 4969 => [...], ...}, []},
#Function<21.105143239/4 in Flow.Materialize.join_ops/5>}
** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.2.6937> => {:subscribed, #PID<0.28588.0>, :transient, 500, 1000, 593}, #Reference<0.0.2.6938> => {:subscribed, #PID<0.28589.0>, :transient, 500, 1000, 898}, #Reference<0.0.2.6940> => {:subscribed, #PID<0.28591.0>, :transient, 500, 1000, 572}})
    ** (EXIT) an exception was raised:
        ** (FunctionClauseError) no function clause matching in Flow.Materialize.dispatch_join/6
            (flow) lib/flow/materialize.ex:328: Flow.Materialize.dispatch_join(
              [{98431, {98431, 98431}}, {98439, {98439, 98439}}, {98442, {98442, 98442}}, {98443, {98443, 98443}}, {98446, {98446, 98446}}, {98449, {98449, 98449}}, {98454, {98454, 98454}}, {98457, {98457, 98457}}, {98458, {98458, 98458}}, {98461, {98461, 98461}}, {98465, {98465, 98465}}, {98466, {98466, 98466}}, {98467, {98467, 98467}}, {98470, {98470, 98470}}, {98472, {98472, 98472}}, {98474, {98474, 98474}}, {98475, {98475, 98475}}, {98476, {98476, 98476}}, {98484, {98484, 98484}}, {98485, {98485, 98485}}, {98488, {98488, 98488}}, {98495, {98495, 98495}}, {98501, {98501, 98501}}, {98502, {98502, 98502}}, {98516, {98516, 98516}}, {98519, {98519, 98519}}, {98524, {98524, 98524}}, {98526, {98526, 98526}}, {98531, {98531, 98531}}, {98548, {98548, 98548}}, {98552, {98552, 98552}}, {98553, {98553, 98553}}, {98561, {98561, 98561}}, {98566, {98566, 98566}}, {98571, {98571, 98571}}, {98574, {98574, 98574}}, {98576, {98576, 98576}}, {98585, {98585, 98585}}, {98599, {98599, 98599}}, {98603, {98603, 98603}}, {98614, {98614, 98614}}, {98615, {98615, 98615}}, {98619, {98619, 98619}}, {98621, {98621, 98621}}, {98632, {98632, 98632}}, {98634, {98634, 98634}}, {98639, {98639, 98639}}, {98643, {98643, ...}}, {98645, ...}, {...}, ...],
              nil,
              %{60829 => [{60829, 60829}], 12995 => [{12995, 12995}], 89750 => [{89750, 89750}], 42408 => [{42408, 42408}], 79313 => [{79313, 79313}], 59922 => [{59922, 59922}], 51560 => [{51560, 51560}], 6653 => [{6653, 6653}], 82555 => [{82555, 82555}], 40746 => [{40746, 40746}], 33823 => [{33823, 33823}], 94037 => [{94037, 94037}], 46309 => [{46309, 46309}], 81817 => [{81817, 81817}], 15437 => [{15437, 15437}], 34319 => [{34319, 34319}], 6749 => [{6749, 6749}], 62448 => [{62448, 62448}], 79870 => [{79870, 79870}], 90980 => [{90980, 90980}], 52195 => [{52195, 52195}], 34512 => [{34512, 34512}], 19510 => [{19510, 19510}], 44436 => [{44436, 44436}], 75113 => [{75113, 75113}], 81119 => [{81119, 81119}], 8441 => [{8441, 8441}], 96376 => [{96376, 96376}], 76413 => [{76413, 76413}], 32156 => [{32156, 32156}], 43416 => [{43416, 43416}], 86876 => [{86876, 86876}], 61486 => [{61486, 61486}], 12163 => [{12163, 12163}], 40610 => [{40610, 40610}], 87629 => [{87629, 87629}], 33708 => [{33708, 33708}], 47217 => [{47217, 47217}], 77706 => [{77706, 77706}], 49309 => [{49309, 49309}], 5934 => [{5934, 5934}], 59657 => [{59657, 59657}], 61379 => [{61379, 61379}], 66854 => [{66854, 66854}], 21938 => [{21938, 21938}], 38839 => [{38839, 38839}], 76035 => [{76035, 76035}], 75873 => [{75873, 75873}], 73528 => [{73528, ...}], 4969 => [...], ...},
              %{60829 => [{60829, -60829}], 12995 => [{12995, -12995}], 42408 => [{42408, -42408}], 79313 => [{79313, -79313}], 59922 => [{59922, -59922}], 51560 => [{51560, -51560}], 6653 => [{6653, -6653}], 82555 => [{82555, -82555}], 40746 => [{40746, -40746}], 33823 => [{33823, -33823}], 46309 => [{46309, -46309}], 81817 => [{81817, -81817}], 15437 => [{15437, -15437}], 34319 => [{34319, -34319}], 6749 => [{6749, -6749}], 62448 => [{62448, -62448}], 79870 => [{79870, -79870}], 52195 => [{52195, -52195}], 34512 => [{34512, -34512}], 19510 => [{19510, -19510}], 44436 => [{44436, -44436}], 75113 => [{75113, -75113}], 81119 => [{81119, -81119}], 8441 => [{8441, -8441}], 76413 => [{76413, -76413}], 32156 => [{32156, -32156}], 43416 => [{43416, -43416}], 61486 => [{61486, -61486}], 12163 => [{12163, -12163}], 40610 => [{40610, -40610}], 33708 => [{33708, -33708}], 47217 => [{47217, -47217}], 77706 => [{77706, -77706}], 49309 => [{49309, -49309}], 5934 => [{5934, -5934}], 59657 => [{59657, -59657}], 61379 => [{61379, -61379}], 66854 => [{66854, -66854}], 21938 => [{21938, -21938}], 38839 => [{38839, -38839}], 76035 => [{76035, -76035}], 75873 => [{75873, -75873}], 73528 => [{73528, -73528}], 4969 => [{4969, -4969}], 16129 => [{16129, -16129}], 75116 => [{75116, -75116}], 45250 => [{45250, -45250}], 74081 => [{74081, -74081}], 15093 => [{15093, ...}], 45659 => [...], ...},
              #Function<7.85826293/2 in Wasatch.Jobs.JoinCatsTask.test_run/0>,
              []
            )
            (flow) lib/flow/materialize.ex:286: anonymous fn/6 in Flow.Materialize.join_ops/5
            (flow) lib/flow/map_reducer.ex:49: Flow.MapReducer.handle_events/3
            (gen_stage) lib/gen_stage.ex:2502: GenStage.consumer_dispatch/7
            (gen_stage) lib/gen_stage.ex:2618: GenStage.take_pc_events/3
    (gen_stage) lib/gen_stage.ex:1705: GenStage.close_stream/1
       (elixir) lib/stream.ex:1250: Stream.do_resource/5
       (elixir) lib/stream.ex:570: Stream.run/1

Error for Flow.partition(key: { :elem, pos })

I have this code:

defmodule T do
  def hello do
    r = Flow.from_enumerable([1,2,3,4,5,6])
    |> Flow.partition()
    |> Flow.map(fn x -> { x, x } end)
    |> Flow.partition()
    |> Flow.reduce( fn -> [] end, fn a, acc -> [ a | acc ] end)
    |> Flow.emit(:state)
    |> Enum.to_list
  end
end

Result of this:
iex(1)> T.hello();
[[], [], [], [{4, 4}, {1, 1}], [{2, 2}, {3, 3}], [], [{6, 6}], [{5, 5}]]

I would want to spread items by key in last partition.
So i made this code:

defmodule T do
  def hello do
    r = Flow.from_enumerable([1,2,3,4,5,6])
    |> Flow.partition()
    |> Flow.map(fn x -> { x, x } end)
    |> Flow.partition(key: { :elem, 2 })
    |> Flow.reduce( fn -> [] end, fn a, acc -> [ a | acc ] end)
    |> Flow.emit(:state)
    |> Enum.to_list
  end
end

But Flow makes error for this code:

iex(1)> T.hello()
** (exit) exited in: GenStage.close_stream(%{#Reference<0.0.2.168> => {:subscribed, #PID<0.173.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.169> => {:subscribed, #PID<0.174.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.170> => {:subscribed, #PID<0.175.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.171> => {:subscribed, #PID<0.176.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.173> => {:subscribed, #PID<0.178.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.174> => {:subscribed, #PID<0.179.0>, :permanent, 500, 1000, 1000}, #Reference<0.0.2.175> => {:subscribed, #PID<0.180.0>, :permanent, 500, 1000, 1000}})
    ** (EXIT) shutdown
    (gen_stage) lib/gen_stage.ex:1598: GenStage.close_stream/1
       (elixir) lib/stream.ex:1248: Stream.do_resource/5
       (elixir) lib/enum.ex:1767: Enum.reverse/2
       (elixir) lib/enum.ex:2528: Enum.to_list/1
iex(1)> 
09:30:20.432 [error] GenServer #PID<0.167.0> terminating
** (ArgumentError) argument error
    (flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
    (elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
    (gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
    (gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
    (gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.164.0>, #Reference<0.0.2.38>}, [1, 5]}
State: {%{#Reference<0.0.2.38> => nil}, %{active: [#Reference<0.0.2.38>], consumers: [#Reference<0.0.1.39>, #Reference<0.0.1.28>, #Reference<0.0.2.142>, #Reference<0.0.1.10>, #Reference<0.0.2.123>, #Reference<0.0.2.106>, #Reference<0.0.2.88>, #Reference<0.0.2.70>], done?: false, producers: %{#Reference<0.0.2.38> => #PID<0.164.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {2, 8}, [], #Function<33.45510982/4 in Flow.Materialize.mapper_ops/1>}

09:30:20.433 [error] GenServer #PID<0.168.0> terminating
** (ArgumentError) argument error
    (flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
    (elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
    (gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
    (gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
    (gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:"$gen_consumer", {#PID<0.164.0>, #Reference<0.0.2.43>}, [3, 4]}
State: {%{#Reference<0.0.2.43> => nil}, %{active: [#Reference<0.0.2.43>], consumers: [#Reference<0.0.1.40>, #Reference<0.0.1.29>, #Reference<0.0.2.143>, #Reference<0.0.1.11>, #Reference<0.0.2.124>, #Reference<0.0.2.107>, #Reference<0.0.2.89>, #Reference<0.0.2.71>], done?: false, producers: %{#Reference<0.0.2.43> => #PID<0.164.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {3, 8}, [], #Function<33.45510982/4 in Flow.Materialize.mapper_ops/1>}

09:30:20.434 [error] GenServer #PID<0.179.0> terminating
** (ArgumentError) argument error
    (flow) lib/flow/materialize.ex:181: anonymous fn/3 in Flow.Materialize.hash_by_key/2
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:196: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
    (elixir) lib/enum.ex:1755: Enum."-reduce/3-lists^foldl/2-0-"/3
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:195: GenStage.PartitionDispatcher.dispatch/3
    (gen_stage) lib/gen_stage.ex:2187: GenStage.dispatch_events/3
    (gen_stage) lib/gen_stage.ex:2410: GenStage.consumer_dispatch/7
    (gen_stage) lib/gen_stage.ex:2531: GenStage.take_pc_events/3
    (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4
Last message: {:DOWN, #Reference<0.0.1.28>, :process, #PID<0.167.0>, {:badarg, [{Flow.Materialize, :"-hash_by_key/2-fun-1-", 3, [file: 'lib/flow/materialize.ex', line: 181]}, {GenStage.PartitionDispatcher, :"-dispatch/3-fun-0-", 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 196]}, {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: 'lib/enum.ex', line: 1755]}, {GenStage.PartitionDispatcher, :dispatch, 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 195]}, {GenStage, :dispatch_events, 3, [file: 'lib/gen_stage.ex', line: 2187]}, {GenStage, :consumer_dispatch, 7, [file: 'lib/gen_stage.ex', line: 2410]}, {GenStage, :take_pc_events, 3, [file: 'lib/gen_stage.ex', line: 2531]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 601]}]}}
State: {%{#Reference<0.0.1.26> => nil, #Reference<0.0.1.27> => nil, #Reference<0.0.1.29> => nil, #Reference<0.0.1.30> => nil, #Reference<0.0.1.31> => nil, #Reference<0.0.1.32> => nil, #Reference<0.0.1.33> => nil}, %{active: [#Reference<0.0.1.33>, #Reference<0.0.1.32>, #Reference<0.0.1.31>, #Reference<0.0.1.30>, #Reference<0.0.1.29>, #Reference<0.0.1.27>, #Reference<0.0.1.26>], consumers: [{#Reference<0.0.2.166>, #Reference<0.0.2.174>}], done?: false, producers: %{#Reference<0.0.1.26> => #PID<0.165.0>, #Reference<0.0.1.27> => #PID<0.166.0>, #Reference<0.0.1.29> => #PID<0.168.0>, #Reference<0.0.1.30> => #PID<0.169.0>, #Reference<0.0.1.31> => #PID<0.170.0>, #Reference<0.0.1.32> => #PID<0.171.0>, #Reference<0.0.1.33> => #PID<0.172.0>}, trigger: #Function<2.31322697/4 in Flow.Window.Global.materialize/5>}, {6, 8}, [], #Function<3.45510982/4 in Flow.Materialize.build_reducer/2>}

What is wrong in my code ?

Flow.fork_join? or I may be thinking of this wrong

I've been playing with an extension for Flow that is either a good idea or I'm thinking about it wrong.

Say I have 2 or tasks that must be called - and these tasks are not CPU bound (instead, IO Bound) so they are best run in parallel.

This could be accomplished by:

  def fork_join(flow, fork1_fun, fork2_fun, join_fun) when is_function(fork1_fun, 1) and is_function(fork2_fun, 1) and is_function(join_fun, 2)

One problem I see is that there could be multiple subsequent signatures to accept [n] number of simultaneous joins.

Thoughts? Valuable? Other ways to think about it? I know I'm applying Rx-isms here which might not be right. I'm certainly happy to create a pull request if it helps.

bypassing to next partition (eliminating passthrough traffic)

Hello,

I would like to minimize the amount of passthrough traffic that goes through my flows because the memory overhead of message passing (as each item passes from one partition to the next) is raising the peak memory usage (VmHWM) of my app too high.

For example, here is a common use case found in my flows:

  • From a lazy stream of input items:
    • map each input item into many output items belonging to either category A or B
    • partition items by categories A and B:
      • allow category A items to pass through to the next partition (don't do anything)
      • emit_and_reduce category B items into output items belonging to category C
    • partition items by categories A and C:
      • allow category A items to pass through to the next partition (don't do anything)
      • emit_and_reduce category C items into output items belonging to category A
  • At this point, the flow only emits category A items!

I have many such flows (similar to the pattern described above) connected together.

Since the topology and interconnection are materialized by Flow, I'm wondering if there can be a way for me to give Flow a hint that certain items can bypass a given partition? For example, in the use case described above, I could provide an option to each Flow.partition() saying Flow.partition(bypass: &(&1.category == :A)) and that would effectively fast-track 🏃‍♂️💨 all category A traffic straight down to the bottom of the flow. 😇 Would this be possible?

Thanks for your consideration.

Rare race condition in tests

There is a race condition in the "enumerable-unpartioned-stream allows custom windowing test that is a bit tricky to reproduce. It's possible that of the 4 stages one is unlucky, stuck receiving numbers in the first window and only gets scheduled by the time the other three empty the enumerable. In this case this unlucky stage won't emit the second window since it never sees it.

Here's an example:

window =
  Flow.Window.fixed(1, :second, fn
    x when x <= 15 -> 0
    x when x <= 30 -> 1_000
  end)

loop = fn cb, n ->
  windows =
    Flow.from_enumerable(1..30, window: window, stages: 4, max_demand: 3)
    |> Flow.reduce(fn -> 0 end, &(&1 + &2))
    |> Flow.on_trigger(fn e, i, t -> IO.inspect({e, i, t}); {[e], e} end)
    |> Enum.to_list()
  if length(windows) == 8, do: cb.(cb, n + 1), else: raise inspect({n, windows})
end
loop.(loop, 1)

runs for quite some time, then the last iteration outputs something like:

{30, {1, 4}, {:fixed, 0, :done}}
{33, {3, 4}, {:fixed, 0, :done}}
{33, {0, 4}, {:fixed, 0, :done}}
{115, {1, 4}, {:fixed, 1000, :done}}
{117, {0, 4}, {:fixed, 1000, :done}}
{113, {3, 4}, {:fixed, 1000, :done}}
{24, {2, 4}, {:fixed, 0, :done}}
** (RuntimeError) {139919, [30, 33, 33, 115, 117, 113, 24]}

It's possible that with an artificial delay in the reduce step or using a larger enumerable would make the test more deterministic.

Duplication of elements for complex flow

I have this test code:

defmodule T do
  def hello do
    { :ok, f } = :file.open('t.txt', [ :write ])
    Flow.from_enumerable(1..5000000)
    |> Flow.partition(window: Flow.Window.periodic(1, :second))
    |> Flow.map(fn x -> {x, [ :erlang.integer_to_binary(x), 10 ]} end)
    |> Flow.partition(stages: 20000,window: Flow.Window.periodic(1, :second), key: {:elem, 0})
    |> Flow.reduce( fn -> [] end, fn {_,a}, acc -> [ a | acc ] end)
    |> Flow.emit(:state)
    |> Flow.partition()
    |> Flow.reduce( fn -> [] end, fn x, acc -> [ x | acc ] end)
    |> Flow.emit(:state)
    |> Flow.each(fn a ->
	:file.write(f, a)
    end)
    |> Flow.run()
    :file.close(f)
  end
end

This code generates range 1..5000000 and writes values in file in parallel mode with aggregation data in buffers. This is simplified my real use case.

So i expects 5M unique values in file. But i found more:

$ wc t.txt
5009198 5009198 38962480 t.txt

$ sort t.txt | uniq -d | wc
9198 9198 73584

Every time these values are different. It is second launch:

$ sort t.txt | uniq -d | wc
3949 3949 31592

When i changed window type by default, i have right result:

  def hello do
    { :ok, f } = :file.open('t.txt', [ :write ])
    Flow.from_enumerable(1..5000000)
    |> Flow.partition()
    |> Flow.map(fn x -> {x, [ :erlang.integer_to_binary(x), 10 ]} end)
    |> Flow.partition(stages: 20000, key: {:elem, 0})
    |> Flow.reduce( fn -> [] end, fn {_,a}, acc -> [ a | acc ] end)
    |> Flow.emit(:state)
    |> Flow.partition()
    |> Flow.reduce( fn -> [] end, fn x, acc -> [ x | acc ] end)
    |> Flow.emit(:state)
    |> Flow.each(fn a ->
	:file.write(f, a)
    end)
    |> Flow.run()
    :file.close(f)
  end
end

$ sort t.txt | uniq -d | wc
0 0 0

So could you explain or fix this issue ?
Thanks.

Memory usage for side-effect only flows is high using Flow.each instead of Flow.map

Consider following stream processing:

File.stream!(path) 
  |> Enum.each(fn x -> rabbitmq_publish(x) end)

def rabbitmq_publish(x)
  # publish and return nothing
  nil
end

It does not need to return the data for further usage, so it's safe to immediately remove processed data from the memory, which seems to be the case when using plain streams (memory usage is constant).

However if I incorporate Flow into the mix:

File.stream!(path) 
  Flow.from_enumerable() |> Flow.each(fn x -> rabbitmq_publish(x) end)  |> Flow.run

memory usage grows as if whole file was kept in the memory.

It seems the proper way that avoids memory usage would be to use Flow.map:

File.stream!(path) 
  Flow.from_enumerable() |> Flow.map(fn x -> rabbitmq_publish(x) end)  |> Flow.run

Is this an expected behavior or something that could be improved? If it's expected, is it worth to mention it in the documentation?

from_stages - noproc errors when GenStage producer finishes early

Hello,

I'm getting :noproc errors when I connect a short-lived GenStage producer to a Flow using Flow.from_stages(). Below is a minimal example that reproduces the problem I'm seeing:

  1. Passing through Flow using multiple stages works correctly. ✔️
  2. Mapping through Flow using a single stage works correctly. ✔️
  3. Mapping through Flow using multiple stages triggers the error. 💥
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 2) |> Enum.to_list()
[1, 2, 3]
iex(2)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms);
ms end) |> Enum.to_list()
"#PID<0.194.0>: sleep 1"
"#PID<0.194.0>: sleep 2"
"#PID<0.194.0>: sleep 3"
[1000, 2000, 3000]
iex(3)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 2) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms);
ms end) |> Enum.to_list()
"#PID<0.200.0>: sleep 1"

12:14:48.166 [info]  GenStage consumer #PID<0.201.0> is stopping after receiving cancel from producer #PID<0.197.0> with reason: :noproc


12:14:48.180 [error] GenServer #PID<0.201.0> terminating
** (stop) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Last message: {:DOWN, #Reference<0.3205571650.1190133762.109920>, :process, #PID<0.197.0>, :noproc}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.127884580/3 in Flow.Window.Global.materialize/5>}, {1, 2}, [], #Function<33.87744541/4 in Flow.Materialize.mapper_ops/1>}
"#PID<0.200.0>: sleep 2"
"#PID<0.200.0>: sleep 3"
** (exit) exited in: GenStage.close_stream(%{#Reference<0.3205571650.1190133762.109930> => {:subscribed, #PID<0.200.0>, :transient, 500, 1000, 1000}})
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
    (elixir) lib/stream.ex:1400: Stream.do_resource/5
    (elixir) lib/enum.ex:3023: Enum.reverse/1
    (elixir) lib/enum.ex:2668: Enum.to_list/1
iex(3)>

How can I safely connect a short-lived GenStage producer to a Flow using multiple stages? 😕 See also GenStage: How to cancel a Flow from the producer? whose answer relies on an outdated GenStage API.

Thanks for your consideration.

noproc errors - connecting finite GenStage to Flow.partition

Hello,

I'm using GenStage 0.14.2 and Flow master (at 1ffac6a) under Elixir 1.9.0 and Erlang/OTP 22, where I'm encountering :noproc errors when I connect a short-lived GenStage producer to a Flow and then immediately partition that flow. Below is a minimal example for reproduction (see related issue #88).

In my actual use case, I'm connecting a large (but finite) GenStage to a Flow partition with 32 stages.

Thanks for your consideration.

Reproduction steps

  1. Sanity check: everything Just Works when it's Flow-only and GenStage isn't involved. ✔️
  2. GenStage has 3 items, from_stages() has 1 stage, and partition() has 1 stage. ✔️
  3. GenStage has 3 items, from_stages() has 1 stage, and partition() has 2 stages. ✔️
  4. GenStage has 3 items, from_stages() has 1 stage, and partition() has 3 stages. 💥
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Flow.from_enumerable(1..3) |> Flow.partition(stages: 3) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.213.0>: sleep 2"
"#PID<0.203.0>: sleep 1"
"#PID<0.196.0>: sleep 3"
[1000, 2000, 3000]
iex(2)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 1) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.231.0>: sleep 1"
"#PID<0.231.0>: sleep 2"
"#PID<0.231.0>: sleep 3"
[1000, 2000, 3000]
iex(3)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 2) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.238.0>: sleep 1"
"#PID<0.239.0>: sleep 3"
"#PID<0.238.0>: sleep 2"
[3000, 1000, 2000]
iex(4)> {:ok, producer} = GenStage.from_enumerable(1..3, link: false); Flow.from_stages([producer], stages: 1) |> Flow.partition(stages: 3) |> Flow.map(fn i -> ms = :timer.seconds(i); IO.inspect("#{inspect(self)}: sleep #{i}"); Process.sleep(ms); ms end) |> Enum.to_list()
"#PID<0.246.0>: sleep 2"
"#PID<0.247.0>: sleep 1"

16:29:54.396 pid=<0.248.0> [info]  GenStage consumer #PID<0.248.0> is stopping after receiving cancel from producer #PID<0.245.0> with reason: :noproc


16:29:54.444 pid=<0.248.0> [error] GenServer #PID<0.248.0> terminating
** (stop) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Last message: {:DOWN, #Reference<0.1394119603.2769551375.77567>, :process, #PID<0.245.0>, :noproc}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.127884580/3 in Flow.Window.Global.materialize/5>}, {2, 3}, [], #Function<32.81753312/4 in Flow.Materialize.mapper_ops/1>}
"#PID<0.247.0>: sleep 3"
** (exit) exited in: GenStage.close_stream(%{#Reference<0.1394119603.2769551372.77002> => {:subscribed, #PID<0.246.0>, :transient, 500, 1000, 1000}, #Reference<0.1394119603.2769551372.77003> => {:subscribed, #PID<0.247.0>, :transient, 500, 1000, 1000}})
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (gen_stage) lib/gen_stage/stream.ex:160: GenStage.Stream.close_stream/1
    (elixir) lib/stream.ex:1400: Stream.do_resource/5
    (elixir) lib/enum.ex:3023: Enum.reverse/1
    (elixir) lib/enum.ex:2668: Enum.to_list/1

Environment details

$ uname -a
Linux myhost 4.1.15.pnotify #18 SMP Thu May 18 15:50:05 PDT 2017 x86_64 GNU/Linux

$ elixir -v
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Elixir 1.9.0 (compiled with Erlang/OTP 22)

$ cat mix.lock
%{
  "file_system": {:hex, :file_system, "0.2.7", "e6f7f155970975789f26e77b8b8d8ab084c59844d8ecfaf58cbda31c494d14aa", [:mix], [], "hexpm"},
  "flow": {:git, "https://github.com/plataformatec/flow.git", "1ffac6a801602bf8b02192488e58ce5728b581aa", []},
  "gen_stage": {:hex, :gen_stage, "0.14.2", "6a2a578a510c5bfca8a45e6b27552f613b41cf584b58210f017088d3d17d0b14", [:mix], [], "hexpm"},
  "jason": {:hex, :jason, "1.1.2", "b03dedea67a99223a2eaf9f1264ce37154564de899fd3d8b9a21b1a6fd64afe7", [:mix], [{:decimal, "~> 1.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm"},
  "mix_test_watch": {:hex, :mix_test_watch, "0.9.0", "c72132a6071261893518fa08e121e911c9358713f62794a90c95db59042af375", [:mix], [{:file_system, "~> 0.2.1 or ~> 0.3", [hex: :file_system, repo: "hexpm", optional: false]}], "hexpm"},
}

segmentation fault

I am trying to understand how Flow's join work, but instead I am crashing Elixir w/ a segfault:

flow_a = Flow.from_enumerable(1_000..1_100)
flow_b = Flow.from_enumerable(  500..1_100)

flow = Flow.window_join(:inner, flow_a, flow_b, Flow.Window.global, & &1, & &1, fn a, a -> a end)

assert flow |> Enum.sort |> Enum.take(3) == [1_000, 1_001, 1_002]

Pushed the complete (hopefully reproducible) source here: https://github.com/larskluge/crash

Please advise. Thank you!

Passing a stream to Flow.from_enumerable is not behaving as expected

Hi!

This might be me misunderstanding how this is supposed to work, but given the following quote from the docs:

The difference between max_demand and min_demand works as the batch size when the producer is full. If the producer has fewer events than requested by consumers, it usually sends the remaining events available.

My intuitive understanding was that the following example would start to process the events as soon as they're available. What I'm seeing is that Flow.from_enumerable/2 seems to buffer until it has max_demand events available to send downstream until processing starts. Which would mean that if events are coming in slowly, it might take a very long time for the pipeline to start executing.

Setting max_demand: 1 starts the downstream phases immediately, but if I understand the docs correctly this change shouldn't be necessary since demand as well as data is available for processing. Is this a bug or am I missing something important?

A full example can be found here: https://github.com/frekw/flow-example

defmodule Example.Queue do
  def start_link(limit) do
    BlockingQueue.start_link(limit, name: __MODULE__)
  end

  def push(x) do
    BlockingQueue.push(__MODULE__, x)
  end

  def to_stream() do
    BlockingQueue.pop_stream(__MODULE__)
  end
end

defmodule Example.Producer do
  require Logger
  use GenServer

  def start_link(opts \\ []) do
    GenServer.start_link(__MODULE__, 0, opts)
  end

  def init(state) do
    loop()

    {:ok, state}
  end

  defp loop do
    Process.send_after(self(), :loop, 10)
  end

  def handle_info(:loop, state) do
    Logger.info("pushing: #{state}")
    Example.Queue.push("message-#{state}")

    loop()

    {:noreply, state + 1}
  end
end

defmodule Example.Pipeline do
  require Logger
  def start_link() do
    Logger.info("Starting pipeline")

    Example.Queue.to_stream()

    # This seems to work as I would expect.
    # |> Flow.from_enumerable(max_demand: 1)

    # This wait for an initial 1000 messages before
    # Flow.each starts running.
    |> Flow.from_enumerable(min_demand: 1)
    |> Flow.each(fn x -> Logger.info("handled: #{x}") end)
    |> Flow.start_link()
  end
end

defmodule Example do
  use Application
  def start(_type, _args) do
    import Supervisor.Spec
    children = [
      worker(Example.Queue, [:infinity]),
      worker(Example.Producer, []),
      worker(Example.Pipeline, []),
    ]

    Supervisor.start_link(children, strategy: :rest_for_one)
  end
end

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.