Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Interesting! There was some conversation recently on <a class="issue-link js-issue-lin

On a second thought, the active option is <code class

Question: would `controlling_process/2` be a potential bottleneck? about thousand_island HOT 12 CLOSED

dsdshcym commented on June 10, 2024

Question: would `controlling_process/2` be a potential bottleneck?

from thousand_island.

Comments (12)

mtrudel commented on June 10, 2024 1

Remember that the 'time to x' charts are inverted; lower relative values are better.

I don't see a whole lot of benefit one way or the other here, but I think it's worth more investigation!

from thousand_island.

dsdshcym commented on June 10, 2024 1

Looking at https://github.com/mtrudel/bandit/actions/runs/3992173488, the results are mostly neutral

I think we should increase the number of clients
After all, the perf issue in main would only occur when there are lots of incoming messages 🤔

from thousand_island.

mtrudel commented on June 10, 2024

Your analysis is spot on! That exact thing is (mostly) implemented on https://github.com/mtrudel/thousand_island/tree/inline_accept, but doesn't seem to yield the improvements you (and I!) might think. Feel free to play around with it though, and see if there's something I'd missed (truthfully, I didn't look too closely at it).

from thousand_island.

dsdshcym commented on June 10, 2024

The implementation in inline_accept branch looks good to me

I'm wondering how you compared the performance between main and inline_accept?
As I mentioned above, this potential performance issue may only occur when:

many connections come in at the same time
and they all start sending a lot of messages

from thousand_island.

mtrudel commented on June 10, 2024

I cooked up a branch of bandit that referenced the inline_accept branch, and then ran a manual benchmark against it.

An even easier way to get started locally (and an even more apples to apples comparison) would be to run something like https://github.com/mtrudel/thousand_island/blob/main/examples/http_hello_world.ex in both versions of thousland_island and run h2load (part of nghttpd) against each of them directly. Its 'Time To Connect' and 'Time To First Byte' measurements would be the things to look at for this. The overhead of HTTP here is as minimal as you're going to get (essentially constant time), and leaves really just the differences in Thousand Island implementations to compare.

from thousand_island.

dsdshcym commented on June 10, 2024

Hi @mtrudel
Thanks for the instruction

I ran some benchmarks with h2load against the HTTPHelloWorld handler on my local
With only 1 tweak: set num_acceptors to 1
so only 1 acceptor is accepting connections, simulating the most demanding scenarios
(I need to use my fork of inline_accept branch so it respects num_acceptors config: ced5f3e)

And I did find that in some extreme scenarios, inline_accept can yield some better performance, a huge improvements over Time to 1st byte

inline_accept

  h2load --h1 -n100000 -c1000 -m1 http://localhost:6001
  starting benchmark...
  spawning thread #0: 1000 total client(s). 100000 total requests
  Application protocol: http/1.1
  progress: 10% done

  finished in 34.29s, 466.24 req/s, 0B/s
  requests: 100000 total, 16634 started, 15988 done, 15988 succeeded, 84012 failed, 84012 errored, 0 timeout
  status codes: 15988 2xx, 0 3xx, 0 4xx, 0 5xx
  traffic: 0B (0) total, 484.01KB (495628) headers (space savings 0.00%), 202.97KB (207844) data
  		  min         max         mean         sd        +/- sd
  time for request:       50us     49.95ms      7.61ms      6.17ms    91.63%
  time for connect:       51us    112.87ms     25.50ms     30.82ms    88.72%
  time to 1st byte:      101us       703us       366us       171us    63.16%
  req/s           :       0.00       69.46       10.63       12.41    90.20%

main (acceptor)

  h2load --h1 -n100000 -c1000 -m1 http://localhost:6000
  starting benchmark...
  spawning thread #0: 1000 total client(s). 100000 total requests
  Application protocol: http/1.1
  progress: 10% done

  finished in 28.37s, 506.86 req/s, 0B/s
  requests: 100000 total, 15179 started, 14380 done, 14380 succeeded, 85620 failed, 85620 errored, 0 timeout
  status codes: 14380 2xx, 0 3xx, 0 4xx, 0 5xx
  traffic: 0B (0) total, 435.33KB (445780) headers (space savings 0.00%), 182.56KB (186940) data
  		  min         max         mean         sd        +/- sd
  time for request:       61us       1.25s     18.99ms     99.27ms    99.35%
  time for connect:       54us    199.61ms     24.63ms     28.52ms    90.65%
  time to 1st byte:      159us       1.24s     81.44ms    296.95ms    93.89%
  req/s           :       0.00       88.29       17.13       17.53    85.90%

That being said, in any scenario that's less demanding than -n100000 -c1000, these 2 implementations are almost the same:

inline_accept

h2load --h1 -n10000 -c100 -m1 http://localhost:6001
starting benchmark...
spawning thread #0: 100 total client(s). 10000 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 786.63ms, 12712.46 req/s, 0B/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 0B (0) total, 302.73KB (310000) headers (space savings 0.00%), 126.95KB (130000) data
  	      min         max         mean         sd        +/- sd
time for request:      635us     10.20ms      6.82ms      1.34ms    74.87%
time for connect:       99us      1.02ms       310us       248us    81.00%
time to 1st byte:      738us      7.95ms      6.36ms      1.60ms    87.00%
req/s           :     127.26      129.98      128.23        0.52    74.00%

main

h2load --h1 -n10000 -c100 -m1 http://localhost:6000
starting benchmark...
spawning thread #0: 100 total client(s). 10000 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 1.42s, 7064.71 req/s, 0B/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 0B (0) total, 302.73KB (310000) headers (space savings 0.00%), 126.95KB (130000) data
min         max         mean         sd        +/- sd
time for request:     1.23ms    166.27ms     13.73ms     16.23ms    98.03%
time for connect:       92us       243us       119us        29us    82.00%
time to 1st byte:     2.50ms     12.51ms     10.14ms      3.58ms    78.00%
req/s           :      70.77       71.73       71.22        0.24    64.00%

And if we have 10 acceptors for both implementations, they can handle -n100000 -c1000 just fine.

So I'm not sure if it's worth it to switch to inline_accept
What do you think?

P.S.
You may find my setup in this repo:
https://github.com/dsdshcym/thousand_island_benchmark/

from thousand_island.

mtrudel commented on June 10, 2024

Interesting! There was some conversation recently on mtrudel/bandit#72 regarding TTFB numbers, so this is something both timely and intriguing!

I'd suggest that we:

Review the changes on the inline_accept branch in the context of it being a 'real' PR and not just an experiment (ie: with an assumption that if things look good, it gets merged).
Cut a test PR on bandit that points at this branch (this is just a test branch; if inline_accept gets merged then bandit will get it as part of its regular thousand_island dependency
We can put the benchmarker to work on this branch in bandit and see how inline_accept works in the real world

WDYT?

from thousand_island.

mtrudel commented on June 10, 2024

Specifically, if we can see improvements to TTC/TTFB with reqs/sec numbers staying unchanged, that's a clear win in my book.

from thousand_island.

michealp-coder commented on June 10, 2024

Wow’zer thats a signifant improvement to “time for request”

from thousand_island.

dsdshcym commented on June 10, 2024

On a second thought, the active option is false until we set it to :once in Handler.handle_continuation/2:

thousand_island/lib/thousand_island/handler.ex

Lines 386 to 390 in 63878c0

 defp handle_continuation(continuation, socket) do 

 case continuation do 

 {:continue, state} -> 

 ThousandIsland.Socket.setopts(socket, active: :once) 

 {:noreply, {socket, state}, socket.read_timeout}

So that there are no messages to move in Acceptor's mailbox when calling controlling_processes/2

I did a little test with this diff in main:

modified   lib/thousand_island/acceptor.ex
@@ -12,10 +12,20 @@ def run({server_pid, parent_pid, %ThousandIsland.ServerConfig{} = server_config}
     accept(listener_socket, connection_sup_pid, server_config)
   end
 
+  require Logger
+
   defp accept(listener_socket, connection_sup_pid, server_config) do
     case server_config.transport_module.accept(listener_socket) do
       {:ok, socket} ->
+        loop(
+          fn ->
+            Logger.debug(inspect(Process.info(self(), :messages)))
+          end,
+          10
+        )
+
         ThousandIsland.Connection.start(connection_sup_pid, socket, server_config)
+
         accept(listener_socket, connection_sup_pid, server_config)
 
       {:error, :closed} ->
@@ -25,4 +35,14 @@ defp accept(listener_socket, connection_sup_pid, server_config) do
         raise "Unexpected error in accept: #{inspect(reason)}"
     end
   end
+
+  defp loop(_fun, 0) do
+    :ok
+  end
+
+  defp loop(fun, times) do
+    fun.()
+
+    loop(fun, times - 1)
+  end
 end

And the messages are always empty...

from thousand_island.

mtrudel commented on June 10, 2024

You're correct! There's still a number of possible approaches here that might help improve connection startup times (though, this is the hardest part of the connection lifecycle to reason about so it's important that we reason through the subtlety here).

I'm working up a simple benchmark stack directly on thousand island so we can test this stuff more directly. I'll report back here as I make progress.

from thousand_island.

mtrudel commented on June 10, 2024

I've been up and down this and I can't seem to get any reproducible numbers out of it. Suggest we shelve the PRs for the time being.

The issue originally raised in this issue is resolved as well, I think. Will close.

from thousand_island.

Question: would `controlling_process/2` be a potential bottleneck? about thousand_island HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	defp handle_continuation(continuation, socket) do
	case continuation do
	{:continue, state} ->
	ThousandIsland.Socket.setopts(socket, active: :once)
	{:noreply, {socket, state}, socket.read_timeout}