GithubHelp home page GithubHelp logo

Comments (12)

mtrudel avatar mtrudel commented on June 10, 2024 1

Remember that the 'time to x' charts are inverted; lower relative values are better.

I don't see a whole lot of benefit one way or the other here, but I think it's worth more investigation!

from thousand_island.

dsdshcym avatar dsdshcym commented on June 10, 2024 1

Looking at https://github.com/mtrudel/bandit/actions/runs/3992173488, the results are mostly neutral

I think we should increase the number of clients
After all, the perf issue in main would only occur when there are lots of incoming messages 🤔

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

Your analysis is spot on! That exact thing is (mostly) implemented on https://github.com/mtrudel/thousand_island/tree/inline_accept, but doesn't seem to yield the improvements you (and I!) might think. Feel free to play around with it though, and see if there's something I'd missed (truthfully, I didn't look too closely at it).

from thousand_island.

dsdshcym avatar dsdshcym commented on June 10, 2024

The implementation in inline_accept branch looks good to me

I'm wondering how you compared the performance between main and inline_accept?
As I mentioned above, this potential performance issue may only occur when:

  1. many connections come in at the same time
  2. and they all start sending a lot of messages

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

I cooked up a branch of bandit that referenced the inline_accept branch, and then ran a manual benchmark against it.

An even easier way to get started locally (and an even more apples to apples comparison) would be to run something like https://github.com/mtrudel/thousand_island/blob/main/examples/http_hello_world.ex in both versions of thousland_island and run h2load (part of nghttpd) against each of them directly. Its 'Time To Connect' and 'Time To First Byte' measurements would be the things to look at for this. The overhead of HTTP here is as minimal as you're going to get (essentially constant time), and leaves really just the differences in Thousand Island implementations to compare.

from thousand_island.

dsdshcym avatar dsdshcym commented on June 10, 2024

Hi @mtrudel
Thanks for the instruction

I ran some benchmarks with h2load against the HTTPHelloWorld handler on my local
With only 1 tweak: set num_acceptors to 1
so only 1 acceptor is accepting connections, simulating the most demanding scenarios
(I need to use my fork of inline_accept branch so it respects num_acceptors config: ced5f3e)

And I did find that in some extreme scenarios, inline_accept can yield some better performance, a huge improvements over Time to 1st byte

  • inline_accept
      h2load --h1 -n100000 -c1000 -m1 http://localhost:6001
      starting benchmark...
      spawning thread #0: 1000 total client(s). 100000 total requests
      Application protocol: http/1.1
      progress: 10% done
    
      finished in 34.29s, 466.24 req/s, 0B/s
      requests: 100000 total, 16634 started, 15988 done, 15988 succeeded, 84012 failed, 84012 errored, 0 timeout
      status codes: 15988 2xx, 0 3xx, 0 4xx, 0 5xx
      traffic: 0B (0) total, 484.01KB (495628) headers (space savings 0.00%), 202.97KB (207844) data
      		  min         max         mean         sd        +/- sd
      time for request:       50us     49.95ms      7.61ms      6.17ms    91.63%
      time for connect:       51us    112.87ms     25.50ms     30.82ms    88.72%
      time to 1st byte:      101us       703us       366us       171us    63.16%
      req/s           :       0.00       69.46       10.63       12.41    90.20%
    
  • main (acceptor)
      h2load --h1 -n100000 -c1000 -m1 http://localhost:6000
      starting benchmark...
      spawning thread #0: 1000 total client(s). 100000 total requests
      Application protocol: http/1.1
      progress: 10% done
    
      finished in 28.37s, 506.86 req/s, 0B/s
      requests: 100000 total, 15179 started, 14380 done, 14380 succeeded, 85620 failed, 85620 errored, 0 timeout
      status codes: 14380 2xx, 0 3xx, 0 4xx, 0 5xx
      traffic: 0B (0) total, 435.33KB (445780) headers (space savings 0.00%), 182.56KB (186940) data
      		  min         max         mean         sd        +/- sd
      time for request:       61us       1.25s     18.99ms     99.27ms    99.35%
      time for connect:       54us    199.61ms     24.63ms     28.52ms    90.65%
      time to 1st byte:      159us       1.24s     81.44ms    296.95ms    93.89%
      req/s           :       0.00       88.29       17.13       17.53    85.90%
    

That being said, in any scenario that's less demanding than -n100000 -c1000, these 2 implementations are almost the same:

  • inline_accept
    h2load --h1 -n10000 -c100 -m1 http://localhost:6001
    starting benchmark...
    spawning thread #0: 100 total client(s). 10000 total requests
    Application protocol: http/1.1
    progress: 10% done
    progress: 20% done
    progress: 30% done
    progress: 40% done
    progress: 50% done
    progress: 60% done
    progress: 70% done
    progress: 80% done
    progress: 90% done
    progress: 100% done
    
    finished in 786.63ms, 12712.46 req/s, 0B/s
    requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
    status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
    traffic: 0B (0) total, 302.73KB (310000) headers (space savings 0.00%), 126.95KB (130000) data
      	      min         max         mean         sd        +/- sd
    time for request:      635us     10.20ms      6.82ms      1.34ms    74.87%
    time for connect:       99us      1.02ms       310us       248us    81.00%
    time to 1st byte:      738us      7.95ms      6.36ms      1.60ms    87.00%
    req/s           :     127.26      129.98      128.23        0.52    74.00%
    
  • main
    h2load --h1 -n10000 -c100 -m1 http://localhost:6000
    starting benchmark...
    spawning thread #0: 100 total client(s). 10000 total requests
    Application protocol: http/1.1
    progress: 10% done
    progress: 20% done
    progress: 30% done
    progress: 40% done
    progress: 50% done
    progress: 60% done
    progress: 70% done
    progress: 80% done
    progress: 90% done
    progress: 100% done
    
    finished in 1.42s, 7064.71 req/s, 0B/s
    requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
    status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
    traffic: 0B (0) total, 302.73KB (310000) headers (space savings 0.00%), 126.95KB (130000) data
    min         max         mean         sd        +/- sd
    time for request:     1.23ms    166.27ms     13.73ms     16.23ms    98.03%
    time for connect:       92us       243us       119us        29us    82.00%
    time to 1st byte:     2.50ms     12.51ms     10.14ms      3.58ms    78.00%
    req/s           :      70.77       71.73       71.22        0.24    64.00%
    

And if we have 10 acceptors for both implementations, they can handle -n100000 -c1000 just fine.

So I'm not sure if it's worth it to switch to inline_accept
What do you think?

P.S.
You may find my setup in this repo:
https://github.com/dsdshcym/thousand_island_benchmark/

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

Interesting! There was some conversation recently on mtrudel/bandit#72 regarding TTFB numbers, so this is something both timely and intriguing!

I'd suggest that we:

  • Review the changes on the inline_accept branch in the context of it being a 'real' PR and not just an experiment (ie: with an assumption that if things look good, it gets merged).
  • Cut a test PR on bandit that points at this branch (this is just a test branch; if inline_accept gets merged then bandit will get it as part of its regular thousand_island dependency
  • We can put the benchmarker to work on this branch in bandit and see how inline_accept works in the real world

WDYT?

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

Specifically, if we can see improvements to TTC/TTFB with reqs/sec numbers staying unchanged, that's a clear win in my book.

from thousand_island.

michealp-coder avatar michealp-coder commented on June 10, 2024

Wow’zer thats a signifant improvement to “time for request”

from thousand_island.

dsdshcym avatar dsdshcym commented on June 10, 2024

On a second thought, the active option is false until we set it to :once in Handler.handle_continuation/2:

defp handle_continuation(continuation, socket) do
case continuation do
{:continue, state} ->
ThousandIsland.Socket.setopts(socket, active: :once)
{:noreply, {socket, state}, socket.read_timeout}

So that there are no messages to move in Acceptor's mailbox when calling controlling_processes/2

I did a little test with this diff in main:

modified   lib/thousand_island/acceptor.ex
@@ -12,10 +12,20 @@ def run({server_pid, parent_pid, %ThousandIsland.ServerConfig{} = server_config}
     accept(listener_socket, connection_sup_pid, server_config)
   end
 
+  require Logger
+
   defp accept(listener_socket, connection_sup_pid, server_config) do
     case server_config.transport_module.accept(listener_socket) do
       {:ok, socket} ->
+        loop(
+          fn ->
+            Logger.debug(inspect(Process.info(self(), :messages)))
+          end,
+          10
+        )
+
         ThousandIsland.Connection.start(connection_sup_pid, socket, server_config)
+
         accept(listener_socket, connection_sup_pid, server_config)
 
       {:error, :closed} ->
@@ -25,4 +35,14 @@ defp accept(listener_socket, connection_sup_pid, server_config) do
         raise "Unexpected error in accept: #{inspect(reason)}"
     end
   end
+
+  defp loop(_fun, 0) do
+    :ok
+  end
+
+  defp loop(fun, times) do
+    fun.()
+
+    loop(fun, times - 1)
+  end
 end

And the messages are always empty...

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

You're correct! There's still a number of possible approaches here that might help improve connection startup times (though, this is the hardest part of the connection lifecycle to reason about so it's important that we reason through the subtlety here).

I'm working up a simple benchmark stack directly on thousand island so we can test this stuff more directly. I'll report back here as I make progress.

from thousand_island.

mtrudel avatar mtrudel commented on June 10, 2024

I've been up and down this and I can't seem to get any reproducible numbers out of it. Suggest we shelve the PRs for the time being.

The issue originally raised in this issue is resolved as well, I think. Will close.

from thousand_island.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.