What is the issue? I was using tonic</co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I believe it always returns true because it has a specialized <code class="notranslate

Sending data frame results in transmission of more tcp segments than needed about h2 HOT 11 CLOSED

xiaoyawei commented on June 6, 2024

Sending data frame results in transmission of more tcp segments than needed

from h2.

Comments (11)

seanmonstar commented on June 6, 2024

It's a heuristic to reduce the amount of copying, if data frame content is large. If the IO transport can support vectored writes, h2 will use writev to send both pieces without copying.

from h2.

xiaoyawei commented on June 6, 2024

@seanmonstar

If the IO transport can support vectored writes, h2 will use writev to send both pieces without copying.

I checked the linux manual (https://linux.die.net/man/2/writev) and it says "the data transfers performed by readv() and writev() are atomic: the data written by writev() is written as a single block that is not intermingled with output from writes in other processes"; however it looks like that writev over a tcp socket on linux (ubuntu 20.4 for my case) does not really aggregate the 2 pieces together first and then send.

Would you be open to improve the heuristic? My proposal is

Increase CHUNK_THRESHOLD a little bit, to like 1024
When chaining a payload, still copy first X bytes to buffer, and chain the rest of the payload; here X is CHUNK_THRESHOLD - buf.remaining()

No matter what, the current performance is not ideal, for the scenarios where large amounts of small payload (around 1KB) are being transmitted

from h2.

xiaoyawei commented on June 6, 2024

If you feel comfortable with my proposal or can suggest an alternative, I will be happy to implement it, submit a pull request, etc.

from h2.

seanmonstar commented on June 6, 2024

writev on a TCP socket will result it the data going together, unless the amount of data is too large for a single segment of course. However, if using TLS on top, some of the transports don't implement the is_write_vectored() method.

I'm open to improving the throughput, certainly! I think that if vectored writes are supported, the threshold should stay small. But if it's not supported, it could definitely make sense to have up to a certain size buffering.

It'd help if there were some benchmarks that could show the improvement. hyper has some end-to-end benchmarks for different sizes and amounts of requests, that could be a good place to measure (or add a case that can then measure).

from h2.

xiaoyawei commented on June 6, 2024

writev on a TCP socket will result it the data going together, unless the amount of data is too large for a single segment of course. However, if using TLS on top, some of the transports don't implement the is_write_vectored() method.

I didn't find much documents about vectorized write. In my environement, OS is Ubuntu 20.04.3 LTS; 2 packets are written at once, one of 9 bytes, the other of around 1000 bytes; TLS is not used.

It's possible that I have some inproper configs but after some researches, I didn't find what linux settings would control the behavior of writev over a TCP socket. Also keep in mind that whether TCP_NODELAY is set also affects the behavior, in my case TCP_NODELAY is on.

I think that if vectored writes are supported, the threshold should stay small. But if it's not supported, it could definitely make sense to have up to a certain size buffering.

Good suggestion; but it might be easy to tell whether vectored io is support or not. For example, in the implementation of TcpStream in tokio (https://github.com/tokio-rs/tokio/blob/37bb47c4a2aff8913e536767645772f15650e6cd/tokio/src/net/tcp/stream.rs#L1348-L1350), TcpStream::is_write_vectored() always returns true no matter what.

It'd help if there were some benchmarks that could show the improvement.

I would certainly make sure to use benchmark to measure perf changes through benchmarks

from h2.

xiaoyawei commented on June 6, 2024

for more information, I tested TcpStream::write_vectored() from tokio in my test environment and the write here is indeed atomic (all payload is concatenated); so looks like when using tonic for grpc, the observed extra tcp segment might be due to some other stuff, I will keep it posted

from h2.

xiaoyawei commented on June 6, 2024

@seanmonstar

I think I find (or get closer to) the root cause for my issue: tonic does not really implement the vectored io part for its BoxIo, which is a wrapper of the underlying TCP stream (https://github.com/hyperium/tonic/blob/2325e3293b8a54f3412a8c9a5fcac064fa82db56/tonic/src/transport/service/io.rs#L52-L52)

So looks like I can help with these stuff

Fix the vectored IO part on tonic
Fix Tcp::is_write_vectored() in tokio
Improve the heuristics here when vectored io is not supported

Let me know what you think. :)

from h2.

seanmonstar commented on June 6, 2024

I think 1 and 3 would be great. Didn't you say you've already tested that 2 works?

from h2.

xiaoyawei commented on June 6, 2024

@seanmonstar

For 2, looks like always return true (https://github.com/tokio-rs/tokio/blob/37bb47c4a2aff8913e536767645772f15650e6cd/tokio/src/net/tcp/stream.rs#L1348-L1350), but correct implementation shall depend on the underlying TcpStream.

Without 2, even if 1 and 3 are fixed, it won't help with the non-vectored case since h2 will assume the vectored IO is on, and uses default implementation of write_vectored(), which actually sends each buffer one by one

from h2.

seanmonstar commented on June 6, 2024

I believe it always returns true because it has a specialized poll_write_vectored, which uses writev on the socket.

1 and 3 seem higher priority.

from h2.

xiaoyawei commented on June 6, 2024

@seanmonstar

Cool, I will do 1 and 3 first.

However, for 2, I looked through the codebase again, and looks like its poll_write_vectored is implemented as (https://github.com/tokio-rs/tokio/blob/37bb47c4a2aff8913e536767645772f15650e6cd/tokio/src/io/poll_evented.rs#L218-L230)

        #[cfg(any(feature = "net", feature = "process"))]
        pub(crate) fn poll_write_vectored<'a>(
            &'a self,
            cx: &mut Context<'_>,
            bufs: &[io::IoSlice<'_>],
        ) -> Poll<io::Result<usize>>
        where
            &'a E: io::Write + 'a,
        {
            use std::io::Write;
            self.registration.poll_write_io(cx, || self.io.as_ref().unwrap().write_vectored(bufs))
        }

here self.io is actually of type mio::net::TcpStream, whose write_vectored implementation is still based on the TcpStream from ruststd.

By the end of the day, if an OS / device does not support vector IO, I feel tokio::TcpStream cannot do much to enable vectored write without manipulation of a bulk of memory. So I feel the is_write_vectored still needs to be changed in tokio

Let me know if I misunderstood any stuff, thanks! ;)

from h2.

Sending data frame results in transmission of more tcp segments than needed about h2 HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs