GithubHelp home page GithubHelp logo

Too late packets. about srt-rs HOT 28 CLOSED

Lighty0410 avatar Lighty0410 commented on June 12, 2024
Too late packets.

from srt-rs.

Comments (28)

robertream avatar robertream commented on June 12, 2024 1

@nipierre If you'd like to collaborate on this (e.g. vidoe chat, pair, etc) I could find some time in my schedule, I just don't have the remaining cognitive capacity to see this through to resolution on my own. If the problem is reproducible locally, it should be feasible to characterize it in an automated test.

from srt-rs.

robertream avatar robertream commented on June 12, 2024 1

It's drop_too_late_packets that appears to be broken. This line of code is not the problem. Calling drop_too_late_packets should only drop packets if there are late packets. Could you show me how I can reproduce this behavior locally? I can also schedule some time to review the drop_too_late_packets tests and implementation together. It could be helpful to have an extra pair of eyes scrutinizing the test cases and implementation.

from srt-rs.

robertream avatar robertream commented on June 12, 2024 1

@nipierre could you run your tests against PR #209?

from srt-rs.

russelltg avatar russelltg commented on June 12, 2024 1

Done

from srt-rs.

robertream avatar robertream commented on June 12, 2024

I don't have much time to look at this right now, but here's a few thoughts:

  1. It looks like the statistics interval for the sender was set to 3ms?? 1s should really be sufficient, I can imagine an extremely low frequency could impact timing and performance.

  2. the reference implementation receiver probably turns off late packet drop if the TLPKTDROP flag is not set by the sender during handshake. I don't think srt-rs currently sets that by default, but I could be wrong.

  3. There could be a bug in the drop_too_late_packets function on ReceiveBuffer?

  4. The timestamp drift recovery has not been well tested, this would affect relative time of packets in the receive buffer.

  5. I'm curious if the srt-rs sender is dropping the packets first? We didn't test RTO extensively, retransmissions could be timing out? max_flow_size could be the setting to adjust. Hitting this limit on the sender will stall transmission and delay packets.

from srt-rs.

Lighty0410 avatar Lighty0410 commented on June 12, 2024

I don't have much time to look at this right now, but here's a few thoughts:

1. It looks like the statistics interval for the sender was set to 3ms?? 1s should really be sufficient, I can imagine an extremely low frequency could impact timing and performance.

2. the reference implementation receiver probably turns off late packet drop if the TLPKTDROP flag is not set by the sender during handshake. I don't think srt-rs currently sets that by default, but I could be wrong.

3. There could be a bug in the drop_too_late_packets function on ReceiveBuffer?

4. The timestamp drift recovery has not been well tested, this would affect relative time of packets in the receive buffer.

5. I'm curious if the srt-rs sender is dropping the packets first? We didn't test RTO extensively, retransmissions could be timing out? max_flow_size could be the setting to adjust. Hitting this limit on the sender will stall transmission and delay packets.

Thank you for the answer! I increased the statistics interval to 3 seconds. And now it seems like there are no issues with the packet drops. RTT-95ms, receive/send latency is 1 second. I'll keep testing with different settings and put feedback here.

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@Lighty0410 could we possibly close this one out too?

from srt-rs.

Lighty0410 avatar Lighty0410 commented on June 12, 2024

@Lighty0410 could we possibly close this one out too?

I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things?
Thanks in advance!

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@Lighty0410 could we possibly close this one out too?

I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things? Thanks in advance!

So you're still seeing packet loss issues, in spite of the change to the statistics interval?

from srt-rs.

Lighty0410 avatar Lighty0410 commented on June 12, 2024

@Lighty0410 could we possibly close this one out too?

I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things? Thanks in advance!

So you're still seeing packet loss issues, in spite of the change to the statistics interval?

Yeah. That's why i crossed out the text. Tried using different settings according to the srt rfcs\documentation and no result whatsoever.

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?

from srt-rs.

Lighty0410 avatar Lighty0410 commented on June 12, 2024

@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?

Thanks a lot ! I'm gonna test it asap.

from srt-rs.

Lighty0410 avatar Lighty0410 commented on June 12, 2024

@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?

Pretty interesting. I've made some testing and here are the results.
They way i tested:

  1. Changed this function for easier debugging -
    pub fn next_data(&mut self, now: Instant) -> Option<(Instant, Bytes)> {
        match self.receiver.arq.pop_next_message(now) {
            Ok(Some(data)) => {
                self.debug(now, "output", &data);
                Some(data)
            }
            Err(error) => {
                // self.warn(now, "output", &error);
                warn!(
                    "delay in millis: {:?}",
                    Duration::from_micros(error.delay.as_micros() as u64)
                );
                let dropped = error.too_late_packets.end - error.too_late_packets.start;
                self.stats.rx_dropped_data += dropped as u64;
                None
            }
            _ => None,
        }
    }
  1. Changed latency window to different values (5/20/50/100/200). And here are results with different values: https://gist.github.com/Lighty0410/925325a86b7f7a4b7cb5e1cf5a93c066

So as you may see. It doesn't matter what's the latency_window value is. There's always a delay. Can you verify/deny my assumption: MessageError.delay = latency_window+ real delay ? If so the real delay is MessageError.delay - latency_window. For example real_delay(3.185ms) = MessageError.delay(103.185ms) - latency_window(100ms). If so on average i get packets which are 1-10ms later than they should be and they probably should be tolerated. Which leads me to yet another assumption that there's probably something wrong with the tolerance calculation/handling. Because no matter which value i set the messages always get dropped. Any tips where to search next ?

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

Hi, bumping in here.
We are experiencing the same problem as @Lighty0410, whatever values we put as latency (from 120 ms to 2s) we end up having dropped packets even though we are on the same machine (and thus one should naïvely expect that everything should be ok).
We're probably gonna investigate on our side as it is a huge hurdle on our critical path...

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

I'll try to wrap my head around it first and then come back to you with, I hope, a more clear view of the problem :-)

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

Hi @robertream !
I think I have a quite clearer view of the problem. IMO there's two:

  • Firstly is that after ~30min of stream, we see too late packets appearing for no apparent reason. The setup is one sender and one receiver on the same machine, sending text data in binary format to each other. For this problem I cannot see why it happens.
  • Secondly, I saw that you implemented some options of the SRT socket, namely too_late_packet_drop for the Receiver
    pub too_late_packet_drop: bool,
    . However the repercussion on the drop_too_late_packets in the buffer is not implemented. That's something that I'd like to implement to bypass first point : I tested on my setup, removing this drop
    return match self.drop_too_late_packets(now) {
    by commenting it out and it was working fine. It might also indicate where it's going wrong ?
  • Lastly, I read the SRT spec (https://haivision.github.io/srt-rfc/draft-sharabayko-srt.html#name-too-late-packet-drop) and I saw that the threshold for too late packets drop should be 1.25 times the latency (if I understand correctly). Does it corresponds to this
    let latency_window = self.tsbpd_latency + Duration::from_millis(5);
    ?

If I implement the second bullet, is it ok for you ? It will unlock us for the time being :-)
See nipierre@999af6f.

from srt-rs.

robertream avatar robertream commented on June 12, 2024

Yes, please do implement the second bullet. I added all the relevant options from the reference implementation and recall there may have been more than one that weren't wired up to actual functionality.

To fix the "too late packet drop" behavior, we should start with an isolation test for the expected threshold and packet drop behavior. I'm pretty sure there's tests that are supposed to cover this scenario, but they are likely not written appropriately.

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

Done the implementation in #208.

I'll try to think of an isolation test.

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

For my comprehension: do one really wants to drop packets in this case ?

None => return self.drop_too_late_packets(now),

In my tests, this is where it fails after some time (30-ish minutes), but it seems to me that it's dropping parts of the message it's supposed to buffer..
To illustrate:

2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] FIRST (DATA PACKET): {DATA sn=1348615918 loc=PacketLocation(FIRST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"<?xml ve"]}
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] COUNT NOT DONE: 1
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615918 loc=PacketLocation(FIRST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"<?xml ve"]}
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] COUNT NOT DONE: 2
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.604621|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615918)..SeqNumber(1348615919), delay: -00:00.118540 }
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615919 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"egere as"]}
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.604153|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615919)..SeqNumber(1348615920), delay: -00:00.118072 }
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615920 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"lectat, "]}
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.603496|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615920)..SeqNumber(1348615921), delay: -00:00.117415 }
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615921 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"feriorem"]}
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,095 WARN [srt_protocol::connection] -35:45.603208|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615921)..SeqNumber(1348615922), delay: -00:00.117128 }
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615922 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"spexit i"]}
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,095 WARN [srt_protocol::connection] -35:45.602953|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615922)..SeqNumber(1348615923), delay: -00:00.116873 }
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615923 loc=PacketLocation(LAST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=768, start=b" vacuita"]}
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,113 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615923 loc=PacketLocation(LAST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=768, start=b" vacuita"]}

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

I created this repo (https://github.com/nipierre/srt-rs-testing) with a sender and a receiver.
Launch a cargo run --bin receiver -- -p 7777 -l -v and a cargo run --bin sender -- -p 7777 -v and wait for 30+ minutes to see packets drop.

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

@robertream Seems ok for me !
After merge, would it be possible to tag a version so that we can incorporate the fix ?

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@Lighty0410 would you have some time for testing this too?

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@robertream Seems ok for me ! After merge, would it be possible to tag a version so that we can incorporate the fix ?

@russelltg what needs to be done in order to publish this release to crates.io?

from srt-rs.

russelltg avatar russelltg commented on June 12, 2024

Did you end up force pushing the tag? It seems I have a Actions job to publish the crates automatically so if so it'll be the first one that got published...

from srt-rs.

russelltg avatar russelltg commented on June 12, 2024

Anyways, 0.4.2 is def on crates.io. If something missed that original tag we should make a 0.4.3

from srt-rs.

robertream avatar robertream commented on June 12, 2024

Oh, whoops. Do you have time to make a v0.4.3? I'm busy now.

from srt-rs.

robertream avatar robertream commented on June 12, 2024

@nipierre I can close this issue if your testing demonstrates this issue has been fixed

from srt-rs.

nipierre avatar nipierre commented on June 12, 2024

@robertream Sorry, missed your message, ofc you can close it :)

from srt-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.