Brief Summary Ruby Server SDK calls Eventsource with large payload

Eventsource cannot handle large payloads about ruby-eventsource HOT 18 CLOSED

launchdarkly commented on June 2, 2024

Eventsource cannot handle large payloads

from ruby-eventsource.

Comments (18)

escardin commented on June 2, 2024 1

If you're going to filter watched flags, I would love if it could be filtered by tag. I feel like prefixes is more of a workaround for what tags can do more effectively.

from ruby-eventsource.

baweaver commented on June 2, 2024

https://github.com/launchdarkly/ruby-eventsource/compare/master...baweaver:baweaver/bugs/buffer_cursor_offset?expand=1

This is my current idea for a minimal fix, but need to test it against a few prod systems over the next few days to see if that's correct. Gist of the idea is to introduce a cursor to prevent re-reading chunks. String#index takes an offset to start scanning from, which is very useful for this.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

This was brought to our attention recently via a support request. I believe your description is accurate, and I understand it now better than I did the original report. I'm not sure I understand this part, though:

Current experimentation I've been working on has been to prevent full reads of a partial buffer, but I believe this reflects more on the entirety of flag data being sent in a single message. If it truly is all being sent in a single response this will be a severe issue for larger clients.

By "response" do you mean a single SSE event? The stream connection normally remains open, so all downloaded data from LaunchDarkly makes up a single HTTP response. But it is true that the first SSE event sent on the stream (regardless of whether it's the Ruby SDK or another SDK) contains the full flag data. You also mentioned "large clients" which was a bit confusing, but I'm guessing that what you meant there was "large LaunchDarkly data sets".

from ruby-eventsource.

baweaver commented on June 2, 2024

Yeah, that was poorly worded, let me try that one again.

From observation I believe that the initial flag data provided to init on data stores is given in a single SSE PUT message in the form of a JSON blob, so: data: { entire_flag_payload_here }. Because of our admittedly overkill usage of tokens our flags are approximately 150Mb of data going over the wire.

If that was a single message, non-streamed/chunked, it'd be a bit expensive to deserialize that much JSON. When the response is streamed and chunked there won't be a record break (/[\r\n]/) until the end of the 150Mb of data. That means that the @buffer.index(/[\r\n]/) check will be called for every chunk that's received until the end of that payload, starting from the beginning.

The experiment I'm working on right now introduces a cursor to not traverse already read chunks as a quicker fix.

In the initial message I'd used this example, where every - represents a 1Mb theoretical chunk of 150Mb of data:

buffer = ''
buffer.index(/[\r\n]/)

buffer = '-'
buffer.index(/[\r\n]/)

buffer = '--'
buffer.index(/[\r\n]/)

# ...

buffer = '-' * 150
buffer.index(/[\r\n]/)

The fix tries to do this instead:

buffer = ''
buffer.index(/[\r\n]/)

previous_buffer_size = buffer.size
buffer = '-'
buffer.index(/[\r\n]/, previous_buffer_size)

previous_buffer_size = buffer.size
buffer = '--'
buffer.index(/[\r\n]/, previous_buffer_size)

# ...

previous_buffer_size = buffer.size
buffer = '-' * 150
buffer.index(/[\r\n]/, previous_buffer_size)

...so it'll start scanning where the last @buffer.index left off instead of trying to scan the entire thing again.

To be fair this had me scratching my head for a good week or so trying to figure what was up, and it was hard to think of a good way to approach or fix it, so it took a bit to write this ticket. Sorry about the delay there.

from ruby-eventsource.

baweaver commented on June 2, 2024

I'd run some isolated tests to see if I could see just how much impact that index scan has versus using a positional cursor in conjunction with it:

chunk = '-' * 1000
chunks = [*(([chunk] * 999) * 150), chunk + "\n"]

chunks.join.bytesize.fdiv 10**6 # 1 mil
# => 149.851001 (Mb)

# Full scan
Benchmark.measure do
  start_time = Time.now
  previous_time = start_time

  buffer = ''
  chunks.each_with_index do |c, i|
    if i % 1000 == 0
      current_time = Time.now - start_time.to_f
      puts "i: #{i}, delta_time: #{(current_time - previous_time).round(4)}, time: #{current_time.to_f.round(4)}"
      previous_time = current_time
    end

    buffer << c
    break true if buffer.index("\n")
  end
end

# i: 1000, delta_time: 0.0144, time: 0.0144
# i: 2000, delta_time: 0.0388, time: 0.0533
# i: 3000, delta_time: 0.051, time: 0.1043
# ...
# i: 43000, delta_time: 2.4448, time: 46.5457
# i: 44000, delta_time: 2.4821, time: 49.0279
# i: 45000, delta_time: 2.624, time: 51.6519
# ...
# i: 147000, delta_time: 9.4535, time: 648.7023
# i: 148000, delta_time: 9.5025, time: 658.2048
# i: 149000, delta_time: 9.0586, time: 667.2634

# Positional scan
Benchmark.measure do
  start_time = Time.now
  previous_time = start_time

  buffer = ''
  chunks.each_with_index do |c, i|
    if i % 1000 == 0
      current_time = Time.now - start_time.to_f
      puts "i: #{i}, delta_time: #{(current_time - previous_time).round(4)}, time: #{current_time.to_f.round(4)}"
      previous_time = current_time
    end

    pos = buffer.size
    buffer << c
    break true if buffer.index("\n", pos)
  end
end

# i: 142000, delta_time: 0.0007, time: 0.1223
# i: 143000, delta_time: 0.0007, time: 0.123
# i: 144000, delta_time: 0.0008, time: 0.1238
# i: 145000, delta_time: 0.0007, time: 0.1245
# i: 146000, delta_time: 0.0007, time: 0.1252
# i: 147000, delta_time: 0.0007, time: 0.1259
# i: 148000, delta_time: 0.0007, time: 0.1266
# i: 149000, delta_time: 0.0007, time: 0.1274

667.2634 / 0.1274
# => 5218.621638954628 (times faster)

This looks to be a viable fix to try, and I'll test it with our production servers on Monday. Any thoughts?

from ruby-eventsource.

eli-darkly commented on June 2, 2024

It looks plausible at first glance, we'll just need to review it in more detail.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

Looking at the branch with your proposed fix, there's one thing I'm not sure is right. It's here, where you are moving @cursor_position to the current end-of-buffer immediately before appending each new chunk of data to the buffer.

It seems to me that a more logical place to do this would be within read_line, immediately after you have scanned to the end of the buffer and determined that there isn't a line break in it. Currently I think the result is the same, since read_chunk_into_buffer is only ever called from that same place, but logically it is more related to what read_line is doing.

Either way, I think there's an edge case where it could end up being wrong— but it's an edge case that I think the existing code is already vulnerable to. SSE allows the line break to be \r, \n, or \r\n. So, if we read some data and it ends in \r, we could consider that to be the end of a line but we do not want a subsequent \n at the beginning of the next chunk to be treated as another line break; it should be treated as just an optional suffix to the previous \r. Handling that properly requires keeping track of a little extra state. So I'm going to try to address that in the process of integrating this fix.

(Edge cases like that are a common pitfall in writing an SSE implementation; I'm pretty sure we ran into this one before on another platform, but missed it in Ruby. For that reason, we're looking into implementing a standardized set of acceptance/contract tests to be run against all of our SSE implementations, separate from the higher-level SDK testing.)

from ruby-eventsource.

eli-darkly commented on June 2, 2024

I'm also not sure that our use of @lock was actually necessary; if there is a way that read_line or read_chunk_into_buffer would ever get called from more than one thread, I'm not seeing it.

from ruby-eventsource.

baweaver commented on June 2, 2024

Yes, and to be clear these are mostly theoretical changes that look decent on paper. I would have to vet out edge cases as well and you know far more than me on how SSEs work. Still intending on giving this a test, but other projects have been demanding attention lately.

I believe the larger change, which has a much larger footprint, would be around sending the entire payload as one SSE versus potential sharding patterns for it, or filtering on the server side before they're even sent down. My fear here is that we won't be the only clients with that large of a payload in the future, so it may be a discussion worth having now.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

Sending the payload as multiple events would be a much, much larger change, incompatible with all our existing SDKs and affecting multiple layers of the service infrastructure. It's unlikely that that will happen in the foreseeable future. Adding some kind of filtering mechanism is more feasible and it's been requested as a feature before.

from ruby-eventsource.

baweaver commented on June 2, 2024

Agreed that sharding payloads is a major breaking change. Server-side filtering would be far more likely and immediately useful, and if a change like this works it'll buy some more time to address it as an option.

I know we've written filtering at a client-side level for data stores, but ideally the filtering is done on the server side before payloads are sent.

from ruby-eventsource.

zellyn commented on June 2, 2024

We've actually added filtering (at the flag data store interface level) in our Go and Java implementations, so that apps suffering from memory ballooning due to LD can explicitly specify the list of flags (or flag name prefixes) they care about. Everything is still downloaded initially, and streamed, but only the specified flags are kept.

Our ideal client SDK would probably look something like this:

allow clients to optionally specify a list of flags/prefixes on initial connection
send only those flags/prefixes in the initial dump
send only those flags/prefixes in subsequent updates
if the client requests a flag that is not currently present:
- log it (once)
- fetch it explicitly
- add it to the list of flags so that it is streamed for the rest of the connection's lifetime
- (optionally) add it to the list of flags for all connections to apps with the same name/identity

Segments would presumable be handled the same way as flags, with the possible optimization of including any segments mentioned in a wanted flag.

The big question would be whether keeping the flag list per connection in Relay would be expensive. If it's a problem, some sort of copy-on-write/coalesce-on-collision system could allow all 50 instances of the same app to (eventually) share the same list of flags.

from ruby-eventsource.

baweaver commented on June 2, 2024

We might spin those discussions off into a separate ticket, don't want to overload this too much. Also I added a Ruby implementation for local filtering, so that exists too @zellyn.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

Filtering would be a general feature request in the entire product, not specific to the Ruby SDK. And it's a request we've received before, so there's no need to file it again; I'll just make a note that there's more interest in it. Let's table that here, so that this GitHub issue on ruby-eventsource can remain about the ruby-eventsource problem.

(Although - @zellyn, I do want to address one thing you mentioned above, as something that would probably not be in any future filtering implementation: the idea that if the client needed a flag it had not previously received, it would fetch the flag from LD or from Relay at that time. We've deliberately never implemented anything in the SDKs that would involve making HTTP requests to the service endpoints on demand during a flag evaluation— that is just too much latency and unpredictability. The SDKs do all of their HTTP dialogue in the background, updating their internal state accordingly, so that flag evaluations can simply reference that internal state. The only exception is that in server-side SDKs which have database integrations, some of that state might need to be requeried from a database if it's not currently cached.)

from ruby-eventsource.

eli-darkly commented on June 2, 2024

@baweaver We're reviewing a PR now that has a number of improvements to stream reading, including the one you suggested. We should be able to release the fix this week.

from ruby-eventsource.

zellyn commented on June 2, 2024

We've deliberately never implemented anything in the SDKs that would involve making HTTP requests to the service endpoints on demand during a flag evaluation— that is just too much latency and unpredictability. The SDKs do all of their HTTP dialogue in the background, updating their internal state accordingly, so that flag evaluations can simply reference that internal state.

Absolutely. Our internal home-built feature flag system works exactly the same way: all evaluations are local, based on asynchronously updated rules that can be evaluated. I don't think one synchronous evaluation for a flag would hurt, although it would also be acceptable to return the default, log a warning (so that clients can add the flag to the list of initially-requested flags), and fetch the value asynchronously.

Filtering would be a general feature request in the entire product, not specific to the Ruby SDK. And it's a request we've received before, so there's no need to file it again; I'll just make a note that there's more interest in it.

Indeed, it is seeming more and more likely that it is going to become a requirement for Square to be able to successfully use LaunchDarkly.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

@baweaver The changes in that PR have been released and the improved SSE client is now in the 6.2.5 release of the Ruby SDK. Please let us know if you see improvements in performance after updating to this version.

from ruby-eventsource.

eli-darkly commented on June 2, 2024

Closing this because implementation changes in the last several releases have made it obsolete.

from ruby-eventsource.

Eventsource cannot handle large payloads about ruby-eventsource HOT 18 CLOSED

Comments (18)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs