The view update backlog is propagated in two ways: With respon

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

After discussing this with <a class="user-mention notranslate" data-hovercard-type="us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

View update backlog may not be gossiped if it's already been updated with a response about scylladb HOT 7 OPEN

wmitros commented on May 29, 2024

View update backlog may not be gossiped if it's already been updated with a response

from scylladb.

Comments (7)

wmitros commented on May 29, 2024

@piodul are you familiar with how expensive gossiping is? If we go with the simpler solution here, we'll have add_local_application_state each second which may be very expensive if there are periods when we're not gossiping at all, or it may be free if we're gossiping each second anyway.

There's also problem with the second solution - the backlog in a response is only propagated to one node, so if we update the last sent backlog in gossip also with backlogs sent in responses, we may think we propagated the backlog already, when actually it's been only propagated to one node.

Perhaps we should go with a third approach - when sending replies, only note that the backlog has changed since the last gossip round and still keep the last sent gossip backlog in view_update_backlog_broker. This should avoid the second issue and gossip will keep sending updates each second only as long as we're performing requests with view updates

from scylladb.

piodul commented on May 29, 2024

@piodul are you familiar with how expensive gossiping is?

In general, AFAIK we try to avoid gossiping data unnecessarily. It probably depends mostly on the size of the data.

The most reasonable solution for me would be to have something like this:

Have a flag bool need_publishing;
When the local view update backlog changes, set need_publishing = true
Run a fiber which, every second:
- If need_publishing then update the application state in gossip and set need_publishing = false

This is of course a naive model because it only assumes one shard. In reality, calculating the backlog is done with atomics (see node_update_backlog::add_fetch), so this becomes more complicated.

Perhaps we could have a per-shard, non-atomic need_publishing variable; the fiber would use invoke_on_all and do the check I mentioned on each local shard and would update the application state if the flag was true on any of the shards. This approach would avoid all the concerns related to the ordering of atomics (each shard's backlog is only written by that shard, so we properly serialize with any potential updates of the shard-local backlog).

from scylladb.

wmitros commented on May 29, 2024

After discussing this with @piodul @kostja and @gleb-cloudius, there are a few things worth noting. There are gossip services similar to the view_update_backlog_broker that we're using, particularly the load gossiper and the cache-hit-rate gossiper. The load gossiper broadcasts the load every 60s and the cache-hit-rate gossiper sends its updates every 2s. The gossiped values are used mainly when the node is starting, later they are always obsolete anyway.
The main difference in the view_update_backlog_broker is that it may be completely unused - in contrast to load and cache-hit-rate which are useful in practically every workload. It may also be used periodically, and between the periods, updates from gossip may be needed.
With that in mind, we found a few approaches we can try here:

The approach mentioned by @piodul in #18461 (comment), which has benefits in form of relatively low complexity and performance costs
Simply send the view update backlog in each iteration of the gossiping loop, this would have the biggest performance cost but lowest complexity
Time-out view update backlog values received in responses after some time - in this case we would assume that if we didn't get an update from gossip, the backlog dropped to 0 (or to the last gossiped value). This approach would be relatively simple and inexpensive, but would allow a higher temporary discrepancy
Implement another way of propagating view update backlog sizes. Currently the propagation works well as long as there are frequent updates from the same coordinator to the same node, the values propagated using gossip quite outdated in comparison.

from scylladb.

avikivity commented on May 29, 2024

If it's gossiped every 60 seconds is an optimization worthwhile?

from scylladb.

kostja commented on May 29, 2024

@avikivity the view update backlog is gossiped every second.

from scylladb.

nyh commented on May 29, 2024

@wmitros how does this issue relate to #18462? It seems your original problem statement refers to the case where a zero backlog estimate is not gossipped, so some non-zero estimate sent in some previous request gets kept forever. If this is the problem then this is exactly issue #18462 - no need for both issues.

from scylladb.

wmitros commented on May 29, 2024

@wmitros how does this issue relate to #18462? It seems your original problem statement refers to the case where a zero backlog estimate is not gossipped, so some non-zero estimate sent in some previous request gets kept forever. If this is the problem then this is exactly issue #18462 - no need for both issues.

These issues have similar symptoms but they are separate issues. #18462 refers only to receiving "empty" backlogs from gossip and this issue is about sending repeating backlogs (which probably are most likely to be 0 as well, but don't have to be).

from scylladb.

View update backlog may not be gossiped if it's already been updated with a response about scylladb HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs