GithubHelp home page GithubHelp logo

Comments (7)

wmitros avatar wmitros commented on May 29, 2024

@piodul are you familiar with how expensive gossiping is? If we go with the simpler solution here, we'll have add_local_application_state each second which may be very expensive if there are periods when we're not gossiping at all, or it may be free if we're gossiping each second anyway.

There's also problem with the second solution - the backlog in a response is only propagated to one node, so if we update the last sent backlog in gossip also with backlogs sent in responses, we may think we propagated the backlog already, when actually it's been only propagated to one node.

Perhaps we should go with a third approach - when sending replies, only note that the backlog has changed since the last gossip round and still keep the last sent gossip backlog in view_update_backlog_broker. This should avoid the second issue and gossip will keep sending updates each second only as long as we're performing requests with view updates

from scylladb.

piodul avatar piodul commented on May 29, 2024

@piodul are you familiar with how expensive gossiping is?

In general, AFAIK we try to avoid gossiping data unnecessarily. It probably depends mostly on the size of the data.

The most reasonable solution for me would be to have something like this:

  • Have a flag bool need_publishing;
  • When the local view update backlog changes, set need_publishing = true
  • Run a fiber which, every second:
    • If need_publishing then update the application state in gossip and set need_publishing = false

This is of course a naive model because it only assumes one shard. In reality, calculating the backlog is done with atomics (see node_update_backlog::add_fetch), so this becomes more complicated.

Perhaps we could have a per-shard, non-atomic need_publishing variable; the fiber would use invoke_on_all and do the check I mentioned on each local shard and would update the application state if the flag was true on any of the shards. This approach would avoid all the concerns related to the ordering of atomics (each shard's backlog is only written by that shard, so we properly serialize with any potential updates of the shard-local backlog).

from scylladb.

wmitros avatar wmitros commented on May 29, 2024

After discussing this with @piodul @kostja and @gleb-cloudius, there are a few things worth noting. There are gossip services similar to the view_update_backlog_broker that we're using, particularly the load gossiper and the cache-hit-rate gossiper. The load gossiper broadcasts the load every 60s and the cache-hit-rate gossiper sends its updates every 2s. The gossiped values are used mainly when the node is starting, later they are always obsolete anyway.
The main difference in the view_update_backlog_broker is that it may be completely unused - in contrast to load and cache-hit-rate which are useful in practically every workload. It may also be used periodically, and between the periods, updates from gossip may be needed.
With that in mind, we found a few approaches we can try here:

  1. The approach mentioned by @piodul in #18461 (comment), which has benefits in form of relatively low complexity and performance costs
  2. Simply send the view update backlog in each iteration of the gossiping loop, this would have the biggest performance cost but lowest complexity
  3. Time-out view update backlog values received in responses after some time - in this case we would assume that if we didn't get an update from gossip, the backlog dropped to 0 (or to the last gossiped value). This approach would be relatively simple and inexpensive, but would allow a higher temporary discrepancy
  4. Implement another way of propagating view update backlog sizes. Currently the propagation works well as long as there are frequent updates from the same coordinator to the same node, the values propagated using gossip quite outdated in comparison.

from scylladb.

avikivity avatar avikivity commented on May 29, 2024

If it's gossiped every 60 seconds is an optimization worthwhile?

from scylladb.

kostja avatar kostja commented on May 29, 2024

@avikivity the view update backlog is gossiped every second.

from scylladb.

nyh avatar nyh commented on May 29, 2024

@wmitros how does this issue relate to #18462? It seems your original problem statement refers to the case where a zero backlog estimate is not gossipped, so some non-zero estimate sent in some previous request gets kept forever. If this is the problem then this is exactly issue #18462 - no need for both issues.

from scylladb.

wmitros avatar wmitros commented on May 29, 2024

@wmitros how does this issue relate to #18462? It seems your original problem statement refers to the case where a zero backlog estimate is not gossipped, so some non-zero estimate sent in some previous request gets kept forever. If this is the problem then this is exactly issue #18462 - no need for both issues.

These issues have similar symptoms but they are separate issues. #18462 refers only to receiving "empty" backlogs from gossip and this issue is about sending repeating backlogs (which probably are most likely to be 0 as well, but don't have to be).

from scylladb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.