GithubHelp home page GithubHelp logo

Comments (11)

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Observability: we should be able to track why a certain client was throttled, ie which specific metric it was throttled on.

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Throttler check requests (mostly via throttler clients) should be able to specify the list of metrics on which they wish to throttle (e.g. "I care about replication lag, but fine to ignore load average")

  • The set of metrics specified by the client will AND with each other, ie if the client chooses to throttle based on lag,loadavg then both lag and loadavg need to individually pass for the overall check to pass.

    I don't think it makes sense to OR or to have any other combination.

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

As mentioned above, we want to be able to change the list of considered metrics while an Online DDL operation is running (as an example). So that, for example, we want Online DDL to start throttling based on lag and based on load average, or then later on for it to stop throttling based on load average and remain just with lag.

IMO the way to do that is to associate metrics with an app name. All Online DDL operations use the app name "online-ddl". So the way would be to associate "online-ddl": "lag,loadavg".

That association will then either

  • make its way to the throttler client -- which then provides to the throttler the list of metrics its interested in,
  • or, keeping the throttler client ignorant, computed on behalf of the client by the throttler.

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

metrics can be collected from the single tablet being probed, or from the collective shard.

  • Replication lag is normally something you wish to collect from the entire shard (including primary), because you want to know about replica's lag. There is a strong reason to check on all shard servers.
  • What about load average? Are you concerned with the load average on the PRIMARY or are you concerned about the metric on replicas? There is no clear answer and you probably want to check on PRIMARY only.

To that effect:

  • A metric is associated with a scope (self/shard). Each metric has a default scope. lag uses shard, others use self.
  • A normal check will use the default scopes (per metric).
  • But the user may also indicate "I wish to check the entire shard for all metrics" or I wish to check self scope for all metrics". In which case we override the metrics' defaults.

Moreover, consider the discussion in previous comment re: associating metrics with apps. It will be even further possible to fine grain the checks by associating "online-ddl": "lag,shard/loadavg". Note:

  • the scope is not mandatory (nothing declared for lag, and so the scope for lag is the default one for this metric, which happens to be shard).
  • per-metric scopes are ignored by the self-checks, which are the mechanism by which the tablets collect their own metrics and by which the PRIMARY tablet collects metrics from the replicas.

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024
  • Adding support for an all app, which is a catch-all for anything that's doesn't have any specific rules. With all, it is possible to do inverted rules, such as "everything is rejected, except this app which is allowed". Or, "everything throttles at 0.7 ratio for the next 2 hours, except these two apps, one of which is exempted in the next 5 hours, the other throttled at 0.2 ratio for the next 30min". Or also "everything is exempted, but this app needs to go through normal throttling".

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024
  • Adding vtctldclient CheckThrottler command, which returns a detailed CheckThrottlerResponse. The command takes a tablet name as argument (potentially also it could take shard name, much like Backup and BackupShard). IT takes --app-name and --scope optional arguments as well as some extra flags.

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Required additions to vtctldclient UpdateThrottlerConfig:

  • Updating the threshold for a given metric name. Setting threshold to 0 will remove the entry.
    We can use the existing --threshold flag, and add --metric-name=... flag. IF the latter exists, then --threshold must be specified. If it does not exist, then we assume the "default" metric.
  • Setting the per app metrics. Something like --app-name=online-ddl --app-metrics=lag,shard/loadavg. The two flags must come together - either both exist, or none exists. It's OK to provide an empty --app-metric, in which case the throttler uses the default metrics for the given app. --app-name must not be empty. It can be "all".

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Eventually (v21/v22/v23, depending), we will deprecate these flags in vtctldclient UpdateThrottlerConfig:

  • --check-as-check-self
  • --check-as-check-shard
    We will also clean up these fields from UpdateThrottlerConfigRequest:
  • CheckAsCheckSelf
  • CheckAsCheckShard

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024
  • Assigning metrics to "all" app should apply to 'all' app should apply to all apps which do not already have any explicit metrics assigned:
$ vtctldclient UpdateThrottlerConfig --app-name "all" --app-metrics "lag,loadavg" commerce

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Addressed by #15988

from vitess.

shlomi-noach avatar shlomi-noach commented on June 22, 2024

Base branch PR for changes: planetscale:throttler-multi-metrics-incremental #16012, onto which we will merge multiple incremental PRs.

from vitess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.