Is your feature request related to a problem? Please describe Hell

but it's better than no response at all due to congestion. </blockquo

Allow specify number of possible unavailable nodes when rerouting about victoriametrics HOT 8 OPEN

Sinketsu commented on June 22, 2024

Allow specify number of possible unavailable nodes when rerouting

from victoriametrics.

Comments (8)

Haleygo commented on June 22, 2024

Hello!

Disable rerouting on unavailable, but in this case, we will suffer when updating servers sequentially (during rolling upgrade)

Why will you suffer here? New vmstorage node should take few minutes to be ready during rolling upgrade, remotewrite client like vmagent or prometheus should be able to buffer unsuccessful write requests and resend them when vmstorage is back.

For example, if 4 of 5 vmstorage will be unavailable, all traffic will be rerouted to 1 server. This server will not withstand such a heavy load, it will slow down both writing and reading.

You mean having an option like rerouteMaxUnavailableNodeTolerance=3, re-route is diabled when 4 out of 5 vmstorage nodes are down? I don't see how this option help with cluster availability though:

if re-route is enabled, it's likely that the only available vmstorage node can't handle the load and crash;
if re-route is disabled, the query results are very likely partial(single node only contain 20% of series if -replicationFactor=1) and unreliable.

So in both cases, read and write requests are failed, and vmstorage nodes must be fixed to serve.

from victoriametrics.

Sinketsu commented on June 22, 2024

Hello!

Disable rerouting on unavailable, but in this case, we will suffer when updating servers sequentially (during rolling upgrade)

Why will you suffer here? New vmstorage node should take few minutes to be ready during rolling upgrade, remotewrite client like vmagent or prometheus should be able to buffer unsuccessful write requests and resend them when vmstorage is back.

If the update goes through without problems, then yes, the buffer on the agents will save us. But this requires a fairly large buffer on agents, which is difficult for us to do. And there may also be situations when the server goes out for maintenance for a longer time - for example, several hours. In this case, we would like not to lose data, since the remaining servers will be able to take out the load.

For example, if 4 of 5 vmstorage will be unavailable, all traffic will be rerouted to 1 server. This server will not withstand such a heavy load, it will slow down both writing and reading.

You mean having an option like rerouteMaxUnavailableNodeTolerance=3, re-route is diabled when 4 out of 5 vmstorage nodes are down? I don't see how this option help with cluster availability though:

if re-route is enabled, it's likely that the only available vmstorage node can't handle the load and crash;

if re-route is disabled, the query results are very likely partial(single node only contain 20% of series if -replicationFactor=1) and unreliable.

So in both cases, read and write requests are failed, and vmstorage nodes must be fixed to serve.

In the second case the data will be marked as partial, but it's better than no response at all due to congestion. In this case we can retry request to another AZ, or merge data from another AZ (depends on the selected vmselect operation scheme).
And it is much more important that the servers themselves will not suffer from a large write flow. Now the server in such a situation may become unavailable due to the large utilization of the CPU.

from victoriametrics.

Haleygo commented on June 22, 2024

but it's better than no response at all due to congestion.

I don't think wrong result is better than no response, and anomaly can be noticed quicker when there is no response.

In this case we can retry request to another AZ, or merge data from another AZ (depends on the selected vmselect operation scheme).

If there is another AZ, in this case, you should switch to the second AZ directly. No matter the vmstorage nodes in first AZ is partially down(partial response) or totally down(no response), otherwise, you got wrong results for users and rule evaluation.

And it's unclear how to set rerouteMaxUnavailableNodeTolerance for big cluster, how to estimate that N nodes down is ok, but N+1 nodes down is unacceptable.

from victoriametrics.

Sinketsu commented on June 22, 2024

If there is another AZ, in this case, you should switch to the second AZ directly. No matter the vmstorage nodes in first AZ is partially down(partial response) or totally down(no response), otherwise, you got wrong results for users and rule evaluation.

Yes, we can switch to another AZ, but we would like this to happen automatically. We are currently using a single vmselect cluster over multiple AZ, as each AZ may be unavailable for some time (more, than buffer can hold). And in such a scheme, we will wait a very long time for a response from the problem AZ due to server overload.

And it's unclear how to set rerouteMaxUnavailableNodeTolerance for big cluster, how to estimate that N nodes down is ok, but N+1 nodes down is unacceptable.

It seems that this can be determined empirically by the system administrators who maintain this cluster.

from victoriametrics.

Haleygo commented on June 22, 2024

Yes, we can switch to another AZ, but we would like this to happen automatically. We are currently using a single vmselect cluster over multiple AZ, as each AZ may be unavailable for some time (more, than buffer can hold).

I would recommend to use seperated vmselect for each AZ, and use vmauth as proxy in front of vmselect, the topology is like this.

vmselect should be configured with -search.denyPartialResponse=true, vmauth uses first_available policyand will auto-switch to the second AZ when AZ1 returns partial responses.
Some pros of this topology:

less pressure on vmselect, as there is only 50% of data compare to connecting both vmcluster;
less cross-AZ network traffic, you can always set the "local" vmcluster as your first available server.
See similar usage in https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-distributed.

It seems that this can be determined empirically by the system administrators who maintain this cluster.

I don't think it's easy to do, and it's hard to provide actionable recommendation for users.

from victoriametrics.

Sinketsu commented on June 22, 2024

I would recommend to use seperated vmselect for each AZ, and use vmauth as proxy in front of vmselect, the topology is like this.

We can't.
We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ. vmauth will not be able to detect such a problem, so it will send requests to this zone, which will lead to incorrect display of dashboards (there will be data gaps) and alerts.

from victoriametrics.

Haleygo commented on June 22, 2024

We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ.

vmselect with -search.denyPartialResponse=true will fail query requests if more than replicationFactor-1 vmstorage node is unavailable, then vmauth will mark this AZ as broken and use another AZ.
If storage nodes on AZ1 are all fixed but old data haven't been backfill, remove AZ1 vmselect address in vmauth config until the data is fixed, it's pretty handy since vmauth can be hot loaded.

from victoriametrics.

Sinketsu commented on June 22, 2024

We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ.

If storage nodes on AZ1 are all fixed but old data haven't been backfill, remove AZ1 vmselect address in vmauth config until the data is fixed, it's pretty handy since vmauth can be hot loaded.

This requires constant manual manipulation of data and configs. And I would like the system to respond to this automatically.

from victoriametrics.

Allow specify number of possible unavailable nodes when rerouting about victoriametrics HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs