GithubHelp home page GithubHelp logo

Comments (8)

Haleygo avatar Haleygo commented on June 22, 2024

Hello!

Disable rerouting on unavailable, but in this case, we will suffer when updating servers sequentially (during rolling upgrade)

Why will you suffer here? New vmstorage node should take few minutes to be ready during rolling upgrade, remotewrite client like vmagent or prometheus should be able to buffer unsuccessful write requests and resend them when vmstorage is back.

For example, if 4 of 5 vmstorage will be unavailable, all traffic will be rerouted to 1 server. This server will not withstand such a heavy load, it will slow down both writing and reading.

You mean having an option like rerouteMaxUnavailableNodeTolerance=3, re-route is diabled when 4 out of 5 vmstorage nodes are down? I don't see how this option help with cluster availability though:

  1. if re-route is enabled, it's likely that the only available vmstorage node can't handle the load and crash;
  2. if re-route is disabled, the query results are very likely partial(single node only contain 20% of series if -replicationFactor=1) and unreliable.

So in both cases, read and write requests are failed, and vmstorage nodes must be fixed to serve.

from victoriametrics.

Sinketsu avatar Sinketsu commented on June 22, 2024

Hello!

Disable rerouting on unavailable, but in this case, we will suffer when updating servers sequentially (during rolling upgrade)

Why will you suffer here? New vmstorage node should take few minutes to be ready during rolling upgrade, remotewrite client like vmagent or prometheus should be able to buffer unsuccessful write requests and resend them when vmstorage is back.

If the update goes through without problems, then yes, the buffer on the agents will save us. But this requires a fairly large buffer on agents, which is difficult for us to do. And there may also be situations when the server goes out for maintenance for a longer time - for example, several hours. In this case, we would like not to lose data, since the remaining servers will be able to take out the load.

For example, if 4 of 5 vmstorage will be unavailable, all traffic will be rerouted to 1 server. This server will not withstand such a heavy load, it will slow down both writing and reading.

You mean having an option like rerouteMaxUnavailableNodeTolerance=3, re-route is diabled when 4 out of 5 vmstorage nodes are down? I don't see how this option help with cluster availability though:

  1. if re-route is enabled, it's likely that the only available vmstorage node can't handle the load and crash;
  2. if re-route is disabled, the query results are very likely partial(single node only contain 20% of series if -replicationFactor=1) and unreliable.

So in both cases, read and write requests are failed, and vmstorage nodes must be fixed to serve.

In the second case the data will be marked as partial, but it's better than no response at all due to congestion. In this case we can retry request to another AZ, or merge data from another AZ (depends on the selected vmselect operation scheme).
And it is much more important that the servers themselves will not suffer from a large write flow. Now the server in such a situation may become unavailable due to the large utilization of the CPU.

from victoriametrics.

Haleygo avatar Haleygo commented on June 22, 2024

but it's better than no response at all due to congestion.

I don't think wrong result is better than no response, and anomaly can be noticed quicker when there is no response.

In this case we can retry request to another AZ, or merge data from another AZ (depends on the selected vmselect operation scheme).

If there is another AZ, in this case, you should switch to the second AZ directly. No matter the vmstorage nodes in first AZ is partially down(partial response) or totally down(no response), otherwise, you got wrong results for users and rule evaluation.

And it's unclear how to set rerouteMaxUnavailableNodeTolerance for big cluster, how to estimate that N nodes down is ok, but N+1 nodes down is unacceptable.

from victoriametrics.

Sinketsu avatar Sinketsu commented on June 22, 2024

If there is another AZ, in this case, you should switch to the second AZ directly. No matter the vmstorage nodes in first AZ is partially down(partial response) or totally down(no response), otherwise, you got wrong results for users and rule evaluation.

Yes, we can switch to another AZ, but we would like this to happen automatically. We are currently using a single vmselect cluster over multiple AZ, as each AZ may be unavailable for some time (more, than buffer can hold). And in such a scheme, we will wait a very long time for a response from the problem AZ due to server overload.

And it's unclear how to set rerouteMaxUnavailableNodeTolerance for big cluster, how to estimate that N nodes down is ok, but N+1 nodes down is unacceptable.

It seems that this can be determined empirically by the system administrators who maintain this cluster.

from victoriametrics.

Haleygo avatar Haleygo commented on June 22, 2024

Yes, we can switch to another AZ, but we would like this to happen automatically. We are currently using a single vmselect cluster over multiple AZ, as each AZ may be unavailable for some time (more, than buffer can hold).

I would recommend to use seperated vmselect for each AZ, and use vmauth as proxy in front of vmselect, the topology is like this.
image
vmselect should be configured with -search.denyPartialResponse=true, vmauth uses first_available policyand will auto-switch to the second AZ when AZ1 returns partial responses.
Some pros of this topology:

  1. less pressure on vmselect, as there is only 50% of data compare to connecting both vmcluster;
  2. less cross-AZ network traffic, you can always set the "local" vmcluster as your first available server.
    See similar usage in https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-distributed.

It seems that this can be determined empirically by the system administrators who maintain this cluster.

I don't think it's easy to do, and it's hard to provide actionable recommendation for users.

from victoriametrics.

Sinketsu avatar Sinketsu commented on June 22, 2024

I would recommend to use seperated vmselect for each AZ, and use vmauth as proxy in front of vmselect, the topology is like this.

We can't.
We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ. vmauth will not be able to detect such a problem, so it will send requests to this zone, which will lead to incorrect display of dashboards (there will be data gaps) and alerts.

from victoriametrics.

Haleygo avatar Haleygo commented on June 22, 2024

We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ.

vmselect with -search.denyPartialResponse=true will fail query requests if more than replicationFactor-1 vmstorage node is unavailable, then vmauth will mark this AZ as broken and use another AZ.
If storage nodes on AZ1 are all fixed but old data haven't been backfill, remove AZ1 vmselect address in vmauth config until the data is fixed, it's pretty handy since vmauth can be hot loaded.

from victoriametrics.

Sinketsu avatar Sinketsu commented on June 22, 2024

We may have one AZ unavailable for a long time. During this period of time, there will be no metrics at all in this AZ.

If storage nodes on AZ1 are all fixed but old data haven't been backfill, remove AZ1 vmselect address in vmauth config until the data is fixed, it's pretty handy since vmauth can be hot loaded.

This requires constant manual manipulation of data and configs. And I would like the system to respond to this automatically.

from victoriametrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.