GithubHelp home page GithubHelp logo

Comments (6)

FZambia avatar FZambia commented on June 15, 2024

Hello @Aohzan

What is your Centrifugo configuration for Redis Sentinel? Do you use redis_sentinel_address? Which version of Centrifugo you have? Which Redis version? Anything specific in your setup? Asking to also understand why Centrifugo could miss master change.

from centrifugo.

Aohzan avatar Aohzan commented on June 15, 2024

centrifugo 4.1.2-0
redis-sentinel 6:7.2.4-1rl1~bullseye1 (upgraded from 5:6.0.16-1+deb11u2 before the sentinel failover)

redis configuration part in centrifugo

  "engine": "redis",                     
  "redis_sentinel_address": "localhost:26379",
  "redis_sentinel_master_name": "mymaster",

there is:

  • 4 centrifugo servers with centrifugo + sentinel
  • 3 redis servers with redis + sentinel

so 7 sentinel, the failover of the master has been handled correctly on all sentinel, but just 1 of 4 centrifugo didn't change its redis server

from centrifugo.

FZambia avatar FZambia commented on June 15, 2024

Thanks for the details. In your case, I'd concentrate on trying to fix the root cause - I suppose first step here is upgrading to the latest Centrifugo version and trying to reproduce with it – there were many improvements in the underlying Redis library since v4.1.2.

I think adding Redis connection check in health endpoint may not fully solve the problem. It may hide the problem, until no Centrifugo nodes left. I suppose you have just a Haproxy without Kubernetes? In this case Haproxy will remove failed node from the balancing, but the issue on Node will persist until someone restarts Centrifugo node. Is it right? Or it works differently? Probably there is some automation to restart failing node?

In Kubernetes liveness probe failure results into app restart – in that case Redis connection check could make more sense.

from centrifugo.

Aohzan avatar Aohzan commented on June 15, 2024

Yes, we will see to update to the latest version.

Yes, no k8s, what we want it's that the HAProxy will exclude the problematic Centrifugo from the pool as soon as it can't handle request properly due to Redis issue. So it let me the time to debug and restart Centrifugo.

from centrifugo.

Aohzan avatar Aohzan commented on June 15, 2024

Hello
We planned the upgrade, as we had the same issue on all nodes after a redis failover.
For the original request, do you have an opinion?

from centrifugo.

FZambia avatar FZambia commented on June 15, 2024

We planned the upgrade, as we had the same issue on all nodes after a redis failover.

Probably you can experiment with new version first regarding Sentinel failover to make sure it solves the problem. I mean without upgrading rest of the app. While upgrade is recommended anyway, this may help to iterate on issue faster.

For the original request, do you have an opinion?

I think optional Redis check could make sense in general, though prefer it being disabled by default – when dealing with many connections removing node from balancing may be not better that waiting for connection issues to go away. Looks like Centrifugo needs to issue some write request to Redis to make sure connection is working and writable.

UPD. Though not sure what to do with Redis Cluster case, where many Redis shards must be checked. In this perspective I'd invest to proper failover as I said before

from centrifugo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.