I have 3 replicas of pods,A pod pubs a message and B pod subs, but B cant not accept m

To clarify the issue, it sounds like: You have 3 replicas of p

multiple replicas problem about golongpoll HOT 4 CLOSED

jcuga commented on September 25, 2024

multiple replicas problem

from golongpoll.

Comments (4)

jcuga commented on September 25, 2024

To clarify the issue, it sounds like:

You have 3 replicas of pods that contain/use golongpoll
You are having a request to subscribe to category someCategory, which gets load balanced/routed to pod B
You have another request to publish on someCategory, which gets load balanced/routed to Pod A
Because the pub/sub calls are hitting different pods/replicas, they're not aware of each other and your subscription request will never see the published data.

Is this what's going on?

That would be expected, as the longpoll manager is stateful, and does not auto-magically sync with other instances.

I'd have to think long about ways to make cross-replica state syncing work, as this was never part of the design. However, there is a relatively simple way to scale longpolling pub/sub if you can load balance/route requests based on the category. This would involve sharding the categories. For example:

Requests to pub/sub on given Category C deterministically get routed/load-balanced to a longpoll instance/container based on the hash/modulo of the category.
Because a given category's data always winds up on the same instance, you can scale as large as you like with the caveat that a single category must be contained on a single instance, but each instance would have only 1/N categories where N is the number of replicas.
Ex: take hash of category and modulo N where N is number of replicas and route traffic accordingly. This would be easier of the category that is in the pub/sub request is baked in to the URL.

One final thing to be aware of if going the sharding route: if you increased/decreased the number of replicas, then you would stop seeing some category data as traffic would start gettign sharded differently.

from golongpoll.

nisainan commented on September 25, 2024

I have learned your response and thanks for resolving my confuse.But it seems difficult to fix this for a service in k8s,cause this service may have HPA or something.Is there any possibility to use some distributed databases like Redis to resolve this gently?

from golongpoll.

jcuga commented on September 25, 2024

You are right, having horizontal auto scaling would break any attempt at sharding traffic based on category because: 1) we'd have to update the shard size/update how load balancing works and 2) any state before the scale change would be now sharded incorrectly.

Could something like redis pub/sub be used to solve this? Maybe.

I've been thinking of making a more k8s-ready longpoll library, and something like Redis for the data may be a solution. But this would be a new library, as this would not easily fit within the existing design.

As a future note to myself in the event I start working on a new library:

https://thenewstack.io/redis-pub-sub-vs-apache-kafka

The article says redis pub sub will only send data to connected/subscribed clients--if they're not connected, they miss out. So for implementing reliable pub sub via longpoll + redis backend, the longpoll plumbing would have to keep the go-->redis subscriptions alive even after a client's longpoll subscription returns data. Otherwise, if we only have go-->redis subscriptions up while a longpoll request is waiting for data, when golongpoll returns data to the client and the http request disconnects, we'd miss out on any redis data between the time we return and when the http client re-requests the next longpoll.

So it sounds initially like we'd have to keep redis subscriptions alive for a configured amount of extra time and reuse the connections for multiple longpoll requests on the same category.

Ah ha! We're back to the same problem--if we have to keep redis subscriptions alive between longpoll requests otherwise we miss out on reids data, then we're back to the orignal problem of having to always get load balanced to the same longpoll node, as getting routed to a different one that wasn't already listening on a given redis channel would be missing out on recent data. This solution does get around having to "rebalance" sharded data as we're relying on redis for data, but we still would need "sticky" connections in that they'd always have to go to the same longpoll node for a given category.

So now I wonder if some other messaging layer like kafka is how one would address this.

Thinking about this more... maybe there's still a way with redis. My uncertainty comes from not having actually used redis before...

If we can use redis pub sub to get new events, but rely on a more traditional query to get recent past events since a given time, then this might be able to work. Something like:

Longpoll request received for category X:

see if any redis data for X since time T, if so return longpoll data.
else, subscribe on X for new data, return data or timeout within time window..

the above may work, as long as we handle the corner case of data coming in after the query and before the redis subscription. One may have to always subscribe first, then do the redis query--which sounds more costly, but gets around the timing issue.

The above idea sounds like it would get around HPA issues since: 1) redis has the state, the longpoll nodes dont' need to maintain or shard any. 2) because stateless, it doesn't matter which longpoll node an http client winds up on.

I'll think about the above some more, and if it sounds good I may try a golongpoll-redis type library that plays nice with k8s + HPA.

Thanks for the issue feedback.

from golongpoll.

nisainan commented on September 25, 2024

Making longpoll-server stateless and use another stateful database like redis(or other one) behind.
Thanks a lot

from golongpoll.

multiple replicas problem about golongpoll HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs