The current replicon code does not clean up despawns/component removals on a client af

We could do this: The client always does Option 4. On

Client reconnect cleanup,about projectharmonia/bevy_replicon

Comments (19)

UkoeHB commented on May 27, 2024 1

We could do this:

The client always does Option 4.
On connect/reconnect, the client sends a reliable message 'I connected/reconnected' to the server.
The server caches ClientInfo on a timer.
On client connect, the server checks if an info is cached. If it is then wait a configurable timer for the client connection message. If the timer expires, the 'I connected' message is received, or the info didn't exist, then make a new info and replicate from scratch. If 'I reconnected' is received then reuse the existing info.

from bevy_replicon.

Shatur commented on May 27, 2024 1

Oh, I get it now. I completely agree.

The user data is fixed-length, it would not work.

Your suggestion with using reliable message also solves this problem, we can include any data into the "I reconnected" message.

from bevy_replicon.

UkoeHB commented on May 27, 2024

More thoughts on cleanup:

Looking at ClientPlugin::reset_system, what is the best way for replicon to handle reconnects while the client app is still running?

A) Completely reset the client and re-replicate everything? That's what this line implies. However I question this implementation - you have to remove the RenetClient resource before this system, then re-insert it after this system, which is definitely not intuitive and leaves a window where Res<RenetClient> will panic. Another problem is clearing the ServerEntityMap means entities replicated post-reconnect will be re-spawned, so all existing entities need to be cleaned up manually (and any child entities or resources entangled with replicated entities - for example in my girk demo I spawn UI elements to represent players and then carefully clean them up when the player entities are despawned (we can just 'require' that replicon users add their own cleanup protocols in case of reconnects, but it would be nice if things just tended to work and cleanup is only necessary for entities that are explicitly despawned by the server)).
B) On connect, clients send their replicon state to the server before the server tries to do any replication (replicon tick, entity acks, and pre-mapped entities). This way the client can 'rejoin' with its existing state and the server only needs to send un-acked data. The problem here is the client will miss out on a window of reliable server events. If the client uses those events to update its state (i.e. client state is entangled with the order of server events), then it's better for the client to reinitialize from scratch (including all replicated entities, since we'd have to assume replicon state is also entangled with the event sequence) and use a server-sent initializer to repair missing information from lost reliable events.

Reconnects are generally quite rare, so it is probably fine to get something that just works robustly without trying to over-optimize it. What are your thoughts?

from bevy_replicon.

Shatur commented on May 27, 2024

I would be fine with any of the proposed solutions. On paper A sounds like easier for us, but harder for users. B sounds like harder for us, but better for users. So I would probably try to see how hard implementing B would end up.

from bevy_replicon.

UkoeHB commented on May 27, 2024

For a future PR:
ClientPlugin::reset_system should be in PreUpdate in ClientSet::Receive and use .run_if(bevy_renet::client_just_disconnected()). Then users can put their reconnect logic after that.

EDIT: Looks like the server events reset_system also needs to be updated.

from bevy_replicon.

UkoeHB commented on May 27, 2024

Further thoughts on reconnects:

There are a number of ways we can handle a disconnect - reconnect cycle. Each one has ergonomics pros and cons.

Options

Option 1: Reset client on disconnect.

In this case we despawn all replicated entities on the client on disconnect.

On disconnect

bevy_replicon server: Remove client entry from ClientsInfo.
bevy_replicon client: Despawn everything with a Replication component.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Detect despawned entities and clean up local entity trackers. This includes resources with entity maps and secondary entities that are tied to replicated entities (e.g. UI elements that listen to replicated entity ids). Also run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon server. Sends one big init message with the entire replicated server state.
bevy_replicon client: Do nothing.
Game server: Run custom client-connect logic. Send client-startup message to client assuming they restarted their app.
Game client: Run custom initialization logic. Special 'reconnect' logic may be needed. When the server's client-startup message arrives, handle it (the client's loading screen should probably wait for this).

Advantages

This is a 'clean' solution since clients always start fresh after a reconnect.

Drawbacks

Game clients must include comprehensive cleanup logic for disconnects, and architect their apps with the assumption that all entities can be despawned and respawned due to a disconnect. There is no such thing as a 'persistent entity'.
Entities are despawned immediately after a disconnect, which means there will be a noticeable span of time between disconnect and reconnect where a bunch of important entities disappear. This can be quite jarring for players.
Servers must send a large replication init message when a client connects, potentially duplicating data the client already has.

Option 2: Reset client on reconnect.

In this case we despawn all replicated entities on the client right before handling the first init message received post-reconnect.

On disconnect

bevy_replicon server: Remove client entry from ClientsInfo.
bevy_replicon client: Do nothing.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon server. Sends one big init message with the entire replicated server state.
bevy_replicon client: Despawn everything with a Replication component immediately before the server's first init message post-reconnect is applied.
Game server: Run custom client-connect logic. Send client-startup message to client assuming they restarted their app.
Game client: Detect despawned entities and clean up local entity trackers. This includes resources with entity maps and secondary entities that are tied to replicated entities (e.g. UI elements that listen to replicated entity ids). Also run custom initialization logic. Special 'reconnect' logic may be needed. When the server's client-startup message arrives, handle it (the client's loading screen should probably wait for this).

Advantages

This is a 'clean' solution since clients always start fresh after a reconnect.
There is no span of time where entities completely disappear.

Drawbacks

User clients must include comprehensive cleanup logic for disconnects, and architect their apps with the assumption that all entities can be despawned and respawned due to a disconnect. There is no such thing as a 'persistent entity'.
Servers must send a large replication init message when a client connects, potentially duplicating data the client already has.

Option 3: Respawn client entities on reconnect.

In this case we despawn all replicated entities on the client on reconnect, then reuse the same entity ids and entity map when spawning entities for the first init message received post-reconnect.

On disconnect

bevy_replicon server: Remove client entry from ClientsInfo.
bevy_replicon client: Do nothing.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon server. Sends one big init message with the entire replicated server state.
bevy_replicon client: Despawn everything with a Replication component immediately before the server's first init message post-reconnect is applied. Reuse the previous client-server entity mapping when applying the first init message so that old entity ids are respawned.
Game server: Run custom client-connect logic. Send client-startup message to client assuming they restarted their app.
Game client: Run custom initialization logic. Special 'reconnect' logic may be needed. When the server's client-startup message arrives, handle it (the client's loading screen should probably wait for this).

Advantages

This is a somewhat 'clean' solution since client entities are recreated after a reconnect so old component data will be discarded.
There is no span of time where entities completely disappear.
Game clients don't need a lot of comprehensive logic for cleanup up after a disconnect, since entities are 'semi-persistent'. Entities that would normally be despawned by the server will have game client systems for handling them, and entities that are normally not despawned will still persist after the reconnect.

Drawbacks

Respawning entities means on-spawn/component-added systems will rerun after a reconnect. This means there are no truly persistent entities. Clients must make sure no bugs arise due to duplicated initialization systems that run for entities that are expected to be persistent (e.g. if you push persistent entity ids into a Vec on spawn, then on respawn you need to make sure not to have duplicates in your Vec).
Servers must send a large replication init message when a client connects, potentially duplicating data the client already has.

Option 4: Repair client on reconnect.

In this case we only despawn entities not replicated in the first init message received post-reconnect. Existing entities that are replicated again are preserved (other than component removals for components not found in the first init message).

On disconnect

bevy_replicon server: Remove client entry from ClientsInfo.
bevy_replicon client: Do nothing.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon server. Sends one big init message with the entire replicated server state.
bevy_replicon client: Apply the first server init message to the existing client state. Detect client entities with Replication that are not present in the init message and despawn them. Also detect, for still-alive client entities, components not present in the init message and remove them.
Game server: Run custom client-connect logic. Send client-startup message to client assuming they restarted their app.
Game client: Run custom initialization logic. Special 'reconnect' logic may be needed. When the server's client-startup message arrives, handle it (the client's loading screen should probably wait for this).

Advantages

This is a 'seamless' solution since the client treats the disconnect period like a large span of severe packet loss (at least as far as replicated state is concerned).
There is no span of time where entities completely disappear.
Entities may be fully persistent, which means once spawned they will never be despawned. This allows players to inject custom/complex components on persistent entities without needing to worry about their lifetimes.
Game clients don't need cleanup logic after a disconnect. Entities that would normally be despawned by the server will have game client systems for handling them, and entities that are normally not despawned will still persist after the reconnect.

Drawbacks

Since client entities are not despawned, there is no 'reset' within the replication system. This is a subtle point - during normal replication we get a stream of init messages, but on reconnect we get one big init message that compresses all other init messages. This means if our client logic is somehow tied up in the order of server init messages (e.g. entangled with component removals/insertions or visibility-related spawns/despawns), this won't have a direct analog in the reconnect logic (in the form of entity resets/respawns).
Servers must send a large replication init message when a client connects, potentially duplicating data the client already has.
This is relatively complex to implement and get right in the client.

Option 5: Realign client on reconnect.

In this case the client sends its entire replication state to the server on reconnect (entity ack map). The server uses that data to remake the client's ClientInfo, and then its first replication message will pick up where it left off before the disconnect.

On disconnect

bevy_replicon server: Remove client entry from ClientsInfo.
bevy_replicon client: Do nothing.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon server. Wait for client message with entity ack map. Creates new ClientInfo from received entity ack map. When producing the first init message post-reconnect, detect no-longer-spawned entities and send those as despawns. We must also send an empty slot for all existing components.
bevy_replicon client: Immediately send the pre-existing entity ack map. When handling the first init message received, detect component removals by comparing the 'component insertion list' (including empty slots) with the local entity's component set.
Game server: Run custom client-connect logic. Send client-startup message to client assuming they restarted their app.
Game client: Run custom initialization logic. Special 'reconnect' logic may be needed. When the server's client-startup message arrives, handle it (the client's loading screen should probably wait for this).

Advantages

Same as Option 4.
The server's initial replication init message is size-optimized and avoids needlessly duplicating data the client has.

Drawbacks

Same 'reset' issue as Option 4.
This is complicated and requires a round trip between client and server before replication can start on the server. It may or may not increase time-to-connect, depending on space savings in the init message (if the full size is much bigger than the reconnect-optimized size then packet loss will cause a relative delay).

Further comments/questions

In all cases, what is the best way to clean up pre-generated entities on the client if a disconnect happens before the server mapping is sent?

from bevy_replicon.

Shatur commented on May 27, 2024

Quite a nice writeup!
This inspired me to another idea.

Option 5: Treat reconnect as just a huge packet loss.

On disconnect

bevy_replicon server: move ClientInfo into a separate storage for some configurable time + additionally store the last sent tick from init message.
bevy_replicon client: Do nothing.
Game server: Clean up local entity trackers and do other custom logic to handle disconnects.
Game client: Run optional 'reinitializing' logic (e.g. display 'Reconnecting' message and block player inputs).

On reconnect

bevy_replicon client: include a special message that the client is trying to reconnect (if I recall correctly, Renet provides the ability to include user data into connection message).
bevy_replicon server: read the user message from the connected client and restore ClientInfo. Init messages will be send since the last received tick that we remembered. Update messages will be send as usual.

What do you think?

from bevy_replicon.

UkoeHB commented on May 27, 2024

move ClientInfo into a separate storage for some configurable time + additionally store the last sent tick from init message.

We don't have access to renet acks, so how do we know which replicon init message was last successfully sent?

Advantages

Unsent client mappings can be preserved/cached (assuming we get access to renet acks).
Server init messages post-reconnect are optimized without an excessively complex/expensive client/server round trip.

Disadvantages

The 'reset' issue of Options 4/5 still exists (unavoidable).
What happens when ClientInfo times out? Then we would probably fall back to option 2 or 3 (or 4?) with all the cleanup-related drawbacks. However it would give people the option to choose based on their use-case (e.g. use an infinite timeout for fixed-member-list games, use a low timeout for an MMO).
It's challenging to embed user data in renet connect tokens for this use-case, since tokens need to be generated by your backend and then piped all the way into the game client. You'd need extra logic in that pipeline to differentiate between 'restart' (fresh app) and 'reconnect' (same app) and include handling of the edge race condition where you start reconnecting then restart the game app. All of this isn't a big problem, but does increase the complexity of using renet/replicon in a production-grade product.

from bevy_replicon.

Shatur commented on May 27, 2024

We don't have access to renet acks, so how do we know which replicon init message was last successfully sent?

Right, but what if we include it into the user data when client tries to reconnect? Client knows his tick.

What happens when ClientInfo times out?

I would expect it to just drop it. The configurable timeout is something after which in games you usually go into main menu with an error. So I would let users handle it.

However it would give people the option to choose based on their use-case

Sure! And maybe the API for manual cleanup.

We could do this:

Yes, this is how I would imagine it.

Except

If the timer expires, the 'I connected' message is received, or the info didn't exist, then make a new info and replicate from scratch. If 'I reconnected' is received then reuse the existing info.

I would expect the server to disconnect the client on reconnect attempt in case of the timeout. I would prefer reconnect API to be explicit.

from bevy_replicon.

UkoeHB commented on May 27, 2024

I would expect the server to disconnect the client on reconnect attempt in case of the timeout. I would prefer reconnect API to be explicit.

The client app should never completely break just because the server dropped ClientInfo. That is why I said "The client always uses Option 4.". Whether the server replicates everything or just 'catches up', the client state will be the same. The only real question is how the client should handle pre-mapped entities in the case where the server replicates everything (so mappings are lost).

EDIT: I may have misunderstood you. Are you suggesting to disconnect the client just from the race condition between timing out the client info, and timing out the 'I connected' message? I.e. you won't disconnect a client that reconnects after ClientInfo is discarded, but you will discard them if they connect with a ClientInfo still present and fail to send 'I connected' in a timely manner? I don't see the value of this.

from bevy_replicon.

Shatur commented on May 27, 2024

I.e. you won't disconnect a client that reconnects after ClientInfo is discarded, but you will discard them if they connect with a ClientInfo still present and fail to send 'I connected' in a timely manner?

No, no, the opposite:

Client reconnected before the timeout - continue playing.
Client reconnected after the timeout - disconnect, require users to write their app logic to explicitly connect to the server from scratch.

The client app should never completely break just because the server dropped ClientInfo. That is why I said "The client always uses Option 4.".

I think we are on the same page then.
What I suggesting is quite similar to Option 4, but solves the disadvantage of sending a big message. The only difference is the suggested API of not removing ClientInfo right away to send less data on reconnect (and maybe simplify the logic a little bit since after reconnect it will work as a huge packet loss).

The only real question is how the client should handle pre-mapped entities in the case where the server replicates everything (so mappings are lost).

Maybe similar to the tick suggestion, if client didn't receive them - include into the user data on reconnect?

from bevy_replicon.

UkoeHB commented on May 27, 2024

Client reconnected after the timeout - disconnect, require users to write their logic to explicitly connect to the server from scratch.

I don't follow. I am suggesting two timeouts, one timeout for ClientInfo (for server cleanup post-disconnect) and another timeout for waiting for the 'I connected/reconnected' client message if the ClientInfo has been cached (to avoid getting stuck waiting for the message to arrive). The server will not know if a client is connected/reconnected if the first timeout has expired, since the server doesn't have any reason to wait for the message in that case (e.g. it would make initial game startup slower), so it will always replicate from scratch. It would be inconsistent to only disconnect the client when the second timeout expires.

To be clear, I am advocating for a startup message over the reliable channel, instead of using renet connect token user data which is a mess.

Maybe similar to the tick suggestion, if client didn't receive them - include into the user data on reconnect?

The user data is fixed-length, it would not work.

from bevy_replicon.

UkoeHB commented on May 27, 2024

Your suggestion with using reliable message also solves this problem, we can include any data into the "I reconnected" message.

Even this doesn't work, because the replicon client doesn't track its own pre-generated entities. We currently assume they are sent to the server by custom client events, and then the server sends them back in entity mappings. If ClientInfo is preserved then we can resend old mappings (that haven't been despawned), but if not then the mappings are completely lost.

What if we rework the mapping system: instead of registering mappings with replicon, insert mappings as components onto entities (with client id target). Then replicon detects those components during server replication (only send the mapping to the targeted client), and they will always be resent on reconnect (and if the mapping doesn't exist on the client due to a restart or other issue, the client can just spawn a new entity).

This is easy with the new archetype caching. We just need to flag archetypes that contain the ClientMapped component.

There is one edge condition this doesn't solve: if a mapping fails to send from client to server due to a disconnect, then the entity may 'leak' on the client if the client assumes the mapping succeeded (e.g. for long-lived pre-generated entities; short-lived ones should be expected to have auto-cleanup systems for if they live too long). If we try to preserve mappings across a reconnect, then users can't both use that mapping-preservation and clean up entities that are missing, because users won't actually know which entities are missing (it's all internal to replicon).

They could simply despawn all pre-generated entities on disconnect and let them be respawned normally, but then preserving mappings would be pointless (and there is a disconnect-reconnect gap, and what about pre-generated entities that are spawned during the disconnect-reconnect period?).

To fully solve it, users should add a Premapped component to their client entities. Then in the tick where the renet client reconnects, collect all the premapped entity ids. The first init message received from the server post-reconnect should contain all mappings. Then the client can compare the init mappings with the collected premapped entities. Excess premapped entities can be despawned (we assume the mappings failed to send).

There is one race condition where this can fail: if a premapped entity is generated between renet Receive and replicon Receive and a message is sent directly to renet, then it is possible for that premapped entity to be despawned by the replicon cleanup logic even though the mapping successfully sent to the server (normally renet will drop messages sent to disconnected clients, so pre-connect mappings can be safely cleaned up using the synchronization mechanism I am proposing, but there is a window where renet is connected before replicon Receive runs).
To enforce this we'd want to emit an error (or panic) on the client if a mapping is received from the server and the mapped client entity exists but it doesn't have the Premapped component.

from bevy_replicon.

UkoeHB commented on May 27, 2024

Ok here is a roadmap:

Clients send ConnectType::{Connect, Reconnect} reliable message when connecting. The Reconnect variant reports the last-acked server init message (and maybe a Vec of the last-acked update_indexs? e.g. the last 20 or so).
Server caches ClientInfo on a timer, if cached when a client reconnects then wait for ConnectType message from client (don't wait on a timer, that just makes things more complicated) (when Reconnect is received, handle the init tick + update_index array), otherwise make a new ClientInfo. Client does Option 4 world repair on the first tick post-reconnect. The client should no longer reset their replicon state on disconnect.
Refactor pre-mapped entities: use ClientMapped and Premapped components, client despawns Premapped entities missing on connect. If server receives Reconnect then send all server mappings, otherwise send none, in the first init message post-connect.

from bevy_replicon.

Shatur commented on May 27, 2024

Refactor pre-mapped entities: use ClientMapped and Premapped components,

But mappings should be processed first, before other data.

from bevy_replicon.

UkoeHB commented on May 27, 2024

But mappings should be processed first, before other data.

Right, ClientMapped would not be a replicated component. We would just refactor collect_mappings() to use entity data instead of ClientEntityMap.

from bevy_replicon.

Shatur commented on May 27, 2024

After some thinking, I prefer not to provide any automation and let users handle this themselves by providing a label to register systems before retrieving the world.
Yes, the user will need to write restart logic, but that's what most games do, even single-player ones. It's not that hard, my game has it. The implementation of a feature to allow the user to have persistent entities is complex (4-5), and automatic despawn is too implicit (1-3).

So I would suggest that you first try to implement a reset in your game. If it turns out to be too difficult, discuss it again. It's easier to undo adding a label than a feature that complicates the library's internal logic. I would also prefer to have rooms if we decide to reiterate on it to see the whole picture.

Changing from "bug" to "enhancement".

from bevy_replicon.

UkoeHB commented on May 27, 2024

I have a better solution that is less intrusive. We can define a client plugin RepliconClientRepairPlugin with the following behavior:

Disable the default reset() system.
Run a cleanup system after replication_receiving_system() (only run once after the first init message post-reconnect):

Detect first init message by looking for when RepliconTick resource is mutated.
Iterate client entity map, despawn entities with old acked replicon tick + remove from map.
Iterate Premapped entities, despawn if not in client entity map and older than when we connected to the renet server (this requires additional tracking, e.g. a system that runs right before renet's update_system()).
Iterate components of still-alive entities, use custom fn to detect removed components and remove them from the entities. This is the tricky part, because user-defined deserializers can do anything, not just remove components. Users will still need their own systems to handle complicated cleanup scenarios (I can't imagine any use-cases for this, but you never know).

The only thing absolutely needed from bevy_replicon is refactoring entity pre-mapping to use ClientMapped components instead of ClientEntityMap, so mappings will be re-replicated after a reconnect. We don't actually need Premapped components in bevy_replicon, they can be mandated as part of the RepliconClientRepairPlugin API.

from bevy_replicon.

Shatur commented on May 27, 2024

I like the proposed solution!
But let me think a bit more about pre-mapping, maybe we can somehow keep the current system. Not entirely sold on component-based API. If I don't come up with anything, then we will do as you suggested.

from bevy_replicon.

Client reconnect cleanup about bevy_replicon HOT 19 CLOSED

Comments (19)

Options

Option 1: Reset client on disconnect.

Option 2: Reset client on reconnect.

Option 3: Respawn client entities on reconnect.

Option 4: Repair client on reconnect.

Option 5: Realign client on reconnect.

Further comments/questions

Option 5: Treat reconnect as just a huge packet loss.

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs