cfc-servers / gm_express Goto Github PK

View Code? Open in Web Editor NEW

62.0 4.0 4.0 206 KB

An unlimited, out-of-band, bi-directional networking library for Garry's Mod

Home Page: https://gmod.express

License: GNU General Public License v3.0

Lua 100.00%

garrysmod garrysmod-addon gmod gmod-addon gmod-lua gmodlua lua network networking

gm_express's People

Contributors

Stargazers

Watchers

Forkers

starlight-oliver pulsar-dev blackra1nfr 8char

gm_express's Issues

How to make the addon more robust to fatal API issues?

The web is the web. Stuff happens.

Sometimes numbskulls like me accidentally push breaking changes, sometimes Cloudflare has outages, sometimes players have firewalls... etc.

So what is Express supposed to do when it can't reach the API? Should it fallback to regular net messages, effectively acting like (or maybe directly using) Netstream? This would complicate the code a bit, and would probably create some annoying regression issues, but it would make messages sent with Express much more reliable.

And what if only one party has issues connecting to the API? Would the Netstream solution work in this case too?

Investigate concerns of timeouts for clients with slow internet connections

As reported by another developer who implemented a similar system on their server, clients with very slow upload speeds (~1mb/s) could experience timeouts when uploading data to Express.

They suggested that splitting the data into 500kb chunks worked for them. I want to experiment to see if we can find another solution that won't require too much data manipulation.

First, we could try setting the default timeout value on all of our HTTP requests to 2 or 3 minutes (it's 60s by default). We'd have to see what the ramifications of that would be. For example, how many pending HTTP requests can be active at the same time?

Default GMod uses HTTP/1.1, so something fancy like HTTP/2 streaming is out of the question.

Create an alert or avenue to run code if a recipient responded with incorrect proof

Right now, if the recipient sends back an incorrect proof, it would be a no-op because the callback wouldn't be found in the awaitingProof table (ref: https://github.com/CFC-Servers/gm_express/blob/main/lua/gm_express/sh_init.lua#L131-L141)

We should have some way of erroring or alerting of this case at minimum, but letting the sender define an error handling function would also be nice.

Changing the express_domain convar after a failed version check doesn't appear to re-check/re-register

Reproduce:

Set the express_domain to something that resolves but doesn't respond, like loopback.cfcservers.org:12345
Start the server
Observe failed revision checks
Set express_domain to gmod.express
Observe literally nothing happen 😔

After changing the convar, Express should basically re-initialize, attempting to check the version and re-register again

Give the server more control over how clients interact with the API

Right now, the server retrieves two tokens on startup; one for itself, and one for all clients.
This client token can be used to write or retrieve data for as long as it's in the KV store.

This creates an avenue for attack wherein a malicious client could use their key to upload any amount of data at any rate.

One low-hanging improvement: Have the server issue clients short-term (or even one-time) tokens. The client code would passively manage the tokens, asking for a new one when it needs one.

Another improvement: force the clients to inform the server of any would-be uploads before they happen. Clients could send the server the hash and size of their data, the server would respond with a specially crafted token (maybe a jwt?) to use for that specific upload. The API could then validate that the upload was entirely correct (size and hash match), responding with an error and maybe short-term API timeout.

Doing transactions in this way would require more overhead, but the server could, for any reason, refuse to grant a token for the proposed upload. This gives a lot of flexibility to server owners for, realistically, a minor performance hit.

Disable Express' automatic registration/version checking in a testing environment

When running our GLuaTest suite in Github's Actions, Express still reaches out to the default domain and does a revision check and registration.

This is unnecessary for our use case, so we should tell Express not to bother when it's in such an environment.

There may be a convar we can check (or create) that would indicate it's in a test environment. A little bit of discovery is necessary for this.

Allow setting the request protocol per-realm

Perhaps the Server wants to communicate via http but make the clients use https, or whatever else.

We should just make a couple of new convars and use them as-needed.

Add a backup for K/V

Cloudflare's K/V has a major issue for Express: it doesn't guarantee that the value you store will be accessible in every region immediately.

For Express, this means the server could upload a payload and only some clients could simply not see it. Of course the issue happens in reverse, too; a client could upload a payload that the server couldn't see if they were in different regions.

K/V is super fast, faster than the alternatives, so I'd like to keep using it. I think the best plan is to also use Cloudflare's D1 when it reaches general availability as a backup.

So for every request, we'd now store the data in both KV and D1 and for every retrieval, check KV first and then check D1 if the ID wasn't found.

Or, if D1 latency gets low enough, we could just use D1 entirely 🤷

Lots of measurements and testing to do.
A few open questions:

How does D1 latency compare to KV?
How much extra latency is added by storing in both KV and D1?
What does D1 pricing look like?

How to handle proofs when sending many of the same messages?

Here's the bit of code in question:
https://github.com/CFC-Servers/gm_express/blob/main/lua/gm_express/sh_init.lua#L131-L141

When a message is sent, it creates a new entry in the express._awaitingProof table, using the hash of the data (prefixed with the recipient's Steam ID, if called serverside) and then removes the entry from the table when proof is received.

But what should Express do if the same message with the same data is sent multiple times in a short timespan?
I suppose the expected behavior would be to get a callback for each message sent, but right now it'd only run the callback once (the first run would remove it from the callbacks table).

Perhaps we could make an incrementing transactionID that would get automatically sent and incremented with each message, and then use that number in the key for express._awaitingProof. Then, the recipient would reply with same transactionID we sent them, and we'd use that to run the correct callback.

We could implement this in transparently so the user doesn't have to worry about it, but I worry this could create a maybe-exploit where a malicious actor could reply with a different transactionID, potentially running the wrong callback. Granted, it would still be prefixed with their SteamID, so they'd only be running a callback we already expected them to run.... I dunno.

Just a braindump for now, will revisit when some of the more pressing tasks have been completed.

Back-end

Hello. Do you have plans to implement JS back-end not only through CloudFlare? For me personally, and I think for many users of your library, it would be much more convenient to host these files on the same VDS.

Async for pon encode/decode and util compress/decompress?

With especially large data structures, just the process of preparing the data to send or read can be massively taxing on the server.

Because we already operate on callbacks, we could run these processes asynchronously to spread the work out over multiple ticks, easing any massive spikes that might occur.

Create proper dev documentation for setting up a custom service

This probably belongs in the https://github.com/CFC-Servers/gm_express_service repo, but seeing as this is the most visible repo I'll put things here.

I'll need to make a new Cloudflare account and go through the process of setting this up so I can optimize and document the process.

Create a GLuaTest suite to prevent pushing oopsies to main :hand_over_mouth:

A simple test suite to verify that functions just like, run properly would be a good place to start.

ref: https://github.com/CFC-Servers/GLuaTest

Allow another argument to .Send that handles timeouts or errors

To do this, we have to make proof sending mandatory, with optional callbacks for proof receiving.

Then, we can send a success bool on the proof message to indicate errors or timeouts.

This will let the sender handle cases where the recipient never received their message.

Recipients randomly receiving 404's on GET

This is a really annoying issue that, according to Cloudflare's docs, shouldn't be happening.

The Problem

Cloudflare KV docs state:

When you write to KV, your data is written to central data stores. It is not sent automatically to every location’s cache

Initial reads from a location do not have a cached value. The data must be read from the nearest central data store, resulting in a slower response.

So what they're saying here is that KV achieves low latency by caching KV lookups. On the first read from a given location, it'll get a cache MISS and have to traverse the Cloudflare nodes to find the actual value, and then it'll cache that value to the location where the lookup occurred.

This is fine. Good, even - we can deal with that added latency for a single client.

A quick refresher on how Express works:
Our current flow is something like this:

Sender uploads their data to Cloudflare
Express Service generates a UUID and uses it as the key for the Data while saving to KV
Express Service replies with 200 and the UUID
Sender sends a net message to the recipient containing the UUID
Recipient receives the net message, reads the UUID, and makes a GET to Express Service for that UUID
Express Service asks KV for that UUID, returns the Data to the Recipient

However, what we're actually seeing, is the Recipient asks Express Service for the UUID, and gets a 404!

How does that work? It's not possible for it to actually be a 404 at this point because the data definitely exists in KV. We wouldn't have the UUID unless it were already stored!

The worst part about this bug is that this 404 is actually cached for that location! Meaning anyone else trying to read the Data for that UUID just gets a cached 404 🤦‍♂️

The Solutions

Assuming this isn't some obscure bug with the Express Service (I suppose it's possible), there are a few different ways to tackle this.

1. Have clients retry for the data
(Proposed in #34)
We could just make the client ask for the data over and over until the cache busts? 😅

2. Have the sender wait a brief moment before sending the net message with the data's UUID
(Proposed in #34)
If it really was a timing issue, perhaps delaying the net message by even a few fractions of a second could increase the chance of a successful first GET. This would be a very minor delay, but it would increase the minimum time of every message by that much.

3. Create a comprehensive cross-region, bare-bones example demonstrating this as a Cloudflare bug and ask Cloudflare to fix it
This should probably be done, if at least just to confirm that the bug is Cloudflare's like I'm describing it... But man, that's a lot of work.

cfc-servers / gm_express Goto Github PK

gm_express's People

Contributors

Stargazers

Watchers

Forkers

gm_express's Issues

The Problem

The Solutions

Recommend Projects

Recommend Topics

Recommend Org

Jobs