This is more of a discussion topic then issue, but discussions are disabled, so here w

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There a lot of other caveats, for instance writable CTE, e.g.: <div class="snippet

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

[RFD] Query-based routing. about pgcat HOT 9 CLOSED

mskarbek commented on July 28, 2024

[RFD] Query-based routing.

from pgcat.

Comments (9)

ben-pr-p commented on July 28, 2024 2

I think the way AWS's pg-bouncer-rr approaches it is pretty ideal: https://github.com/awslabs/pgbouncer-rr-patch

The only issue with their approach is that you can't access the parameter values: awslabs/pgbouncer-fast-switchover#47

Given that this is a Rust project and there is really good support for Rust / Javascript / Wasm interfaces from Deno, Wasmer, etc., I think it'd be most useful to be able to write your route_query(username, query, parameters) function in Javascript / wasm

I just discovered this project today from Postgres Weekly, but we've been considering patching pgbouncer-rr to fix the parameter issue because it solves such an important business use case for us, and so if there was anything we could do to help this along we'd love to!

from pgcat.

Faridalim commented on July 28, 2024 2

Hi, awesome project. I think user choosing database is the easiest and clear implementation. Or maybe user need to add info on the query as "Read" or "Write" queries using SET

from pgcat.

levkk commented on July 28, 2024 1

Hey! Yes absolutely, that's a great suggestion, I've been thinking about it as well.

There are caveats and the implementation has to be pretty careful. I've seen other poolers, e.g. Makara, attempt to parse the SQL and figure out if it's a SELECT (read-only) or something else, e.g. UPDATE, INSERT, etc. That approach in practice is okay, up until you hit something like this:

SELECT some_mutating_function();

This will get routed to a replica and we'll get an error back about the database being read-only.

An example where they don't attempt this at all is Rails 6, where they just make it explicit: the user has to pick a database for each query, since they are the best person to know if they are read-only or not.

Another use case to consider is replica lag. Sometimes the client really wants the most fresh information, so they talk to the primary explicitly, even with a SELECT query. We should allow them this functionality, perhaps through some extended SQL syntax, e.g. SET CONNECTION TO 'primary'; or something like that.

These are corner cases though. I think in practice, we can come up with an approach that works 99% of the time and it can be customizable too.

from pgcat.

mskarbek commented on July 28, 2024 1

I'm perfectly aware that this is not a trivial functionality. The ideal situation would be to require from developer separation on the application level, but that is not always possible. One way to approach this currently would be to monitor all application queries, verify edge cases, and hand pick a list for pgBouncer query routing. This is not ideal but serves its purpose in some number of cases and helps to deal with legacy applications. The question is how to improve upon that in a safe, reliable way, and is it worth the effort?

from pgcat.

levkk commented on July 28, 2024 1

@mskarbek I've added a basic query parser using sqlparser (crate). It's more advanced than a simple regex, because it'll be able to detect contexts like WITH ..., but it still may give false negatives for mutative SELECTs. That being said, this should cover 99% of use cases. It's disabled by default, just to make sure the caveats of using it are considered.

You could use it like so:

SET SERVER ROLE TO 'auto';

-- All queries that follow will be automatically routed to primary or replicas automatically.

or enable it in the config:

query_parser_enabled = true

from pgcat.

rjuju commented on July 28, 2024

There a lot of other caveats, for instance writable CTE, e.g.:

WITH del AS (DELETE FROM ...) SELECT 1;

You can even have the writable part in the middle of other read only ones.

Or more common: SELECT ... FOR UPDATE.

In practice, trying to discover whether a query can be sent to a read-only replica or not (if indeed you can send any read only query on a replica) requires to parse the query, and on most OLTP workload it adds enough cost to make it slower than simply sending everything on the primary node.

from pgcat.

ben-pr-p commented on July 28, 2024

@levkk unfortunately the solution you're implementing in #5 does not meet our needs, as we'd be using pgcat to help us manage different tenants on different databases.

Would you be interested / accept a PR along the lines I described above that would enable user supplied logic for routing?

from pgcat.

levkk commented on July 28, 2024

Hey @ben-pr-p , thanks for taking a look! #5 isn't quite done yet, but I see what you're saying. My plan was to start with explicit query routing and then slowly graduate to automatic query routing based on a regex or maybe something more complex.

That being said, if you'd like to give this a shot, I'd always welcome a PR :)

Thanks!

p.s. I think the closest analog to what you're looking for in the current implementation is automatic shard selection. Since you are multi-tenant, each tenant would be equivalent to a shard in the current config. Currently, we use Postgres' hash function to select a shard, but there is no reason not to add another function, e.g. a query parser. I think furthermore, assuming you have replicas for each of your tenants, we still would want primary/replica selection and load balancing there.

from pgcat.

davidfetter commented on July 28, 2024

The only way you can actually know whether a query is actually going to cause a write is to plan it and look for nodes that write. There isn't some kind of clever inspection of the query text that can actually provide this information because the information may be in, for example, a rewrite rule which exists only on the DB side.

This is totally worth doing, but the cost is a round trip that grabs the plan tree and maybe some caching of query write-ness results. Of course, caching brings in one of the Two Hard Problems of Computer Science: Cache Invalidation. The other two are Naming Things and Off-By-One.

Anyhow, the point here is that if you attempt to be save labor by doing anything but ask the DB in question whether a write will occur, you will quickly find yourself in a tar pit.

from pgcat.

[RFD] Query-based routing. about pgcat HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs