Right now PL/Rust keeps produced shared objects on the local filesystem in the directo

RFC: Artifact storage options about plrust HOT 10 CLOSED

tcdi commented on August 22, 2024

RFC: Artifact storage options

from plrust.

Comments (10)

JohnHVancouver commented on August 22, 2024 1

I think another benefit of having objects in the database is that you have nicer transaction semantics, which I believe is currently an issue?

Transaction 0 w/ xid 99
CREATE OR REPLACE FUNC foo()
...



Transaction 1 w/ xid 100                         Transaction 2 w/ xid 150
BEGIN;                                                   BEGIN;
...                                                      CREATE OR REPLACE FUNC FOO() (pl/rust function)
....                                                        commit;
.....
..
CALL foo()
COMMIT;

If I understand correctly, with the current filesystem approach, the foo call that Transaction 1 w/ xid 100 will get transaction 2 with xid 150's foo() definition instead of the one at 99

from plrust.

JohnHVancouver commented on August 22, 2024

I'm curious how the workflow of this would work if we stored it in the plrust.artifacts table.
When the user calls create function with PL/Rust, then:

We setup the cargo create the same way and get a .so
Store the .so as a bytea into plrust.artifacts

Then when they execute the function, would we need to:

Query plrust.artifacts to get the data
Write the data as .so onto disk and then call it?

@eeeebbbbrrrr mentioned one of the benefits would be to avoid recompiling the code on replicas of the same architecture, which makes sense.

Is there some mechanism to optimize the look-up/write-to-disk away? I guess you could do the look-up for each PL/Rust function on a per-transaction basis...

Even with wasm, assuming that's the direction we're moving towards and we store the wasm in pg_proc.prosrc as suggested by Eric, I think you still need to write the .wasm out to disk to be added as a module in order to load/execute it?
We could store it as a .wat instead and avoid the write-to-disk and having to handle race conditions/formatting there but that doesn't seem to perform as nicely.

I did a quick benchmark where I had a plrust.function executing a .wasm, vs a .wat on the filesystem that I converted it to with wasm2wat (https://github.com/WebAssembly/wabt). My PL/Rust function just returned a number and I got around 0.194ms with .wasm compared to 0.265ms with .wat

cargo pgx stop
cargo pgx start
cargo pgx connect
cat /dev/null > 14.log

set log_duration = 'on'
select return_int(); \watch 0.0001

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | paste -s -d+ - | bc
2376.364ms

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | wc -l              
 12231

---> 0.194 ms on average with .wasm

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | paste -s -d+ - | bc
4450.141

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | wc -l  
   16796

--> 0.265ms with .wat

from plrust.

Hoverbear commented on August 22, 2024

Noting that I'm not sure if writing to disk is required for wasmtime. I think I saw API to use some existing memory.

from plrust.

JohnHVancouver commented on August 22, 2024

Do you mean running via .wat files directly in mem like here?
https://docs.wasmtime.dev/wasm-wat.html

from plrust.

Hoverbear commented on August 22, 2024

This one: https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.from_binary

from plrust.

JohnHVancouver commented on August 22, 2024

Bringing this back, I'm curious where folks are thinking about this. Are we still planning on storing it as a bytea?
I'm not sure if you can get the same safety semantics that we're aiming for if you store it in a regular table since ISTM an arbitrary user could insert their own bytea into it.

If we're still planning on storing artifacts on disk and noting the transaction semantics as a gap (and CREATE DATABASE w/ template1) for PL/Rust, we should include the databaseOid in the function name at the very least since Oids aren't unique and we can have collisions today.

When we say system_catalog are we referencing an actual table in pg_catalog? Or just a table that plrust attempts to manage

from plrust.

workingjubilee commented on August 22, 2024

Bringing this back, I'm curious where folks are thinking about this. Are we still planning on storing it as a bytea?
I'm not sure if you can get the same safety semantics that we're aiming for if you store it in a regular table since ISTM an arbitrary user could insert their own bytea into it.

Isn't it already the case that a user can overwrite the function using CREATE OR REPLACE if they have USAGE privileges for the language, the types, and ownership of that function? Surely a much lower bar, also. It seems unlikely to me that Postgres does not offer the ability to correctly specify the ACLs required to allow the language handler to manage its own semi-secure tables which exactly match the same policy, or somewhat stricter, but if I am wrong then I suppose I am wrong.

from plrust.

JohnHVancouver commented on August 22, 2024

You can replace the function, but if you do it through CREATE/REPLACE it would still go through the PL/Rust handler and be compiled as safe via postgrestd.

imo from a "trusted" language perspective, the PL/Rust handler shouldn't execute unsafe code. As a user I could update an existing binary blob in the table with something else that is unsafe and presumably it would still execute? (ofc someone could always change what's on disk and you have to draw the line somewhere)

from plrust.

workingjubilee commented on August 22, 2024

Hmm. I suppose that's true.

It's somewhat annoying: as I learn more about the function interface, I become more confident that Postgres could much more easily add the functionality necessary to understand "this function must be compiled" directly, but it's harder to back-hack it in.

from plrust.

eeeebbbbrrrr commented on August 22, 2024

It's been awhile but we've decided to store artifacts for each target architecture in the pg_proc.prosrc column in a custom json format.

from plrust.

RFC: Artifact storage options about plrust HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs