GithubHelp home page GithubHelp logo

RFC: Artifact storage options about plrust HOT 10 CLOSED

tcdi avatar tcdi commented on August 22, 2024
RFC: Artifact storage options

from plrust.

Comments (10)

JohnHVancouver avatar JohnHVancouver commented on August 22, 2024 1

I think another benefit of having objects in the database is that you have nicer transaction semantics, which I believe is currently an issue?

Transaction 0 w/ xid 99
CREATE OR REPLACE FUNC foo()
...



Transaction 1 w/ xid 100                         Transaction 2 w/ xid 150
BEGIN;                                                   BEGIN;
...                                                      CREATE OR REPLACE FUNC FOO() (pl/rust function)
....                                                        commit;
.....
..
CALL foo()
COMMIT;

If I understand correctly, with the current filesystem approach, the foo call that Transaction 1 w/ xid 100 will get transaction 2 with xid 150's foo() definition instead of the one at 99

from plrust.

JohnHVancouver avatar JohnHVancouver commented on August 22, 2024

I'm curious how the workflow of this would work if we stored it in the plrust.artifacts table.
When the user calls create function with PL/Rust, then:

  1. We setup the cargo create the same way and get a .so
  2. Store the .so as a bytea into plrust.artifacts

Then when they execute the function, would we need to:

  1. Query plrust.artifacts to get the data
  2. Write the data as .so onto disk and then call it?

@eeeebbbbrrrr mentioned one of the benefits would be to avoid recompiling the code on replicas of the same architecture, which makes sense.

Is there some mechanism to optimize the look-up/write-to-disk away? I guess you could do the look-up for each PL/Rust function on a per-transaction basis...

Even with wasm, assuming that's the direction we're moving towards and we store the wasm in pg_proc.prosrc as suggested by Eric, I think you still need to write the .wasm out to disk to be added as a module in order to load/execute it?
We could store it as a .wat instead and avoid the write-to-disk and having to handle race conditions/formatting there but that doesn't seem to perform as nicely.

I did a quick benchmark where I had a plrust.function executing a .wasm, vs a .wat on the filesystem that I converted it to with wasm2wat (https://github.com/WebAssembly/wabt). My PL/Rust function just returned a number and I got around 0.194ms with .wasm compared to 0.265ms with .wat

cargo pgx stop
cargo pgx start
cargo pgx connect
cat /dev/null > 14.log

set log_duration = 'on'
select return_int(); \watch 0.0001

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | paste -s -d+ - | bc
2376.364ms

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | wc -l              
 12231

---> 0.194 ms on average with .wasm

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | paste -s -d+ - | bc
4450.141

hsuchen@88665a3712cb ~/.pgx % grep -oE "0\.[0-9]{3}" 14.log | wc -l  
   16796

--> 0.265ms with .wat

from plrust.

Hoverbear avatar Hoverbear commented on August 22, 2024

Noting that I'm not sure if writing to disk is required for wasmtime. I think I saw API to use some existing memory.

from plrust.

JohnHVancouver avatar JohnHVancouver commented on August 22, 2024

Do you mean running via .wat files directly in mem like here?
https://docs.wasmtime.dev/wasm-wat.html

from plrust.

Hoverbear avatar Hoverbear commented on August 22, 2024

This one: https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.from_binary

from plrust.

JohnHVancouver avatar JohnHVancouver commented on August 22, 2024

Bringing this back, I'm curious where folks are thinking about this. Are we still planning on storing it as a bytea?
I'm not sure if you can get the same safety semantics that we're aiming for if you store it in a regular table since ISTM an arbitrary user could insert their own bytea into it.

If we're still planning on storing artifacts on disk and noting the transaction semantics as a gap (and CREATE DATABASE w/ template1) for PL/Rust, we should include the databaseOid in the function name at the very least since Oids aren't unique and we can have collisions today.

When we say system_catalog are we referencing an actual table in pg_catalog? Or just a table that plrust attempts to manage

from plrust.

workingjubilee avatar workingjubilee commented on August 22, 2024

Bringing this back, I'm curious where folks are thinking about this. Are we still planning on storing it as a bytea?
I'm not sure if you can get the same safety semantics that we're aiming for if you store it in a regular table since ISTM an arbitrary user could insert their own bytea into it.

Isn't it already the case that a user can overwrite the function using CREATE OR REPLACE if they have USAGE privileges for the language, the types, and ownership of that function? Surely a much lower bar, also. It seems unlikely to me that Postgres does not offer the ability to correctly specify the ACLs required to allow the language handler to manage its own semi-secure tables which exactly match the same policy, or somewhat stricter, but if I am wrong then I suppose I am wrong.

from plrust.

JohnHVancouver avatar JohnHVancouver commented on August 22, 2024

You can replace the function, but if you do it through CREATE/REPLACE it would still go through the PL/Rust handler and be compiled as safe via postgrestd.

imo from a "trusted" language perspective, the PL/Rust handler shouldn't execute unsafe code. As a user I could update an existing binary blob in the table with something else that is unsafe and presumably it would still execute? (ofc someone could always change what's on disk and you have to draw the line somewhere)

from plrust.

workingjubilee avatar workingjubilee commented on August 22, 2024

Hmm. I suppose that's true.

It's somewhat annoying: as I learn more about the function interface, I become more confident that Postgres could much more easily add the functionality necessary to understand "this function must be compiled" directly, but it's harder to back-hack it in.

from plrust.

eeeebbbbrrrr avatar eeeebbbbrrrr commented on August 22, 2024

It's been awhile but we've decided to store artifacts for each target architecture in the pg_proc.prosrc column in a custom json format.

from plrust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.