Comments (6)
FFI vs. lj_ircall.h
: They're mostly the same thing: a higher level language to generate an IR call to something else. After that, the optimizers don't care where that came from. There are three parts on the problem:
- arguments. the FFI uses runtime ctype (as observed on the trace) to emit transformations and
CARG
instructions;lj_ircall.h
declares just the number of arguments, the user (a recording function in C) has to specify theTRef
s already emitted in the appropriate types. - call variety (normal, load/store/both barrier). FFI assumes XS,
lj_ircall.h
specifies the exact variety. This allows the optimizer greater code mobility (or even eliminate N calls), but I think it's a heavy responsibility for most non-hardcore developers. The vast majority of external functions would always be XS anyway. There's also a flag to use theFASTCALL
convention instead of 'normal' C. - return argument. Again, the FFI emits extra transformations as specified by the
ctype
, and immediately by any assignment (hopefully those transformations could be fused/folded in many cases).lj_ircall.h
limits to theIRType
declaration of the call itself.
FFI vs Lua C API: these are far more different (after all they're two different designs by different people with different goals). The reason why L
isn't exposed to the FFI is because there's no guarantee that it's in a consistent state. I guess at the very least there would have to be a snapshot+replay just before FFI-calling any lua_xxx()
function. Probably would result in something very similar to a thread stitch; at best it would avoid starting the interpreter, instead falling into an execution environment similar to external C, but maybe running some specific mcode that just does the FFI call to the exposed symbol.
from raptorjit.
Relevant example is VMProfile on LuaJIT/LuaJIT#290. The code is currently an awkward combination of LJLIB_CF
for accessing the lua_State
and FFI for cutting through boilerplate and passing an FFI pointer from Lua. How should that code be written in LuaJIT? How should it be written in RaptorJIT?
from raptorjit.
AIUI:
LJLIB_xxx
(_CF
,_ASM
,_LUA
) implement the standard library._CF
are implemented in C and are callable from Lua via the traditional API._LUA
are compiled into bytecode stored as C chunks of bytes in the executable. The interpreter can execute them directly, the tracer trace them normally and produce IR and mcode as expected. Some limitations (I think can't use most other built in functions)._ASM
are just stubs for the build scripts, and implemented in assembler. Not sure about the calling conventions.
LUA_API
implement the classic LUA C API. Called from C code.LJ_FASTCALL
doesn't mark a "type" of function, it's a hint to compiler to use more registers and less stack, so it has less overhead. Only useful for x86 and don't know what the tradeoffs are. It's used on some functions used "directly" from compiled code, and also for some "normal" C code in time-sensitive places.IR_CALL_{N,L,S}
'internal functions' called from the IR, compiled into a very tight mcode call. The call format must be predefined inlj_ircall.h
for each internal function.- FFI generate
CALL_XS
IR opcodes using the definition from the cdef instead oflj_ircall.h
. No way to declare the equivalent to N/L/S optimization options, so they're always full barriers.
from raptorjit.
@javierguerragiraldez Thank you very much for the detailed explanation! I really appreciate being able to discuss these things in a social context rather than only contemplating the code in silence :). Meditating over source code until reaching enlightenment is great for personal development, but for community development not so much :-)
Anyway!
I have removed the LJ_FASTCALL
macro with #19. Thanks for pointing out that this was an i386 feature!
Question: Could the FFI replace some or all of these other mechanisms?
For example, if the FFI were extended with load/store/xstore barrier declarations could we rewrite lj_ircall.h
as an ffi.cdef()
? (Would this be a win?)
For another example, if the FFI were extended to allow access to lua_State
then could it replace the classic Lua C API for some use cases e.g. VMProfile? Maybe we could just store the current lua_State
in a thread-local variable and provide a way to access that from FFI routines? (Would there be other implications e.g. semantic differences between FFI and C API optimization barriers?)
Also - really another topic - but I wonder whether simple extensions to the FFI could provide most of the benefits of intrinsics as in LuaJIT/LuaJIT#116. I would quite like intrinsics for executing simple instructions like RDTSCP
and PREFETCHNTA
and question on my mind is whether FFI is good enough for this use case in practice and if not then what is the problem (the branch to a subroutine? the stack/register operations for the calling convention? the IR optimization barrier? appropriate solution will depend on which one(s) are actually a practical problem. But I digress...)
I am looking at the load/store barrier code in lj_opt_mem.c
for the first time now. This code looks really nice actually :-) I like the way the IR instruction chains make it so easy to reference backwards in the code to find barriers and related loads/stores. Mike really is a smart fella. Anyway...
Looks to me like:
- Optimizer is treating table vs closure vs FFI loads and stores as independent. That is, it assumes that one kind of operation will never access the same memory as another kind of operation.
- Barriers are also being treated as independent: loads and stores to tables can't be forwarded across a
TBAR
, to closures can't be forwarded across anOBAR
, and to FFI pointers can't be forwarded across anXBAR
.
So if we would extend the FFI and apply it to more use cases the simplest approach might be to do that on the frontend and keep the existing IR instructions as they are and be careful to respect their invariants.
This would seem to imply that an extended FFI with access to lua_State
would need to:
- Declare which barriers are needed for each function (is it allowed to update tables? closures? FFI memory?)
- Be careful about returning pointers to Lua VM memory via the FFI. This could lead to loads and stores to tables or closures being aliased with FFI memory access that the optimizer won't account for.
End braindump!
from raptorjit.
FFI for intrinsics: I think it could be done similar to the gcc (and others?) syntax, looks like a function but is managed by a completely different code emitter. Probably the easiest way would be to add an IR instruction parameterized like FPMATH
to just hold the mcode snippet as arguments (maybe in CARG
instructions). I have no idea how to best specify barrier types and dependencies to allow for best optimizability.
from raptorjit.
@javierguerragiraldez Thank you for the insights!
What do you think about the lj_vmprofile.c
implementation style?
This has felt wrong to me: On the one hand I am using the Lua C API to start and stop the profiler but on the other hand I am using the FFI to setup the memory where profiler data will be stored. Shouldn't I be doing this all the same way - either with the Lua C API or with the FFI?
I start to wonder if the answer is no, and if the code is actually okay. The start/stop use the Lua C API because they need to access the Lua state (actually the global_State
) from the VM. The memory access uses FFI because it can operate directly on a cached value of the global_State
. Could be that this is okay?
However it's possible that there is a neat solution doing everything with the C API. The immediate challenge is that I am not sure how to pass an FFI pointer via the Lua C API...?
from raptorjit.
Related Issues (20)
- Idea: Remove Lua C-API HOT 41
- Philosophy: Who is RaptorJIT for? HOT 1
- RaptorJIT language side evolution and Lua compatibility HOT 3
- A world on FFI HOT 6
- Benchmark: FFI
- Idea: Separate snapshot for each function call
- raptorjit release version confusion HOT 2
- Idea: Write Lua parser and bytecode compiler in Lua HOT 19
- Question: How to send relevant fixes to LuaJIT?
- Document VM bootstrap, code generation, build process HOT 2
- Idea: CNEWI sinking across trace boundaries HOT 4
- Demo: Over 50x slowdown on pointer arithmetic due to single branch
- Windows support HOT 2
- Openresty HOT 8
- Optimization: lambda lifting HOT 7
- Initial port of RaptorJIT bytecode interpreter to C
- Filling the gap with Lua 5.3 HOT 2
- Apply to GitHub sponsorship HOT 3
- LuaJIT/RaptorJIT at FOSDEM 2020?
- Linking failed on ArchLinux HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raptorjit.