GithubHelp home page GithubHelp logo

Comments (6)

javierguerragiraldez avatar javierguerragiraldez commented on May 26, 2024 1

FFI vs. lj_ircall.h: They're mostly the same thing: a higher level language to generate an IR call to something else. After that, the optimizers don't care where that came from. There are three parts on the problem:

  • arguments. the FFI uses runtime ctype (as observed on the trace) to emit transformations and CARG instructions; lj_ircall.h declares just the number of arguments, the user (a recording function in C) has to specify the TRefs already emitted in the appropriate types.
  • call variety (normal, load/store/both barrier). FFI assumes XS, lj_ircall.h specifies the exact variety. This allows the optimizer greater code mobility (or even eliminate N calls), but I think it's a heavy responsibility for most non-hardcore developers. The vast majority of external functions would always be XS anyway. There's also a flag to use the FASTCALL convention instead of 'normal' C.
  • return argument. Again, the FFI emits extra transformations as specified by the ctype, and immediately by any assignment (hopefully those transformations could be fused/folded in many cases). lj_ircall.h limits to the IRType declaration of the call itself.

FFI vs Lua C API: these are far more different (after all they're two different designs by different people with different goals). The reason why L isn't exposed to the FFI is because there's no guarantee that it's in a consistent state. I guess at the very least there would have to be a snapshot+replay just before FFI-calling any lua_xxx() function. Probably would result in something very similar to a thread stitch; at best it would avoid starting the interpreter, instead falling into an execution environment similar to external C, but maybe running some specific mcode that just does the FFI call to the exposed symbol.

from raptorjit.

lukego avatar lukego commented on May 26, 2024

Relevant example is VMProfile on LuaJIT/LuaJIT#290. The code is currently an awkward combination of LJLIB_CF for accessing the lua_State and FFI for cutting through boilerplate and passing an FFI pointer from Lua. How should that code be written in LuaJIT? How should it be written in RaptorJIT?

from raptorjit.

javierguerragiraldez avatar javierguerragiraldez commented on May 26, 2024

AIUI:

  • LJLIB_xxx (_CF, _ASM, _LUA) implement the standard library.
    • _CF are implemented in C and are callable from Lua via the traditional API.
    • _LUA are compiled into bytecode stored as C chunks of bytes in the executable. The interpreter can execute them directly, the tracer trace them normally and produce IR and mcode as expected. Some limitations (I think can't use most other built in functions).
    • _ASM are just stubs for the build scripts, and implemented in assembler. Not sure about the calling conventions.
  • LUA_API implement the classic LUA C API. Called from C code.
  • LJ_FASTCALL doesn't mark a "type" of function, it's a hint to compiler to use more registers and less stack, so it has less overhead. Only useful for x86 and don't know what the tradeoffs are. It's used on some functions used "directly" from compiled code, and also for some "normal" C code in time-sensitive places.
  • IR_CALL_{N,L,S} 'internal functions' called from the IR, compiled into a very tight mcode call. The call format must be predefined in lj_ircall.h for each internal function.
  • FFI generate CALL_XS IR opcodes using the definition from the cdef instead of lj_ircall.h. No way to declare the equivalent to N/L/S optimization options, so they're always full barriers.

from raptorjit.

lukego avatar lukego commented on May 26, 2024

@javierguerragiraldez Thank you very much for the detailed explanation! I really appreciate being able to discuss these things in a social context rather than only contemplating the code in silence :). Meditating over source code until reaching enlightenment is great for personal development, but for community development not so much :-)

Anyway!

I have removed the LJ_FASTCALL macro with #19. Thanks for pointing out that this was an i386 feature!

Question: Could the FFI replace some or all of these other mechanisms?

For example, if the FFI were extended with load/store/xstore barrier declarations could we rewrite lj_ircall.h as an ffi.cdef()? (Would this be a win?)

For another example, if the FFI were extended to allow access to lua_State then could it replace the classic Lua C API for some use cases e.g. VMProfile? Maybe we could just store the current lua_State in a thread-local variable and provide a way to access that from FFI routines? (Would there be other implications e.g. semantic differences between FFI and C API optimization barriers?)

Also - really another topic - but I wonder whether simple extensions to the FFI could provide most of the benefits of intrinsics as in LuaJIT/LuaJIT#116. I would quite like intrinsics for executing simple instructions like RDTSCP and PREFETCHNTA and question on my mind is whether FFI is good enough for this use case in practice and if not then what is the problem (the branch to a subroutine? the stack/register operations for the calling convention? the IR optimization barrier? appropriate solution will depend on which one(s) are actually a practical problem. But I digress...)

I am looking at the load/store barrier code in lj_opt_mem.c for the first time now. This code looks really nice actually :-) I like the way the IR instruction chains make it so easy to reference backwards in the code to find barriers and related loads/stores. Mike really is a smart fella. Anyway...

Looks to me like:

  • Optimizer is treating table vs closure vs FFI loads and stores as independent. That is, it assumes that one kind of operation will never access the same memory as another kind of operation.
  • Barriers are also being treated as independent: loads and stores to tables can't be forwarded across a TBAR, to closures can't be forwarded across an OBAR, and to FFI pointers can't be forwarded across an XBAR.
    So if we would extend the FFI and apply it to more use cases the simplest approach might be to do that on the frontend and keep the existing IR instructions as they are and be careful to respect their invariants.

This would seem to imply that an extended FFI with access to lua_State would need to:

  • Declare which barriers are needed for each function (is it allowed to update tables? closures? FFI memory?)
  • Be careful about returning pointers to Lua VM memory via the FFI. This could lead to loads and stores to tables or closures being aliased with FFI memory access that the optimizer won't account for.

End braindump!

from raptorjit.

javierguerragiraldez avatar javierguerragiraldez commented on May 26, 2024

FFI for intrinsics: I think it could be done similar to the gcc (and others?) syntax, looks like a function but is managed by a completely different code emitter. Probably the easiest way would be to add an IR instruction parameterized like FPMATH to just hold the mcode snippet as arguments (maybe in CARG instructions). I have no idea how to best specify barrier types and dependencies to allow for best optimizability.

from raptorjit.

lukego avatar lukego commented on May 26, 2024

@javierguerragiraldez Thank you for the insights!

What do you think about the lj_vmprofile.c implementation style?

This has felt wrong to me: On the one hand I am using the Lua C API to start and stop the profiler but on the other hand I am using the FFI to setup the memory where profiler data will be stored. Shouldn't I be doing this all the same way - either with the Lua C API or with the FFI?

I start to wonder if the answer is no, and if the code is actually okay. The start/stop use the Lua C API because they need to access the Lua state (actually the global_State) from the VM. The memory access uses FFI because it can operate directly on a cached value of the global_State. Could be that this is okay?

However it's possible that there is a neat solution doing everything with the C API. The immediate challenge is that I am not sure how to pass an FFI pointer via the Lua C API...?

from raptorjit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.