Comments (4)
First idea:
Suppose that we extended lua_State
with a parallel "immediate cdata stack" that contains unboxed immutable cdata values i.e. FFI pointers or 64-bit integers. This stack would have exactly the same size and structure as the Lua stack except that most of the values would be empty.
Then when a trace exits and has a sunk cnewi
instruction referenced by a snapshot we don't have to transfer that value onto the heap. Instead we:
- Write the actual value onto the immediate cdata stack for unboxed values.
- Write a special TValue onto the main Lua stack slot to indicate that the value needs to be loaded from the immediate cdata stack.
This becomes complex if we have to always maintain both of these stacks in parallel and make sure that every stack access is prepared for a potential indirection onto the immediate cdata stack. However, suppose that we could restrict the circumstances when the immediate cdata stack is actually maintained and used.
Suppose that the immediate cdata stack is only valid when branching to a root trace from another trace. That is, if your trace is going to exit and link with a root trace, then you transfer any sunk cnewi
values onto the immutable cdata stack, and then when a trace loads a cdata value from the stack using an sload
IR instruction it always checks for the special value saying that the value is unboxed on the immutable cdata stack.
Then we have solved the problem quite neatly?
- Sunk allocations will always stay sunk across trace boundaries.
- Only two pieces of code need to be modified: snapshot restore, to separately handle transferring sunk values to other traces verses to the interpreter, and the
sload
IR instruction to accept values either as boxed TValues or as unboxed on the immediate cdata stack.
What do I miss?
from raptorjit.
What do I miss?
You enter a root trace with a "dirty" "immutable cdata stack", the trace doesn't touch the "special TValue" (hence not added to a snapshot). Now you are in a situation when a snapshot alone isn't enough to properly restore Lua stack.
I am also concerned about the overhead (essentially doubling the Lua stack footprint of a function) and limited applicability (only handles 64bit things).
from raptorjit.
Now you are in a situation when a snapshot alone isn't enough to properly restore Lua stack.
True. This approach would require extra snapshot-like bookkeeping.
I am also concerned about the overhead (essentially doubling the Lua stack footprint of a function) and limited applicability (only handles 64bit things).
On the one hand it is important to keep overhead low. On the other hand optimizing Lua code often involves using FFI data structures and today this can unpredictably cause ~50x slowdown (#252.) So I do want to find an efficient solution but almost anything would be better than the status quo.
from raptorjit.
@mejedi Thanks for shooting that naïve idea down.
Could we attack this problem during compilation instead of at runtime?
Suppose that we have two linked traces T1->T2
with a CNEWI
in T1
that cannot sink because its value escapes into T2
via the last snapshot. This causes an unwanted heap allocation in trace T1
.
Further suppose that the value does not escape from T2
into the interpreter or onto the heap i.e. that it would have been sunk had it been allocated in T2
instead of in T1
.
Is there a way that we could "transfer the sinkage" from trace T1
to trace T2
?
Then the allocation would sink in T1
and the sunk value would escape into T2
which could then sink it too, and then we wouldn't have an allocation. This seems similar to the way that sunk values are already allowed to escape into side-traces today, with those side-traces being responsible for deciding whether to keep the value sunk or not.
Example
Here is the abbreviated IR code for the hot path in example #252:
---- TRACE 1 start xx.lua:20
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbx int SLOAD #3 CI
0002 rax > cdt SLOAD #2 T
0003 u16 FLOAD 0002 cdata.ctypeid
0004 > int EQ 0003 +96
0005 rbp p64 FLOAD 0002 cdata.ptr
0006 rbp + p64 ADD 0005 +1
0007 {sink}+ cdt CNEWI +96 0006
.... SNAP #1 [ ---- ---- 0007 0001 ---- ---- ---- ]
[[[ Exit to side trace 2 happens here ]]]
---- TRACE 2 start 1/1 xx.lua:20
---- TRACE 2 IR
0001 rbx int SLOAD #3 PI
0002 rbp p64 PVAL #6
0003 [8] cdt CNEWI +96 0002
.... SNAP #0 [ ---- ---- 0003 0001 ---- ---- ---- ]
0004 > nil GCSTEP
0005 rbx int ADD 0001 +1
.... SNAP #1 [ ---- ---- 0003 ]
0006 > int LE 0005 +1000000
0007 xmm7 num CONV 0005 num.int
.... SNAP #2 [ ---- ---- 0003 0007 ---- ---- 0007 ]
---- TRACE 2 stop -> 1
So here we see that:
- Trace 1 has a
CNEWI
that sinks. - Trace 2 loads the raw sunk pointer value using
PVAL
. - Trace 2 has a duplicate
CNEWI
to make the pointer value respectable/usable. - Trace 2 is not able to sink the
CNEWI
because it is referenced in the last snapshot.
The nice aspect of this is that sunk values can be passed between traces. The restriction is that those values can't just be loaded from the stack using SLOAD
and must instead be wrapped in a CNEWI
so that the compiler can unsink them if necessary.
So maybe a solution is that root traces would not load immediate cdata values using SLOAD
but instead with a special CNEWI
that loads its value from the stack and can accept either sunk or unsunk values?
So at the start of trace 1 we would replace
-0002 rax > cdt SLOAD #2 T
+0002 {sink}+ cdt CNEWI +96 #2
... which is a new form of CNEWI
in the spirit of PVAL
that will receive a sunk value (if available) instead of loading from the stack. (Handwave on the details of this for now.)
This all sounds tricky but I am not so sure. The promising aspect is that when we compile the link T1->T2
we already have full snapshot information, etc, about both of the traces.
from raptorjit.
Related Issues (20)
- Idea: Remove Lua C-API HOT 41
- Philosophy: Who is RaptorJIT for? HOT 1
- RaptorJIT language side evolution and Lua compatibility HOT 3
- A world on FFI HOT 6
- Benchmark: FFI
- Idea: Separate snapshot for each function call
- raptorjit release version confusion HOT 2
- Idea: Write Lua parser and bytecode compiler in Lua HOT 19
- Question: How to send relevant fixes to LuaJIT?
- Document VM bootstrap, code generation, build process HOT 2
- Demo: Over 50x slowdown on pointer arithmetic due to single branch
- Windows support HOT 2
- Openresty HOT 8
- Optimization: lambda lifting HOT 7
- Initial port of RaptorJIT bytecode interpreter to C
- Filling the gap with Lua 5.3 HOT 2
- Apply to GitHub sponsorship HOT 3
- LuaJIT/RaptorJIT at FOSDEM 2020?
- Linking failed on ArchLinux HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raptorjit.