The feature that I am most excited about supporting in raptorjit is unboxed FFI pointe

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Idea: CNEWI sinking across trace boundaries about raptorjit HOT 4 OPEN

lukego commented on June 16, 2024

Idea: CNEWI sinking across trace boundaries

from raptorjit.

Comments (4)

lukego commented on June 16, 2024

First idea:

Suppose that we extended lua_State with a parallel "immediate cdata stack" that contains unboxed immutable cdata values i.e. FFI pointers or 64-bit integers. This stack would have exactly the same size and structure as the Lua stack except that most of the values would be empty.

Then when a trace exits and has a sunk cnewi instruction referenced by a snapshot we don't have to transfer that value onto the heap. Instead we:

Write the actual value onto the immediate cdata stack for unboxed values.
Write a special TValue onto the main Lua stack slot to indicate that the value needs to be loaded from the immediate cdata stack.

This becomes complex if we have to always maintain both of these stacks in parallel and make sure that every stack access is prepared for a potential indirection onto the immediate cdata stack. However, suppose that we could restrict the circumstances when the immediate cdata stack is actually maintained and used.

Suppose that the immediate cdata stack is only valid when branching to a root trace from another trace. That is, if your trace is going to exit and link with a root trace, then you transfer any sunk cnewi values onto the immutable cdata stack, and then when a trace loads a cdata value from the stack using an sload IR instruction it always checks for the special value saying that the value is unboxed on the immutable cdata stack.

Then we have solved the problem quite neatly?

Sunk allocations will always stay sunk across trace boundaries.
Only two pieces of code need to be modified: snapshot restore, to separately handle transferring sunk values to other traces verses to the interpreter, and the sload IR instruction to accept values either as boxed TValues or as unboxed on the immediate cdata stack.

What do I miss?

from raptorjit.

mejedi commented on June 16, 2024

What do I miss?

You enter a root trace with a "dirty" "immutable cdata stack", the trace doesn't touch the "special TValue" (hence not added to a snapshot). Now you are in a situation when a snapshot alone isn't enough to properly restore Lua stack.

I am also concerned about the overhead (essentially doubling the Lua stack footprint of a function) and limited applicability (only handles 64bit things).

from raptorjit.

lukego commented on June 16, 2024

Now you are in a situation when a snapshot alone isn't enough to properly restore Lua stack.

True. This approach would require extra snapshot-like bookkeeping.

I am also concerned about the overhead (essentially doubling the Lua stack footprint of a function) and limited applicability (only handles 64bit things).

On the one hand it is important to keep overhead low. On the other hand optimizing Lua code often involves using FFI data structures and today this can unpredictably cause ~50x slowdown (#252.) So I do want to find an efficient solution but almost anything would be better than the status quo.

from raptorjit.

lukego commented on June 16, 2024

@mejedi Thanks for shooting that naïve idea down.

Could we attack this problem during compilation instead of at runtime?

Suppose that we have two linked traces T1->T2 with a CNEWI in T1 that cannot sink because its value escapes into T2 via the last snapshot. This causes an unwanted heap allocation in trace T1.

Further suppose that the value does not escape from T2 into the interpreter or onto the heap i.e. that it would have been sunk had it been allocated in T2 instead of in T1.

Is there a way that we could "transfer the sinkage" from trace T1 to trace T2?

Then the allocation would sink in T1 and the sunk value would escape into T2 which could then sink it too, and then we wouldn't have an allocation. This seems similar to the way that sunk values are already allowed to escape into side-traces today, with those side-traces being responsible for deciding whether to keep the value sunk or not.

Example

Here is the abbreviated IR code for the hot path in example #252:

---- TRACE 1 start xx.lua:20
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbx      int SLOAD  #3    CI
0002 rax   >  cdt SLOAD  #2    T
0003          u16 FLOAD  0002  cdata.ctypeid
0004       >  int EQ     0003  +96 
0005 rbp      p64 FLOAD  0002  cdata.ptr
0006 rbp    + p64 ADD    0005  +1  
0007  {sink}+ cdt CNEWI  +96   0006
....              SNAP   #1   [ ---- ---- 0007 0001 ---- ---- ---- ]
[[[ Exit to side trace 2 happens here ]]]

---- TRACE 2 start 1/1 xx.lua:20
---- TRACE 2 IR
0001 rbx      int SLOAD  #3    PI
0002 rbp      p64 PVAL   #6  
0003 [8]      cdt CNEWI  +96   0002
....              SNAP   #0   [ ---- ---- 0003 0001 ---- ---- ---- ]
0004       >  nil GCSTEP 
0005 rbx      int ADD    0001  +1  
....              SNAP   #1   [ ---- ---- 0003 ]
0006       >  int LE     0005  +1000000
0007 xmm7     num CONV   0005  num.int
....              SNAP   #2   [ ---- ---- 0003 0007 ---- ---- 0007 ]
---- TRACE 2 stop -> 1

So here we see that:

Trace 1 has a CNEWI that sinks.
Trace 2 loads the raw sunk pointer value using PVAL.
Trace 2 has a duplicate CNEWI to make the pointer value respectable/usable.
Trace 2 is not able to sink the CNEWI because it is referenced in the last snapshot.

The nice aspect of this is that sunk values can be passed between traces. The restriction is that those values can't just be loaded from the stack using SLOAD and must instead be wrapped in a CNEWI so that the compiler can unsink them if necessary.

So maybe a solution is that root traces would not load immediate cdata values using SLOAD but instead with a special CNEWI that loads its value from the stack and can accept either sunk or unsunk values?

So at the start of trace 1 we would replace

-0002 rax   >  cdt SLOAD  #2    T
+0002  {sink}+ cdt CNEWI  +96   #2

... which is a new form of CNEWI in the spirit of PVAL that will receive a sunk value (if available) instead of loading from the stack. (Handwave on the details of this for now.)

This all sounds tricky but I am not so sure. The promising aspect is that when we compile the link T1->T2 we already have full snapshot information, etc, about both of the traces.

from raptorjit.

Idea: CNEWI sinking across trace boundaries about raptorjit HOT 4 OPEN

Comments (4)

Example

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs