Comments (3)
Just one interesting correlation that I found while poking around with ocperf:
The performance counters above show that enabling fuse
optimization increased the execution time by 538M cycles. I see that the counter ild_stall.iq_full
also increased by 413M. This is apparently the number of "stall cycles" where the CPU front end instruction length decoder (ILD) was blocked by the instruction queue (IQ) being full.
Performance counter stats for './luajit ../../luajit-test-cleanup/bench/md5.lua 20000':
8,166,773,255 instructions # 2.02 insn per cycle
4,048,864,389 cycles
1,425,454,950 ild_stall_iq_full
1.687461787 seconds time elapsed
Performance counter stats for './luajit -O-fuse ../../luajit-test-cleanup/bench/md5.lua 20000':
8,646,969,220 instructions # 2.47 insn per cycle
3,504,694,078 cycles
1,012,083,906 ild_stall_iq_full
1.460671377 seconds time elapsed
This simple observation definitely does not imply causation but it seemed worth remarking upon. This could be consistent with a lack of instruction decoding throughput caused by complex fused instructions choking the frontend. It could also be from something else entirely, e.g. backpressure from a bottleneck further down the path and maybe even in the backend? Have to dig a little deeper.
Performance counters are notoriously hard to interpret. There are long procedures/flowcharts for deciding what aspect of the processor is limiting performance to what extent, and these are somewhat automated in tools like toplev
, but I find that I get the most value from having a small set of counters that I am familiar with. I have mostly used the counters for debugging backend problems (memory access, etc) and so I am not terribly familiar with the frontend counters.
from raptorjit.
The first place to look for clarity on this issue is the Intel Architectures Optimization Manual (also cited from top-level README on this project.) Intel write beautiful documentation and it is targeted at exactly people like us.
Here is a screen shot from the first subsection of the frontend optimization chapter. Just to share the flavor with people who have never checked out the Intel documentation before but might be tempted to open it :-)
from raptorjit.
I examined some more CPU performance counters and uops_issued_slow_lea
lead me to #55. This seems to resolve the problem but I have only tested it on the md5 benchmark so far and so have to run the full suite to see the wider implications.
from raptorjit.
Related Issues (20)
- Idea: Remove Lua C-API HOT 41
- Philosophy: Who is RaptorJIT for? HOT 1
- RaptorJIT language side evolution and Lua compatibility HOT 3
- A world on FFI HOT 6
- Benchmark: FFI
- Idea: Separate snapshot for each function call
- raptorjit release version confusion HOT 2
- Idea: Write Lua parser and bytecode compiler in Lua HOT 19
- Question: How to send relevant fixes to LuaJIT?
- Document VM bootstrap, code generation, build process HOT 2
- Idea: CNEWI sinking across trace boundaries HOT 4
- Demo: Over 50x slowdown on pointer arithmetic due to single branch
- Windows support HOT 2
- Openresty HOT 8
- Optimization: lambda lifting HOT 7
- Initial port of RaptorJIT bytecode interpreter to C
- Filling the gap with Lua 5.3 HOT 2
- Apply to GitHub sponsorship HOT 3
- LuaJIT/RaptorJIT at FOSDEM 2020?
- Linking failed on ArchLinux HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raptorjit.