GithubHelp home page GithubHelp logo

Comments (3)

lukego avatar lukego commented on June 6, 2024

Just one interesting correlation that I found while poking around with ocperf:

The performance counters above show that enabling fuse optimization increased the execution time by 538M cycles. I see that the counter ild_stall.iq_full also increased by 413M. This is apparently the number of "stall cycles" where the CPU front end instruction length decoder (ILD) was blocked by the instruction queue (IQ) being full.

 Performance counter stats for './luajit ../../luajit-test-cleanup/bench/md5.lua 20000':

     8,166,773,255      instructions              #    2.02  insn per cycle         
     4,048,864,389      cycles                                                      
     1,425,454,950      ild_stall_iq_full                                           

       1.687461787 seconds time elapsed

 Performance counter stats for './luajit -O-fuse ../../luajit-test-cleanup/bench/md5.lua 20000':

     8,646,969,220      instructions              #    2.47  insn per cycle         
     3,504,694,078      cycles                                                      
     1,012,083,906      ild_stall_iq_full                                           

       1.460671377 seconds time elapsed

This simple observation definitely does not imply causation but it seemed worth remarking upon. This could be consistent with a lack of instruction decoding throughput caused by complex fused instructions choking the frontend. It could also be from something else entirely, e.g. backpressure from a bottleneck further down the path and maybe even in the backend? Have to dig a little deeper.

Performance counters are notoriously hard to interpret. There are long procedures/flowcharts for deciding what aspect of the processor is limiting performance to what extent, and these are somewhat automated in tools like toplev, but I find that I get the most value from having a small set of counters that I am familiar with. I have mostly used the counters for debugging backend problems (memory access, etc) and so I am not terribly familiar with the frontend counters.

from raptorjit.

lukego avatar lukego commented on June 6, 2024

The first place to look for clarity on this issue is the Intel Architectures Optimization Manual (also cited from top-level README on this project.) Intel write beautiful documentation and it is targeted at exactly people like us.

Here is a screen shot from the first subsection of the frontend optimization chapter. Just to share the flavor with people who have never checked out the Intel documentation before but might be tempted to open it :-)

screen shot 2017-03-22 at 09 23 31

from raptorjit.

lukego avatar lukego commented on June 6, 2024

I examined some more CPU performance counters and uops_issued_slow_lea lead me to #55. This seems to resolve the problem but I have only tested it on the md5 benchmark so far and so have to run the full suite to see the wider implications.

from raptorjit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.