GithubHelp home page GithubHelp logo

Data type narrowing and LSR about llvm-mos HOT 8 CLOSED

llvm-mos avatar llvm-mos commented on July 20, 2024
Data type narrowing and LSR

from llvm-mos.

Comments (8)

mysterymath avatar mysterymath commented on July 20, 2024

We do actually do a bit of this already; the "MOSIndexIV" pass looks for 16-bit index variables in loops and lowers them to 8 bits.

If you look really carefully at the assembly in the second version of the code, you can see that that pass is actually working. The X register is carried around the loop completely separately from whatever else is going on there, and the array is actually accessed using the 8-bit ,X addressing mode. It appears (I'm guessing) that the loop's test doesn't receive the same treatment however, which indicates a gap in the cases handled by that pass.

from llvm-mos.

beholdnec avatar beholdnec commented on July 20, 2024

If the MOSIndexIV pass is designed to narrow index variables in loops, then it takes advantage of some narrowing opportunities but probably misses others. Perhaps it would be good to have a more general data-narrowing pass that takes place earlier in compilation

from llvm-mos.

mysterymath avatar mysterymath commented on July 20, 2024

We'll probably need both. Loops are a bit special in that there's a circular dependency: both bytes of the 16-bit index are used by the 16-bit increment which becomes used by the next 16-bit increment, and so on. You have to look at the loop as a whole to see that the whole only requires 8 bits. It's a bit like cycle detection in a garbage collector.

In LLVM, this is the SCEV pass, which generates for say, "int x" in the above loop: (and don't quote me on this): i16:<+, 0, 1>. This encodes that x begins at zero and increases by 1 each time through the loop. MOSIndexIV notes that since the loop edge count is 64, the max value of x is 64 (0 + 1 + 1 + 1... 64 times). We then turn the index from (gStuff + i16:<+, 0, 1>) into (gStuff + zero_ext(i8<+, 0, 1>). It seems like when loop strength reduction later goes looking for an induction variable to use to test whether we've hit 64 iterations, it should prefer the native i8<+, 0, 1> to the original i16<+, 0, 1>, but it doesn't (or it constructs a completely different IV or something). We'll have to trace it and see what it's doing there.

There should already be some stuff in LLVM to lower straight-line code, since it'd be important to say, lower 64-bit operations to 32 bits on a 32-bit system. I haven't really gone looking for it yet, though; it may just be a matter of turning it on and/or telling it that we want it to take things down to 8 bits. A lot of this is triggered on the "native" int type, which we did already set to 8 bits, but there may be some optimization that hardcoded i32 (there's a SURPRISING amount of that).

from llvm-mos.

mysterymath avatar mysterymath commented on July 20, 2024

While optimizing Dhrystone, I enabled Loop Strength Reduction (LSR), even for non-native induction variables (IVs). Loop strengh reduction rewrites all uses of variables that vary with the loop (induction variables) to require maintaining as few as possible, and updating them as cheaply as possible. This will for example convert a multiplication in a loop to an addition of the stride on each iteration, and a host of other optimizations. LSR uses information about target addressing modes to try to produce expressions that will later reduce to addressing modes, rather than computing those values into registers.

A ton of induction variables will be 16-bit or larger on MOS, but that doesn't eliminate the need to rewrite them to e.g. count down to zero. LLVM didn't do this, since LSR wasn't smart enough to avoid production a lot of non-native IVs when smaller IVs would suffice.

Turning this on broke the old IndexIV optimization, since LSR completely rewrites away the narrow IV to a wider one. We need to actually teach LSR how the 6502's addressing modes work. This will require extending its internal interfaces quite a bit; it makes assumptions that are not valid on MOS, so there's no way at present to tell it about MOS addressing modes.

from llvm-mos.

mysterymath avatar mysterymath commented on July 20, 2024

The latest spat of changes should have repaired the IndexIV optimization, assuming they pass testing.

from llvm-mos.

beholdnec avatar beholdnec commented on July 20, 2024

Here are the latest results as of today.

This program:

char gStuff[64];

void foo(char x) {
    for (int i = 0; i < 64; i++) {
        gStuff[i] = i;
    }
}

Gets compiled to:

foo:                                    ; @foo
; %bb.0:                                ; %entry
	ldx	#64
	ldy	#0
.LBB0_1:                                ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
	tya
	sta	gStuff,y
	clc
	adc	#1
	tay
	clc
	txa
	adc	#-1
	tax
	lda	#0
	cmp	#0
	bne	.LBB0_1
; %bb.2:                                ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	txa
	cpx	#0
	bne	.LBB0_1
; %bb.3:                                ; %for.cond.cleanup
	rts

Note the useless pattern lda #0, cmp #0.

For reference, here are the results when int i is changed to char i:

foo:                                    ; @foo
; %bb.0:                                ; %entry
	lda	#0
.LBB0_1:                                ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
	tax
	sta	gStuff,x
	clc
	adc	#1
	cmp	#64
	bne	.LBB0_1
; %bb.2:                                ; %for.cond.cleanup
	rts

from llvm-mos.

mysterymath avatar mysterymath commented on July 20, 2024

I'm still working on this pretty actively; it's been an absolutely wild ride through LLVM's codebase so far.

I've had to rather substantially alter the data model used by Loop Stength Reduction, which is the primary pass that deals with hardware addressing modes in LLVM. The 6502's addressing modes were completely unrepresentable in that pass, due to the hard assumption that all addressing modes are of the form "base + reg + scale * scalereg", where all of base, reg, and scalereg have exactly the same size.

After attempting several hacks and alternatives, I've broken that assumption in LoopStrengthReduction; the registers can now be narrower than the full addition, and they're implicitly zero-extended before adding. (This was a fairly hefty change, so it's taken up most of my time on this project over the last few weeks.)

I've had to disable a number of sections of LSR in the scenario we're dealing with, since I haven't repaired the logic yet, and they were spitting out garbage. I'm still hopeful that once these sections are repaired, LSR should pick really good induction variables to travel around the loop with. At least, it looks fairly promising so far.

Thanks for keeping an eye on this; I'll post an update once I've finished tweaking LSR.

Oh, and one side effect of this is that LSR can really change the output to the rest of the code generator, which isn't very well optimized either. Often a really good decision by LSR has produced a really poor decision elsewhere, just by coincidence. My goal for this pass is to get LSR putting out consistently good and clean loop code; this may actually make some of our benchmarks slower until we clean up all the little problems elsewhere in the codebase.

from llvm-mos.

mysterymath avatar mysterymath commented on July 20, 2024

After quite a lot more futzing with LSR, the given example is now:

char gStuff[64];

void foo(char x) {
    for (int i = 0; i < 64; i++) {
        gStuff[i] = i;
    }
}
foo:                                    ; @foo
; %bb.0:                                ; %entry
        lda     #0
.LBB0_1:                                ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
        tax
        sta     gStuff,x
        clc
        adc     #1
        cmp     #64
        bne     .LBB0_1
; %bb.2:                                ; %for.cond.cleanup
        rts
.Lfunc_end0:

LSR seems to be doing a mostly okay job now from a cursory look at the benchmarks; at the very least, returns are starting to diminish. The changes to it seem fairly brittle; it's clear that we'll need to hammer on LSR a lot more throughout the life of the project, but que sera sera.

Closing this one for now, since there's at least a reasonable pass at handling this kind of optimization, even if it ends up not always "sticking" for one reason or another.

from llvm-mos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.