Comments (6)
If you replace rand16()
with e.g. a constant 1
, does the inefficiency go away?
Maybe it might be the case that the compiler can't prove that rand16()
doesn't touch memory and potentially i
, so it thinks it needs to store X and reload it later.
Any chance you'd be able to reduce the test case into a minimal code line test case, so that it could e.g. be verifiable on https://godbolt.org/ ? That way it will be easier to see if the issue still persists in the future when newer versions of llvm-mos come up on godbolt.
from llvm-mos.
Maybe it might be the case that the compiler can't prove that
rand16()
doesn't touch memory and potentiallyi
, so it thinks it needs to store X and reload it later.
Even if it needs to use the value in $31
elsewhere, shouldn't it at least be able to eliminate the 2nd ldx
? (unless i
was declared volatile
, maybe?)
from llvm-mos.
Yeah, probably.. that was just a guess.
from llvm-mos.
If you replace
rand16()
with e.g. a constant1
, does the inefficiency go away?
Good question, yeah it becomes a simple DEX / BNE in that case.
Maybe it might be the case that the compiler can't prove that
rand16()
doesn't touch memory and potentiallyi
, so it thinks it needs to store X and reload it later.
neslib's rand.s is in assembly. rand16 uses X, and JSRs to rand8 twice. Both of those subroutines are defined like:
.section .text.rand8,"ax",@progbits
Any chance you'd be able to reduce the test case into a minimal code line test case, so that it could e.g. be verifiable on https://godbolt.org/ ? That way it will be easier to see if the issue still persists in the future when newer versions of llvm-mos come up on godbolt.
Sure, I put a reduced version here. Well, rand16 may be a red herring, because it's in there and the iterator test suddenly looks more reasonable (DEC ZP then loads it to Y, but Y value this time is actually used at the top of the loop).
https://godbolt.org/z/PTWxKvdbr
While editing it down, I noticed something strange. Code before the loop is affecting the loop, it doesn't seem like it should be. This copy has one of the neslib function calls uncommented, and uncommenting any one of those function calls seems to have this same effect. With this, now it's back to doing the LDX / DEX / STX thing (actual X value gets trashed by rand16).
https://godbolt.org/z/rdoaaqheG
from llvm-mos.
I'm also now noticing there's more inefficiency in the end of this same loop. Same code as linked above: https://godbolt.org/z/rdoaaqheG
ldx mos8(.Lmain_zp_stk+2) ; 1-byte Folded Reload
dex
stx mos8(.Lmain_zp_stk+2) ; 1-byte Folded Spill
ldx mos8(.Lmain_zp_stk+2) ; 1-byte Folded Reload
bne .LBB0_1
ldx #0
rts
X was tested by BNE, X must be zero to continue, but the next instruction is a redundant LDX #0. Seems related to this issue maybe, but I could open that as a separate issue, if that would help.
from llvm-mos.
I tried a copy/paste of the loop in question, so it runs another copy of that loop. The first copy does the unoptimized X register stuff, and the second copy uses DEC ZP. This happens after a neslib call, using any single one of them will do it. If there are no neslib calls beforehand, both for loops will use DEC ZP.
If I put another neslib call before the second copy of the loop, that makes the second copy also do the redundant X register stuff.
https://godbolt.org/z/d3n3Ezfn3
from llvm-mos.
Related Issues (20)
- Support assembler sources in ca65 format
- Lower mem intrinsics to loops
- G_OR prevents selection of addressing mode HOT 1
- Don't copy single-use strings to the zero page
- rustc crash HOT 2
- Compilation failure on MacOS w. Apple silicon HOT 11
- Builder for Apple Silicon
- mos-sim crash HOT 1
- Triple selection doesn't accommodate mos-<platform>-<type>-<subtype> syntax
- [65C816, 65CE02] Long branch instructions not supported HOT 2
- ld.lld: error: undefined symbol: __rc4 to __rc24 HOT 3
- Missing G_SBC commutation for equality checks HOT 1
- [Assembler] Improved ergonomics for 65816 (and other) subtargets HOT 14
- [Assembler] .byte/.short don't support MOS expression parsing
- [Interrupts] Current interrupt C generation inadequate for CBM machines HOT 2
- Redundant copy and spilling HOT 1
- Compiler crashes when try to access a member variable of a class through inline assembly HOT 5
- Declaration order of member variables has a big impact on code optimization HOT 1
- Surface error messages for inline assembly
- LLD verbose mode should print the command lines of invoked `ld65` steps
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llvm-mos.