Comments (7)
For reference, the inimitable 64doc:
http://www.zimmers.net/anonftp/pub/cbm/documents/chipdata/64doc
Necessary info is under the excellently named "6510 features"
from llvm-mos.
I'd like to confirm that I understand this issue.
When implementing volatile support, it's important to make sure that 1) the underlying memory is actually being read and written whenever the corresponding volatile is; and optionally, 2, that the underlying memory is read and written no more frequently than the corresponding volatile is.
Is this a fair problem statement?
from llvm-mos.
Yes, that's a fair assessment. To clarify, the overall ordering of volatile accesses need to agree as well.
There's exactly two things that the C standard defines something that the implementation is actually required to do. Were it not for these two clauses, the compiler could just emit RTS for every program.
They are:
- At each sequence point, all previous volatile reads and writes are complete, and no later volatile read or writes have begun.
- I/O operations are produced in agreement with the abstract machine semantics.
The first item roughly corresponds to (1); the standard says nothing about extra accesses, only those given in the abstract machine model. But, implementations can and do define tighter interpretations. We can also be fairly choosy with what we want to support.
For example, say you have a volatile IO reg that performs an access on read, but you define it via a struct with a bitfield:
struct S {
int dummy : 7;
int flag: 1;
};
volatile struct S* IO = 0x1234;
If you change one of the fields, it'll be nearly impossible for the compiler to emit something that doesn't involve a read-modify-write, since there's no direct "set one bit" operation in memory on the 6502. You have to read a full byte, modify it, then write it back.
IO->flag = 1;
LDA IO
ORA #1
STA IO
So there's not really a way to 100% agree with the abstract C semantics here, even without any CPU bugs. It says one access, but we can't do it in under two. This specific issue caused contention amongst Linux kernel developers, since GCC was happily doing 64-bit read/modify/writes for 32-bit accesses, and Linus thought it shouldn't (those accesses were actually partially outside the struct, causing memory access violations IIRC. Not all that dissimilar than our case here. I can't remember what GCC team actually decided.)
Ideally, we'd have volatile accesses agree "as much as possible," but there's obviously a sliding scale here. There's also an easy out for this work; we can take the C standard hard-line and require accesses to these sorts of IO registers to be done with inline assembly.
from llvm-mos.
First, let's go to the standards documents:
"A static volatile object is an appropriate model for a memory-mapped I/O register. Implementors of C translators should take into account relevant hardware details on the target systems when implementing accesses to volatile objects. For instance, the hardware logic of a system may require that a two-byte memory-mapped register not be accessed with byte operations; and a compiler for such a system would have to assure that no such instructions were generated, even if the source code only accesses one byte of the register. Whether read-modify-write instructions can be used on such device registers must also be considered.
Whatever decisions are adopted on such issues must be documented, as volatile access is implementation-defined. A volatile object is also an appropriate model for a variable shared among multiple processes. A static const volatile object appropriately models a memory-mapped input port, such as a real-time clock. Similarly, a const volatile object models a variable which can be
altered by another process but not by this one." Reference
There are two kinds of spurious accesses that we need to be aware of in llvm-mos:
A: When you add a carry to the MSB of an address, a fetch occurs at a garbage address.
B: The instructions INC, DEC, ASL, LSL, LSR, ROL, and ROR, will store garbage into the target address as part of the modify cycle.
Now A will only be a problem for volatiles, if the address of said volatile is calculated as a <256 byte offset from another address, and that offset happens to cross a page boundary, and that addressing is calculated at run-time as an x or y offset. I can't think of a situation where this would occur, if the effective address is a constant. In other words, I think that problem A could only occur in a volatile whose effective address is not const and not static. Is this correct?
Now B will only be a problem if you do the shift or increment on the volatile memory itself, as opposed to reading the value into an imaginary register, doing the operation on that imaginary register, and then writing it. But I am not aware of anything in your codegen than performs increments or shifts on anything except imaginary registers, which can't be mapped onto any hardware device. Anything I'm missing?
If the above assumptions are true, then we may be able to squeak by, by simply telling the user that reads on indexed volatiles that are neither static nor const, are not guaranteed to read only the effective address, i.e. they may generate spurious reads, because of hardware side effects.
If my assumptions are wrong, then I can think of the following ways of mitigating.
- Mark certain instructions or addressing, including adding a carry to the MSB of an address, and RMW instructions including INC, DEC, ASL, LSL, LSR, ROL, ROR, as being incompatible with volatiles, within codegen. You'll have a better opinion how to do this than me.
- Implement an emulator that marks certain instructions as possibly spurious during volatile access, and write test cases in lit to verify that codegen never generates them.
- Implement an emulator that models all instruction side effects, and verify that spurious reads never occur on any instruction during volatile access. This is not easy, even assuming that a cycle accurate emulator exists, because spurious reads are usually ok; it's only during volatile access that we get worried about them. We'd need some kind of thunk to tell the emulator to start or stop trapping on spurious reads.
So I propose the following approach.
- Push the information above into llvm-mos documentation;
- Get an instruction-level emulator base class working well enough to hello world;
- Subclass the emulator to abort on the shift-increment instructions on anything that is not an imaginary register, or when crossing a page boundary during reads. Not easy even with cycle accuracy, as per 3 above.
References:
http://www.textfiles.com/apple/6502.bugs.txt
http://visual6502.org/wiki/index.php?title=6502_Timing_States
http://www.visual6502.org/wiki/index.php?title=6502_State_Machine
https://docs.mamedev.org/techspecs/m6502.html
https://github.com/mamedev/mame/tree/master/src/devices/cpu/m6502
Appendix A of http://archive.6502.org/books/mcs6500_family_hardware_manual.pdf , which documents these reads as "discarded"
from llvm-mos.
Now B will only be a problem if you do the shift or increment on the volatile memory itself, as opposed to reading the value into an imaginary register, doing the operation on that imaginary register, and then writing it. But I am not aware of anything in your codegen than performs increments or shifts on anything except imaginary registers, which can't be mapped onto any hardware device. Anything I'm missing?
This is a TODO item for the code generator; the instruction selector should eventually be able to detect an entire G_STORE G_SHL G_LOAD sequence and convert it into a single memory ASL. But, it's pretty trivial to make sure that the volatile bit isn't set on the G_LOAD and G_STORE (and it's really just one bit there, passed all the way down from Clang). Not too worried about this one; just something to keep in mind when we get around to it.
Now A will only be a problem for volatiles, if the address of said volatile is calculated as a <256 byte offset from another address, and that offset happens to cross a page boundary, and that addressing is calculated at run-time as an x or y offset. I can't think of a situation where this would occur, if the effective address is a constant. In other words, I think that problem A could only occur in a volatile whose effective address is not const and not static. Is this correct?
We could end up emitting spurious accesses to volatile objects when accessing non-volatile objects as well.
For example, say we have a volatile object assigned by linker script to 0x2001, and a regular object, which the linker ends up assigning to 0x20ff:
volatile const char vol;
char nonvol[2];
vol = 0x2001;
...
assert(nonvol == 0x20ff);
If we were to assign to the high byte of nonvol, the effective address would be $2101, causing a spurious read to $2001, the volatile object:
extern char y; // 2 at runtime
nonvol[y] = 1;
LDY y
LDA #1
STA nonvol,Y ; Effective address $2101, issues spurious read to $2001
I'm not entirely sure what the best mitigation is for this. For the compiler to be usable for I/O, we'd need to allow users to ensure that it didn't emit page-crossing indexed operations exactly one page above an IO register. The easiest way I can see to allow users to annotate that is to require all such accesses to be to volatile objects, and to completely disallow indexed addressing for volatiles. This can be done by generating the final address into an imaginary pointer, then using the indirect-indexed mode with Y=0, which is guaranteed not to cross a page.
We'd have to ensure that our linker scripts don't place any of the default sections immediately one page above IO registers triggered on read/write, but that shouldn't be too onerous. We also may be able to relax the restrictions somewhat on volatiles that are placed in those sections, since they're effectively guaranteed never to contain IO ports. It's somewhat doubtful that avoiding indexing and rmw on volatiles will ever cause that big of a performance hit, though, given the expected rarity of doing either on IO registers.
from llvm-mos.
I think your solution of disallowing indexed addressing for volatiles, is more than fair.
To show off a bit, you could even provide a pragma or target-specific compile flag, to permit the compiler to index addressing on volatiles, with the presumption that the user will page-align base addresses for indexed volatiles. That way, you could still generate optimal I/O code, if you know how to turn off the safety feature that saves you from spurious reads. -mmos-unsafe-volatile-reads, -mno-mos-unsafe-volatile-reads
Another possibility would be to require all volatile data structures of n bytes, to be 2^ceil(log2(n)) byte aligned by the linker. The linker can set aside some sections for this, but it would be up to the compiler to annotate that variable as being aligned, and this seems like overly strong medicine generally, especially on platforms with limited memory.
Don't worry too much about the linker scripts. I can only think of a few special cases where you put RAM and hardware devices in the same 6502 page. Generally, most 6502 devices tend to put I/O hardware into high, page-aligned memory, probably because it's easier from a hardware perspective to patch said hardware onto the address bus there. This is also probably why this problem tends to be relatively rare in practice... your index base will usually start on a page boundary, and will rarely exceed a page in size.
As an example of a practical memory address on a 6502 target that auto increments upon read, consider the PPUDATA flag from https://wiki.nesdev.com/w/index.php/PPU_registers ,
from llvm-mos.
I've forced the index to zero for volatile loads and stores; this should prevent page crossing from occurring for any such accesses. We don't have any RMW operations at the moment, so this issue is resolved, at least for now. I'll just have to make sure to consider this when I finally get around to adding RMW logic; this shouldn't be too difficult, because this bug has indelibly stained RMW operations in my mind.
from llvm-mos.
Related Issues (20)
- LLVM ERROR: Unable to legalize instruction HOT 3
- Support assembler sources in ca65 format
- Lower mem intrinsics to loops
- G_OR prevents selection of addressing mode HOT 1
- Don't copy single-use strings to the zero page
- rustc crash HOT 2
- Compilation failure on MacOS w. Apple silicon HOT 11
- Builder for Apple Silicon
- mos-sim crash HOT 1
- Triple selection doesn't accommodate mos-<platform>-<type>-<subtype> syntax
- [65C816, 65CE02] Long branch instructions not supported HOT 2
- ld.lld: error: undefined symbol: __rc4 to __rc24 HOT 3
- Missing G_SBC commutation for equality checks HOT 1
- [Assembler] Improved ergonomics for 65816 (and other) subtargets HOT 14
- [Assembler] .byte/.short don't support MOS expression parsing
- [Interrupts] Current interrupt C generation inadequate for CBM machines HOT 2
- Redundant copy and spilling HOT 1
- Compiler crashes when try to access a member variable of a class through inline assembly HOT 5
- Declaration order of member variables has a big impact on code optimization HOT 1
- Surface error messages for inline assembly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llvm-mos.