GithubHelp home page GithubHelp logo

Comments (26)

jackoalan avatar jackoalan commented on July 2, 2024

I would prefer to have platform selection made part of the clang driver as much as possible. Not everybody wants to be locked into a particular build system (cmake toolchain modules, argument generation utilities for makefiles, etc).

The OS component of the target triple could be expanded to the various supported platforms. I think the MSP430 toolchain is a good model to follow for deploying linker scripts ${CLANG_SYSROOT}/include/<platform-name>.ld. Likewise, ${CLANG_SYSROOT}/lib would be the place for deploying default platform libraries.

Having the platform as part of the target triple also means that implications regarding zero page usage would be known to codegen; imaginary register conflicts could be avoided.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

I've deleted the llvm-mos-sdk repository in the project. Knock yourselves out figuring out a replacement.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

Ah, I hadn't seen that MSP430 was a microcontroller target; that's very relevant, since it's the closest thing there'd be to a baremetal-ish-but-with-some-support mircocomputer environment like most real 6502 targets.

One thing that John and I were discussing is that there's really a lot of variants of each target: Atari XEX is slightly different than Atari Cartridge is slightly different than Atari Boot Disks, etc. In particular, the number of zero page registers available will differ wildly, as you've mentioned, and that has code generation implications.

I'm hoping to avoid building and shipping a huge number of through a slightly dirty trick: shipping the core C libraries (the non platform specific stuff) as -flto .ll files bundled into an .a archive. This would allow the target description to specify the number of imaginary registers available, and the C library would be generated to only use that number. This will also allow extremely aggressive interprocedural optimization of the C library, which is highly desirable.

So there's probably going to be a lot of overlap between the actual contents of each target. But Clang's driver is flexible enough that I can probably just have it defer to a common directory, or establish a sort of hierarchy of overrides. I'm putting together a rough draft of a candidate SDK that can work with Clang's driver; hopefully I'll have end-to-end C->loadable binary hello world working on Atari XEX and C64 PRG in O(week).

from llvm-mos.

jackoalan avatar jackoalan commented on July 2, 2024

If the goal is to let the user use a wide variety of build systems with pre established filesystem hierarchy conventions, I think MinGW's deployment structure is a good model.

This is essentially like any other gcc-style cross-compiler. The native parts of the cross compiler simply become part of the user's install prefix; tools prefixed by triple to disambiguate them from existing clang tools. Library code for foreign architectures are isolated in sysroots named by target triple.

There will inevitably be file duplication among the sysroots in this deployment strategy, but it is safest to assume every artifact will differ between target platforms (even headers can theoretically be generated in a target-dependent manner).

- <prefix>
    - bin
        - mos-clang-<version> (this is the actual binary, clang can parse the target out of arg0 with this convention)
        - mos-clang -> mos-clang-<version>
        - mos-clang++ -> mos-clang
        - mos-6502-appleii-clang -> mos-clang
        - mos-6502-appleii-clang++ -> mos-clang++
        - mos-6502-c64-clang -> mos-clang
        - mos-6502-c64-clang++ -> mos-clang++
        - mos-lld (lld uses the first matching "flavor" from all hyphenated tokens)
        - mos-ld.lld -> mos-lld
    - lib
        - mos-clang (this is clang's "resource" directory and is fairly easy to change)
            - <version>
                - include
                    - <compiler-maintained headers>
                - lib
                    - appleii
                        - <compiler-maintained libraries (compiler_rt if we need it, etc...)>
                    - c64
                    - vic20
    - mos-6502-appleii
        - bin
            - <user-built mos programs could install here>
        - include
            - <C/C++ library headers>
            - <user-built library headers could install here>
            - <distribution could also include platform-specific sdk headers>
        - lib
            - <C/C++ library with `_start` imps>
            - <linker script>
            - <user-built static libraries could install here>
            - <distribution could also include platform-specific sdk libraries>
    - mos-6502-c64
    - mos-6502-vic20

I would be inclined to make a new top-level project in the LLVM monorepo dedicated to the runtime aspects of all mos platforms mos-rt?. Presumably this will contain init assembly sources for platforms that need special provisions to get the zero page and other aspects of the platform usable before hitting main(). This is also where linker scripts could be maintained.

- clang
    - cmake
        - caches
            - MOS.cmake (build clang targeting host)
            - MOS-stage2.cmake (build clang targeting mos)
            - MOS-stage3.cmake (build platform matrix of mos-rt libraries)
- mos-rt
    - include
        - mos-rt
            - <hypothetical platform-independent public API headers>
    - lib
        - <hypothetical platform-independent library sources>
    - Platforms
        - appleii
            - appleii.ld
            - appleii_init.s
        - c64
            - c64.ld
            - c64_init.s
        - vic20
            - c64.ld
            - vic20_init.s

A lot of this also hinges on what we want to do for libc. If using an off-the-shelf libc proves to be too cumbersome, an acceptable libc subset could be maintained here. stdio worries me a great deal. I almost wonder if streaming file conventions aren't worth pursuing at all on these platforms. malloc is less worrysome, but the scalability of any malloc implementation should be carefully evaluated for these constrained systems.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

The ultimate intention with this project is upstream it into llvm proper. Therefore, although it might be convenient to put in some MOS specific things into the monorepo, we're trying to localize everything that is not LLVM proper into a related and dependent project. As for libc, picolibc is the only current libc that has put some thought into the questions you bring up.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

I'll add that we'd probably want to break from the pure-GCC model somewhat, since LLVM has already done so to a degree. It's clang driver binary is a sort of uber-cross-compiler, and it includes the full set of paths and includes for every target that LLVM supports. So we'd probably want to follow suit and have a clang-mos binary that was able to run --target=mos-c64, --target=mos-atari8, etc.

Accordingly, there doesn't seem much risk in having a very stripped-down SDK organization until the compiler is further along. We can teach clang's driver whatever directory structure or conventions we like, and it's easy to change later. Once we get closer to the first major release (a C99-compatible freestanding compiler), we'd want to start locking this down so any code that gets developed against the SDK doesn't break. If we need another major release to move to a hosted implementation, so be it, but we'd want a branch with a somewhat-working libc so we can try to get there without breaking changes.

from llvm-mos.

jackoalan avatar jackoalan commented on July 2, 2024

Has llvm-libc been investigated as a possible libc? The cmake arrangement seems to be very flexible for retargeting purposes; trivial to omit entry points that would not function well. But I am not certain how the pre-main initialization is supposed to work.

EDIT: I see now, there are platform dependent loaders
https://github.com/llvm-mos/llvm-mos/blob/main/libc/loader/linux/x86_64/start.cpp

It sounds like what we really need here is the equivalent of libgloss. Something that would be closely associated with a selected libc that contains platform-specific linker-scripts, init, I/O support, etc...

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

Although I would not dismiss llvm-libc out of hand, a superficial scan of it suggests that it is being designed for a much larger target machine, in which memory is cheap. llvm-libc is also new, and that's not necessarily a good thing, as libc's go.

However, I would not dismiss an experiment to see whether llvm-libc might work with this compiler. Personally however, I would not count on llvm-libc until I had seen some positive results. In particular, I would expect that llvm-libc probably assumes that sizeof(void *) > 2 and sizeof(int) > 2, which are kind of non-starters here.

I tend toward picolibc because of its newlib heritage, which has a couple decades of compatibility testing behind it.

Recall however that having multiple compatible libc's is a healthy thing, not a bad thing.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

So we'd probably want to follow suit and have a clang-mos binary that was able to run --target=mos-c64, --target=mos-atari8, etc.

Actually that would be --target=mos-c64-prg, --target=mos-c64-dsk, -target=mos-c64-cart-pal, and another couple dozen permutations. You could put support for all of those into the driver.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

from llvm-mos.

jackoalan avatar jackoalan commented on July 2, 2024

Oh yeah, bitcode libraries are a good idea. The biggest compatibility issue is probably reserved zero page regions. Bitcode libs would completely address that problem once the codegen is able to block out imaginary registers.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

Your current design places all includes and linker scripts in a sysroot per target. However, the majority of these files will be shared among multiple platforms. For example, all Commodore platforms will share _chrout and linker scripts for prg files, whether on C64, C128, PET, or VIC-20. However, they will not share memory layouts. Commodore shares an extremely similar BASIC header to Apple, but it doesn't share that similiarity with NES.

As more targets are added, it will become more and more difficult to maintain that hierarchy of sysroots, especially since the changes between platforms and formats are often trivial. This breaks the DRY principle.

If you are still married to the notion of sysroot after going through this exercise, you should consider an SDK build process where individual sysroots are generated per platform, from a set of common files.

This would permit CMake to be in charge of getting all the details correct about each sysroot, while still having a single source of truth for each file in said sysroot.

Also, the baked sysroots could be distributed, independently of the environment that builds them, either singly or as a whole.

It also provides a straightforward method for continuously building and distributing bitcode libraries per sysroot, which the current design does not anticipate.

This would be backwards compatible with your design -- you'd still have your same sysroot layout and your clang command line would be the same -- but it would be significantly easier to scale to many logical variations of targets.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

ZP.ini? Really? This was one of the reasons why the SDK should probably be driving clang, not the other way around. Each platform will have its own opinion on the correct number of zp registers, which will vary when you're building to multiple targets.

I suggest that --num-imag-regs should default to 32. Obviously some platforms can support more than that, but that can't be decided by the SDK in the current architecture.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

If you're willing to accept --config as a parameter in place of --sysroot, then as a consequence of implementing --config support:

  1. The layout of the SDK's directories may then be arbitrary, e.g., more sensibly laid out in terms of common vs platform specific features;
  2. --num-imag-regs may be chosen per target;
  3. The clang configuration files might be partially or completely generated by CMake, as CMake is good at collecting command line parameters;
  4. clang's MOS driver may remain relatively simple and easier to maintain;
  5. and, most importantly, use of clang becomes "simple," per your original design criteria -- you only need to add --config platform-clang-config-file, to get clang to target your platform of choice. You don't even need to add --triple anymore.

I'm willing to prototype this as a proof of concept, as you have done here, but I do not require that I should do it.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

Your absolute vs. relative path problem may be solved in just a few lines of CMake, by realizing that CMake itself needs to resolve all relative paths to absolute paths in order to create a command line.

Instead of running clang with a laundry list of paths, CMake would essentially dump that list out to a clang config file.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

Further, the user would not necessarily have to have CMake, if we distributed those config files as part of the binary distribution of an sdk. (I think?) In any case, it is reasonable to permit end users to avoid CMake altogether.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

Ah, I'm not too worried about the paths in the config files themselves. As you say, we can fill them in with CMake's configure_file, no problem. It's the specification of the config file on the command line itself that's weird.

If you were to say something like, "clang --config=commodore/64.cfg", it interprets "commodore/64.cfg" as a literal path relative to wherever you ran clang from, e.g., some random directory on the user's machine. If the cfg file wasn't at that path, it would bail.

Clang does do something different if you say "clang --config=commodore_64.cfg" though; it'll try to find that file in the user and system directories. However, those directories are actually specified in CMake when the compiler is built; they're hardcoded in from that point on. So I guess we could maybe get away with that: having the user required to specify the SDK install directory as a -D flag whenever they're building LLVM-MOS, at least if they wanted the compiler driver to work.

I guess if we're shipping binaries to most users (almost certainly), then this is no problem at all really. We'd just build LLVM-MOS that way and call it a day.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

I've removed the path-generation logic from the LLVM-MOS compiler driver and added configuration files to the SDK. It works quite a bit better than I'd expected; I was already using CMake configure_file to copy files over to the output directory, so it was just a matter of adding ${CMAKE_CURRENT_BINARY_DIR} to the include paths to get CMake to generate them.

This also allows, as John mentioned, --target=mos to be folded into the SDK, along with other good defaults that there's no way to get the Clang toolchain code to emit: -Os, -flto, and whatever else we'd want by default. What's more, it's fairly obvious what those defaults are (just by looking at the file you provide on the command line to Clang), how to change them, and how to make your own presets that "inherit" from one of the existing configurations (since configuration files can include one another). There's docs for all this inside the Clang project as well; we can just point folks to that for the finer points of the semantics, vs trying to explain the exact custom logic that would've ended up in the compiler driver.

I'm quite satisfied with this, so I'll release my interest with this issue. Are there any other outstanding concerns, or can we close this one out?

from llvm-mos.

jackoalan avatar jackoalan commented on July 2, 2024

Config files look good to me. Even better, it looks like clang can be built with -DCLANG_CONFIG_FILE_SYSTEM_DIR=... for distro packages to reference the configs with an easy path like --config=commodore/64.cfg.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

Well no, the implementation of the config feature is really weird, if I'm interpreting the docs/implementation-code correctly. If there's a directory separator anywhere in the string, it treats the whole thing as a path relative to the current dir and ignores CLANG_CONFIG_FILE_SYSTEM_DIR. So you'd have to say either --config commodore_64.cfg or --config $LLVM_MOS_SDK/commodore/64.cfg where the installer makes an env var for you or something.

This seems really weird to me, and we may want to change that code to establish the = convention used elsewhere in gcc: have --config "=commodore/64.cfg" be CLANG_CONFIG_FILE_SYSTEM_DIR relative (instead of sysroot relative, as it's used elsewhere). Or something, iunno.

from llvm-mos.

jackoalan avatar jackoalan commented on July 2, 2024

That's strange, you'd think it would work like any other search path lookup. Ideally a generic mechanism.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on July 2, 2024

If I understand this new design, it seems that you have come full circle to making the SDK dependent on CMake again. This was, I believe, the source of your objection to the previous sdk layout. It's unfortunate that you had to build the sdk yourself to understand this, but I'll take the final result as progress.

I have no objection to packaging command line parameters in configuration files -- it does make the command line cleaner and it removes a lot of pain from the user -- but that implies that some process must run on the user's machine before those config files are available.

I am not recommending the following, but it is possible: CMake is only needed as a final step on the user's machine, to set LLVM_MOS_SDK_ROOT to some meaningful value across the configuration files. (By the way, we should standardize on that environment variable.) This could be done in a few lines in a batch script or a shell script, depending on the user's preference. Personally, I think CMake is a better way to go, but I acknowledge that some people are allergic to it, and it is possible to design around it, if it is a design requirement to avoid it.

The proof of concept is complete, and although I'm closing this bug, I'd like to reopen these topics as part of a high-level discussion of SDK features, in #21.

from llvm-mos.

mysterymath avatar mysterymath commented on July 2, 2024

I'm still holding out hope that there's a way for someone to use the compiler and SDK without installing CMake. Definitely not to compile them without CMake, but for the actual platform distributables, this may be part of the reason software tends to get installed into very fixed paths. If we had full installers, then we could hard code the SDK directory right into the LLVM-MOS compiler binaries.

It doesn't look like making DEB, RPM, and Chocolatey packages would actually be that bad; CMake has package generators for all three. To put on my cynical hat, providing a rock-solid download-to-run user journey is one of the things that makes a project look mature and reliable (even if it's not), and I think we'll need all of that we can get if we're going against a tool that has already it (cc65).

from llvm-mos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.