the next steps would be:
- create an emitter (in Common), which is basically like an assembler
- create a basic jit with ONLY branching (see *CompBranch.cpp)
- handle anything needed for CPUDetect
- check if anything needs changing in Reporting
that much is enough to have "jit" be usable, and making sure that much works is a good first step to make sure the emitter works etc. (and help determine if you need WX exclusive, etc.)
after that, you'd want to firm up the register cache (I suggest basing off ARM64, since you'll have enough regs to fix-allocate some), and make the branching use that
then implement the alu ops in jit (especially basic ones, add/sub, etc.), test, then implement basic load/store ops (lw/sw to start.)
that much should already give you a huge gain in performance with the jit, compared to before/interp, and probably already better than ir interpreter
I will say, it's a lot of work no one has completed yet, but the plan is sorta to migrate to using IR and a backend (i.e. risc-v) based on that. If you want to do something even more exciting, that'd be the way to go - but there's no (working) reference
anyway, after that you'd probably want to do FPU, some key VFPU ops (including ABI calling of sin/cos)
I will warn you: some people have gone to some of this trouble for PPC or MIPS, and then been disappointed because the device's GPU can't handle the heat
this will help CPU emulation, but I have no idea what GPUs RISC-V processors are typically paired with