fex-emu / fex Goto Github PK
View Code? Open in Web Editor NEWA fast usermode x86 and x86-64 emulator for Arm64 Linux
Home Page: https://fex-emu.com
License: MIT License
A fast usermode x86 and x86-64 emulator for Arm64 Linux
Home Page: https://fex-emu.com
License: MIT License
Nearly all of the elf processing uses section headers, which are very convenient to work with as there is much more information and split better.
The problem is section headers can be stripped off, while it's rare it's perfectly valid and only program headers are needed to build the process image.
Currently most vector ops declare an additional RegisterSize and ElementSize element in their ops.
This is already declared in the header of each op, pass the argument through to the header rather than having it in each IR specifically
https://github.com/FEX-Emu/FEX/blob/master/External/FEXCore/Source/Interface/Core/Core.cpp#L475
This function is called from multiple guest threads, which is fine except that the FrontendDecoder object is shared and not thread safe.
Best way to work around this issue would be to give each thread object its own FrontendDecoder object.
This way multiple threads can be compiling code as it pleases
We can do things like <GPRPair> = CASP <GPRPair>, <GPRPair>, <GPR>
directly in IR definition file rather than having that information encoded with a bunch of arguments.
Would take a decent amount of work to ensure it is correct so this is a longer term goal
The file is ~5100 LoC and should be able to be split up pretty cleanly around tables.
Bit unwieldy to navigate at the current size.
Initial Metadata:
Architecture specific select of metadata
There is a scoped mutex at the top of the syscall handling function.
This is incorrect and causes the application to stall once threading kicks in.
Should be removed and only have a mutex on things that actually need it.
Can be done alongside syscall cleanup.
I think skmp wanted to have a go at this?
We need unit tests for these syscalls to make sure we don't break them.
Both the IR ops and x86 functions for CVT can end up being confusing and prone to bugs.
Clean it up and make it more apparent what they are supposed to do.
We need better filesystem emulation for x86 specific data that will appear
/proc/cpuinfo
- Support generating this on the fly based on emulated CPUID/proc/self
- Fairly large folder of various information/sys/devices/system/cpu/online
Currently we state this as only 1 core. We should allow this to be configurable and default to host core countWe need RA constraints in order to remove extraneous moves inside of IR Ops.
A good example is the AArch64 instruction casp*
needs the constraint that dest = expected to remove four moves per op.
There are a lot of other ops that would also benefit from RA constraints by just looking for moves in the aarch64 JIT.
Currently FEX doesn't maintain that gettid == getpid.
This is noticed by some applications and confuses them.
We should ensure that the guest application's primary thread is FEX's primary thread.
The frontend should spin off its own worker threads instead.
Alternatively we can return the primary's guest thread's tid for the getpid syscall, but I think that may cause issues in some cases?
#include <stdio.h>
#include <unistd.h>
int main() {
pid_t pid = getpid();
pid_t tid = gettid();
if (pid != tid) {
printf("FAIL! Something is mucking with us! pid != tid ; %d != %d\n", pid, tid);
}
else {
printf("SUCCESS! TID == PID\n");
}
return 0;
}
Real environment
ryanh@Ryan-TR2:~$ ./a.out
SUCCESS! TID == PID
FEX
ryanh@Ryan-TR2:~/work/FEXNew/Build$ ./Bin/ELFLoader -U -c irjit -n 500 -- ~/a.out
[DEBUG] We installed 2314 instructions to the tables
[DEBUG] Precompiling: 0 blocks...
[DEBUG] Done
FAIL! Something is mucking with us! pid != tid ; 30155 != 30157
[DEBUG] Reason we left VM: 3
[DEBUG] Used 1455012 bytes for compiling
[DEBUG] Managed to load? No
Right now ELFLoader is not pulling in the dynamic linker from the ELF.
This is requiring us to launch apps through the linker directly, which clang absolutely hates.
Pass it through ELFLoader so we no longer need to do this.
Saves us headaches, will be needed when we hook through binutils.
Specint, specperf, POVRAY....?
We also need ARMv8.2 hardware that isn't quirk to ensure this.
Need to ensure that TestHarnessRunner still doesn't enable it. Those tests rely on some explicit address locations.
These failures happen on ARM only. Will disable them for merge, bu opening a ticket so we follow up on them
Looks like an actual JIT bug
2020-05-23T19:29:12.3407909Z 4976: timeout: the monitored command dumped core
2020-05-23T19:29:12.3419427Z 4976: test failed, expected is 0 but got -11
The test passes, while it fails for x86 (native, x86 emulator). Not sure this is a bug yet.
2020-05-23T19:29:30.5500586Z 5043: Test PASS: mmap/21-1.c Error at mmap: Invalid argument
2020-05-23T19:29:30.5515182Z 5043: [DEBUG] Reason we left VM: 3
2020-05-23T19:29:30.5546797Z 5043: [DEBUG] Managed to load? No
2020-05-23T19:29:30.5619388Z 5043: test failed, expected is 1 but got 0
The test passes, while it fails for x86 (native, x86 emulator). Not sure this is a bug yet.
2020-05-23T19:29:30.5762072Z 5049: off: fffffffffffff000, len: fffffffffffff000
2020-05-23T19:29:30.5765496Z 5049: Test Pass: mmap/31-1.c Error at mmap: Value too large for defined data type
2020-05-23T19:29:30.5778912Z 5049: [DEBUG] Reason we left VM: 3
This is a long term goal.
Currently we don't offer bit accurate transcendental instructions.
Reciprocal and reciprocal square root instructions have a fairly large range for their precision support.
These are currently implemented with float divisions to ensure all of the CPU backends match results and have same unit test results.
These precision differences have the fun quirk that usually something like the reciprocal of 1.0f results in a result that isn't 1.0f even.
We should have support for a few modes once this gets worked on.
High priority, red alert
Requires adding another GPR class to the RA that supports paired registers.
Then adding support for class interference support to the RA.
Probably will lead in to a bit of IR Op and RA cleanup in the process.
It's a useful instruction for lock free linked list implementations that people will definitely be using.
Also ensure the CPUID bit says it is supported
This is gonna be an aspect that we need to support.
It's easy to mess up and accidentally include a FEX header in to FEXCore.
This should never occur. Fix the cmake setup so FEXCore can never include FEX headers.
Somewhere along the line these were broken. Maybe when we switched to unified memory?
I wasn't really testing them to ensure they kept working, but we definitely need to keep them working.
I'll fix this.
Let each argument specify its incoming register class to enforce validation in the validation pass.
Currently we have no way to detect bad IR in this way
javascript/wasm side
.net side
Others?
Rather than printf
and cout
everywhere, {fmt} is the future
It's an implementation of std::format that works in versions of c++ before c++20. By standardising on {fmt} now, we can easily switch to std::format
later.
Support guest backtrace to know where the guest ends up at when we crash
Currently, a lot of code implicitly depends on CPUState layout. Assumptions are smeared throughout the codebase.
It would be nice to allow the layout of CPUState to be changed from a single place.
Currently const-prop doesn't hit every op so it could be doing better
When launching from binfmt_misc we won't have the luxury of arguments.
Need to support some way to pass via environment
For the DCE pass
Including caching for dynamic objects loaded
We have
We need
In the future
Version syscalls based on implemented in host kernel.
Change the uname result to a version of guest that matches host kernel version and syscalls supported.
The register allocator becomes a fairly large time sink in large functions.
This needs to be improved quite substantially.
Additionally the cmpxchg instruction is going to add a paired register class which will add register class interference testing, which could drive up CPU usage more.
Related to #85.
Allow dumping of code from an arbitrary PC, storing relevant data to allow running standalone from the application executing.
Needs to store state like RIP, Allocated memory regions, Incoming state, outgoing state(?Might not be worth hassle), and potentially accessed memory region data
This will be useful for microbenching RA on large functions very specifically.
Ensuring runtime correctness of the ripped out code is something second to care about.
Date/Hash/Runner
folder structure sort of thing.
Being able to trace the application when interesting events occur is a very good thing to support.
Even if it is a "tracing" specific version to not inflict a bunch of overhead.
I learned that https://perfetto.dev/#/trace-processor.md exists.
So we can do trace profiling for getting interesting information
Find all VFS HLE routines, such as
EmulatedFDManager::OpenAt() and resolve path arguments to canonical form before using in map<path,fd>
Currently only absolute paths that match perfectly will work.
We need to support register class conflicts in the RA.
This is because we need the regular GPR class, and an additional GPR pair class.
AArch64 CASP instructions mandate that the Expected and Desired arguments are two pairs of registers that are consecutive and start at an even offset.
So we need register pairs in one class {x0, x1}, {x2, x3}, ...
That conflict with the regular GPR class when those are in use x0, x1, x2, x3,...
Golang has a bit of assembly for using vdso or vsyscall for gettimeofday.
We don't currently support either of these.
This is necessary if we want to run any golang application.
Currently the destination register class is declared in the RA pass.
Change this so it is declared in the JSON file and the RA uses a helper that is generated to pull the register class
Someone just needs to spend a day going through all the syscalls that can end up just being a passthrough
Currently if you're on a host that doesn't have a /lib64/ld-linux-x86-64.so.2
then you need to explicitly point to the dynamic linker location.
Let it search in the rootfs first.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.