fex-emu / fex Goto Github PK

View Code? Open in Web Editor NEW

1.8K 35.0 98.0 22.47 MB

A fast usermode x86 and x86-64 emulator for Arm64 Linux

Home Page: https://fex-emu.com

License: MIT License

CMake 1.02% Python 1.51% C++ 63.35% C 6.94% Assembly 27.14% Dockerfile 0.01% Shell 0.03% Roff 0.01%

emulator x86 x86-64 arm64 linux emulation cpp

fex's People

Contributors

Stargazers

Watchers

Forkers

sonicadvance1 qeeg phire stjordanis woachk kamaropoulos longjohncoder duk-37 moneytech lioncash crackercat brucehoult merryhime neobrain callumdev giomasce cobalt2727 cwilder23 seas0 z0mbieb0y wannacu marcoluc97 marcospampi aparashk ckandroidproject q4a abigbright 00mjk eric-keller catfella ganyao114 laplacekorea philpax celestialwy deep-gaurav killvxk cctvbtx hypnotron akxks vn-os val-verde gmh5225 baixin1228 nvjsp heckerstone laurenisacoder rubitwa szllzs maktiny 1ace ifquant karolherbst data-gami glch722 fengjixuchui zhengxianwei1679 icodein ca4ti gg-big-org baikaishiuc strikersix23 exiahan stevenvandenbrandenstift impact-of-compiler-warnings-thesis yan00s joshua-ashton dalekamistoso zhaodongru shanshuixiliu andrerh julliard surmeier misaka2023 ademersln ihaveapear rinsuki bylaws alyssarosenzweig fueler tintupratap romatthe nihilnaut chengmingtangcn jamestiotio rhscz clayne pmatos wangqiang1588 caiyuqing379 awulicja asdlei99 kylegospo billyb2

fex's Issues

ELF Handling code relies on section headers

Nearly all of the elf processing uses section headers, which are very convenient to work with as there is much more information and split better.

The problem is section headers can be stripped off, while it's rare it's perfectly valid and only program headers are needed to build the process image.

cmake Check if boost is installed for test generation

Clean up vector ops in json

Currently most vector ops declare an additional RegisterSize and ElementSize element in their ops.
This is already declared in the header of each op, pass the argument through to the header rather than having it in each IR specifically

cmake check for GLFW is installed for graphical debugger

CompileBlocks isn't thread safe

https://github.com/FEX-Emu/FEX/blob/master/External/FEXCore/Source/Interface/Core/Core.cpp#L475
This function is called from multiple guest threads, which is fine except that the FrontendDecoder object is shared and not thread safe.
Best way to work around this issue would be to give each thread object its own FrontendDecoder object.
This way multiple threads can be compiling code as it pleases

Have IR pull relevant bits of data from a textual representation

We can do things like <GPRPair> = CASP <GPRPair>, <GPRPair>, <GPR> directly in IR definition file rather than having that information encoded with a bunch of arguments.
Would take a decent amount of work to ensure it is correct so this is a longer term goal

Split OpDispatcher.cpp

The file is ~5100 LoC and should be able to be split up pretty cleanly around tables.
Bit unwieldy to navigate at the current size.

Add IR op metadata tags in json

Initial Metadata:

StoreMemory
- Stores to memory. Ensures that even if the op doesn't generate output, that it can't be DCE'd
Volatile
- Similar to StoreMemory. Even if it doesn't write memory, this is a volatile op that shouldn't be DCE'd

Architecture specific select of metadata

ClobbersFlags
- Clobbers host side flags, so if FLAG register class is used, must be spilled around this op
NecessaryTemps
- Number of temp registers this IR op needs to work. Will allocate a register that its live range starts at this op and then ends at the same op
PhysicalRegisters - (We need both dest physical colouring and argument physical colouring)
- If this op needs specific physical registers then this is an RA constraint that enforces those registers to be allocated

Syscall emulation has incorrect global mutex

There is a scoped mutex at the top of the syscall handling function.
This is incorrect and causes the application to stall once threading kicks in.
Should be removed and only have a mutex on things that actually need it.
Can be done alongside syscall cleanup.

I think skmp wanted to have a go at this?

Syscall Unit tests

We need unit tests for these syscalls to make sure we don't break them.

Clean up CVT IR Ops and OpDispatcher function names.

Both the IR ops and x86 functions for CVT can end up being confusing and prone to bugs.
Clean it up and make it more apparent what they are supposed to do.

OverlayFS filesystem emulation

We need better filesystem emulation for x86 specific data that will appear

/proc/cpuinfo - Support generating this on the fly based on emulated CPUID
/proc/self - Fairly large folder of various information
/sys/devices/system/cpu/online Currently we state this as only 1 core. We should allow this to be configurable and default to host core count
Find more that expose x86 data (libnuma has some things)

Implement RA constraints

We need RA constraints in order to remove extraneous moves inside of IR Ops.
A good example is the AArch64 instruction casp* needs the constraint that dest = expected to remove four moves per op.
There are a lot of other ops that would also benefit from RA constraints by just looking for moves in the aarch64 JIT.

Ensure gettid == getpid

Currently FEX doesn't maintain that gettid == getpid.
This is noticed by some applications and confuses them.
We should ensure that the guest application's primary thread is FEX's primary thread.
The frontend should spin off its own worker threads instead.
Alternatively we can return the primary's guest thread's tid for the getpid syscall, but I think that may cause issues in some cases?

#include <stdio.h>
#include <unistd.h>

int main() {
      pid_t pid = getpid();
      pid_t tid = gettid();
      if (pid != tid) {
            printf("FAIL! Something is mucking with us! pid != tid ; %d != %d\n", pid, tid);
      }
      else {
            printf("SUCCESS! TID == PID\n");
      }
      return 0;
}

Real environment

ryanh@Ryan-TR2:~$ ./a.out
SUCCESS! TID == PID

FEX

ryanh@Ryan-TR2:~/work/FEXNew/Build$ ./Bin/ELFLoader  -U -c irjit -n 500 -- ~/a.out
[DEBUG] We installed 2314 instructions to the tables
[DEBUG] Precompiling: 0 blocks...
[DEBUG] Done
FAIL! Something is mucking with us! pid != tid ; 30155 != 30157
[DEBUG] Reason we left VM: 3
[DEBUG] Used 1455012 bytes for compiling
[DEBUG] Managed to load? No

Parse ELF interpreter correctly so we don't have to pass in dynamic linker

Right now ELFLoader is not pulling in the dynamic linker from the ELF.
This is requiring us to launch apps through the linker directly, which clang absolutely hates.
Pass it through ELFLoader so we no longer need to do this.
Saves us headaches, will be needed when we hook through binutils.

Investigate small programs that could be benchmarks

Specint, specperf, POVRAY....?
We also need ARMv8.2 hardware that isn't quirk to ensure this.

Enable -U option by default

Need to ensure that TestHarnessRunner still doesn't enable it. Those tests rely on some explicit address locations.

Failing ARM tests

These failures happen on ARM only. Will disable them for merge, bu opening a ticket so we follow up on them

conformance-interfaces-fsync-4-1.test.jit.posix (jit only)

Looks like an actual JIT bug

2020-05-23T19:29:12.3407909Z 4976: timeout: the monitored command dumped core
2020-05-23T19:29:12.3419427Z 4976: test failed, expected is 0 but got -11

conformance-interfaces-mmap-21-1.test.int.posix (int and jit fail the same way)

The test passes, while it fails for x86 (native, x86 emulator). Not sure this is a bug yet.

2020-05-23T19:29:30.5500586Z 5043: Test PASS: mmap/21-1.c Error at mmap: Invalid argument
2020-05-23T19:29:30.5515182Z 5043: [DEBUG] Reason we left VM: 3
2020-05-23T19:29:30.5546797Z 5043: [DEBUG] Managed to load? No
2020-05-23T19:29:30.5619388Z 5043: test failed, expected is 1 but got 0

conformance-interfaces-mmap-31-1.test.int.posix (jit and int fail the same way)

The test passes, while it fails for x86 (native, x86 emulator). Not sure this is a bug yet.

2020-05-23T19:29:30.5762072Z 5049: off: fffffffffffff000, len: fffffffffffff000
2020-05-23T19:29:30.5765496Z 5049: Test Pass: mmap/31-1.c Error at mmap: Value too large for defined data type
2020-05-23T19:29:30.5778912Z 5049: [DEBUG] Reason we left VM: 3

cmake Check if nasm is installed for test generation

Bit accurate Transcendental support

This is a long term goal.
Currently we don't offer bit accurate transcendental instructions.
Reciprocal and reciprocal square root instructions have a fairly large range for their precision support.
These are currently implemented with float divisions to ensure all of the CPU backends match results and have same unit test results.

These precision differences have the fun quirk that usually something like the reciprocal of 1.0f results in a result that isn't 1.0f even.

We should have support for a few modes once this gets worked on.

Bit accurate representation that matches SOME known hardware
Most accurate representation (What we have now)
Accuracy that falls within x86 precision requirements, but doesn't match any known hardware (In case any device supports less accurate results that are still within x86 spec)

Remove SARX usage

High priority, red alert

Support CMPXCHG16B

Requires adding another GPR class to the RA that supports paired registers.
Then adding support for class interference support to the RA.
Probably will lead in to a bit of IR Op and RA cleanup in the process.
It's a useful instruction for lock free linked list implementations that people will definitely be using.
Also ensure the CPUID bit says it is supported

Look in to guest faulting and guest signal support

This is gonna be an aspect that we need to support.

skmp's TODO

Implement block lookups to not be hacky / aliasy
Syscall unit tests #116
Look into signals / marshaling -> at least for segfaults with host context #90
-- Write some test cases, exit on loops, etc
Cleanup the op dispatcher #26
Build chroot image without x87 love

Make sure External projects can't include FEX headers

It's easy to mess up and accidentally include a FEX header in to FEXCore.
This should never occur. Fix the cmake setup so FEXCore can never include FEX headers.

Fix static and EXEC executables

Somewhere along the line these were broken. Maybe when we switched to unified memory?
I wasn't really testing them to ensure they kept working, but we definitely need to keep them working.
I'll fix this.

cmake check for epoxy is installed for graphical debugger

Add IR Op argument register class contraints to enforce validation

Let each argument specify its incoming register class to enforce validation in the validation pass.
Currently we have no way to detect bad IR in this way

Investigate available jits

javascript/wasm side

v8
Mozilla

.net side

dotnetcore
mono

Others?

Convert all string formatting to {fmt}

Rather than printf and cout everywhere, {fmt} is the future

It's an implementation of std::format that works in versions of c++ before c++20. By standardising on {fmt} now, we can easily switch to std::format later.

ninja clean fails with `Cleaning... ninja: error: remove(unittests/ASM): Directory not empty`

Guest backtrace support

Support guest backtrace to know where the guest ends up at when we crash

Refactor everything to agnostic of CPUState structure layout

Currently, a lot of code implicitly depends on CPUState layout. Assumptions are smeared throughout the codebase.

It would be nice to allow the layout of CPUState to be changed from a single place.

Flesh out const-prop pass

Currently const-prop doesn't hit every op so it could be doing better

Support config from environment variables

When launching from binfmt_misc we won't have the luxury of arguments.
Need to support some way to pass via environment

Add `HasSideEffect` to IR JSON

For the DCE pass

Implement IR Caching and Code Caching

Including caching for dynamic objects loaded

FEX-EMU infrastructure

We have

fex-emu.org
Google Calendar
Google Drive Share (RO)
email forwards for skmp, phire, hdkr @fex-emu.org
chat.fex-emu.org redirect

We need

Bot that auto merges PRs approved / made by @FEX-Emu/maintainers
A non hello world website (See FEX-Emu/fex-emu.org#1)

In the future

Register our trademark/namemark (@skmp can look into pricing in greece for worldwide, so we have an idea of the costs involved)
a formal entity to represent us (@phire suggested looking into the NZ options to do this, so we have an idea of the costs and complexity involved)

Syscall host->guest versioning

Version syscalls based on implemented in host kernel.
Change the uname result to a version of guest that matches host kernel version and syscalls supported.

Improve performance of the register allocator

The register allocator becomes a fairly large time sink in large functions.
This needs to be improved quite substantially.
Additionally the cmpxchg instruction is going to add a paired register class which will add register class interference testing, which could drive up CPU usage more.

Implement code segment dumping

Related to #85.
Allow dumping of code from an arbitrary PC, storing relevant data to allow running standalone from the application executing.
Needs to store state like RIP, Allocated memory regions, Incoming state, outgoing state(?Might not be worth hassle), and potentially accessed memory region data
This will be useful for microbenching RA on large functions very specifically.
Ensuring runtime correctness of the ripped out code is something second to care about.

Make CI runner actually push artifacts to the shared folder

Date/Hash/Runner folder structure sort of thing.

Annote interesting trace events

Being able to trace the application when interesting events occur is a very good thing to support.
Even if it is a "tracing" specific version to not inflict a bunch of overhead.
I learned that https://perfetto.dev/#/trace-processor.md exists.
So we can do trace profiling for getting interesting information

Resolve paths passed to VFS/HLE to canonical

Find all VFS HLE routines, such as
EmulatedFDManager::OpenAt() and resolve path arguments to canonical form before using in map<path,fd>

Currently only absolute paths that match perfectly will work.

Implement RA reg class conflict support

We need to support register class conflicts in the RA.
This is because we need the regular GPR class, and an additional GPR pair class.
AArch64 CASP instructions mandate that the Expected and Desired arguments are two pairs of registers that are consecutive and start at an even offset.
So we need register pairs in one class {x0, x1}, {x2, x3}, ... That conflict with the regular GPR class when those are in use x0, x1, x2, x3,...