GithubHelp home page GithubHelp logo

open-goal / jak-project Goto Github PK

View Code? Open in Web Editor NEW
2.7K 39.0 158.0 256.93 MB

Reviving the language that brought us the Jak & Daxter Series

Home Page: https://opengoal.dev

License: ISC License

CMake 0.04% C++ 15.10% Assembly 0.02% C 0.02% Shell 0.01% Scheme 0.12% Python 0.18% Common Lisp 84.40% Batchfile 0.01% Emacs Lisp 0.01% GLSL 0.07% POV-Ray SDL 0.03% PowerShell 0.01% Dockerfile 0.01% JavaScript 0.01%
lisp scheme jak jak-and-daxter ps2 reverse-engineering

jak-project's Introduction

Documentation Badge Linux and Windows Build Codacy Badge Discord

Please read first

Important

Our repositories on GitHub are primarily for development of the project and tracking active issues. Most of the information you will find here pertains to setting up the project for development purposes and is not relevant to the end-user.

For a setup guide on how to install and play the game there is the following video that you can check out: https://youtu.be/K84UUMnkJc4

For questions or additional information pertaining to the project, we have a Discord for discussion here: https://discord.gg/VZbXMHXzWv

Additionally, you can find further documentation and answers to frequently asked questions on the project's main website: https://opengoal.dev

Warning

Do not use this decompilation project without providing your own legally purchased copy of the game. We do not distribute any assets from the game - you must use your own legitimately obtained PS2 copy of the game. We support every retail PAL, NTSC, and NTSC-J build, including Greatest Hits copies.

Project Description

This project is to port the original Jak and Daxter and Jak II to PC. Over 98% of the games are written in GOAL, a custom Lisp language developed by Naughty Dog. Our strategy is:

  • decompile the original game code into human-readable GOAL code
  • develop our own compiler for GOAL and recompile game code for x86-64
  • create a tool to extract game assets into formats that can be easily viewed or modified
  • create tools to repack game assets into a format that our port uses.

Our objectives are:

  • make the port a "native application" on x86-64, with high performance. It shouldn't be emulated, interpreted, or transpiled.
  • Our GOAL compiler's performance should be around the same as unoptimized C.
  • try to match things from the original game and development as possible. For example, the original GOAL compiler supported live modification of code while the game is running, so we do the same, even though it's not required for just porting the game.
  • support modifications. It should be possible to make edits to the code without everything else breaking.

We support both Linux and Windows on x86-64.

We do not support, or plan to support the ARM architecture. This means that this will not run on devices such as an M1 Mac or a mobile device.

Current Status

Jak 1 is largely playable from start to finish with a handful of bugs that are continually being ironed out. Jak 2 is in development.

YouTube playlist: https://www.youtube.com/playlist?list=PLWx9T30aAT50cLnCTY1SAbt2TtWQzKfXX

Methodology

To help with decompiling, we've built a decompiler that can process GOAL code and unpack game assets. We manually specify function types and locations where we believe the original code had type casts (or where they feel appropriate) until the decompiler succeeds, then we clean up the output of the decompiled code by adding comments and adjusting formatting, then save it in goal_src.

Our decompiler is designed specifically for processing the output of the original GOAL compiler. As a result, when given correct casts, it often produces code that can be directly fed into a compiler and works perfectly. This is continually tested as part of our unit tests.

Setting up a Development Environment

The remainder of this README is catered towards people interested in building the project from source, typically with the intention on contributing as a developer.

If this does not sound like you and you just want to play the game, refer to the above section Quick Start

Docker

All three Linux systems are supported using Docker.

Pick your supported prefered flavour of linux and build your chosen image

docker build -f docker/(Arch|Fedora|Ubuntu)/Dockerfile -t jak .

This will create an image with all required dependencies and already built.

docker run -v "$(pwd)"/build:/home/jak/jak-project/build -it jak bash

Note: If you change the content of the build/ directory you'll need to rerun the build command. Alternatively you can get the build via docker cp.

This will link your build/ folder to the images so can validate your build or test it on an external device.

Docker images can be linked into your IDE (e.g. CLion) to help with codesniffing, static analysis, run tests and continuous build.

Unfortunately you'll still need task runner on your local machine to run the game or instead, manually run the game via the commands found in Taskfile.yml.

Linux

Ubuntu (20.04)

Install packages and init repository:

sudo apt install gcc make cmake build-essential g++ nasm clang-format libxrandr-dev libxinerama-dev libxcursor-dev libpulse-dev libxi-dev python libgl1-mesa-dev libssl-dev
sudo sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b /usr/local/bin

Compile:

cmake -B build && cmake --build build -j 8

Run tests:

./test.sh

Note: we have found that clang and lld are significantly faster to compile and link than gcc, generate faster code, and have better warning messages. To install these:

sudo apt install lld clang

and run cmake (in a fresh build directory) with:

cmake -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=lld" -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..

Arch

Install packages and init repository:

sudo pacman -S cmake libpulse base-devel nasm python libx11 libxrandr libxinerama libxcursor libxi
yay -S go-task

For Arch only, replace task with go-task in the rest of the instructions.

Compile:

cmake -B build && cmake --build build -j 8

Run tests:

./test.sh

Fedora

Install packages and init repository:

sudo dnf install cmake python lld clang nasm libX11-devel libXrandr-devel libXinerama-devel libXcursor-devel libXi-devel pulseaudio-libs-devel mesa-libGL-devel
sudo sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b /usr/local/bin

Compile with clang:

cmake -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=lld" -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -B build
cmake --build build -j$(nproc)

Run tests:

./test.sh

Windows

Required Software

We primarily use Visual Studio on Windows for C++ development. Download the latest community edition from here. At the time of writing this is Visual Studio 2022.

You will require the Desktop development with C++ workload. This can be selected during the installation, or after via the Visual Studio Installer, modifying the Visual Studio Installation.

On Windows, it's recommended to use a package manager, we use Scoop. Follow the steps on the bottom of the homepage here to get it.

Once Scoop is installed, run the following commands:

scoop install git llvm nasm python task

Using Visual Studio

Clone the repository by running the following command in your folder of choice.

git clone https://github.com/open-goal/jak-project.git

This will create a jak-project folder, open the project as a CMake project via Visual Studio.

Then build the entire project as Windows Release (clang). You can also press Ctrl+Shift+B as a hotkey for Build All. We currently prefer clang on Windows as opposed to msvc, though it should work as well!

MacOS

NOTE: At this time you can only run the game on macOS if you have an Intel processor.

Ensure that you have Xcode command line tools installed (this installs things like Apple Clang). If you don't, you can run the following command:

xcode-select --install

Intel Based

brew install go-task/tap/go-task
brew install cmake nasm ninja go-task clang-format
cmake -B build --preset=Release-macos-clang
cmake --build build --parallel $((`sysctl -n hw.logicalcpu`))

Apple Silicon

Not Supported at This Time

brew install go-task/tap/go-task
brew install cmake ninja go-task clang-format
cmake -B build --preset=Release-macos-clang
cmake --build build --parallel $((`sysctl -n hw.logicalcpu`))

You may have to add the MacOS SDK to your LIBRARY_PATH:

  • export LIBRARY_PATH="$LIBRARY_PATH:/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib"

VSCode

If you either don't want to or cannot use Visual Studio for working with the C++ project, VSCode is a good alternatively.

The clangd extension is recommended and requires clangd to be on your $PATH. If you can run clangd in a terminal successfully then you should be good to go.

Once you generate your CMake for the first time the clangd LSP should be able to index the project and give you intellisense.

Building and Debugging

TODO - Consider Contributing Documentation :)

Building and Running the Game

Getting a running game involves 4 steps:

  1. Build C++ tools (follow Getting Started steps above for your platform)
  2. Extract assets from the game
  3. Build the game
  4. Run the game

Extract Assets

First, setup your settings so the following scripts know which game you are using, and which version. For the black label version of the game, run the following in a terminal:

task set-game-jak1
task set-decomp-ntscv1

For other versions of the game, you will need to use a different -set-decomp-<VERSION> command. An example for the PAL version:

task set-game-jak1
task set-decomp-pal

Run task --list to see the other available options

At the time of writing, only Jak 1 is expected to work end-to-end!

The first step is to extract your ISO file contents into the iso_data/<game-name> folder. In the case of Jak 1 this is iso_data/jak1.

Once this is done, open a terminal in the jak-project folder and run the following:

task extract

Build the Game

The next step is to build the game itself. To do so, in the same terminal run the following:

task repl

You will be greeted with a prompt like so:

 _____             _____ _____ _____ __
|     |___ ___ ___|   __|     |  _  |  |
|  |  | . | -_|   |  |  |  |  |     |  |__
|_____|  _|___|_|_|_____|_____|__|__|_____|
      |_|
Welcome to OpenGOAL 0.8!
Run (repl-help) for help with common commands and REPL usage.
Run (lt) to connect to the local target.

g >

Run the following to build the game:

g > (mi)

IMPORTANT NOTE! If you're not using the non-default version of the game, you may hit issues trying to run (mi) in this step. An example error might include something like:

Input file iso_data/jak1/MUS/TWEAKVAL.MUS does not exist.

This is because the decompiler inputs/outputs using the gameName JSON field in the decompiler config. For example if you are using Jak 1 PAL, it will assume iso_data/jak1_pal and decompiler_out/jak1_pal. Therefore, you can inform the REPL/compiler of this via the gameVersionFolder config field described here

Run the Game

Finally the game can be ran. Open a second terminal from the jak-project directory and run the following:

task boot-game

The game should boot automatically if everything was done correctly.

Connecting the REPL to the Game

Connecting the REPL to the game allows you to inspect and modify code or data while the game is running.

To do so, in the REPL after a successful (mi), run the following:

g > (lt)

If successful, your prompt should change to:

gc>

For example, running the following will print out some basic information about Jak:

gc> *target*
Running the Game Without Auto-Booting

You can also start up the game without booting. To do so run the following in one terminal

task run-game

And then in your REPL run the following (after a successful (mi)):

g > (lt)
[Listener] Socket connected established! (took 0 tries). Waiting for version...
Got version 0.8 OK!
[Debugger] Context: valid = true, s7 = 0x147d24, base = 0x2123000000, tid = 2438049

gc> (lg)
10836466        #xa559f2              0.0000        ("game" "kernel")

gc> (test-play)
(play :use-vis #t :init-game #f) has been called!
0        #x0              0.0000        0

gc>

Interacting with the Game

In the graphics window, you can use the period key to bring up the debug menu. Controllers also work, using the same mapping as the original game.

Check out the pc_debug, examples and pc folders under goal_src for some examples of GOAL code we wrote. The debug files that are not loaded automatically by the engine have instructions for how to run them.

Technical Project Overview

There are four main components to the project.

  1. goalc - the GOAL compiler for x86-64
  2. decompiler - our decompiler
  3. goal_src/ - the folder containing all OpenGOAL / GOOS code
  4. game - aka the runtime written in C++

Let's break down each component.

goalc

Our implementation of GOAL is called OpenGOAL.

All of the compiler source code is in goalc/. The compiler is controlled through a prompt which can be used to enter commands to compile, connect to a running GOAL program for interaction, run the OpenGOAL debugger, or, if you are connected to a running GOAL program, can be used as a REPL to run code interactively. In addition to compiling code files, the compiler has features to pack and build data files.

Running the compiler

Environment Agnostic

If you have installed task as recommended above, you can run the compiler with task repl

Linux

To run the compiler on Linux, there is a script scripts/shell/gc.sh.

Windows

On Windows, there is a scripts/batch/gc.bat scripts and a scripts/batch/gc-no-lt.bat script, the latter of which will not attempt to automatically attach to a running target.

decompiler

The second component to the project is the decompiler.

The decompiler will output code and other data intended to be inspected by humans in the decompiler_out folder. Files in this folder will not be used by the compiler.

Running the decompiler

You must have a copy of the PS2 game and place all files from the DVD inside a folder corresponding to the game within iso_data folder (jak1 for Jak 1 Black Label, etc.), as seen in this picture:

The decompiler will extract assets to the assets folder. These assets will be used by the compiler when building the port, and you may want to turn asset extraction off after running it once.

Environment Agnostic

If you have installed task as recommended above, you can run the compiler with task decomp

Linux

To run, you can use scripts/shell/decomp.sh to run the decompiler

Windows

To run, you can use scripts/shell/decomp-jak1.bat to run the decompiler

goal_src/

The game source code, written in OpenGOAL, is located in goal_src. All GOAL and GOOS code should be in this folder.

game runtime

The final component is the "runtime", located in game. This is the part of the game that's written in C++.

In the port, that includes:

  • The "C Kernel", which contains the GOAL linker and some low-level GOAL language features. GOAL has a completely custom dynamically linked object file format so in order to load the first GOAL code, you need a linker written in C++. Some low-level functions for memory allocation, communicating with the I/O Processor, symbol table, strings, and the type system are also implemented in C, as these are required for the linker. It also listens for incoming messages from the compiler and passes them to the running game. This also initializes the game, by initializing the PS2 hardware, allocating the GOAL heaps, loading the GOAL kernel off of the DVD, and executing the kernel dispatcher function. This is in the game/kernel folder. This should be as close as possible to the game, and all differences should be noted with a comment.
  • Implementation of Sony's standard library. GOAL code can call C library functions, and Naughty Dog used some Sony library functions to access files, memory cards, controllers, and communicate with the separate I/O Processor. The library functions are in game/sce. Implementations of library features specific to the PC port are located in game/system.
  • The I/O Processor driver, OVERLORD. The PS2 had a separate CPU called the I/O Processor (IOP) that was directly connected to the DVD drive hardware and the sound hardware. Naughty Dog created a custom driver for the IOP that handled streaming data off of the DVD. It is much more complicated than I first expected. It's located in game/overlord. Like the C kernel, we try to keep this as close as possible to the actual game.
  • Sound code. Naughty Dog used a third party library for sound called 989SND. Code for the library and an interface for it is located in game/sound.
  • PC specific graphics code. We have a functional OpenGL renderer and context that can create a game window and display graphics on it. The specific renderers used by the game however are mostly implemented. Aside from post-processing effects, everything in the game is rendered. This is located in game/graphics. While many liberties will be taken to make this work, the end result should very closely match the actual game.
  • Extra assets used by the port in some fashion, located in game/assets. These include extra text files, icons, etc.

jak-project's People

Contributors

alexislefebvre avatar animalstyletaco avatar bb010g avatar blahpy avatar breakpoints avatar brent-hickey avatar chillypepper avatar dallmeyer avatar dependabot[bot] avatar doctashay avatar evelyntsmg avatar fabjan avatar francessco121 avatar github-actions[bot] avatar hat-kid avatar himham-jak avatar jabermony avatar luminarlight avatar mandude avatar opengoalbot avatar possum93 avatar rafalekkb avatar towai avatar trippjoe avatar vodbox avatar water111 avatar xsm2 avatar xtvaser avatar zedb0t avatar ziemas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jak-project's Issues

Finish Texture Extraction Tool

  • Export other mip levels
  • Export info so we eventually repack textures
  • Check that all data is read
  • Investigate the null textures (likely CLUTs, still an entry in their layout tool but not something accessed as a GOAL texture object)
  • Check for duplicate names
  • Some better way to organize the output (folder per tpage)
  • Add human-friendly name to tpages (just use the name from the game?)

Implement logging for the decompiler

Currently the logger has a bunch of print statements. We could implement logging in the decompiler like we did for the runtime, and have it log to a file.

To run the decompiler from the build folder:

decompiler/decompiler ../config/jak1_ntsc_black_label.jsonc ../iso_data ../decompiler_out

note that you must copy the DGO/CGO folders to iso_data first.

The current output of the decompiler:

Jak Disassembler
- Loading Types...
- Initializing ObjectFileDB...
ObjectFileDB Initialized:
 total dgos: 2
 total data: 6685824 bytes
 total objs: 336
 unique objs: 336
 unique data: 6664192 bytes
 total 100.9 ms (63.199 MB/sec, 3330.389 obj/sec)

- Processing Link Data...
Processed Link Data:
 code 3076368 bytes
 v2 code 3076368 bytes
 v2 link data 53632 bytes
 v2 pointers 29533
 v2 pointer seeks 15404
 v2 symbols 251
 v2 symbol links 18801
 v3 code 3008875 bytes
 v3 link data 520096 bytes
 v3 pointers 18123
   split 6305
   word  11818
 v3 pointer seeks 12014
 v3 symbols 17069
 v3 offset symbol links 60823
 v3 word symbol links 25718
 total 351.928 ms

- Finding code in object files...
Found code:
 code 1.945 MB
 data 6.631 MB
 functions: 5939
 fp uses resolved: 19136 / 19136 (100.000 %)
 decoded 509915 / 509915 (100.000 %)
 total 391.614 ms

- Processing Labels...
Processed Labels:
 total 66104 labels
 total 48.370 ms

- Finding scripts in object files...
Found scripts:
 total 258.481 ms

- Analyzing Functions...
Found 5939 functions (3050 with no control flow)
Named 4751/5939 functions (80.00%)
Excluding 246 asm functions
Found 43432 basic blocks in 1729.795 ms
 5693/5693 functions passed cfg analysis stage (100.00%)
 328874/328874 basic ops converted successfully (100.00%)
 5693/5693 cfgs converted to ir (100.00%)
- Writing functions...
Wrote functions dumps:
 total 316 files
 total 37.050 MB
 total 6835.690 ms (5.420 MB/sec)

Add scripts to batch-format the decompiler's output.

While it would be useful to eventually have a built in pretty-printer in the REPL, making one that covers all of the edge-cases we'd probably like to support, would be a challenge and is probably not where we want to be spending our time (formatting the AST is a breeze with lisp, but all the edge-cases for making it look nice code-wise, is the hard part).

Common-lisp is pretty old at this point, so finding tooling around it is a bit of a challenge. Emacs is not only the defacto lisp editor, but is built in lisp itself. As a result, there are decent code-formatting tools for it:

We could write an emacs script to run the resulting files through this formatter https://lunaryorn.com/blog/emacs-script-pitfalls/

This formatter achieves pretty decent results, and claims to offer a bit of customization in the event that we need it.

The drawback of course is now we are dependent on emacs and an emacs plugin just to format our code. Perhaps we could wrap these dependencies up into a docker container? However, as at somepoint we are probably going to be writing / editing lisp code...might be a good idea to get used to emacs anyway.

GOAL v2 object files don't work in decompiler

Jak 1 supports v2, v3, and v4 object files. I thought there were no v2 object file, but it turns out the game text files are v2 GOAL object files. Putting these through the LinkedObjectFileCreation "linker" fails. The LinkedObjectFileCreation linker should support this version of file and the decompiler should also process these text files.

Create pipeline and tests for extracting and packing text

  • Extract game text from game text files and store in the assets folder (we don't want to check these in) with the decompiler.
    • Support V2 GOAL format
    • Parse text files
      • Support non-printable chars
      • Japanese characters?
      • Combine languages, figure out text ID stuff, generate one giant text definition file
  • Create the OpenGOAL data object format
    • Support in ObjectFileGenerator (compiler)
    • Support in klink (runtime)
  • Create a text "compiler" which produces OpenGOAL compatible versions of game text per language
  • Create a simple test for OpenGOAL data objects to run as part of the compiler test suite (not using game data)
  • Create "offline tests" for extracting / repacking / loading / printing game text.

OpenGOAL linker doesn't handle references to segments that aren't loaded

This is a pretty low priority, but I suspect that the behavior of the OpenGOAL linker is wrong. In the original GOAL, if you defmethod with a method that's in the debug segment, but don't load the debug segment, you end up with the default method. In OpenGOAL you probably end up with junk or maybe the linker crashes?

If we ever try to run in non-debug mode this will be an issue.

At some point there should be a test for booting the kernel in non-debug mode and confirming this behavior.

Move file I/O utilities to a single library

Because this repo started as a few separate projects, there is some duplicated code for reading/writing files in decompiler/util/FileIO.h and goal/util/file_io.h. These should be moved to a single common library in common/util that should probably get its own CMakeLists.txt file.

The code for Timer is also duplicated a bunch and could go in this library as well.

Create Compiler "Torture Tests"

To help find bugs in the compiler, I'm imagining we could create a test which has a lot of operations. This would cause more cases in the register allocation code to be tested and increase our confidence that the compile does the right thing. I'm imagining that we could start with these operations:

  • let to create local variables
  • set! to change the value of local variables
  • Integer Math (+, -,). Lets avoid /, *, and mod until we're sure the current behavior is right.
  • Bitwise operators (logand, logior, logxor, lognot)
  • Defining and calling functions which use these things
  • Doing these things in strange and interesting orders

One possible approach would be to write a simple script to generate some random large programs. Figuring out the correct answer might be a little tricky though. If there's no overflows, I think Common Lisp has an almost identical syntax for these operations.

Reference:

All existing tests are here:
https://github.com/water111/jak-project/tree/master/goal_src/test
A few good example tests that are simple
https://github.com/water111/jak-project/blob/master/goal_src/test/test-div-2.gc
https://github.com/water111/jak-project/blob/master/goal_src/test/test-three-reg-add.gc
https://github.com/water111/jak-project/blob/master/goal_src/test/test-add-function-returns.gc

The code that runs the tests and checks the result is here:
https://github.com/water111/jak-project/blob/master/test/test_compiler_and_runtime.cpp#L178

The list of things the compiler can do is at the top of this file, as the not-commented-out lines of goal_forms:
https://github.com/water111/jak-project/blob/master/goalc/compiler/compilation/Atoms.cpp

Some features are implemented as macros here:
https://github.com/water111/jak-project/blob/master/goal_src/goal-lib.gc

Get rid of `Form`

S-expressions can be represented as either a Form, as part of the pretty printer, or as an Object, as part of the GOOS interpreter. The GOOS Object can be read from a text file/string, can use a TextDB to find which file/line a part of a form came from, has more features, has equality checking, and is easier to use, but doesn't have pretty printing. These should merge into a single Object which can pretty print itself.

This will make it easier for the decompiler and its tests to read/write text files.

  • Add pretty printing to GOOS Object
  • Add option for integers to print as hex/binary
  • Fix pretty printing bug where there are more consecutive parenthesis than the maximum line length
  • Add tests for pretty printer (print, read, verify its the same)
  • Replace use of Form with Object.

Reorganize the data, assets, and out folders

It's too confusing.
Proposed new layout:

  • decompiler_out : output of the decompiler that is intended for humans to read.
  • build : cmake C++ build output.
  • assets : output of the decompiler that will be used by the compiler.
  • out: output of the compiler. Will have subfolders. Will have a check.sh script for checking hashes of files against the game for files we expect to match exactly.

Compiler Tests on Windows

Linux compiler tests now work! See example here:
https://github.com/water111/jak-project/blob/w/ir/test/test_compiler_and_runtime.cpp#L92
The test works by running the runtime in a separate thread, and having the Listener connect. The compiler compiles the file to test, sends it to the runtime, and the runtime's KERNEL.CGO executes the test code, and then prints the result of the last thing in the test file. The Listener will record all the PRINT messages sent by the runtime so they can be checked in the test.

Tracking what has to get done for compiler tests to work on windows:

  • Listener working on Windows
  • Deci2Server working on Windows
  • Common sleep function to replace some usleeps
  • Upgraded fake_iso system
  • (Possibly) common function to get paths (GOAL compiler test load files the same way as GOOS tests, and GOOS tests work so I suspect this might work already)
  • Figure out how to use checkboxes on github issues

[Decompiler] Revisit object file naming

The convention for naming object files in the decompiler is challenging for a few reasons.

  • Some object files appear multiple times and are identical. There are 2832 object files, but only 2135 unique object files.
  • There are commonly code and art-group files with the same name in the same DGO
  • There are object files located outside of DGOs (STR, TXT)
  • There are object files that are included multiple times in different DGOs. Sometimes they are the same and sometimes they are slightly different.
  • We want a stable naming scheme which gives all object file the same names on different platforms, even when doing a partial decompilation.
  • We need to generate build files that can be given to the OpenGOAL compiler for the compilation order of files, and which files go in which DGOs.

The current implementation handles this and generates a "map file" that can be used when doing a partial decompilation to keep the naming the same. But the implementation is really confusing and it's hard to convince myself that it's correct.

We can get away with ignoring this issue for a while since all the engine code is located in a single DGO and has no weird duplicates, but I want to clean this up when we starting going through level data.

Replace printing with a more robust logging library

Excessive printing can be verbose for test output especially. We should trim out the ones that don't feel necessary anymore.

For the ones that we want to keep because they are helpful for debugging, we should keep them but ideally we'd like to hide them from normal release builds / etc. Therefore, it would likely be a good idea to adopt a logging library so we can turn off debug level logs when we want to https://github.com/gabime/spdlog

Investigate Windows Compiler Warnings

We have no warnings on Linux, but tons of warnings on Windows. See https://github.com/water111/jak-project/runs/1106293451?check_suite_focus=true

There are a few that are suspicious to me:

  • warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc. I think we need the /EHsc flag in more places or we might have more issues similar to the Reader exception thing.
  • not all control paths return a value - In Linux, the compiler figures out that a function ending in assert(false) doesn't need to return anything, but in Windows it doesn't. We should probably return some value here or throw an exception.
  • warning C4477: 'printf' : format string '%ld' requires an argument of type and similar printf warnings. Printing integers seems to require slightly different flags. When possible we should use libfmt or spdlog instead.
  • function assumed not to throw an exception but does in format_impl. It's really important that format_impl not throw exceptions because it's called from GOAL and C++ code can't figure out how to unwind the GOAL stack.

Pretty printer test is very slow in windows only

The test PrettyPrinter.ReadAgainVeryShortLines took 8 seconds on Windows, but passed. This test reads and pretty prints a single file, and 8 seconds seem pretty long. In comparison this test takes 0.285 seconds on my desktop.

This could be a sign that something is wrong on windows or maybe unoptimized windows builds are just insanely slow.

[Listener] Use message IDs

I believe the GOAL listener message format supports message IDs, to associate ack messages with their request. I didn't understand this when reverse engineering the listener so OpenGOAL's listener uses 0 for all message IDs.

This is probably fine, but prints a harmless warning if you timeout due to hitting a breakpoint (it gives up waiting for ack), then resume, and get this ack. If we tracked message numbers, we wouldn't warn for timed out acks arriving late.

Tests didn't run on a PR

Tests didn't run on #21
There's a merge conflict, but that didn't seem to prevent tests from running on other PRs.

@xTVaser - any ideas? I'm not sure what's different about this PR.

Make multiplication/division/mod match the PS2 Hardware

GOAL seems to have 4 operations:

  • Signed Multiply (mult, 3 operand form)
  • Unsigned Multiply (multu, 3 operand form)
  • Divide (div and mflo)
  • Mod (div and mfhi)

We should try to match the behavior of the PS2 exactly with our implementation. It would be great to find a test case for PS2 emulators or look at PCSX2's implementation.

Also we should check float to int and int to float.

Add a function to get file paths

It would be nice to have an easy way to get a path to a file inside of jak-project that works in any of the C++ tools and takes care of any windows/linux differences. I think this should be written in common/util and there should be a util library so all the programs can use it.

One possible version could work like this:

std::string path = get_file_path("goal_src", "kernel", "gkernel.gc");

and it would return an absolute file path to goal_src/kernel/gkernel.gc that's formatted correctly on Windows and Linux. It could use the NEXT_DIR environment variable that will be set in scripts that launch tools, or maybe there is a better way to do this.

Once this is in place, we should remove the FAKE_ISO_PATH variable that's set/read in various scripts, and use this new function in fake_iso.cpp.

[Debugger] Fancier disassembly

The debugger needs a fancier disassembly view. Ideally it would determine what function we're in and use this to correctly find the instruction boundaries. If not possible, it should do a better job and take into account the current rip if we're halted.

The disassembler should also "unpatch" breakpoints so you don't see int3's, and subtract 1 from rip when stopped at a breakpoint.

The disassembler should print out some info for references to GOAL symbols.

[Compiler] Doing math on floats loaded from memory is broken

(defun vector-dot ((a vector) (b vector))
   (let ((result 0.))
     (+! result (* (-> a x) (-> b x)))
     (+! result (* (-> a y) (-> b y)))
     (+! result (* (-> a z) (-> b z)))
     )
   result
   )

This should load x, y, z into XMM registers, but it doesn't. There is also a compiler error. It's kinda disappointing that something so simple is broken ๐Ÿ˜ž

[Debugger] Crash from threads other than the GOAL thread cause chaos

If another thread in the runtime crashes other than the GOAL code (like the IOP C++ code), the debugger goes crazy and spams error messages. I have no idea what the behavior of ptrace is in this case so it's likely not being handled correctly when the process being debugged disappears.

[Compiler] Add utility functions for building GOAL functions

There's a lot of weird things you have to do to make a function that actually works, like IR_FunctionBegin. All this logic is duplicated in compile_defmethod and compile_lambda and when auto-generating inspect methods.

It would be better if there were some helper functions to set up a new function env and "finish" a function env that could be shared in all three.

Decompiler tests

The decompiler currently has no tests, which will soon start to be a problem. We could have tests that run as part of github actions and don't require access to the CGO/DGO files, and we could have tests which don't run on github actions because we cannot access CGO/DGO files from github actions.

The most useful test would be to check a few of the outputs of the decompiler. Most of the decompiler output is a Form, which is an s-expression that can be pretty printed. The Form representation sucks, and I plan to replace it with GOOS Object, so the test should convert the actual and expected values to GOOS Objects, which already have a working operator= to check for equality.

The tests which run on github without access to the game files could do basic things, like checking the result of BasicOpBuilder or checking the result of InstructionDecode. They could be part of the existing goalc-test.

I plan to set up the test framework after #60 is merged so we have actual IR to test against.

General Compiler Robustness to Bad Input

Mega-Issue to track all cases where the compiler is not strict enough at generating errors:

  • Parameter name duplicated
  • Use of int vs sized integers in function types
  • Label name spaces when inline with (inline x)
  • Redefining GOOS macros
  • Redefining constants (GOAL and GOOS?)
  • Putting things that aren't number in comparisons
  • Object size (32 vs 64 bits)
  • Adding fields after a dynamic field, either in the same type or a child type.
  • GOOS macros with the wrong number of arguments in rest args
  • Forward declare a type and lie about what kind of type it is

Missing Features for `gkernel`

Tracking things we need before we can compile gkernel in the new compiler.

  • Defmethod
  • Method support
    • Get type of boxed object
    • Get method of basic
    • Get method of unboxed
    • Call method
    • Define custom new methods
    • (new 'global ...) support
  • Binteger support
  • Number conversions
  • Type coerce
  • the and the-as
    • Alias
    • Number conversion
  • -> operator
    • Access fields
      • Deref fields
      • Offset fields
    • Set Places
      • Constant Offset Places
      • Other Places
    • Deref pointers
    • Array access
  • & operator
  • Pair Support
    • cons
    • static pairs (maybe not needed?)
    • car and cdr and Vals for pairs
    • '()
  • Inline asm
    • rlet
  • Calling GOAL from C runs GOAL on a stack in GOAL's memory space.
  • (new 'static ...) for basics
  • (new 'global 'inline-array ...)
  • Support compound typespecs in new
  • and and or
  • Lots of stuff from gcommon
    • Check the order of evaluating things for everything in gcommon
  • Automatic Inspect Methods

Nice to have, but not essential

  • defenum and enums
  • break
  • GOAL Crash Handler
  • addr2line
  • Null pointer deref causes a crash
  • Safety Features: #50
  • Use a single RegKind type which also includes 32/128 bits and float/int for xmms.

GOAL Debugging Mega-Issue

Debugging GOAL code sucks. If you have a bug and it crashes, there's no way to figure out where the crash is coming from.

  • Implement addr2line - take an address and figure out where the code/data is from. Because of GOAL's fancy loading stuff, this is tricky. But the original GOAL did this and has the framework to notify the compiler of where things are loaded.
  • Make GOAL crash when dereferencing a null pointer. Currently you can read/write null GOAL pointers.
  • Implement a GOAL crash handler that prints out something useful.
  • Investigate writing a GOAL debugger (maybe using ptrace on linux) or somehow making GDB work for GOAL.

The pretty printer doesn't work very well

Eventually we will look at the output of the decompiler a lot, which will go through the pretty printer. The pretty printer output is better than nothing but still has some issues. Examples of running the pretty printer on existing code:

  (defun type-type? ((a type) (b type))
   "is a a type (or child type) of type b?"
   (until
    (eq? a object)
    (if (or (eq? a b) (zero? a)) (return-from #f #t))
    (set! a (-> a parent))
    )
   #f
   )

  (defmethod
   new
   inline-array-class
   ((allocation symbol) (type-to-make type) (cnt int))
   "Create a new inline-array.  Sets the length, allocated-length to cnt.  Uses the mysterious heap-base field
  of the type-to-make to determine the element size"
   (let*
    ((sz (+ (-> type-to-make size) (* (-> type-to-make heap-base) cnt)))
     (new-object (object-new (the int sz)))
     )
    (unless
     (zero? new-object)
     (set! (-> new-object length) cnt)
     (set! (-> new-object allocated-length) cnt)
     )
    new-object
    )
   )

  (defun mem-set32! ((dst pointer) (value int) (n int))
   "Memset a 32-bit value n times.  Total memory filled is 4 * n bytes."
   (let
    ((p (the pointer dst)) (i 0))
    (while
     (< i n)
     (set! (-> (the (pointer int32) p) 0) value)
     (&+! p 4)
     (+1! i)
     )
    )
   dst
   )
  )

Some of these could be solved by adding special cases for certain key words/structures, but the fundamental problem may be that my pretty printer algorithm isn't very good. If this starts to become a more serious issue, we should investigate other options. I think even a "perfect" printing algorithm that found the optimal pattern of inserting line breaks would not do the right thing - there are lots of cases where line breaks are inserted to make the code more readable even if they aren't needed. Maybe the decompiler needs to give the printer hints on where lines should be broken. For instance, the body of a let should always be on the line after a let.

Calling convention issues

The make_function_from_c_win32 function inserts a trampoline function on the GOAL heap which scrambles the arguments from GOAL order to windows order, then calls the C function. It's designed for calling C functions directly from GOAL, and supports only 4 arguments.

This goes wrong when GOAL calls _format, with the following sequence:

  • GOAL _format is created from make_function_symbol_from_c("_format", (void*)_format_win32);
  • This starts with the argument-scrambling trampoline function (args now in windows order)
  • This calls _format_win32, which expects the arguments in GOAL order.
  • This calls format_impl and passes it an array of arguments on the stack, in the wrong order.

I think the solution is to make a separate make_format_function_from_c which doesn't do the argument scrambling.
We should also double check both the windows/linux GOAL/C calling and document the saved/temp registers in each language and make sure no registers could accidentally be overwritten.

We'll know this is fixed when we can enable the format test on both linux/windows and have it pass.

[Compiler] Inspect in the REPL calls the function, not the method.

If you do inspect in the REPL it calls the inspect function in gcommon. This doesn't know how to inspect structures. In this case it would be better to call the inspect method of the structure.

Current behavior:

;; calls the inspect function, which interprets the aligned address as a boxed integer.
;; the inspect function only works on boxed objects, and vector is not boxed.
gc> (inspect (new 'global 'vector))
[          17f200] boxed-fixnum 196160
1569280

;; you can manually grab the inspect method and it works like you would expect
;; note that vector has a custom inspect method
gc> ((method vector inspect) (new 'global 'vector))
[0017f210] vector
	[      0.0000] [      0.0000] [      0.0000] [      0.0000]
1569296

A few possible solutions:

  • remove the inspect function, I don't think it's used and I don't know why it exists.
  • make method calls happen above function calls (this is tough, you have to figure out the type of target without compiling, which we currently don't support).
  • Special case inspect/print to always be methods in the compiler.

I want to think about this a little longer before making a change - there might be a good way to reorganize the function/method call code to make this easier to implement.

Add a system for comparing file timestamps

Eventually the project will get large enough that we don't want to rebuild the entire game each time we change one thing. We could have a utility for skipping part of the build if the source is unchanged since the last build. We could detect this by comparing the timestamp of the source and output files. If the source is newer than the output, we will need to rebuild.

We'll need a function to compare the timestamps on two files (possibly with the output file not existing yet) and determine which is newer. It may need to be different on windows and linux.

OpenGOAL Debugger - Tracking

The compiler/debugger needs to "track" a few events in the runtime in order for debugging to be possible:

  • initialization, including the "real address" to "GOAL address" offset and the location of the GOAL symbol table
  • Object file segment load
  • Object file unload.

The `all-types.gc` file is annoyingly large

The decompiler uses the types defined in all-types.gc to figure out what fields are being accessed and function types. There's also a compiler test that reads this file and makes sure that these types are compatible with whatever we have in our source code.

But there are 1372 types, which makes the file quite large. It's currently 39,000 lines long and may get longer in the future. This is large enough that poorly written text editors get laggy and github won't display the diff or even the file.

The easy solution is just to split up the file into a bunch of smaller parts. Then all-types.gc would just include each of the smaller files. We'd need to add some sort of #include like functionality to the decompiler type file parser but this should be easy. I don't think this is ideal - there's no super clean way to break up the types into separate groups, and it looks kinda weird to have types1.gc, types2.gc, ...

Another solution might be to turn this into some JSON thing and write a GUI for editing it. This makes adding comments and "ctrl-f searching" harder and might be more frustrating in the long term.

[Runtime Bug] Bad function argument in GOAL -> C call

(defmethod call rpc-buffer-pair ((obj rpc-buffer-pair) (fno uint) (recv-buff pointer) (recv-size uint))
  "Call an RPC. This is an async RPC. Use check-busy or sync to see if it's done."
  (when (!= 0 (-> obj current elt-used))
    ;; when we have used elements
    (format 0 "call rpc-buffer-pair with ~D elts~%" (-> obj current elt-used))
    
    ;; make sure the previous buffer is done
    (let ((active-buffer (if (= (-> obj buffer 0) (-> obj current))
                             (-> obj buffer 1)
                             (-> obj buffer 0))))
      (when (-> active-buffer busy)
        ;; we think the active buffer may be busy.
        ;; first lets just do a simple check
        (cond 
          ((!= 0 (rpc-busy? (-> obj rpc-port)))
           ;; busy! print an error and stall!
           (format 0 "STALL: waiting for IOP on RPC port #~D~%" (-> obj rpc-port))
           (while (!= 0 (rpc-busy? (-> obj rpc-port)))
             (+ 1 2 3)
             )
           )
          (else
            ;; not busy.
            (set! (-> active-buffer busy) '#f)
            (set! (-> active-buffer elt-used) 0)
            )
          )
        )
      ;; now we've cleared the last RPC call, we can do another
      (let ((current-buffer (-> obj current)))
        ;; rpc_channel, fno, async, send_buff, send_size, recv_buff, recv_size
        (format 0 "recv-size is ~D~%" recv-size)
        (rpc-call (-> obj rpc-port)
                  fno
                  (the uint 1)
                  (the uint (-> current-buffer base))
                  (the int (* (-> current-buffer elt-used) (-> current-buffer elt-size)))
                  (the uint recv-buff)
                  (the int recv-size)
                  )
        (set! (-> current-buffer busy) '#t)
        (set! (-> obj last-recv-buffer) recv-buff)
        (set! (-> obj current) active-buffer)
        )
      )
    )
  0
  )

the final argument to the call to rpc-call ends up being junk.

General Compiler Missing Features

Add auto-generated inspect methods
When you declare a type in GOAL, it generates an inspect method that prints out the name and value of each field.
You can run the decompiler to get an idea of what these normally look like, but they usually print out the type name, location of the object, then a line for each field containing the name and value.

Types are declared with deftype, implemented in compile_deftype. Currently this just generates code to call the new method of type to inform the runtime of the type, which will generate the type's method table. We have the compiler generate an inspect method, then call method-set! to add the method to the type.

Implementation:

  • Run the decompiler and take a look at some inspect methods to learn what they look like. (make sure you have #81 so the strings show up)
  • Add a generate_inspector_for_field() which takes a Field and a Type, and generates code to call format with an appropriate format string to print the name and value of the field.
  • Add a generate_inspector_for_type() that creates a function which inspects all fields of the type.
  • Add code to compile_defmethod compile_deftype to generate the inspector method.
  • Add code to compile_defmethod compile_deftype to insert a call to method-set! to add the inspector method to the runtime method table.

Add decompiler features to the type system
The type system is shared between the compiler and the decompiler, but there are some features missing for the decompiler. The decompiler needs to be able to ask "what is stored at an offset of 24 bytes from string"? The opposite of this is already implemented for the compiler. This can be a little tricky due to the many confusing types of arrays/pointers/inline arrays in GOAL.

Add a better warning/error system
It would be nice if the compiler errors/warnings had a consistent style and gave useful errors.

Add bitfield types
Types which are children of integers are bitfield types. They work similarly to normal GOAL types, but all the field sizes/offsets are in terms of bits instead of bytes.

Add enums
GOAL clearly supported enums, but the exact syntax is unknown. Make up a syntax for enums and add it.

Some notes on the compiler:

  • All of the compiler state is stored in a Compiler, from goalc/compiler/Compiler.h

    • m_ts is the type system
  • All of the high level implementation of keywords is in the goalc/compiler/compilation folder.

    • These are functions like compile_goto or compile_set, which are methods of Compiler.
    • The main map of keywords to compiler functions is in Atoms.cpp
  • A Type represents a single, non-compound GOAL type and stores information about the type.

  • A TypeSpec is a reference to a Type or a Compound Type (like (pointer integer)).

    • You can get a typespec with m_ts.make_typespec("name-of-type")
    • You can create compound typespecs for pointer with things like m_ts.make_pointer_typespec

Implement Safety Features on `Val`

Val should have a flag for settable that defaults to false, to avoid the compiler accidentally setting the value of temporary or intermediate values when we really meant to be setting some other more important value, like something in memory. Compilation functions which propagate enough information to correctly set the source (like MemoryDerefVal or register aliases) should set settable to true.

Val should also have a one_time_use flag that defaults to true. Some Vals represent "a way to get a value", and know how to emit code to get this value, but they should only be able to do this once. GOAL seems to only ever generate code where this happens 0 or 1 times. For example:

(set! (-> my-object my-field) 2) will read (-> my-object my-field) 0 times.

and

(let ((x (-> my-object my-field)))
  (print x)
  (print x)
  )

will read (-> my-object my-field) exactly 1 time, in the let.

Function and Object Name issues in the decompiler

There are two issues with the decompiler naming conventions that came up when playing with the GUI json stuff:

  • Some function names are duplicated. This happens when the game legitimately defines the same function in two places.
  • The naming of object files isn't consistent depending on what input the decompiler gets. For example if you GAME + the level DGOs there are object files with same name/different contents, so it will append version names to each. But if you do just GAME you don't get these version names. The solution is to have a master name list (based on the full game's set of objects) and then pull names from this. So we can get consistent names even when we don't process the whole input.

OpenGOAL Debugger - Address to Line Mapping

OpenGOAL will probably need a debugger. One of the first steps to implementing a debugger is to figure out how to map from code to an offset into an object file. The reverse mapping is also useful - go from an offset in an object file to a line of code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.