GithubHelp home page GithubHelp logo

gdl-perf's Introduction

gdl-perf

This is a framework for testing the performance of Game Description Language (GDL) interpreters and reasoners used in General Game Playing. It allows for automatically running tests on a wide variety of reasoners across a wide variety of games, with minimal human intervention. It also supplies tools for analyzing the outputs of these tests.

This was inspired by an article by Yngvi Björnsson and Stephan Schiffel in the GIGA '13 proceedings that tested the performance of various such interpreters. It is designed so that tools from different programming languages can be tested. As a benefit of being open-source, programmers can add hooks for their own creations, or at least check that their code is being tested correctly. Having the results in a standardized format also means that analyses and visualizations can be created once, committed to the code base, and regenerated for future sets of results.

All games available on ggp.org are used in testing. Game IDs include the originating repository, the key within the repository, and a hash of the game contents, so no two versions of a game will be conflated.

There is also some preliminary work on a tool for testing the correctness of reasoners, which the performance-testing framework does not check.

How to use

Currently, before using the framework, you must provide a ggp-base.jar file. Check out a recent version of the ggp-base repository, run "./gradlew assemble" in the ggp-base folder, and copy ggp-base/build/libs/ggp-base.jar into gdl-perf/lib. (In the future this can be converted into a different kind of dependency.)

Before running tests, you must create a file named "computerName.txt" in the root of the repository. This should contain a name to indicate your computer; a recommended format is your GitHub username followed by a numeral identifying the specific computer. This makes it possible to commit results from your hardware without conflicting with other users' results. (This in turn makes it possible to commit raw results of testing without making the engine public, which some GGP competitors would like to avoid.)

Some engines require additional setup or configuration that is specific to your system (e.g. installing libraries or specifying program locations). These are explained in the SETUP.md file.

Gradle commands

The following commands can be run via the Gradle wrapper script (replace 'gradlew' with 'gradlew.bat' if running on Windows):

./gradlew eclipse          | Generate files needed to import this project in the Eclipse IDE.
./gradlew idea             | Generate files needed to import this project in the IntelliJ IDE.
./gradlew runPerfSample    | Run a sample selection of perf tests. (Takes under an hour.)
./gradlew runPerfAll       | Run perf tests for all untested game/engine pairs. (Takes multiple days, but progress is recorded as it goes.)
./gradlew perfAnalysis     | Generate a set of web pages containing an analysis of all perf tests that have been run. This is placed in the "analyses" folder.

Adding an engine

Users are encouraged to add their own engines. There are two approaches to adding an engine, one specific for Java-based engines and one that will work with any language.

Java engines

Java engines can be added as an entry in the JavaEngineType class, which can then be used to construct the corresponding EngineType entry. Once an EngineType entry is available, the engine will be automatically tested by the MissingEntriesPerfTestRunner (a.k.a. runPerfAll).

The first step to accomplishing this is to add your code as a dependency of the gdl-perf project, so you can write . This can be done in a few ways. From easiest-to-understand to most complete:

  • You can copy your code into the project directly. This is not suitable for submitting to the repository, and it will require you to re-copy your code whenever you change it, but this requires relatively little understanding of the build system. (In some cases, however, you may need to add dependencies that the original project used.)
  • You can tell your IDE to have the gdl-perf project depend on your engine's project. If you do this, you'll have to run the performance testing from your IDE instead of from the command line.
    • In Eclipse, this involves right-clicking on the gdl-perf project -> Build Path -> Configure Build Path... -> Projects tab and clicking "Add..." to add the dependency on the other project.
  • You can add a jar created from the engine's codebase to a subdirectory of lib/ and add an entry to the build.gradle's dependencies section to include the library, next to the other jars in there. This will allow you to use Gradle to run the code, and it's the bare minimum for contributing back to the repository. Note, however, that you may need to add additional dependencies to support the code.
    • If you are developing an engine within GGP-Base, you can just use that version of GGP-Base when creating the ggp-base.jar to use locally. Otherwise, you may have to rename the packages in the project (there are tools like jarjar that can do this automatically) to avoid conflicts with the existing GGP-Base project.
  • You can publish the jar with your code on Maven Central and add it as a non-file-based Gradle dependency. (Please use a fixed version, not a '+' version.) This is the best option for adding an engine to the repository, if possible.

Now that you can reference your engine in the gdl-perf code, you'll need to create three things: a JavaSimulatorWrapper implementation for your engine, a JavaEngineType entry, and an EngineType entry.

Make a Runnables class that contains a method returning a JavaSimulatorWrapper backed by your engine. Select types for the Engine, State, Role, and Move generics that suit your engine. This should expose all the functionality of your engine in a uniform way. See StateMachineRunnables, GameSimulatorRunnables, JavaSimulatorRunnables, and similarly-named classes for examples.

Then, add an entry to the JavaEngineType enum that has as its arguments a version string to identify the version of the enum and an instance of the wrapper.

Finally, add an EngineType entry with the JavaEngineType as an argument.

At this point, you can start testing your engine. Try adding it to the list of engine types in the SamplePerfTestRunner as a starting point.

Adding an engine in a language-agnostic way

To test an engine in a non-Java language, you'll need to write an executable or script that adheres to the rules described in the "Performance testing" section below. Generally, it's best to put that script in the code repository for the engine being tested, not gdl-perf itself.

Within the repository, you'll need to create an EngineEnvironment and an EngineType entry.

  • The EngineEnvironment specifies the working directory that the test should be run from (if relevant) and what values should be picked up from the localConfig.prefs files and set as environment variables. For example, the FLUXPLAYER_PROLOG engine uses one configuration setting to set the working directory (the location of the Fluxplayer codebase), and another to set an environment variable used by the perf-test script (the location of the ECLiPSe executable on the user's file system).
  • The EngineType specifies what EngineEnvironment to use, what command to run to start a performance test process for the engine, and what to use as the working directory for that process.

Once the EngineType entry has been added, you can start testing your engine. Try adding it to the list of engine types in the SamplePerfTestRunner as a starting point.

Performance testing

Performance tests are run in their own process, one per game. Interactions with the framework use the command line and files written in a standard format. This has two advantages:

  1. The test framework can be used to run and test engines in different languages, including those which are not JVM-based.
  2. Instability in the engine, whether in the form of crashes, infinite loops, or otherwise, will affect neither the test framework nor subsequent tests. The test process will be killed if it takes an excessive amount of time, and then the framework can move on.

Performance test process specification

The following command line arguments are given to the test process:

  1. The location of the GDL file containing that game to be played, in the form of an absolute local-system file path.
  2. The location of the file to which results are to be recorded, in the form of an absolute local-system file path.
  3. The target number of seconds to run the test.

The command line to run is specified in the EngineType enum entry for the engine. Note that additional arguments may be used in the command line; these will precede the other arguments. For example, the implementation for Java-based engines (PerfTestProcess) takes an initial argument specifying the ID of the engine to be tested.

The test process should run the performance test by repeatedly getting the initial state; checking if the state is terminal; if not, getting the available legal moves, picking random moves for each player, and using them to get the next state; repeating this process until a terminal state is reached; and computing the goal values for each player in this state. (The initial state may optionally be generated only once.) The Java implementation of this is in JavaPerfTestRunnable.

Results are expected to be written by the test process in the following format:

  1. Each line consists of a name specifying the variable type, followed by an equals sign, followed by the value of the variable. (Whitespace at the beginning or end of the variable name or value is optional and ignored.)
  2. Newlines may be either \n or \r\n; either can be parsed correctly.

The variable types used in the results for a successful performance test are as follows:

  • version: The version of the engine being tested. This may be an arbitrary string.
  • millisecondsTaken: The actual length, in milliseconds, of the testing that was performed.
  • numStateChanges: The total number of transitions from one state to a following state that were computed.
  • numRollouts: The total number of times that the test process reached a terminal state.

An unsuccessful performance test may leave these variables to indicate the nature of the error:

  • version: The version of the engine being tested. This may be an arbitrary string.
  • errorMessage: A message indicating the nature of the test error.

Results format

Each completed perf test appends a line to a file within the results directory. Each line is a series of entries delimited by semicolons.

The format of results is not yet documented.

Correctness testing

This is not yet documented.

gdl-perf's People

Contributors

alexlandau avatar richemslie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gdl-perf's Issues

Factor out more utility code

Code for dealing with processes in particular, maybe code for figuring out which games to deal with based on the contents of a CSV file, code for computerName.txt, etc.

More environmentally robust JavaEclipse implementation

The current one is too finicky about where ECLiPSe Prolog is installed. This is maybe less important now that the full Fluxplayer prolog implementation is available, but it would be nice to have if we can track down the sources and make those modifications.

Correctness testing: Identify game errors separately from engine errors

The tested engine should be able to report an error of invalid goals in a terminal state, no legal moves, etc. and have the reference engine confirm the error. If this is confirmed by the reference engine, this should be recorded and interpreted in a way different from an engine error.

Test/add hooks for more rule engines

Java:
Sancho
Palamedes BasicPlayer: http://palamedes-ide.sourceforge.net/start.html
https://github.com/parthy/ggp (? Might use Fluxplayer's Prolog as well?)
Rekkura: https://github.com/ptpham/rekkura
JavaEclipse (from GIGA-13 paper; either GameController or Moller et al, 2011/Centurio)
JavaProver (from GIGA-13 paper; Halderman et al 2006 and Gunther et al 2013) -- is this just an earlier version of GGP-Base prover? http://games.stanford.edu/resources/reference/java/java.html

.NET:
NGGP Base: https://github.com/druzil/nggp-base

C/C++:
Informafiosi: http://www.general-game-playing.de/downloads.html (C++Reasoner from GIGA-13 paper? points to that link)
https://github.com/thomasmarsh/ggp (?)
https://github.com/sumedhghaisas/libgdl
GDLCC (mentioned in GIGA-13 paper; Waugh 2009)
https://github.com/ptpham/piecemeal

Prolog-based:
FluxPlayer: http://www.general-game-playing.de/downloads.html
CadiaPlayer: ??? (B/F 2008, 2009) - based on YAP Prolog (Costa et al 2012)
https://github.com/muupan/ggpe: uses YAP Prolog (check if same as CadiaPlayer)
https://github.com/tflaherty/GeneralGamePlaying (?)
https://github.com/jvandenbroeck/The-Turk (Java and/or Prolog?)
Ary http://www.ai.univ-paris8.fr/~jm/ggp/ (C/Prolog)

Haskell:
https://github.com/ian-ross/ggp

Rust:
https://github.com/gsingh93/ggp-rs
https://github.com/strout/worthy-opponent

Clojure:
https://github.com/ramirez7/ggp-mcts (?)

OCAML:
GaDeLaC (mentioned in GIGA-13 paper; Saffidine/Cazenave 2011)

???:
Toss (mentioned in GIGA-13 paper; Kaiser/Stafiniak, 2011/2013) toss.sourceforge.net
Gamer (mentioned in GIGA-13 paper; Kissmann/Edelkamp, 2011)

Add installer scripts for Prolog

Assuming most people will be running these on 64-bit Linux machines, we can probably script an installation procedure for the versions of Prolog we rely on, or at least provide reliable installation instructions.

Version should be specified in perf test report file

Currently we specify it in the Java codebase. This approach isn't going to work well for non-Java executables, i.e. it would have to be manually updated whenever something independently-versioned changes, which is more error-prone than updating it in the executable as the engine changes.

Analysis: Add pairwise comparisons of engines

It would be nice to have these pairwise comparison pages where we can see a list of games that both engines support, sorted by the performance ratio between them. Currently it's hard to find the extrema games on that list, which can be interesting case studies.

We can also show the errors for those engines, in a table to show error messages in engine 1 and error messages in engine 2. (Sort by the category of which engine(s) have errors at all.)

Kill process subtrees, not individual processes, after timeouts

After my last run of cadiaplayer, I had a leftover process that was consuming most of the memory on the box, causing significant swapping. I didn't see which it was, so it pretty much invalidated the collected statistics.

This suggests that kill -kill -1234 (for script PID 1234) might work for POSIX-reliant scripts like cadiaplayer's, if we can somehow figure out the script PID from Java: http://askubuntu.com/questions/520107/how-to-kill-a-script-running-in-terminal-without-closing-terminal-ctrl-c-doe

EDIT: From further research and testing, it seems like the issue is not the reliability of killing processes, but just the ability to kill process subtrees instead of an individual process. That will be available in Java 9, which sounds like a good long-term solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.