ML_Arena

A fully custom simulation environment that interchangeably compares reinforcement learning models. Has a full blown API for modularity of algorithms, built in general genetic neural network library, and other utilities for training and testing.

NOTE: If you would like to skip the write up portion of the readme, click here.

Training Process

Fitness Evaluation:

After experimenting with Fixed-Topology Neuro-Evolution within this environment, I've found that neglecting to discount detrimental fitness factors, it will fail to develop any sort of effective policy. After hours of trial and error, passing detriment values under a logarithm seems to suffice as it asserts that the detriments don't have such a serious impact on it's overall score, but rather it heavyily rewards minimizing it. You can view the actual method inside Fitness Pawn class under "calculate_fitness".

Tendencies:

With the default hyper parameters I have set in place, on average about 1200 generations the mean seems to hover around a fitness score of ~7 with a standard deviation of around 0.5-1. However, the next 250 generations show major improvements in max generational scores: jumping to a mean of ~13, allbeit with a larger standard deviation of around 2.5-3.

Figure 1: Generations 1100 - 1350

After another 100 generations, it is apparent there is yet another jump in performance, from the previous mean of ~13 to a peak of ~22, where it eventually hovered for another 1000 generations making no further policy improvements.

Figure 2: Generations 1350 - 1450

Figure 3: Generations 1350 - 2900

The model seems to cap out around 23 in this simulation, however 23 is not the max fitness we've had so far, which was an unrecorded 28, but the highest recorded would be back on figure 1, where we had the big spike at 25. Further modifications & research into the policy improvement is still being done.

General Info

Versions:

Built in python3.

Libraries Required:

Arcade
Numpy

Usage:

Clone this repository. git clone https://github.com/McCrearyD/ML_Arena.git
To run any simulation, run python3 -u main.py inside the main directory.
Follow the terminal instructions to run any type of simulation.
To run all test assertions & test environment, run python3 -u test.py inside the main directory.
To view a graph for any saved population, run python3 -u visualize.py inside the main directory.

Simulation Types:

Freeplay: Freeplay allows you to create a single matchup using any type of pawn controller you'd like.
Evolution:
- Adversarial: Train a random (or load previous) population against another.
- Other: Train a random (or load previous) population against another pawn type (ie. dynamic or brainless).
Balance: Run a balancing simulation for pawn statistical biases. Runs x match iterations concurrently and reports win/loss results for each bias.

Indicators:

Blue Pawn: If a pawn's color is blue, this means they have enabled their shield.
Red Pawn: If they are red, this means their health is 0.
Number Below Pawn: If a pawn has a number, this means they are a FitnessPawn. A fitness pawn is typically assigned to a neural network in order to judge how good they are. The higher = the better.
Bar Above Pawn: Every pawn has a bar above their head that displays their health percentage.
- Green = +75%
- Yellow = +50%
- Red = -50%
Red Laser: Long distance laser.
- Lighter Red: Hasn't reached the minimum distance yet, if it collides with an enemy, they won't be damaged.
- Darker Red: Fully charged laser, upon impact it will deal damage.
Blue Laser: Short distance laser.
Green Laser (Debug): A laser is highlighted green if it is the 'imminent laser' of the current player pawn.

Main Simulation Keys

Key(s)	Description	Context
W, A, S, D	Movement	Player
LEFT, UP, RIGHT, DOWN	Directional movement	Player
Q	Shield	Player
SHIFT	Long-range laser	Player
SPACE	Short-range laser	Player
OPEN-BRACKET	Draw all pawns in every match	Global
CLOSE-BRACKET	Show connections between all on-screen pawns	Global
BACK-SLASH	Show pawn directional tracers	Global
ESCAPE	End simulation, if Evolutionary: Save populations under their names	Global
BACKSPACE	Force reset the environment. If Evolutionary: End the current generation	Global
N	(Toggle) Visually display the currently focused creature(s) Neural Networks	Evolution
P	(Toggle) Speed up all gameplay. Updates per frame can be changed in `environment.py > class Environment > var('speed_up_cycles')`	Global

Test Environment Keys

Key(s)	Description
BACKSPACE	Next generation
ESCAPE	Randomize goal location

Showcase


Custom Network Visualization	Freeplay

Evolutionary Training	Statistical Bias Balancing

Test Environment	Non-Graphical Simulation w/ Generational Reports

Saved Population Visualization

verbose-void / ml-arena Goto Github PK

ml-arena's Introduction

ML_Arena

NOTE: If you would like to skip the write up portion of the readme, click here.

Training Process

Fitness Evaluation:

Tendencies:

Figure 1: Generations 1100 - 1350

Figure 2: Generations 1350 - 1450

Figure 3: Generations 1350 - 2900

General Info

Versions:

Libraries Required:

Usage:

Simulation Types:

Indicators:

Main Simulation Keys

Test Environment Keys

Showcase

ml-arena's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs