RL and Deep-RL implementations

This is a modular implementation, meaning: you can plug-and-play almost any environment (in the corresponding file, within the same base folder) with any algorithm.

Table of contents:

Tabular RL Algorithms

Implemented Algorithms:

.	Prediction	Control
Monte Carlo (MC)	MC policy evaluation	MC non-exploring-starts control, off-policy MC control
Temporal Difference 0 (TD0)	TD0 policy evaluation	SARSA, Expected SARSA, Q Learning, Double Q Learning

Implemented Environments:

Discrete State Space	Discretized State Space
Toy Text - FrozenLake, Taxi, Blackjack	Classic Control - MountainCar, CartPole, Acrobot

Deep RL Algorithms

Most of the cases, you can select the desired library type (lib_type) implementation: LIBRARY_TF, LIBRARY_TORCH, LIBRARY_KERAS.

Implemented Control Algorithms:

Deep Q Learning (DQL)
Policy Gradient (PG)
- set ep_batch_num = 1 for the Monte-Carlo PG (REINFORCE) algorithm
Actor-Critic (AC)
Deep Deterministic Policy Gradient (DDPG)

Implemented Environments:

(environments with Continuous State Space)

.	Discrete Action Space	Continuous Action Space
Observation Vector Input Type	CartPole, LunarLander	Pendulum, MountainCarContinuous, LunarLanderContinuous, BipedalWalker
Stacked Frames Input Type	Breakout, SpaceInvaders

Algorithms restrictions

Note that some some algorithms have restrictions.

Innate restrictions:

Discrete Action Space	Continuous Action Space
Deep Q Learning	Deep Deterministic Policy Gradient

Some current restrictions are due to the fact that there's more work to be done (code-wise), meaning: writing for every -
- library implementation (tensorflow, torch, keras).
- input (state) type (observation vector, stacked frames).
- action space type (discrete, continuous).

cmdline_play.py

Enables playing from the command-line.

Running this file performs the algorithm on a single environment, through the command-line (using the argparse module to parse command-line options). The major benefit from this is that it enables concatenating multiple independent runs via && (so you can run multiple tests in one go).

grid_search.py

Enables performing grid search.

Running this file performs a comparative grid search for a single environment, and plots the results. This is mostly done for hyper-parameters tuning. Note that currently I added 16 colors (more than that will raise an error, so add more colors if you need more than 16 combinations)

Currently, grid search is tuned to DQL, but it's applicable to every algorithm with only minor changes (the relevant imports are there at the top of the file, just commented out).

Results

Algorithms Performance Examples. Training & Test results come in the forms of graphs and statistics (for some of the environments) of both: running average of episode scores, and accumulated scores.