This repository contains code for several Reinforcement Learning projects. Topics include Dynamic Programming, Temporal Difference (TD) and Deep Q Learning (DQN) Networks.
In this exercise we will evaluate a policy and find the value function. We evaluate this using Dynamic Programming. Using the policy evaluation algorithm we can implement policy iteration to find a good policy for this problem. In this exercise we will implement the value iteration algorithm with Q-values.
For the Monte Carlo Prediction we will look at the Blackjack game, Example 5.1 from the book "Reinforcement Learning: An introduction" by Richard S. Sutton and Andrew G. Barto. The BlackjackEnv
is implemented in blackjack.py
. We implement the MC prediction algorithm and make plots following the book.
In the second part of this exercise we will use off-policy monte carlo to estimate the value function of our target policy using the experience from a different behavior policy.
Monte Carlo Prediction with the Blackjack game.
This part of the project is about Temporal Difference (TD). In this lab, we use the windy gridworld environment, Example 6.5 from the book "Reinforcement Learning: An introduction" by Richard S. Sutton and Andrew G. Barto.
In this lab, we use PyTorch to imlement a Deep Q-Network. We implement the QNetwork.forward
function that uses one hidden layer with ReLU activation (no output activation).
These projects are from the Reinforcement Learning Course from the Master Artificial Intelligence at the University of Amsterdam in the Fall of 2020.