This repository contains Jupyter notebooks of different Reinforcement Learning (RL) algorithms implementations. Each notebook implements a single RL algorithm in a self-contained and complete manner.
In order to use Docker you should first build the image by running the following Shell command in the repository directory.
docker build -t rl_alg:latest .
Then, run the following command to create a container out of that image, connect it to port 8888 (the Jupyter port), and bind the notebooks
directory in the container to the notebooks
directory on the host (on your machine).
docker run -d -p 8888:8888 -v ./notebooks:app/notebooks rl_alg
Finally, open a browser on your local machine and go to
localhost:8888
and enter in the Jupyter password the word admin
. A Jupyter-Lab interface should open with a locally mounted notebooks directory in it.
- REINFORCE (discrete)
- REINFORCE (continuous)
- Actor Critic TD(0)
- Q-Learning
- Double Q-Learning
- SARSA
- SARSA n-step - Tabular
- SARSA n-step
- SARSA(λ)
- Expected-SARSA
- Off-policy Expected-SARSA n-step
Name | Description | Link | Colab | NBViewer |
---|---|---|---|---|
REINFORCE.ipynb |
Implementation of the REINFORCE and REINFORCE with baseline algorithms using PyTorch and Gymnasium on the LunarLander-v2 environment | notebook | ||
REINFORCE_continuous.ipynb |
Implementation of the REINFORCE and REINFORCE with baseline algorithms for environments with continuous action spaces using PyTorch and Gymnasium on the LunarLander-v2 continuous environment | notebook | ||
Actor_Critic_TD_0.ipynb |
Implementation of the Actor-Critic TD(0) algorithm for environments with discrete and continuous action spaces using PyTorch and Gymnasium on the LunarLander-v2 environment | notebook | ||
SARSA.ipynb |
Implementation of the SARSA algorithm with and without an Experience Replay Buffer using PyTorch and Gymnasium on the CartPole-v1 environment | notebook | ||
Q-Learning.ipynb |
Implementation of the Q-Learning algorithm with and without an Experience Replay Buffer using PyTorch and Gymnasium on the Acrobot-v1 environment | notebook | ||
Expected_SARSA.ipynb |
Implementation of the Expected-SARSA algorithm using PyTorch and Gymnasium on the Acrobot-v1 environment | notebook | ||
Double_Q-Learning.ipynb |
Implementation of the Double Q-Learning vs. Q-Learning algorithms using PyTorch and Gymnasium on the Acrobot-v1 environment | notebook | ||
SARSA_n_step_tabular.ipynb |
Implementation of the n-steps SARSA algorithm in its tabular version using Gymnasium on the CliffWalking-v0 environment | notebook | ||
SARSA_n_step.ipynb |
Implementation of the n-steps SARSA algorithm in its non-tabular version using PyTorch and Gymnasium on the CartPole-v1 environment | notebook | ||
Off_policy_Expected_SARSA_n_step.ipynb |
Implementation of the Off-policy n-steps Expected-SARSA algorithm in its non-tabular version using PyTorch and Gymnasium on the CartPole-v1 environment | notebook | ||
SARSA_lambda.ipynb |
Implementations of the SARSA(λ) algorithm in its tabular and deep cases, with and without an Experience Replay Buffer using PyTorch and Gymnasium on the CliffWalking-v0 and CartPole-v1 environments | notebook | ||