This repository holds deep RL solutions for solving the bit flipping enviroment using the hindsight experience replay.
View the original paper here.
In this environment, we are given with a starting state with is a binary vector of size n, and a goal state of size n.
In each action, the user can flip one of the bits in the current state. For each step, the user gets a reward '-1',
and for a step which makes the current state equal to the goal, the user gets a reward '0'.
The environment is written in the bit_flip_env.py
file.
To check the ability of HER to deal with dynamic environments, we added this option to the bit flipping domain.
This means that with every step the user makes, with probability 0.3, one of the goal's bits would flip,
making it harder to predict. The goal's flipped bit is chosen with uniform probability.
The algorithm, described in details here by Andrychowicz et al. can deal with sparse binary rewards (as we get in the bit flipping domain.
The problem with sparse rewards, is that for very large state spaces, we might never get a succesful episode, making it very hard to learn.
In this algorithm, we create new "fake" episodes from unsuccesful ones, by chaging their original goal to one of the states they actually reached.
This way, we add successes to the experience replay buffer, and can learn from them. It is basically the same as learning from mistakes.
The concept here is very similar to HER, and is described here by Fang et al.
This algorithm takes also into account that the goal made some trasitions over time, and uses its trajectory to learn how to reach it.
All the files below have arguments which can be changed (but all set by default to our choice of parameters).
To see all arguments for each script run: <SCRIPT NAME>.py --help
Example for running a script: python main.py
To train the model that solves the bit flipping environment, run the following scripts: main.py
.
Note that the argument --state-size <NUMBER>
is neccesary, in order to see the effect of the different sizes on the model.
Adding the argument --HER
or --DHER
would use the respective algorithms.
Adding the argument --dynamic
would use the dynamical mode of the environment.
The models architecture is specified in: dqn.py
To test the models run the following scripts: evaluate_model.py
with the relevant --state-size
argument.
We added a trained model in the bit_flip_model.pkl
file, with the size n=10.
In the above figure, we show how the state size affects the success ratio of the different algorithms.
As can be seen, using HER, allows us to overcome the binary sparse reward problem, and maintain high success rate even for very high state spaces.
This even works when compared to the normal DQN, with added Reward Shaping.
In the following example, we can ovserve how the domain is solved step by step using the HER algorithm.