In Lecture 6 in the definition of the TD($\lambda$) update on Slide 30, the definition needs to be adjusted. On the one hand it should be "SARSA($\lambda$)" , on the other hand the Eligibility trace must be adjusted by the action: $z_k(x_k, a_k)$.
Quellen:
Reinforcement Learning: An Introduction (Second Edition), Chapter 7.5 p. 183ff
Some of the solution notebooks are not well readable. It seems that they partly contain debug column of figures or a bulk of number arrays floating around. There is still some room for improvement here to promote readability and a quick introduction to the topic.
It would be nice to have a small mark down information sheet per tutorial summarizing the most important information (very short description of the adressed problems, used algorithms, maybe an overview of the sub-tasks within the notebook). The format of the mark down sheet should be standardized among all tutorials.
And finally, please provide the task templates with gaps for the student inputs. Hence, for every notebook there should be two files like ex00_task_template.ipynb and ex00_solution.ipynb.
Question/remark: Is there maybe a straightforward way to automate the generation of the task template notebooks based on the solution notebooks (e.g. simplified / light nbgrader with Travis backend which erases parts of the solution code based on built-in keywords). Investing once the effort in order to automate this pipeline will be very convinient for future updates of the tasks.
Double Q learning was introduced as a way to remove maximization bias especially in stochastic environments. Now in exercise 5 we're given a stochastic environment on which double Q learning is behaving worse than normal Q learning. While this teaches us that in practice it's not always clear which method will be the best, it is not helpful to strengthen the concepts learned in lecture. A new learner will question his solution of this task and in general the benefit of double Q learning. So my suggestion is to find a better fit environment to see the advantages that double Q learning has over single Q learning.
Describe the bug
In Exercise 04 Monte-Carlo methods should be implemented for the racetrack environment, however, I cannot find where racetrack_environment.py is located. Could you provide installation instructions or an description of the environment (e.g. env.yml for conda) to be used?
Finding and installing the correct package versions to run the later exercises is too cumbersome.
It seems that some packages have to be downgraded concerning the provided requirements.txt.
In Lecture 2, slide 8, it is not explained how P_{xx'}^m in eq.(2.3) can be replaced by a constant transition matrix as m goes to infinity. It is clear that as m goes to infinity p_{k+m} and p_{k} in eq.(2.3) can be replaced by a constant matrix p, but not clear and not explained how P_{xx'}^m becomes P_{xx'}.