GithubHelp home page GithubHelp logo

upb-lea / reinforcement_learning_course_materials Goto Github PK

View Code? Open in Web Editor NEW
925.0 925.0 210.0 148.46 MB

Lecture notes, tutorial tasks including solutions as well as online videos for the reinforcement learning course hosted by Paderborn University

License: MIT License

Jupyter Notebook 96.98% Python 0.41% MATLAB 0.46% TeX 2.15%
control course course-materials jupyter jupyter-notebooks latex lecture lecture-notes machine-learning online-learning online-videos open-education open-education-resources open-educational-resources prediction python reinforcement-learning teaching teaching-materials tutorial

reinforcement_learning_course_materials's People

Contributors

bhk11 avatar hvater avatar marvinmeyer avatar max-schenke avatar wallscheid avatar webbah avatar wkirgsn avatar xydrkrulof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reinforcement_learning_course_materials's Issues

Lecture 2, slide 40: rewording is needed

"An optimal policy must equal the expected return for the best action
of a given state:"

A policy cannot equal to the expected return since they are different things. Re-wording is needed for this sentence.

Eligibility traces for SARSA(lambda)

In Lecture 6 in the definition of the TD($\lambda$) update on Slide 30, the definition needs to be adjusted. On the one hand it should be "SARSA($\lambda$)" , on the other hand the Eligibility trace must be adjusted by the action: $z_k(x_k, a_k)$.

Quellen:
Reinforcement Learning: An Introduction (Second Edition), Chapter 7.5 p. 183ff

A mistake in lecture 1, slide 46

lecture 1, slide 46: in the summation in 1.16 r_{k+i} should be a function of u_{k+i-1} because
for example we have r_{k+1} when u_{k} is applied.

Tidy up tutorial solution notebooks and provide problem templates

  • Some of the solution notebooks are not well readable. It seems that they partly contain debug column of figures or a bulk of number arrays floating around. There is still some room for improvement here to promote readability and a quick introduction to the topic.
  • It would be nice to have a small mark down information sheet per tutorial summarizing the most important information (very short description of the adressed problems, used algorithms, maybe an overview of the sub-tasks within the notebook). The format of the mark down sheet should be standardized among all tutorials.
  • And finally, please provide the task templates with gaps for the student inputs. Hence, for every notebook there should be two files like ex00_task_template.ipynb and ex00_solution.ipynb.
    • Question/remark: Is there maybe a straightforward way to automate the generation of the task template notebooks based on the solution notebooks (e.g. simplified / light nbgrader with Travis backend which erases parts of the solution code based on built-in keywords). Investing once the effort in order to automate this pipeline will be very convinient for future updates of the tasks.

Exercise 5: Change the example environment for tasks 3 & 4

Double Q learning was introduced as a way to remove maximization bias especially in stochastic environments. Now in exercise 5 we're given a stochastic environment on which double Q learning is behaving worse than normal Q learning. While this teaches us that in practice it's not always clear which method will be the best, it is not helpful to strengthen the concepts learned in lecture. A new learner will question his solution of this task and in general the benefit of double Q learning. So my suggestion is to find a better fit environment to see the advantages that double Q learning has over single Q learning.

Exercise 4: cannot find Racetrack Environment

Describe the bug
In Exercise 04 Monte-Carlo methods should be implemented for the racetrack environment, however, I cannot find where racetrack_environment.py is located. Could you provide installation instructions or an description of the environment (e.g. env.yml for conda) to be used?

Dependency Issues for Running the Excercises

Finding and installing the correct package versions to run the later exercises is too cumbersome.
It seems that some packages have to be downgraded concerning the provided requirements.txt.

This should be streamlined if possible.

As reference, this is the environment that resulted for me to exclusively run ex12:
ex12_schenke_requirements.txt

Grammatical Errors

Lecture 08
Slide 10 : Last point - It should be ..." an ML model"...

Lecture 11
Slide 2: ..."Goal of today's lecture"... [missing apostrophe]

Insufficient explanation Lecture 2, slide 8

In Lecture 2, slide 8, it is not explained how P_{xx'}^m in eq.(2.3) can be replaced by a constant transition matrix as m goes to infinity. It is clear that as m goes to infinity p_{k+m} and p_{k} in eq.(2.3) can be replaced by a constant matrix p, but not clear and not explained how P_{xx'}^m becomes P_{xx'}.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.