The bandits's discuss from jasonkrone

introduction-suggestions

Reinforcement learning (RL) is the area of machine learning concerned with an agent interacting with an environment with the objective of learning the optimal behavioral policy to maximize a reward. --- [can you use notation in the intro. good example in preliminaries: https://arxiv.org/abs/1704.03012]

When [the reward function and transition functions] are known, dynamic programming approaches like value iteration can be used to determine an optimal policy. Otherwise, the agent must attempt to learn about rewards through interacting with the environment, by using the RMAX algorithm for example. In both of these cases, however, these algorithms do not scale well when the state space is large. [talk more about this; give an example]

In particular, long horizon problems are difficult to solve using standard RL approaches.
[talk more about this; give an example; talk about sparse rewards and things we cant do right now e.g. montezumas revenge]

Hierarchical approaches to this problem aim to take advantage of trajectories that have common sub-components [give an example; possibly from one of the papers]. Hierarchical RL aims to learn these sub-tasks and then combine them to solve the original problem. This might be done through temporally extended actions where decisions aren't required at every time step, but rather choosing activities that follow their own policies over several time steps until they terminate [make this sentence more precise; it's a little unclear].

This paper is a review of several hierarchical reinforcement learning methods. We first define Markov Decision Processes(MDP) and Semi-Markov Decision Processes(SMDP) which is the motivating problem of RL. We then describe several of the classical strategies: options, hierarchical abstract machines (HAM), feudal RL, and MAXQ. We explain how each of these algorithms work, how much they learn [huh ?], how they use temporal or state abstraction to provide more scalable approaches to solving MDPs, and how they are advantageous over the other approaches.

We then describe more modern modern approaches. [talk about commonalities among modern approaches]

jasonkrone / bandits Goto Github PK

bandits's Issues

introduction-suggestions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs