The rl-policy-gradient's intro from robin-ml

robin-ml / rl-policy-gradient Goto Github PK

View Code? Open in Web Editor NEW

Policy gradient algorithms in reinforcement learning is an approach to solve reinforcement learning problems by finding an optimal policy. A policy tells us how to act from a particular state

Python 100.00%

rl-policy-gradient's Introduction

Policy Gradient Methods

We focuses on a particular family of reinforcement learning algorithms that use policy gradient methods. They are designed to be easily adaptable for reinforcement learning environments (like gym).

The goal of reinforcement learning is to find an optimal behavior strategy for the agent to obtain optimal rewards. The policy gradient methods target at modeling and optimizing the policy directly. The policy is usually modeled with a parameterized function (θ), i.e π _θ (a|s). The value of the reward (objective) function depends on this policy and then various algorithms can be applied to optimize θ for the best reward.

Finding the θ that maximises the reward is an optimisation problem . Some approaches include:

Gradient-based:
- Gradient descent
- Conjugate gradient
- Quasi-newton
Genetic algorithms
Hill climbing
Simplex / amoeba / Nelder Mead

Tensorflow versions

The master branch supports Tensorflow 2 versions of the baseline algorithm A2C/A3C

PG Algorithms

Recommend Projects

robin-ml / rl-policy-gradient Goto Github PK

rl-policy-gradient's Introduction

Policy Gradient Methods

Tensorflow versions

PG Algorithms

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs