GithubHelp home page GithubHelp logo

rl-book's Introduction

Foundations of Reinforcement Learning

A textbook teaching foundational ideas in reinforcement learning with examples in finance. Written by Ashwin Rao and Tikhon Jelvis.

Reinforcement Learning (RL) is emerging as a viable and powerful technique for solving a variety of complex business problems across industries that involve Sequential Optimal Decisioning under Uncertainty. Although RL is classified as a branch of Machine Learning (ML), it tends to be viewed and treated quite differently from other branches of ML (Supervised and Unsupervised Learning). Indeed, RL seems to hold the key to unlocking the promise of AI – machines that adapt their decisions to vagaries in observed information, while continuously steering towards the optimal outcome. It’s penetration in high-profile problems like self-driving cars, robotics and strategy games points to a future where RL algorithms will have decisioning abilities far superior to humans.

Getting the Book

The first edition of the book is now available from Routledge. You can:

You can also download errata for the print version.

Getting the Code

The rl directory in this repo contains the code used in this book, with a simple framework for reinforcement learning as well as fleshed out examples for each chapter.

Working with Python and venv

The Python code for the book requires a few additional libraries. We can manage our Python dependencies with a venv.

First, create a venv:

> nix-shell
[nix-shell:~/Documents/RL-book]$ python -m venv .venv

Then, each time you're working on this project, make sure to activate the venv:

> source .venv/bin/activate

Once the venv is activated, you should see a (.venv) in your shell prompt:

(.venv) RL-book:RL-book>

Now you can use pip to install the needed dependencies inside the venv:

(.venv) RL-book:RL-book>pip install -r requirements.txt

If you want additional libraries, you can install them explicitly:

(.venv) RL-book:RL-book> pip install matplotlib

rl-book's People

Contributors

amil5 avatar coverdrive avatar dependabot[bot] avatar slyderek avatar sven-lerner avatar tikhonjelvis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-book's Issues

Code example not properly indented p.55

The converge function on page 55 is not properly indented.
Instead of

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in itertools.pairwise(values):
    yield a
    if abs(a - b) < threshold:
        break

(I assume) it should be:

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in pairwise(values):
        yield a
        if abs(a - b) < threshold:
            break

Code example missing statement p 50

The 2 code examples for the iterative square root algorithm cause an 'UnboundLocalError' and are missing an assignment; even with local vars initialized, the loop will never terminate as x is never updated.

I agree the use of math.inf is messy but prevents an extra check in the while loop like while x is None or abs()>.01

These are on line approx 644 of chapter1.md (p50 of the un-numbered chapter of the pdf book), and then again 3 paragraphs later.

import math
def sqrt(a: float) -> float:
    x = math.inf                         #Need to avoid UboundLocalError, x is previous value, x_n is current value
    x_n = a / 2 # initial guess
    while abs(x_n - x) > 0.01:
        x = x_n                             #<== missing                
        x_n = (x + (a / x)) / 2
    return x_n

requirements.txt issues on Windows

I had a few issues with requirements.txt dependencies. I ran into installation errors with:

pandas == 1.0.3 --> switched it to 1.3.0 and it worked fine.
scipy == 1.4.1 --> switched to 1.7.0 and it worked.

requests 2.24.0 then conflicted with urllib3 == 1.26.5, as it depended on earlier versions. Switched to urllib 1.21.1 to resolve. Ran into some more build errors down the line.

For context, I'm on running in an anaconda environment on a Windows OS which could be the problem. The requirements may run correctly on a Linux VM.

Issue with feature functions in chapter8/optimal_exercise_bi.py

Dear professors,

I believe there is an issue with the feature functions used in the chapter8/optimal_exercise_bi.py, in the lines 200-202:

ffs += [(lambda s: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[i]))
            for i in range(num_laguerre)]

It should create different functions, each one being a different Laguerre polynomial multiplied by a softmax function.

However, it seems that it is creating the same function multiple times.

Searching for this issue on the internet, I found a nice post in Stackoverflow (https://stackoverflow.com/questions/6076270/lambda-function-in-list-comprehensions), where it explains the problem (a sort of edge case in Python, where the loop variable, the i in the code above, is captured by reference instead of by value).

It seems one possible solution would be to explicitly capture the loop variable inside the function as an extra argument, like below:

ffs += [(lambda s, coeff=i: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[coeff]))
            for i in range(num_laguerre)]

It might be that this same issue happens in other places in the codebase. The solution would be similar.

Would you please let me know what you think about this?

Thanks for the code examples and the book.

typo pg. 125

First sentence of the Policy Improvement section should read "Terms such as 'better' ..."

typo pg. 200

In the paragraph before A Simple Financial Example, tradeoff is spelled as treadeoff.

typing or rl.distribution?

Thanks for the book - I am enjoying reading the book and its "modular" python codes.

I assume that in Chapter "Markov Processes" (sec:mrp-chapter), in the python code above the "Simple Inventory Example" (page 72 of book pdf), the

from typing import FiniteDistribution, Categorical
needs to be changed to
from rl.distribution import FiniteDistribution, Categorical

rl.distribution, etc.

I keep seeing the command:

import rl.distribution

But I can't find which package this command is from. Is it from Keras-rl or gym?

Inconsistent code snippet imports

pg. 119 has the following code snippet

from typing import Iterator
X = TypeVar('X')

def iterate(step: Callable[[X], X], start: X) -> Iterator[X]:
    ...

It seems odd to explicitly import Iterator from typing, but not Callable or TypeVar.

State-Reward Sequence

At the end of the first paragraph, page 9 of chapter2.pdf reads

The sequence S0, R1, S1, R1, S2, . . . terminates at...

I'm guessing it's supposed to read

The sequence S0, *R0* S1, R1, S2, . . . terminates at...

Dynamic Programming convergence control

It would be good to have the dynamic programming algorithms take as input a tolerance input (eg: value_iteration_result takes an extra input tolerance: float)

typo

"its" is written as "it's" multiple times throughout the book, e.g. "distribution of it’s Markov Process" on pg. 194

typo LSPI

In the first paragraph describing LSPI, it should say \bm{\phi}(s,a)^T \cdot \bm{w} instead of \bm{\phi}(s)^T \cdot \bm{w}.

chapter numbering in pdf and rl folder

Hi,

When you read the book pdf, say I am reading Chapter 3. "3. Dynamic Programming Algorithms", the python code ("clearance_pricing_mdp.py") for the chapter has been located in chapter4 of the rl folder. I do not think that is a big issue (you read Chapter 3 and codes are in chapter4 folder), but if you can come up with a quick-fix, it can lead to further consistency of book pdf with codes.

One suggestion maybe is to number the chapters in rl folder from 0, instead of 1 (like book folder), then I feel the pdf book will match with the codes in rl folder. But I am afraid that may break some internal structure you already have :)

requests package conflict

ERROR: Cannot install -r requirements.txt (line 66) and urllib3==1.26.5 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested urllib3==1.26.5
requests 2.24.0 depends on urllib3!=1.25.0, !=1.25.1, <1.26 and >=1.21.1

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.