tikhonjelvis / rl-book Goto Github PK

Shell 0.01% Nix 1.69% Python 78.24% Jupyter Notebook 10.63% Lua 0.33% TeX 9.10%

rl-book's Introduction

Foundations of Reinforcement Learning

A textbook teaching foundational ideas in reinforcement learning with examples in finance. Written by Ashwin Rao and Tikhon Jelvis.

Reinforcement Learning (RL) is emerging as a viable and powerful technique for solving a variety of complex business problems across industries that involve Sequential Optimal Decisioning under Uncertainty. Although RL is classified as a branch of Machine Learning (ML), it tends to be viewed and treated quite differently from other branches of ML (Supervised and Unsupervised Learning). Indeed, RL seems to hold the key to unlocking the promise of AI – machines that adapt their decisions to vagaries in observed information, while continuously steering towards the optimal outcome. It’s penetration in high-profile problems like self-driving cars, robotics and strategy games points to a future where RL algorithms will have decisioning abilities far superior to humans.

Getting the Book

The first edition of the book is now available from Routledge. You can:

Buy a copy from Routledge or Amazon
Download a PDF of the manuscript
Compile the latest version of the manuscript from this repo

You can also download errata for the print version.

Getting the Code

The rl directory in this repo contains the code used in this book, with a simple framework for reinforcement learning as well as fleshed out examples for each chapter.

Working with Python and venv

The Python code for the book requires a few additional libraries. We can manage our Python dependencies with a venv.

First, create a venv:

> nix-shell
[nix-shell:~/Documents/RL-book]$ python -m venv .venv

Then, each time you're working on this project, make sure to activate the venv:

> source .venv/bin/activate

Once the venv is activated, you should see a (.venv) in your shell prompt:

(.venv) RL-book:RL-book>

Now you can use pip to install the needed dependencies inside the venv:

(.venv) RL-book:RL-book>pip install -r requirements.txt

If you want additional libraries, you can install them explicitly:

(.venv) RL-book:RL-book> pip install matplotlib

rl-book's People

Contributors

Stargazers

Watchers

Forkers

neural-finance mindis r2rahul yunxileo pravin-jha coverdrive timxian leorebensabath josephwakim allenzhaoyb lkourti remroc danielxjd yihcathy gqy94 yujiehe05 emileclastres learnerlyp sogipec nancyzhao6 sandyleee nsanchez95 jy-tang cbcooper55 gyy7 mbecking qing1994 thowell heyang-huang abhinavrangarajan mtirlea ladouceu chkao831 chaosex kanto-nishiguchi amanaroratc to-be-or-not fintrek minalspatil ziyiliubird smgyl mlsi zarfer007 davidrsewell learning-functional-programming jeromeku irischang020 jiacheng-liu abanskota githubcucu shenoy1 smallgum aurosark nferreira ji9620 efpm04013 cyyeh davidsmandrade zhangyi-hu douxiaotian davidfelsen yhyh6565 whoismanoj wangobango jiaahso tytsai123 aaronput prakashgawas faces821 saurabac layoups ejisoo pratishthaw weiyeecon tonylibing otreewen2020 liubo1999 yang0110 ufda walkacross chenjz13 arthas0310 10sun ayushverma13 zephelhe barclay742 yerim21c kartikeyasethi zz2585 josegarciav wmleung2 akenarsari learnableloopai maneeshdisodia jrovee vamorel epicsurya srijaladi jrowley8013 bonwoo-koo9

rl-book's Issues

Book feature Request -- keep indentation when copying code from pdf

It would be really helpful if indents were kept when copying code snippets from the pdf version of the book. (I don't know how to do it though).
This could enable readers to experiment with code snippets faster and encourage more such behavior.

Code example not properly indented p.55

The converge function on page 55 is not properly indented.
Instead of

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in itertools.pairwise(values):
    yield a
    if abs(a - b) < threshold:
        break

(I assume) it should be:

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in pairwise(values):
        yield a
        if abs(a - b) < threshold:
            break

Typo in codes RL-book/rl/chapter2/simple_inventory_mrp.py/

Line 48 should be state.state.on_hand instead of state.on_hand

Code example missing statement p 50

The 2 code examples for the iterative square root algorithm cause an 'UnboundLocalError' and are missing an assignment; even with local vars initialized, the loop will never terminate as x is never updated.

I agree the use of math.inf is messy but prevents an extra check in the while loop like while x is None or abs()>.01

These are on line approx 644 of chapter1.md (p50 of the un-numbered chapter of the pdf book), and then again 3 paragraphs later.

import math
def sqrt(a: float) -> float:
    x = math.inf                         #Need to avoid UboundLocalError, x is previous value, x_n is current value
    x_n = a / 2 # initial guess
    while abs(x_n - x) > 0.01:
        x = x_n                             #<== missing                
        x_n = (x + (a / x)) / 2
    return x_n

requirements.txt issues on Windows

I had a few issues with requirements.txt dependencies. I ran into installation errors with:

pandas == 1.0.3 --> switched it to 1.3.0 and it worked fine.
scipy == 1.4.1 --> switched to 1.7.0 and it worked.

requests 2.24.0 then conflicted with urllib3 == 1.26.5, as it depended on earlier versions. Switched to urllib 1.21.1 to resolve. Ran into some more build errors down the line.

For context, I'm on running in an anaconda environment on a Windows OS which could be the problem. The requirements may run correctly on a Linux VM.

Coin() returns str, but mean() expects a list of numbers

RL-book/rl/chapter1/probability.py

Line 55 in 56ca64d

expected_value(Coin(), 100)

nix installation failed on old (2018) MacOS with Sonoma 14.5

On my older 2018 intel mac, installation failed using the --darwin-use-unencrypted-nix-store-volume flag. excluding this flag as on the nix docs works fine.

I will make a docs PR.

Issue with feature functions in chapter8/optimal_exercise_bi.py

Dear professors,

I believe there is an issue with the feature functions used in the chapter8/optimal_exercise_bi.py, in the lines 200-202:

ffs += [(lambda s: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[i]))
            for i in range(num_laguerre)]

It should create different functions, each one being a different Laguerre polynomial multiplied by a softmax function.

However, it seems that it is creating the same function multiple times.

Searching for this issue on the internet, I found a nice post in Stackoverflow (https://stackoverflow.com/questions/6076270/lambda-function-in-list-comprehensions), where it explains the problem (a sort of edge case in Python, where the loop variable, the i in the code above, is captured by reference instead of by value).

It seems one possible solution would be to explicitly capture the loop variable inside the function as an extra argument, like below:

ffs += [(lambda s, coeff=i: np.log(1 + np.exp(-s.state / (2 * strike))) *
            lagval(s.state / strike, ident[coeff]))
            for i in range(num_laguerre)]

It might be that this same issue happens in other places in the codebase. The solution would be similar.

Would you please let me know what you think about this?

Thanks for the code examples and the book.

typo pg. 125

First sentence of the Policy Improvement section should read "Terms such as 'better' ..."

typo pg. 200

In the paragraph before A Simple Financial Example, tradeoff is spelled as treadeoff.

typing or rl.distribution?

Thanks for the book - I am enjoying reading the book and its "modular" python codes.

I assume that in Chapter "Markov Processes" (sec:mrp-chapter), in the python code above the "Simple Inventory Example" (page 72 of book pdf), the

from typing import FiniteDistribution, Categorical
needs to be changed to
from rl.distribution import FiniteDistribution, Categorical

rl.distribution, etc.

I keep seeing the command:

import rl.distribution

But I can't find which package this command is from. Is it from Keras-rl or gym?

About the code in simple_inventory_mdp_nocap.py in chapter3

I am confusing about the line 40, why is it state.state.inventory_position() instead of state.inventory, and line 45 state.state.on_hand instead of state.on_hand.

Inconsistent code snippet imports

pg. 119 has the following code snippet

from typing import Iterator
X = TypeVar('X')

def iterate(step: Callable[[X], X], start: X) -> Iterator[X]:
    ...

It seems odd to explicitly import Iterator from typing, but not Callable or TypeVar.

Mistake in summation for Stationary Distribution

The summation variable for a Stationary Distribution (p74 of pdf) in Chapter2.md should be s not s', i.e. sum over s in N:

Wrong:

State-Reward Sequence

At the end of the first paragraph, page 9 of chapter2.pdf reads

The sequence S0, R1, S1, R1, S2, . . . terminates at...

I'm guessing it's supposed to read

The sequence S0, *R0* S1, R1, S2, . . . terminates at...

Dynamic Programming convergence control

It would be good to have the dynamic programming algorithms take as input a tolerance input (eg: value_iteration_result takes an extra input tolerance: float)

typo

"its" is written as "it's" multiple times throughout the book, e.g. "distribution of it’s Markov Process" on pg. 194

typo LSPI

In the first paragraph describing LSPI, it should say \bm{\phi}(s,a)^T \cdot \bm{w} instead of \bm{\phi}(s)^T \cdot \bm{w}.

chapter numbering in pdf and rl folder

Hi,

When you read the book pdf, say I am reading Chapter 3. "3. Dynamic Programming Algorithms", the python code ("clearance_pricing_mdp.py") for the chapter has been located in chapter4 of the rl folder. I do not think that is a big issue (you read Chapter 3 and codes are in chapter4 folder), but if you can come up with a quick-fix, it can lead to further consistency of book pdf with codes.

One suggestion maybe is to number the chapters in rl folder from 0, instead of 1 (like book folder), then I feel the pdf book will match with the codes in rl folder. But I am afraid that may break some internal structure you already have :)

Check formula for calculating the new state in Process1

The code mentions: return Process1.State(price=state.price + up_move * 2 - 1)
Where as its a logistic function of (L-Xt).
Why is up_move multiplied by 2 and subtracted by 1?

requests package conflict

ERROR: Cannot install -r requirements.txt (line 66) and urllib3==1.26.5 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested urllib3==1.26.5
requests 2.24.0 depends on urllib3!=1.25.0, !=1.25.1, <1.26 and >=1.21.1

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict