thetawom / mabby Goto Github PK

A multi-armed bandit (MAB) simulation library in Python

Home Page: https://thetawom.github.io/mabby/

License: Apache License 2.0

Makefile 4.03% Python 95.97%

multi-armed-bandits probability python reinforcement-learning simulation agent-based-simulation artificial-intelligence epsilon-greedy thompson-sampling

mabby's Introduction

mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation problem and framework in reinforcement learning. It allows users to quickly yet flexibly define and run bandit simulations, with the ability to:

choose from a wide range of classic bandit algorithms to use
configure environments with custom arm spaces and rewards distributions
collect and visualize simulation metrics like regret and optimality

Installation

Prerequisites: Python 3.9+ and pip

Install mabby with pip:

pip install mabby

Basic Usage

The code example below demonstrates the basic steps of running a simulation with mabby. For more in-depth examples, please see the Usage Examples section of the mabby documentation.

import mabby as mb

# configure bandit arms
bandit = mb.BernoulliArm.bandit(p=[0.3, 0.6])

# configure bandit strategy
strategy = mb.strategies.EpsilonGreedyStrategy(eps=0.2)

# setup simulation
simulation = mb.Simulation(bandit=bandit, strategies=[strategy])

# run simulation
stats = simulation.run(trials=100, steps=300)

# plot regret statistics
stats.plot_regret()

Contributing

Please see CONTRIBUTING for more information.

License

This software is licensed under the Apache 2.0 license. Please see LICENSE for more information.

mabby's People

Contributors

Stargazers

Watchers

Forkers

dag2226

mabby's Issues

setup static type-checking

add GitHub issues badge to README.md

deploy docs with GitHub pages

do first deploy

[FEATURE] support non-stationary rewards

Is your feature request related to a problem? Please describe.
Currently, the rewards distribution of each arm is fixed, so the library can't simulate non-stationary (restless) bandit problems.

Describe the solution you'd like
When implementing a custom Arm, there should be an update() function that can be overridden to update the reward distribution's parameters. Then each step, the ArmSet should call update() for each arm to advance its reward distribution.

setup pyproject-fmt

See pyproject-fmt

do first release

setup testing

mabby/init.py uses relative imports

From PEP8:

Relative imports for intra-package imports are highly discouraged. Always use the absolute package path for all imports. Even now that PEP 328 [7] is fully implemented in Python 2.5, its style of explicit relative imports is actively discouraged; absolute imports are more portable and usually more readable.

track rewards and regret statistics in simulation

Is your feature request related to a problem? Please describe.
To be able to evaluate or compare bandit algorithms, it's important to have statistics on the rewards and regret of the bandit.

Describe the solution you'd like
Simulation.run() should produce or return a collection of statistics such as:

mean reward for the bandit on round i across trials
mean cumulative regret for the bandit on round i across trials
mean percentage of best arm played for the bandit on round i across trials

[FEATURE] implement Thompson sampling strategy

[BUG] arms allow invalid parameters

BernoulliArm and GaussianArm both allow invalid parameters, e.g., non-probability p or negative scale. Initializers should check parameters and throw a ValueError if invalid.

add integration tests for Thompson sampling strategy

bandit repr or str should use its name

For clarity and consistency, bandit repr or str should either return the bandit's default name or the custom name.

move away from global numpy RNG

See https://albertcthomas.github.io/good-practices-random-number-generators/

add implementation for random bandit

Is your feature request related to a problem? Please describe.
Sometimes it would also be helpful to have an implementation for a random bandit that can serve as a baseline for comparison. The random bandit would just choose arms at random.

Describe the solution you'd like
Have a RandomBandit class, which can just be a special case of EpsilonGreedyBandit with eps fixed at 1.

Describe alternatives you've considered
We could just use an epsilon greedy bandit with epsilon of 1. However, (1) it would be nice if the random bandit could have a special default name, and (2) as other semi-uniform bandit algorithms are implemented there might be multiple ways of getting a "random bandit" through parameter settings so it would be nice to have a consistent one to use.

add integration tests for random bandit

add integration tests for UCB1 bandit

[BUG] np.argmax returns first argmax

When bandit algorithms take the argmax over various arrays to choose an arm, if there are multiple argmax's, then a random one should be returned instead of the first one.

add type hinting and annotation

[BUG] Bandit `play()` uses the wrong RNG

Describe the bug
Bandit is created with an rng passed into the initializer, but the play() method ignores the set rng and takes another one as an argument. This produces inconsistent behavior.

[BUG] empty armsets

Describe the bug
There's currently many ways to create an empty ArmSet, either from passing in an empty list, or calling one of the Arm.armset methods with no parameters (or one parameter that is an empty list). This causes an error when the bandit tries to choose from the empty ArmSet in a simulation run.

To Reproduce
Running the following code

import mabby as mb
bandit = mb.EpsilonGreedyBandit(eps=0.1)
armset = mb.BernoulliArm.armset()
sim = mb.Simulation(bandits=[bandit], armset=armset)
sim.run(trials=1, steps=5)

produces the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../mabby/mabby/simulation.py", line 27, in run
    self._run_trial(stats, bandit, steps)
  File "/.../mabby/mabby/simulation.py", line 33, in _run_trial
    choice = bandit.choose(self._rng)
  File "/.../mabby/mabby/bandits.py", line 36, in choose
    self._choice = self._choose(rng)
  File "/.../mabby/mabby/bandits.py", line 85, in _choose
    return int(np.argmax(self._Qs))
  File "<__array_function__ internals>", line 200, in argmax
  File "/.../python3.9/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
  File "/.../python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Expected behavior
There should be checking for errors on both ends:

the bandit should not throw an error when being simulated with an empty ArmSet
an empty ArmSet should not be able to be created (for now, unless some other use case requires it)

custom naming for bandits

Is your feature request related to a problem? Please describe.
Currently, when simulation stats from bandits are plotted, they are labeled numerically (e.g., "bandit 0", "bandit 1"). This makes the plot difficult to read because it is unclear which bandit corresponds to which numerical label.

Describe the solution you'd like
There should be some way of naming or tagging a bandit, which will then override the default label for the bandit when its stats are plotted. For specific bandit implementations, default names can also be based off parameter values.

setup pytest-randomly

There are a few unit and integration tests that have randomness. It would be nice to set up a plugin like pytest-randomly to control random.seed and also shuffle the order of tests.

[FEATURE] add eps-first and eps-decreasing strategy

Is your feature request related to a problem? Please describe.
It would be great if the library could support other semi-uniform bandit strategies in addition to epsilon-greedy, such as epsilon-first and epsilon-decreasing.

Describe the solution you'd like
Because semi-uniform bandits largely share the same strategy (main difference is how to determine the effective value of epsilon each step), a SemiUniformBandit abstract class implementing the bulk of this strategy can be created. Then, each variant (epsilon-greedy, epsilon-first, epsilon-decreasing) can sub-class SemiUniformBandit.

add CI

rename "rounds" to "steps"

Currently, the word "rounds" is used to denote steps or iterations in each simulation trial. However, the singular "round" shadows Python's built-in round() and introduces warnings. Using a word like "steps" or "turns" would be better.

[FEATURE] better customizability in stats tracking

Is your feature request related to a problem? Please describe.
When running a simulation, not all of the statistics being tracked may be needed, especially if there are (eventually) a large number of statistics available for tracking.

Describe the solution you'd like
There should be an interface (maybe in Simulation.run()) where the statistics to be tracked are specified. If left empty, then all statistics are tracked by default.

This setup should also integrate nicely with the visualization interface to designate, for example, which statistics should be plotted. Eventually, there might also be strategy-specific statistics or user-defined statistics that can be tracked.

[FEATURE] tracking action value estimates

It would be nice to be able to track the moving action value estimates for different strategies across steps in a trial.

[FEATURE] support saving simulation stats/results

Is your feature request related to a problem? Please describe.
When conducting experiments with long simulation runs, it's helpful to be able to save run results to file, then pull them back up to re-plot alongside other simulation results.

Describe the solution you'd like
Possible approach: pickleSimStats or convert each bandit's run stats to a pandas DataFrame and store it in Feather format.