GithubHelp home page GithubHelp logo

thetawom / mabby Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 1.0 3.96 MB

A multi-armed bandit (MAB) simulation library in Python

Home Page: https://thetawom.github.io/mabby/

License: Apache License 2.0

Makefile 4.03% Python 95.97%
multi-armed-bandits probability python reinforcement-learning simulation agent-based-simulation artificial-intelligence epsilon-greedy thompson-sampling

mabby's Introduction

PyPI license issues build docs coverage

mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation problem and framework in reinforcement learning. It allows users to quickly yet flexibly define and run bandit simulations, with the ability to:

  • choose from a wide range of classic bandit algorithms to use
  • configure environments with custom arm spaces and rewards distributions
  • collect and visualize simulation metrics like regret and optimality

Installation

Prerequisites: Python 3.9+ and pip

Install mabby with pip:

pip install mabby

Basic Usage

The code example below demonstrates the basic steps of running a simulation with mabby. For more in-depth examples, please see the Usage Examples section of the mabby documentation.

import mabby as mb

# configure bandit arms
bandit = mb.BernoulliArm.bandit(p=[0.3, 0.6])

# configure bandit strategy
strategy = mb.strategies.EpsilonGreedyStrategy(eps=0.2)

# setup simulation
simulation = mb.Simulation(bandit=bandit, strategies=[strategy])

# run simulation
stats = simulation.run(trials=100, steps=300)

# plot regret statistics
stats.plot_regret()

Contributing

Please see CONTRIBUTING for more information.

License

This software is licensed under the Apache 2.0 license. Please see LICENSE for more information.

mabby's People

Contributors

dag2226 avatar dependabot[bot] avatar thetawom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dag2226

mabby's Issues

[FEATURE] support non-stationary rewards

Is your feature request related to a problem? Please describe.
Currently, the rewards distribution of each arm is fixed, so the library can't simulate non-stationary (restless) bandit problems.

Describe the solution you'd like
When implementing a custom Arm, there should be an update() function that can be overridden to update the reward distribution's parameters. Then each step, the ArmSet should call update() for each arm to advance its reward distribution.

mabby/__init__.py uses relative imports

From PEP8:

Relative imports for intra-package imports are highly discouraged. Always use the absolute package path for all imports. Even now that PEP 328 [7] is fully implemented in Python 2.5, its style of explicit relative imports is actively discouraged; absolute imports are more portable and usually more readable.

track rewards and regret statistics in simulation

Is your feature request related to a problem? Please describe.
To be able to evaluate or compare bandit algorithms, it's important to have statistics on the rewards and regret of the bandit.

Describe the solution you'd like
Simulation.run() should produce or return a collection of statistics such as:

  • mean reward for the bandit on round i across trials
  • mean cumulative regret for the bandit on round i across trials
  • mean percentage of best arm played for the bandit on round i across trials

[BUG] arms allow invalid parameters

BernoulliArm and GaussianArm both allow invalid parameters, e.g., non-probability p or negative scale. Initializers should check parameters and throw a ValueError if invalid.

add implementation for random bandit

Is your feature request related to a problem? Please describe.
Sometimes it would also be helpful to have an implementation for a random bandit that can serve as a baseline for comparison. The random bandit would just choose arms at random.

Describe the solution you'd like
Have a RandomBandit class, which can just be a special case of EpsilonGreedyBandit with eps fixed at 1.

Describe alternatives you've considered
We could just use an epsilon greedy bandit with epsilon of 1. However, (1) it would be nice if the random bandit could have a special default name, and (2) as other semi-uniform bandit algorithms are implemented there might be multiple ways of getting a "random bandit" through parameter settings so it would be nice to have a consistent one to use.

[BUG] np.argmax returns first argmax

When bandit algorithms take the argmax over various arrays to choose an arm, if there are multiple argmax's, then a random one should be returned instead of the first one.

[BUG] Bandit `play()` uses the wrong RNG

Describe the bug
Bandit is created with an rng passed into the initializer, but the play() method ignores the set rng and takes another one as an argument. This produces inconsistent behavior.

[BUG] empty armsets

Describe the bug
There's currently many ways to create an empty ArmSet, either from passing in an empty list, or calling one of the Arm.armset methods with no parameters (or one parameter that is an empty list). This causes an error when the bandit tries to choose from the empty ArmSet in a simulation run.

To Reproduce
Running the following code

import mabby as mb
bandit = mb.EpsilonGreedyBandit(eps=0.1)
armset = mb.BernoulliArm.armset()
sim = mb.Simulation(bandits=[bandit], armset=armset)
sim.run(trials=1, steps=5)

produces the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../mabby/mabby/simulation.py", line 27, in run
    self._run_trial(stats, bandit, steps)
  File "/.../mabby/mabby/simulation.py", line 33, in _run_trial
    choice = bandit.choose(self._rng)
  File "/.../mabby/mabby/bandits.py", line 36, in choose
    self._choice = self._choose(rng)
  File "/.../mabby/mabby/bandits.py", line 85, in _choose
    return int(np.argmax(self._Qs))
  File "<__array_function__ internals>", line 200, in argmax
  File "/.../python3.9/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
  File "/.../python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Expected behavior
There should be checking for errors on both ends:

  • the bandit should not throw an error when being simulated with an empty ArmSet
  • an empty ArmSet should not be able to be created (for now, unless some other use case requires it)

custom naming for bandits

Is your feature request related to a problem? Please describe.
Currently, when simulation stats from bandits are plotted, they are labeled numerically (e.g., "bandit 0", "bandit 1"). This makes the plot difficult to read because it is unclear which bandit corresponds to which numerical label.

Describe the solution you'd like
There should be some way of naming or tagging a bandit, which will then override the default label for the bandit when its stats are plotted. For specific bandit implementations, default names can also be based off parameter values.

setup pytest-randomly

There are a few unit and integration tests that have randomness. It would be nice to set up a plugin like pytest-randomly to control random.seed and also shuffle the order of tests.

[FEATURE] add eps-first and eps-decreasing strategy

Is your feature request related to a problem? Please describe.
It would be great if the library could support other semi-uniform bandit strategies in addition to epsilon-greedy, such as epsilon-first and epsilon-decreasing.

Describe the solution you'd like
Because semi-uniform bandits largely share the same strategy (main difference is how to determine the effective value of epsilon each step), a SemiUniformBandit abstract class implementing the bulk of this strategy can be created. Then, each variant (epsilon-greedy, epsilon-first, epsilon-decreasing) can sub-class SemiUniformBandit.

rename "rounds" to "steps"

Currently, the word "rounds" is used to denote steps or iterations in each simulation trial. However, the singular "round" shadows Python's built-in round() and introduces warnings. Using a word like "steps" or "turns" would be better.

[FEATURE] better customizability in stats tracking

Is your feature request related to a problem? Please describe.
When running a simulation, not all of the statistics being tracked may be needed, especially if there are (eventually) a large number of statistics available for tracking.

Describe the solution you'd like
There should be an interface (maybe in Simulation.run()) where the statistics to be tracked are specified. If left empty, then all statistics are tracked by default.

This setup should also integrate nicely with the visualization interface to designate, for example, which statistics should be plotted. Eventually, there might also be strategy-specific statistics or user-defined statistics that can be tracked.

[FEATURE] support saving simulation stats/results

Is your feature request related to a problem? Please describe.
When conducting experiments with long simulation runs, it's helpful to be able to save run results to file, then pull them back up to re-plot alongside other simulation results.

Describe the solution you'd like
Possible approach: pickleSimStats or convert each bandit's run stats to a pandas DataFrame and store it in Feather format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.