GithubHelp home page GithubHelp logo

flynnowen / multi-armed-bandits Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 1.52 MB

Multi-Armed Bandit method of accurately estimating the largest parameter out of a set of candidates.

Python 98.18% Just 1.82%
machine multi-armed-bandit multi-armed-bandits python reinforcement-learning

multi-armed-bandits's Introduction

Multi Armed Bandits (MAB).

A study consists of a collection of bandits, $B = \{b_1, ..., b_n\}$, where a bandit, $b_i$ has an associated parameter $\theta_i$ (or two parameters) that are within the exponential family of distributions. Thus, we have a collection $\theta = \{\theta_1, ..., \theta_n \}$ of parameters (for the sake of two parameter distribution (e.g Gaussian), we refer to the primary parameter ($\mu$) as this single parameter).

When 'pulling' a bandit, we are generating a random variable from this bandit's distribution. For example, if generating a random variable from bandit $i$ that follows a Poisson distribution, we denote this $X_i \thicksim Pois(\theta_i)$. 'Payout' or 'reward' is what we call this generated random variable, $x_i$, and is what we wish to maximize. In the case of gambling machines, it could be a 'win' (Bernoulli), or some form of 'monetary payout' (Gaussian).

We therefore wish to find the optimal bandit to exploit; or in other words the bandit with the highest parameter: $\tilde{\theta} = max(\{\theta_1, ..., \theta_n\})$.

There is a second constraint however, that is doing this using the fewest generations (bandit pulls) as possible. Reasons for this include being able to maximize cumulative reward, given a fixed number of generations (e.g if there is a time constraint), or generating may have another associated cost (e.g paying for each armed bandit pull).

The problem can therefore be encapsulated as: Given a fixed number of generations, $g \in \{1, ..., G\}$, maximizing cumulative reward: $\sum{x_g}$. This can be done by a number of different strategies - and all include some relation to balancing exploration (trying to accurately estimate each parameter $\{\hat{\theta}_1, ..., \hat{\theta}_n \}$), with exploration (the cumulative reward gained by just generating from the optimal bandit $\tilde{\theta}$).

Usage

Pre-requisites

To run the below commands, it's suggested that Just is installed.

Running Simulations

Options exist for simulating both directly using the command line, or via passing a configuration file in the form of JSON. The second is recommended for reproducable simulation studies.

To see all possible simulation commands run

just help

or help for a specific command:

just {{COMMAND}} help

e.g:

just list-distributions help

To view possible distributions, and information about their associated parameters, run:

just list-distributions

Simulating directly

To run a simulation by passing arguments directly, run:

just {{COMMAND}} {{*ARGS}}

Where COMMAND is a command returned from just help, and args are those required by a specific command upon running just {{COMMAND}} help.

Simulating from config (json) file

There is the option to perform a MAB simulation, reading arguments and configuration from a config file. To perform this simulation, run:

just simulate-from-json {{COMMAND}} {{CONFIG}}

Testing

To run all unit and integration tests in the repository, execute:

just local-test

Sample Outputs

Metrics from the simulations are output to stdout.

Plots are also optionally produced, detailing the the performance of the simulation.

multi-armed-bandits's People

Contributors

flynnowen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

multi-armed-bandits's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.