GithubHelp home page GithubHelp logo

rl-project1's Introduction

Bandit Learning Algorithms

Overview

This repository contains the code and report for an assignment on bandit learning algorithms. The goal of this assignment is to experiment with various bandit learning algorithms, implement them from scratch, and compare their performance. The comparison is based on two metrics: (1) the average accumulated reward and (2) the proportion of time the optimal action is taken.

The assignment consists of two parts:

Part 1: Stationary Reward Distributions
Part 2: Non-Stationary Reward Distributions

Repository Structure

.ipynb_checkpoints/: Contains Jupyter notebook checkpoints. rl-a1.pdf: Project description
Part_1_RL_Project1.ipynb: Jupyter notebook for Part 1: Stationary Reward Distributions.
Part2-Drift_change.ipynb: Jupyter notebook for the drift change experiment in Part 2.
Part2_AbruptChange.ipynb: Jupyter notebook for the abrupt change experiment in Part 2.
Part2_mean_reverting_change.ipynb: Jupyter notebook for the mean-reverting change experiment in Part 2.

Requirements

To replicate the results in this repository, you will need the following packages: Python 3.x
NumPy
Matplotlib
Seaborn
Jupyter Notebook

Instructions

Part 1: Stationary Reward Distributions For each method, we repeated the experiment for 1000 different bandit problems (1000 sets of ten mean parameters), and 10,000 steps.

  1. Greedy with Non-Optimistic Initial Values: Initialize the action value estimates to 0. Use the incremental implementation of the simple average method.

  2. Epsilon-Greedy with Different Choices of Epsilon: Experimented with different values of epsilon and used pilot runs to choose the best epsilon value.

  3. Optimistic Starting Values with a Greedy Approach: Assume knowledge of the means of each reward distribution to set optimistic initial values. (5)

  4. Gradient Bandit Algorithm: Experimented with different learning rates (α) to determine the best one. The average reward acquired by the algorithm at each time step.

Part 2: Non-Stationary Reward Distributions

  1. Gradual Changes: Applied a drift change (µt = µt−1 + ϵt) where ϵt = N(0, 0.001²).
  2. Applied a mean-reverting change (µt = κµt−1 + ϵt) where κ = 0.5 and ϵt ~ N(0, 0.01²).
  3. Abrupt Changes: At each time step, with probability 0.005, permuted the means corresponding to each reward distribution.

Evaluation

Compared optimistic greedy method, ϵ-greedy with a fixed step size, and ϵ-greedy with a decreasing step size. Run the algorithm on 1000 repetitions of the non-stationary problem and report the distribution of the average reward attained at the terminal step using box plots. Results

Authors

This project was completed by Himani Thakkar and Parinaz Shiri. If you have any questions or need further assistance, please feel free to contact us.

rl-project1's People

Contributors

hiimani28 avatar parinaz-shiri avatar

Watchers

 avatar

Forkers

parinaz-shiri

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.