GithubHelp home page GithubHelp logo

microsoft / orchestrated-value-mapping Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 5.0 21 KB

Orchestrated Value Mapping

Home Page: https://openreview.net/forum?id=c87d0TS4yX

License: Other

Python 100.00%
algorithm algorithms mapping reinforcement-learning reward-decomposition rl value value-mapping dqn log-lin loglinear log-rl loglin logrl q-decomporition

orchestrated-value-mapping's Introduction

Orchestrated Value Mapping

This repository hosts the code release for the paper "Orchestrated Value Mapping for Reinforcement Learning", published at ICLR 2022. This work was done by Mehdi Fatemi (Microsoft Research) and Arash Tavakoli (Max Planck Institute for Intelligent Systems).

We release a flexible framework, built upon Dopamine (Castro et al., 2018), for building and orchestrating various mappings over different reward decomposition schemes. This enables the research community to easily explore the design space that our theory opens up and investigate new convergent families of algorithms.

The code has been developed by Arash Tavakoli.

Citing

If you make use of our work, please use the citation information below:

@inproceedings{Fatemi2022Orchestrated,
  title={Orchestrated Value Mapping for Reinforcement Learning},
  author={Mehdi Fatemi and Arash Tavakoli},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=c87d0TS4yX}
}

Getting started

We install the required packages within a virtual environment.

Virtual environment

Create a virtual environment using conda via:

conda create --name maprl-env python=3.8
conda activate maprl-env

Prerequisites

Atari benchmark. To set up the Atari suite, please follow the steps outlined here.

Install Dopamine. Install a compatible version of Dopamine with pip:

pip install dopamine-rl==3.1.10

Installing from source

To easily experiment within our framework, install it from source and modify the code directly:

git clone https://github.com/microsoft/orchestrated-value-mapping.git
cd orchestrated-value-mapping
pip install -e .

Training an agent

Change directory to the workspace directory:

cd map_rl

To train a LogDQN agent, similar to that introduced by van Seijen, Fatemi & Tavakoli (2019), run the following command:

python -um map_rl.train \
  --base_dir=/tmp/log_dqn \
  --gin_files='configs/map_dqn.gin' \
  --gin_bindings='MapDQNAgent.map_func_id="[log,log]"' \
  --gin_bindings='MapDQNAgent.rew_decomp_id="polar"' &

Here, polar refers to the reward decomposition scheme described in Equation 13 of Fatemi & Tavakoli (2022) (which has two reward channels) and [log,log] results in a logarithmic mapping for each of the two reward channels.

Train a LogLinDQN agent, similar to that described by Fatemi & Tavakoli (2022), using:

python -um map_rl.train \
  --base_dir=/tmp/loglin_dqn \
  --gin_files='configs/map_dqn.gin' \
  --gin_bindings='MapDQNAgent.map_func_id="[loglin,loglin]"' \
  --gin_bindings='MapDQNAgent.rew_decomp_id="polar"' &

Creating custom agents

To instantiate a custom agent, simply set the mapping functions for each channel and a reward decomposition scheme. For instance, the following setting

MapDQNAgent.map_func_id="[log,identity]"
MapDQNAgent.rew_decomp_id="polar"

results in a logarithmic mapping for the positive-reward channel and the identity mapping (same as in DQN) for the negative-reward channel.

To use more complex reward decomposition schemes, such as Configurations 1 and 2 from Fatemi & Tavakoli (2022), you can do as follows:

MapDQNAgent.map_func_id="[identity,identity,log,log,loglin,loglin]"
MapDQNAgent.rew_decomp_id="config_1"

To instantiate an ensemble of two learners, each using a polar reward decomposition, use the following syntax:

MapDQNAgent.map_func_id="[loglin,loglin,log,log]"
MapDQNAgent.rew_decomp_id="two_ensemble_polar"

Custom mappings and reward decomposition schemes

To implement custom mapping functions and reward decomposition schemes, we suggest that you draw on insights from Fatemi & Tavakoli (2022) and follow the format of such methods in map_dqn_agent.py to design yours.

orchestrated-value-mapping's People

Contributors

fatemi avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.