GithubHelp home page GithubHelp logo

md3qn's Introduction

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Implementations for our paper Distributional Reinforcement Learning for Multi-Dimensional Reward Functions at NeurIPS 2021.

Codes

The implementation of three maze environments is in dopamine/environment/maze.py, and the implementation of MD3QN is in dopamine/agents/mmdqn_nd/mmdqn_nd_agent.py.

Dependencies

Run

conda env create -f environment.yml

in this repository to install all dependencies for this project.

Run experiments

Run python ./script/batch_general.py to run the experiments for both policy evaluation setting and control setting.

batch_general.py provides an interactive process for user to input the configurations, and run the experiment on these configurations.

By running python ./script/batch_general.py it runs all the experiment with the input configurations.

By running python ./script/batch_general.py export, it exports all the commands for experiments (without running) into ./scripts-export. By running python ./script/batch_general.py test, it outputs the command for experiments (without running).

Modeled Joint distribution in policy evaluation on Maze environments

To reproduce Figure 2 in our paper for modeled joint distribution on policy evaluation setting on Maze, run batch_general.py in the following configurations:

  • clip_reward: False
  • icml_setting: False
  • game_name_file: maze.txt
  • alg_name: MMDQNND
  • seed: 0
  • network_type: v21
  • bandwidth_type: v3
  • kscale_type: v11
  • use_priority: False
  • evaluation_setting: True
  • eval_policy_path: use default value (press Enter)
  • iterations: 20
  • exp_name: use default value (press Enter)
  • resume: False
  • gpu_func: use default value (press Enter)

The results will be saved in dopamine_runs/MultiRewardMaze-v0_MMDQNND_seed-0_network-v21_bw-v3_kscale-v11_eval/evaluation_plots/iter-{iteration}/episode-{episode}.pickle

You can use the script plot-maze.py to plot the results in Figure 2.

Control setting on Atari games

Run python ./script/batch_general.py with the following configurations to run the policy optimization experiments on Atari games. To generate the data needed for Figure 3:

  • reward_clipping: True
  • icml_setting: True
  • game_name_file: default value (press Enter)
  • alg_name: DQN or HRA or MMDQN or MMDQNND
  • seed: 0 or 1 or 2
  • if running MD3QN:
    • network_type: v21
    • bandwidth_type: v3
    • kscale_type: v11
    • use_priority: False
    • evaluation_setting: False
  • iterations: 200
  • exp_name: use default value (press Enter)
  • resume: False
  • gpu_func: use default value (press Enter)

The results will be saved in dopamine_runs/{game_name}_MMDQNND_icml_seed-{seed}_network-v21_bw-v3_kscale-v11 and {game_name}_HRA_icml_seed-2

We provide plot-atari.py to generate the Figure 3 in our paper. You can copy all the results for HRA, MMDQN and MD3QN in dopamine_runs/ into data/ folder, and run python plot-atari.py, which will generate Figure 3 in our paper.

RL with multiple constraints by MD3QN

To reproduce the experiments for Appendix A.3.3 (RL with multiple constraints), run python ./script/batch_general.py with the following configurations:

  • clip_reward: False
  • icml_setting: False
  • game_name_file: maze-v3.txt
  • alg_name: MMDQNND
  • seed: 0
  • network_type: v21
  • bandwidth_type: v3
  • kscale_type: v11
  • use_priority: False
  • evaluation_setting: False
  • eval_policy_path: use default value (press Enter)
  • iterations: 20
  • exp_name: use default value (press Enter)
  • resume: False
  • gpu_func: use default value (press Enter)
  • First use_marginal: False (using MD3QN's joint distribution)
  • Second use_marginal: True (using MD3QN's marginal distribution information)

md3qn's People

Contributors

zpschang avatar

Stargazers

Ke Sun avatar Guoqing Liu avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.