GithubHelp home page GithubHelp logo

ericschuma / bilevel-rl Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 27 KB

This repository holds code for a bilevel meta-gradient reinforcement learning variant of DQN: the Intrinsic Reward Deep Q-Network (IRDQN).

License: MIT License

Python 100.00%

bilevel-rl's Introduction

Learning intrinsic rewards with bilevel reinforcement learning

This repository holds code for a bilevel meta-gradient reinforcement learning variant of DQN: Intrinsic Reward Deep Q-Network (IRDQN).

Abstract

Reinforcement learning teaches intelligent agents how to act in dynamic environments through scalar rewards. There are many feasible ways to define a reward for solving a given task. However, designing rewards requires extensive domain knowledge and engineering efforts.

A popular application for the reinforcement learning approach is traffic signal control, where agents control traffic lights at intersections. Works in this field predominantly define rewards that minimize vehicle travel times. Because travel times are inefficient as a reward – they are delayed, sparse, and influenced by factors outside the agent’s control – researchers use combinations of other traffic metrics as the reward.

This approach can be problematic because specific weight choices might lead to considerable differences in agent performance and learning efficiency. This work aims to mitigate the difficulty of choosing weightings. We investigate this problem with the sustainability-oriented goal of reducing CO2 emissions in traffic.

We studied how sensitive agents are to pre-defined weightings of combined rewards. Our insight for mitigating this sensitivity was to relax the assumption that the weightings are pre-defined by an agent designer. We propose a method that treats the weightings as intrinsic to the agent. Our implementation builds on the famous Deep Q-Network; hence, we call our algorithm the Intrinsic Reward Deep Q-Network (IRDQN). Based on a meta-learning assumption, we consider that the agent’s experience contains knowledge about learning itself. In this work, the IRDQN agent meta-learns reward weights online through gradient descent.

Our results indicate that some task objectives (e.g., a CO2 emission reward) are inefficient for learning, agents are sensitive to combined reward weightings, and meta-learning these weightings can benefit agent performance and learning efficiency. The proposed IRDQN agent learned reward weights that lead to the desired behavior of reducing CO2 emissions.

Our study revealed some interesting limitations of the IRDQN algorithm, such as a lack of exploration and its sensitivity to imbalanced weights. In the future, we plan to investigate these issues further to see if we can improve the algorithm’s performance.

Setup

  1. Clone this repository (including submodules): git clone --recurse_submodules https://github.com/EricSchuMa/bilevel-rl.git.
  2. Follow the intructions in sumo_rl/README.md for installing the SUMO traffic simulator.
  3. Create a conda envrionment with python 3.8: conda create -n bilevel-rl python=3.8.
  4. Activate the conda environment: conda activate bilevel-rl.
  5. Add your local repository path to the python PATH variable: export PYTHONPATH="${PYTHONPATH}:{/path/to/bilevel-rl}.
  6. Install the requirements with pip: pip install -r requirements.txt.

Running

From the project root, run the following command to train a DQN or IRDQN agent:

python experiments/train.py --config-path experiments/configs/{config}

where {config} should be replaced by a config file. Available config files are experiments/configs/DQN.ini and experiments/configs/IRDQN.ini.

The training logs are saved to the folder mlruns. You can access the logs by running a MLflow server:

mlflow ui

Examples

1: DQN trained with brake and queue reward

Video of DQN with brake and queue weights of 0.5 controlling an intersection in SUMO

2: IRDQN trained with brake and queue reward

Video of IRDQN with initial brake and queue weights of 0.5 controlling an intersection in SUMO

bilevel-rl's People

Contributors

ericschuma avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

finnk11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.