This repository contains code for the paper "Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning". https://arxiv.org/abs/2210.03022
Integrate the MARLGrid environment into the codebase and include coordination and heterogeneity levels
Put all relevant files under the folder src/envs/marlgrid/. Once the environment code is ready, make sure it's callable from src/envs/__init__.py using the get_env function.
We still need to implement additional PPO training details to benefit from the full performance of IPPO and MAPPO [1,2]. Here are the things that should be implemented:
Feature Pruning: Form a state by concatenating environment provided global state and agent's local observation and then prune out redundant information. This is highly environment specific so we might need to change the obs_to_state_wrapper to account for that. No change needed elsewhere.
Value Normalization: Regress value network output to the normalized value target. This was found to help the training significantly for MAPPO
Recurrent-MAPPO: MAPPO that operates with RNNs (GRU for example) instead of simple MLPs
Frame stacking: Provide a stack of observations instead of only one
Add the possibility to use a CNN architecture for all implemented algorithms in the code base. Appropriate reshaping might be needed, in that case, make sure runner.py is compatible with it too