Refactoring: Change MERLIN so that it can be used from ActorCriticAlgorithm

ALF

Agent Learning Framework (ALF) is a reinforcement learning framework emphasizing on the flexibility and easiness of implementing complex algorithms involving many different components. ALF is built on PyTorch. The development of previous version based on Tensorflow 2.1 has stopped as of Feb 2020.

Tutorial

A draft tutorial can be accessed on RTD. This tutorial is still under construction and some chapters are unfinished yet.

Documentation

Read the ALF documentation here.

Algorithms

Algorithm	Type	Reference
A2C	On-policy RL	OpenAI Baselines: ACKTR & A2C
PPO	On-policy RL	Schulman et al. "Proximal Policy Optimization Algorithms" arXiv:1707.06347
PPG	On-policy RL	Cobbe et al. "Phasic Policy Gradient" arXiv:2009.04416
DDQN	Off-policy RL	Hasselt et al. "Deep Reinforcement Learning with Double Q-learning" arXiv:1509.06461
DDPG	Off-policy RL	Lillicrap et al. "Continuous control with deep reinforcement learning" arXiv:1509.02971
QRSAC	Off-policy RL	Dabney et al. "Distributional Reinforcement Learning with Quantile Regression" arXiv:1710.10044
SAC	Off-policy RL	Haarnoja et al. "Soft Actor-Critic Algorithms and Applications" arXiv:1812.05905
OAC	Off-policy RL	Ciosek et al. "Better Exploration with Optimistic Actor-Critic" arXiv:1910.12807
HER	Off-policy RL	Andrychowicz et al. "Hindsight Experience Replay" arXiv:1707.01495
TAAC	Off-policy RL	Yu et al. "TAAC: Temporally Abstract Actor-Critic for Continuous Control" arXiv:2104.06521
SEditor	Off-policy/Safe RL	Yu et al. "Towards Safe Reinforcement Learning with a Safety Editor Policy" NeurIPS 2022
DIAYN	Intrinsic motivation/Exploration	Eysenbach et al. "Diversity is All You Need: Learning Diverse Skills without a Reward Function" arXiv:1802.06070
ICM	Intrinsic motivation/Exploration	Pathak et al. "Curiosity-driven Exploration by Self-supervised Prediction" arXiv:1705.05363
RND	Intrinsic motivation/Exploration	Burda et al. "Exploration by Random Network Distillation" arXiv:1810.12894
MuZero	Model-based RL	Schrittwieser et al. "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" arXiv:1911.08265
BC	Offline RL	Pomerleau "ALVINN: An Autonomous Land Vehicle in a Neural Network" NeurIPS 1988 Bain et al. "A framework for behavioural cloning" Machine Intelligence 1999
Causal BC	Offline RL	Swamy et al. "Causal Imitation Learning under Temporally Correlated Noise" ICML2022
IQL	Offline RL	Kostrikov, et al. "Offline Reinforcement Learning with Implicit Q-Learning" arXiv:2110.06169
MERLIN	Unsupervised learning	Wayne et al. "Unsupervised Predictive Memory in a Goal-Directed Agent"arXiv:1803.10760
MoNet	Unsupervised learning	Burgess et al. "MONet: Unsupervised Scene Decomposition and Representation" arXiv:1901.11390
Amortized SVGD	General	Feng et al. "Learning to Draw Samples with Amortized Stein Variational Gradient Descent" arXiv:1707.06626
HyperNetwork	General	Ratzlaff and Fuxin. "HyperGAN: A Generative Model for Diverse, Performant Neural Networks" arXiv:1901.11058
MCTS	General	Grill et al. "Monte-Carlo tree search as regularized policy optimization" arXiv:2007.12509
MINE	General	Belghazi et al. "Mutual Information Neural Estimation" arXiv:1801.04062
ParticleVI	General	Liu and Wang. "Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm" arXiv:1608.04471 Liu et al. "Understanding and accelerating particle-based variational inference" arXiv:1807.01750
GPVI	General	Ratzlaff, Bai et al. "Generative Particle Variational Inference via Estimation of Functional Gradients" arXiv:2103.01291
SVGD optimizer	General	Liu et al. "Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm" arXiv:1608.04471
VAE	General	Higgins et al. "beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework" ICLR2017
RealNVP	General	Dinh et al. "Density estimation using Real NVP" arXiv:1605.08803
SpatialBroadcastDecoder	General	Watters et al. "Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs" arXiv:1901.07017
VQ-VAE	General	A van den Oord et al. "Neural Discrete Representation Learning" NeurIPS2017

Installation

OS software

The following installation was tested on Ubuntu22.04 with CUDA 11.8.

Python3.11 is currently supported by ALF. Note that some pip packages (e.g., pybullet) need python dev files, so make sure python3.11-dev is installed:

sudo apt install -y python3.11 python3.11-dev

Boost is also required by ALF for fast parallel environments.

sudo apt install libboost-all-dev

Python environment

Virtualenv is recommended for the installation. After creating and activating a virtual env, you can run the following commands to install ALF:

git clone https://github.com/HorizonRobotics/alf
cd alf
pip install pybind11
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu118

For Nix Users

There is a built-in Nix-based development environment defined in flake.nix. To activate it, run

$ nix develop

in the root of your local repository.

Docker

We also provide a docker image of ALF for convenience. In order to use this image, you need to have docker and nvidia-docker (for ALF gpu usage) installed first.

docker run --gpus all -it horizonrobotics/cuda:11.8.0-py3.11-torch2.2-ubuntu22.04 /bin/bash

This will give you a shell that have all ALF and dependencies pre-installed.

The current docker image contains an ALF version on Feb 21, 2024. Regular version updates are expected in the future.

Examples

You can train any _conf.py file under alf/examples as follows:

python -m alf.bin.train --conf=CONF_FILE --root_dir=LOG_DIR

CONF_FILE is the path to your conf file which follows ALF configuration file format (basically python).
LOG_DIR is the directory when you want to store the training results. Note that if you want to train from scratch, LOG_DIR must point to a location that doesn't exist. Otherwise, it is assumed to resume the training from a previous checkpoint (if any).

During training, we use tensorboard to show the progress of training:

tensorboard --logdir=LOG_DIR

After training, you can evaluate the trained model and visualize environment frames using the following command:

python -m alf.bin.play --root_dir=LOG_DIR

Deprecated

An older version of ALF used gin for job configuration. Its syntax is not as flexible as ALF conf (e.g., you can't easily do math computation in a gin file). There are still some examples with .gin under alf/examples. We are in the process of converting all .gin examples to _conf.py examples.

You can train any .gin file under alf/examples using the following command:

cd alf/examples; python -m alf.bin.train --gin_file=GIN_FILE --root_dir=LOG_DIR

GIN_FILE is the path to the gin conf (some .gin files under alf/examples might be invalid; they have not been converted to use the latest pytorch version of ALF).
LOG_DIR has the same meaning as in the ALF conf example above.

Warning: When using gin, ALF has to be launched in the same directory with the gin file(s). If an error says that no configuration file is found, then probably you've launched ALF in a wrong place.

All the examples below are trained on a single machine Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz with 32 CPUs and one RTX 2080Ti GPU.

A2C

Cart pole. The training score took only 30 seconds to reach 200, using 8 environments.
Atari games. Need to install python package atari-py for atari game environments. The evaluation score (by taking argmax of the policy) took 1.5 hours to reach 800 on Breakout, using 64 environments.
Simple navigation with visual input. Follow the instruction at SocialRobot to install the environment.

PPO

PR2 grasping state only. Follow the instruction at SocialRobot to install the environment.
Humanoid. Learning to walk using the pybullet Humanoid environment. Need to install python pybullet>=2.5.0 for the environment. The evaluation score reaches 3k in 50M steps, using 96 parallel environments.

PPG

procgen. Game "bossfight" as an example. Need to install python package procgen.
MetaDrive. Learning to drive on randomly generated map with interaction on the MetaDrive simulator, with BEV as input. Need to install python package metadrive-simulator.

DDQN

DDQN on Atari. Game "Q*Bert" performance.

DDPG

FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This example reproduces the performance of vanilla DDPG reported in the OpenAI's Robotics environment paper. Our implementation doesn't use MPI, but obtains (evaluation) performance on par with the original implementation. (The original MPI implementation has 19 workers, each worker containing 2 environments for rollout and sampling a minibatch of size 256 from its replay buffer for computing gradients. All the workers' gradients will be summed together for a centralized optimizer step. Our implementation simply samples a minibatch of size 5000 from a common replay buffer per optimizer step.) The training took about 1 hour with 38 (19*2) parallel environments on a single GPU.

SAC

Bipedal Walker.
FetchReach (sparse rewards). Need to install the MuJoCo simulator first. The training took about 20 minutes with 20 parallel environments on a single GPU.
FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This is the same task with the DDPG example above, but with SAC as the learning algorithm. Also it has only 20 (instead of 38) parallel environments to improve sample efficiency. The training took about 2 hours on a single GPU.
Fetch Environments (sparse rewards) w/ Action Repeat. We are able to achieve even better performance than reported by DDPG + Hindsight Experience Replay in some cases simply by using SAC + Action Repeat with length 3 timesteps. See this note to view learning curves, videos, and more details.

ICM

Super Mario. Playing Super Mario only using intrinsic reward. Python package gym-retro>=0.7.0 is required for this experiment and also a suitable SuperMarioBros-Nes rom should be obtained and imported (roms are not included in gym-retro). See this doc on how to import roms.

RND

Montezuma's Revenge. Training the hard exploration game Montezuma's Revenge with intrinsic rewards generated by RND. A lucky agent can get an episodic score of 6600 in 160M frames (40M steps with frame_skip=4). A normal agent would get an episodic score of 4000~6000 in the same number of frames. The training took about 6.5 hours with 128 parallel environments on a single GPU.

DIAYN

Pendulum. Learning diverse skills without external rewards.

BC

Pendulum. Learning a control policy from offline demonstrations.

Merlin

Collect Good Objects. Learn to collect good objects and avoid bad objects. DeepmindLab is required, Follow the instruction at DeepmindLab to install the environment.

MuZero

6x6 Go. It took about a day to train a reasonable agent to play 6x6 go using one GPU.

Citation

If you use ALF for research and find it useful, please consider citing:

@software{Xu2021ALF,
  title={{{ALF}: Agent Learning Framework}},
  author={Xu, Wei and Yu, Haonan and Zhang, Haichao and Hong, Yingxiang and Yang, Break and Zhao, Le and Bai, Jerry and ALF contributors},
  url={https://github.com/HorizonRobotics/alf},
  year={2021}
}

Contribute to ALF

You are welcome to contribute to ALF. Please follow the guideline here.

	py_env = env_load_fn(env_name)
	py_env.seed(np.random.randint(0, np.iinfo(np.int32).max))

horizonrobotics / alf Goto Github PK

alf's Introduction

ALF

Tutorial

Documentation

Algorithms

Installation

OS software

Python environment

For Nix Users

Docker

Examples

Deprecated

A2C

PPO

PPG

DDQN

DDPG

SAC

ICM

RND

DIAYN

BC

Merlin

MuZero

Citation

Contribute to ALF

alf's People

Contributors

Stargazers

Watchers

Forkers

alf's Issues

====================================================================== FAIL: test_ppo_cart_pole (bin.train_test.TrainTest) test_ppo_cart_pole (bin.train_test.TrainTest)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

======================================================================
FAIL: test_ppo_cart_pole (bin.train_test.TrainTest)
test_ppo_cart_pole (bin.train_test.TrainTest)