zuoxingdong / lagom Goto Github PK

lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

License: MIT License

Python 15.07% Shell 0.17% Jupyter Notebook 84.76%

reinforcement-learning pytorch machine-learning python research deep-learning artificial-intelligence policy-gradient evolution-strategies deep-reinforcement-learning

lagom's People

Contributors

Stargazers

Watchers

lagom's Issues

Update logger: put '-'*50 into arg, maybe called header ?

Update Configurator: remove dataframe_groupview & dataframe_subset

Refactor Network & Module: maybe remove make_params/init_params/reset (?) maybe remove BasePolicy ?

HistoryMetric

For both trajectory and segment, computing TD, GAE or something, extendable

Merge Experiment and Algorithm classes, remove Algo class,

Put this API to Experiment

    def __call__(self, config, seed, device):
        r"""Run the algorithm with a configuration, a random seed and a PyTorch device.
        
        Args:
            config (dict): a dictionary of configuration items
            seed (int): a random seed to run the algorithm
            device (torch.device): a PyTorch device. 
            
        Returns
        -------
        out : object
            output of the algorithm execution. If no need to return anything, then an ``None`` should be returned. 
        """
        pass

Exhaustive unit test with pytest.mark.parametrize

Use Miniconda, change scripts adapted from CI

[Runner]: separate running and converting

Decoupled the sequential running maybe to collect list of data, and then another function to convert everything to batch of Trajectory or Segment. This might make API more generic and flexible.

Add tqdm bar to parallelized experiment

Experiment master creates a tqdm bar with total size of configurations, call update for each process_algo_result after it finishes its job.

Remove A2C and replace with IMPALA

Polish code in lagom.vis and write tests

Add tests for VecStandardizeObservation and VecStandardizeReward

Roadmap 0.0.2

Here we list some todos for next release of 0.0.2

ortho_init doesn't work with PyTorch 0.4.1

When trying to run VAE example got this:

Traceback (most recent call last):
  File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/multiprocessing/base_worker.py", line 47, in __call__
    task_id, result = self.work(master_cmd)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/experiment/base_experiment_worker.py", line 55, in work
    result = algo(config, seed, device_str=device_str)
  File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/algo.py", line 58, in __call__
    model = ConvVAE(config=config)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/base_network.py", line 66, in __init__
    self.init_params(self.config)
  File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/network.py", line 107, in init_params
    ortho_init(layer, nonlinearity='relu', constant_bias=0.0)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/init.py", line 34, in ortho_init
    if isinstance(module, (nn.RNNBase, nn.RNNCellBase)):  # RNN
AttributeError: module 'torch.nn' has no attribute 'RNNCellBase'

Dmytros-MacBook-Pro:vae dkorduban$ pip freeze | grep torch
torch==0.4.1
torchvision==0.2.1

Add MPI master/worker: consistent API with BaseMaster and BaseWorker

put value_functions and policies to networks folder

Add TensorEnvWrapper

Output Observation: convert to tensor and device
Input Action: convert to numpy array from tensor

Separate calculations e.g. TD, returns from Trajectory/Segment

Refactor codes to minimize obvious comments (distraction) complying with PEP8

Use ABC for all base classes

This enforces strict API inheritance for subclasses, better OOP management.

Re-implement with new TimeLimit concerns ?

According to the paper, Time Limit in RL, maybe it's better to adapt implementation of metrics and PG algorithms ?

Add OpenAI-ES

import numpy as np

import torch
import torch.optim as optim

from lagom.es import BaseES

from lagom.transform import RankTransform


class OpenAIES(BaseES):
    r"""Implements OpenAI evolution strategies.
    
    .. note::
    
        In practice, the learning rate is better to be proportional to the batch size.
        i.e. for larger batch size, use larger learning rate and vise versa. 
        
    """
    def __init__(self, 
                 mu0, 
                 std0, 
                 popsize,
                 std_decay=0.999, 
                 min_std=0.01,
                 lr=1e-3, 
                 lr_decay=0.9999, 
                 min_lr=1e-2, 
                 antithetic=False,
                 rank_transform=True):
        r"""Initialize OpenAI-ES. 
        
        Args:
            mu0 (ndarray): initial mean
            std0 (float): initial standard deviation
            popsize (int): population size
            std_decay (float): standard deviation decay
            min_std (float): minimum of standard deviation
            lr (float): learning rate
            lr_decay (float): learning rate decay
            min_lr (float): minumum of learning rate
            antithetic (bool): If True, then use antithetic sampling to generate population.
            rank_transform (bool): If True, then use rank transformation of fitness (combat with outliers). 
        """
        self.mu0 = np.array(mu0)
        self.std0 = std0
        self.popsize = popsize
        self.std_decay = std_decay
        self.min_std = min_std
        self.lr = lr
        self.lr_decay = lr_decay
        self.min_lr = min_lr
        self.antithetic = antithetic
        if self.antithetic:
            assert self.popsize % 2 == 0, 'popsize must be even for antithetic sampling. '
        self.rank_transform = rank_transform
        if self.rank_transform:
            self.rank_transformer = RankTransform()
        
        self.num_params = self.mu0.size
        self.mu = torch.from_numpy(self.mu0).float()
        self.mu.requires_grad = True  # requires gradient for optimizer to update
        self.std = self.std0
        self.optimizer = optim.Adam([self.mu], lr=self.lr)
        self.lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=self.optimizer, 
                                                             gamma=self.lr_decay)
        
        self.solutions = None
        self.best_param = None
        self.best_f_val = None
        self.hist_best_param = None
        self.hist_best_f_val = None
    
    def ask(self):
        # Generate standard Gaussian noise for perturbating model parameters. 
        if self.antithetic:  # antithetic sampling
            eps = np.random.randn(self.popsize//2, self.num_params)
            eps = np.concatenate([eps, -eps], axis=0)
        else:
            eps = np.random.randn(self.popsize, self.num_params)
        # Record the noise for gradient computation in tell()
        self.eps = eps
        
        # Perturbate the parameters
        self.solutions = self.mu.detach().numpy() + self.eps*self.std
        
        return list(self.solutions)
        
    def tell(self, solutions, function_values):
        # Enforce ndarray of function values
        function_values = np.array(function_values)
        if self.rank_transform:
            # Make a copy of original function values, for recording true values
            original_function_values = np.copy(function_values)
            # Use centered ranks instead of raw values, combat with outliers. 
            function_values = self.rank_transformer(function_values, centered=True)
            
        # Sort function values and select the minimum, since we are minimizing the objective. 
        idx = np.argsort(function_values)[0]  # argsort is in ascending order
        self.best_param = solutions[idx]
        if self.rank_transform:  # use rank transform, we should record the original function values
            self.best_f_val = original_function_values[idx]
        else:
            self.best_f_val = function_values[idx]
        # Update the historical best result
        first_iteration = self.hist_best_param is None or self.hist_best_f_val is None
        if first_iteration or self.best_f_val < self.hist_best_f_val:
            self.hist_best_f_val = self.best_f_val
            self.hist_best_param = self.best_param
            
        # Compute gradient from original paper
        # Enforce fitness as Gaussian distributed, here we use centered ranks
        F = (function_values - function_values.mean(-1))/(function_values.std(-1) + 1e-8)
        # Compute gradient, F:[popsize], eps: [popsize, num_params]
        grad = (1/self.std)*np.mean(np.expand_dims(F, 1)*self.eps, axis=0)
        grad = torch.from_numpy(grad).float()
        self.mu.grad = grad
        self.lr_scheduler.step()
        self.optimizer.step()
        
        if self.std > self.min_std:
            self.std = self.std_decay*self.std
        
    @property
    def result(self):
        results = {'best_param': self.best_param, 
                   'best_f_val': self.best_f_val, 
                   'hist_best_param': self.hist_best_param, 
                   'hist_best_f_val': self.hist_best_f_val,
                   'stds': self.std}
        
        return results

Policy class inherit from nn.Module

Take out agent & env out of Runner, put agent and env into call argument

[Examples/MDN]: refactor code to standard pipeline with .py files

Refactor MDN code to use lagom pipelines with .py files with Experiment, Engine etc.

Change last_dim to last_feature_dim to be consistent with other networks and policies.

Agent handles optimizer and lr_scheduler inside class

Simplify PG code

experiment.py: Only support train.timestep, remove count(), use while loop

Put `device` as argument to many classes

Requires device to the classes that might use GPU/CPU, by doing this, it is safer and easier to make sure all things are in CUDA or CPU together without having to worry about writing .to(device) in many places.

e.g. Experiment worker

Update GaussianPolicy: support min/max bound of std, use trick from

Use tricks from https://arxiv.org/pdf/1805.12114.pdf

Add CEM

Same API style to CMAES, e.g. {'popsize': 32, 'seed': 1} and also cem.result as a namedtuple

Remove BaseLogger, just Logger

Add getitem to Trajectory

TODO files:

Update LinearSchedule

Support get_current

Add master/worker for PyTorch multiprocessing

TODO: PyTorch variant, maybe unnecessary for RL ?

from torch.multiprocessing import Process
from torch.multiprocessing import Queue
# SimpleQueue sometimes better, it does not use additional threads
from torch.multiprocessing import SimpleQueue

# .readthedocs.yml

version: 2

python:
  version: 3.7
  install:
     - method: pip
       path: .

zuoxingdong / lagom Goto Github PK

lagom's People

Contributors

Stargazers

Watchers

Forkers

lagom's Issues

TODO: PyTorch variant, maybe unnecessary for RL ?

Recommend Projects

Recommend Topics

Recommend Org

Jobs