GithubHelp home page GithubHelp logo

mathisfederico / learnrl Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 4.0 759 KB

A library to use and learn reinforcement learning algorithms

Home Page: https://learnrl.readthedocs.io/en/latest/

License: Other

Python 100.00%

learnrl's People

Contributors

mathisfederico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

learnrl's Issues

agents_order in Playground

We need an attribute agents_order and methods to set and shuffle the order of agents
This will come handy if we want to change agents rôles in a environement during training or testing

Multi-Agent Env

An environement template using gym environements in a more general way to manage turns between agents.

Agents

Agents

Agents must have :

  • A policy
  • A generalised value fuction (May be split on state (V) and action (Q) values only)
  • play(observation) -> action; (How can we have an agent that adapt to multiple and unkowned env?)
  • render() -> None; method specified for evey type of agent

Playground init function predefined type

When passing a list of learnrl.Agent instances in Playground init function, the assertion "isinstance(agent, Agent))" becomes false. It is fix with deleting predefined type of input "agents".

Str Arguments for BasicAgent

We need to add str arguments for evaluation and control (Typicaly 'mc' for montecarlo and 'onp-onl-td' for onpolicy online td) instead of importing from evaluation and control modules

MultiAgent Gym-Environements

Environements

Include Gym environements composed of:

  • action_space, state_space
  • step(action) -> observation, reward, done, info
  • reset() -> first_observation
  • render() -> None;

Build standard for multi-agents environements
Build "evaluation" methode to evaluate multiple agents on a number of games.

Evaluation

We need a greater framework for evaluations, for now we have only MonteCarlo

Basic evaluations methodes are :

  • TD
  • Online-TD
  • Online-TD Off policy (Sarsa)
  • QLearning

Tests for dt_step and dt_episodes

We need tests for dt_step and dt_episodes :

  • Use dt_step only on steps
  • Use dt_episodes for episodes and episodes_cycles
  • Assert dt_episode averaging over episodes_cycles

Tests for BasicAgent

We need unit tests for core functions of BasicAgent :

  • act
  • hash (MultiDiscrete)
  • invert_hash (MultiDiscrete)

Tests for agents

We need several tests :

  • MemoryTest
  • BasicAgentTest
    • ControlTest
    • EvaluationTest

QLearningAgent

We need to implement the QLearningAgent directly for user-friendly purpose !

Adding time in Logger

It would be nice to add ETA in the Logger.
It has to manage the different levels of verbose and use the metrics at hand to give the best an most usefull time estimate.

Saving system for Memory

We need to add methods to dave and load a memory from path:
save(path)
load(path)

load should check the memory format (At least the base MEMORY_KEYS)

Standard-DeepRL agents

In need if issue #1 : Agents

Standard-DeepRL agents

Deep-Qlearning

  • Need DataStoring for experience replay

Actor-Critic

  • Need Tensorflow/Pytorch for custom gradient descent

Display cannot support Random Agent

Issue :

I am using an agent that takes random actions. The environment is CartPole.
When I use pg.fit() with episodes >= 40, the display gives episode rewards with decimals, which should be impossible.

Code used :

import learnrl as rl
import gym

env = gym.make("CartPole-v0")

class RandomAgent(rl.Agent):
    def __init__(self, env):
        super().__init__()
    
    def act(self, observation, greedy=None):
        action = env.action_space.sample()
        return action

    def learn(self):
        metrics = {}
        return metrics

    def remember(self,
            observation,
            action,
            reward,
            done,
            next_observation=None,
            info=None,
            **param):
        pass

agent = RandomAgent(env)
pg = rl.Playground(env, [agent])
metrics = [('reward~env-rwd', {'steps': 'sum', 'episode': 'sum'}), ('dt_step~')]
pg.fit(episodes=100, verbose=1, metrics=metrics)

What I get :

                |  Env-rwd  |  Step time   |
Episode   5/100 | 22        |  26.8us/step | 
Episode  10/100 | 28.6      |  23.9us/step | 
Episode  15/100 | 16        |  24.0us/step | 
Episode  20/100 | 19        |  25.0us/step | 
Episode  25/100 | 22.8      |  23.6us/step | 
Episode  30/100 | 32.8      |  24.2us/step | 
Episode  35/100 | 33.2      |  23.6us/step | 
Episode  40/100 | 25.2      |  25.3us/step | 
Episode  45/100 | 17.4      |  23.6us/step | 
Episode  50/100 | 20.8      |  27.2us/step | 
Episode  55/100 | 16.8      |  25.7us/step | 
Episode  60/100 | 17.8      |  23.3us/step | 
Episode  65/100 | 33.8      |  28.6us/step | 
Episode  70/100 | 21.4      |  25.7us/step | 
Episode  75/100 | 23.8      |  24.0us/step | 
Episode  80/100 | 23.8      |  24.9us/step | 
Episode  85/100 | 25        |  23.3us/step | 
Episode  90/100 | 13.6      |  26.1us/step | 
Episode  95/100 | 23        |  22.8us/step | 
Episode 100/100 | 16.4      |  25.2us/step |

This bug may depend on the device since CartPole is known to be an extremely fast environment.

Standard RL agents

In need for issue #1 : Agents

Standard RL agents (If discreet envs)

Evaluation

  • Monte-Carlo
  • TD($$\lambda$$)

Control

  • greedy
  • $$\epsilon$$-greedy
  • UCB
  • Puck
  • Puck/UCB

Standard combinaisons

  • SARSA
  • Q_learning

MCTS-based Agents

In need of issue #1 : Agents

MCTS-based Agents

Need for sampling or true/learned model.

MCTS

AlphaZero

  • Need the model !

MuZero

Display error

metrics = [ "exploration~exp.last", ("reward~rwd", {"episode": "sum"}), "dt_step~", "loss" ]

display :
image

Tensorflow memory

we need a TFMemory object that do the same as Memory but optimized for tensorflow

Logger with multi-agent

The logger might not work well for multi-agent setup.
We need a test function to ensure everything is right.

Playground init function predefined type

When passing a list of learnrl.Agent instances in Playground init function, the assertion "isinstance(agent, Agent))" becomes false. It is fix with deleting predefined type of input "agents".

Add episode to wandb

When using wandb, it shows step as X and not episode.

Hence, longer runs have more steps and it makes the comparaison between runs difficult.

photo_2020-11-17_13-47-41

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.