mathisfederico / learnrl Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 4.0 759 KB

A library to use and learn reinforcement learning algorithms

Home Page: https://learnrl.readthedocs.io/en/latest/

License: Other

Python 100.00%

learnrl's People

Contributors

Stargazers

Watchers

Forkers

cr4zysheep tanweeralii valentingol sb2020

learnrl's Issues

2DWorld Environement

We need that to test continuous state/action space on our agents

CrossesAndNought Environement

We need that to test adaptability between basic agents and DRL agents

Memory of info (dict to 2D-array)

We need to implement and test that properly

Migrating examples to LearnRL-Tensorflow

agents_order in Playground

We need an attribute agents_order and methods to set and shuffle the order of agents
This will come handy if we want to change agents rôles in a environement during training or testing

Fix relative links in documentation

Add Agent API to readme

Wandb visualisation tutorial

Multi-Agent Env

An environement template using gym environements in a more general way to manage turns between agents.

Agents

Agents must have :

A policy
A generalised value fuction (May be split on state (V) and action (Q) values only)
play(observation) -> action; (How can we have an agent that adapt to multiple and unkowned env?)
render() -> None; method specified for evey type of agent

Playground init function predefined type

When passing a list of learnrl.Agent instances in Playground init function, the assertion "isinstance(agent, Agent))" becomes false. It is fix with deleting predefined type of input "agents".

Str Arguments for BasicAgent

We need to add str arguments for evaluation and control (Typicaly 'mc' for montecarlo and 'onp-onl-td' for onpolicy online td) instead of importing from evaluation and control modules

Add Get started inside docs

Add a get started part with tutorials and example inside the documentation.

Tests for core

We need tests for Playgroud

MultiAgent Gym-Environements

Environements

Include Gym environements composed of:

action_space, state_space
step(action) -> observation, reward, done, info
reset() -> first_observation
render() -> None;

Build standard for multi-agents environements
Build "evaluation" methode to evaluate multiple agents on a number of games.

Basic Agent Test

We need a test for the Basic Agent

Wandb visualisation with colapsable custom graphs

Contributing pipeline

Contributing pipeline (dev branch, feature branches, ...)

Evaluation

We need a greater framework for evaluations, for now we have only MonteCarlo

Basic evaluations methodes are :

TD
Online-TD
Online-TD Off policy (Sarsa)
QLearning

Refactoring to pytest to handle slow tests

We need to refactor to pytest to build markers for slow tests (Training agents on basic exemples like Nim takes ~1s with is way to slow for a test)

Tests for dt_step and dt_episodes

We need tests for dt_step and dt_episodes :

Use dt_step only on steps
Use dt_episodes for episodes and episodes_cycles
Assert dt_episode averaging over episodes_cycles

Nim Environement for testing BasicAgents

Tests for BasicAgent

We need unit tests for core functions of BasicAgent :

act
hash (MultiDiscrete)
invert_hash (MultiDiscrete)

Tests for agents

We need several tests :

MemoryTest
BasicAgentTest
- ControlTest
- EvaluationTest

QLearningAgent

We need to implement the QLearningAgent directly for user-friendly purpose !

Adding time in Logger

It would be nice to add ETA in the Logger.
It has to manage the different levels of verbose and use the metrics at hand to give the best an most usefull time estimate.

Saving system for Memory

We need to add methods to dave and load a memory from path:
save(path)
load(path)

load should check the memory format (At least the base MEMORY_KEYS)

Standard-DeepRL agents

In need if issue #1 : Agents

Standard-DeepRL agents

Deep-Qlearning

Need DataStoring for experience replay

Actor-Critic

Need Tensorflow/Pytorch for custom gradient descent

Faruk soccer environement

It would be a good test to see if the librairie is easy to use

Fixing codacy issues

Codacy shows us a lot of little issues, a lot of them might be good to fix !

Display cannot support Random Agent

Issue :

I am using an agent that takes random actions. The environment is CartPole.
When I use pg.fit() with episodes >= 40, the display gives episode rewards with decimals, which should be impossible.

Code used :

import learnrl as rl
import gym

env = gym.make("CartPole-v0")

class RandomAgent(rl.Agent):
    def __init__(self, env):
        super().__init__()
    
    def act(self, observation, greedy=None):
        action = env.action_space.sample()
        return action

    def learn(self):
        metrics = {}
        return metrics

    def remember(self,
            observation,
            action,
            reward,
            done,
            next_observation=None,
            info=None,
            **param):
        pass

agent = RandomAgent(env)
pg = rl.Playground(env, [agent])
metrics = [('reward~env-rwd', {'steps': 'sum', 'episode': 'sum'}), ('dt_step~')]
pg.fit(episodes=100, verbose=1, metrics=metrics)

What I get :

                |  Env-rwd  |  Step time   |
Episode   5/100 | 22        |  26.8us/step | 
Episode  10/100 | 28.6      |  23.9us/step | 
Episode  15/100 | 16        |  24.0us/step | 
Episode  20/100 | 19        |  25.0us/step | 
Episode  25/100 | 22.8      |  23.6us/step | 
Episode  30/100 | 32.8      |  24.2us/step | 
Episode  35/100 | 33.2      |  23.6us/step | 
Episode  40/100 | 25.2      |  25.3us/step | 
Episode  45/100 | 17.4      |  23.6us/step | 
Episode  50/100 | 20.8      |  27.2us/step | 
Episode  55/100 | 16.8      |  25.7us/step | 
Episode  60/100 | 17.8      |  23.3us/step | 
Episode  65/100 | 33.8      |  28.6us/step | 
Episode  70/100 | 21.4      |  25.7us/step | 
Episode  75/100 | 23.8      |  24.0us/step | 
Episode  80/100 | 23.8      |  24.9us/step | 
Episode  85/100 | 25        |  23.3us/step | 
Episode  90/100 | 13.6      |  26.1us/step | 
Episode  95/100 | 23        |  22.8us/step | 
Episode 100/100 | 16.4      |  25.2us/step |

This bug may depend on the device since CartPole is known to be an extremely fast environment.

Rotation and symetric invarients for MultiDiscrete state_spaces

Is there a way to implement that in a good and robust fashion ?

Standard RL agents

In need for issue #1 : Agents

Standard RL agents (If discreet envs)

Evaluation

Monte-Carlo
TD($$\lambda$$)

Control

greedy
$$\epsilon$$-greedy
UCB
Puck
Puck/UCB

Standard combinaisons

SARSA
Q_learning

MCTS-based Agents

In need of issue #1 : Agents

MCTS-based Agents

Need for sampling or true/learned model.

MCTS

AlphaZero

Need the model !

mathisfederico / learnrl Goto Github PK

learnrl's People

Contributors

Stargazers

Watchers

Forkers

learnrl's Issues

Agents

Environements

Standard-DeepRL agents

Deep-Qlearning

Actor-Critic

Standard RL agents (If discreet envs)

Evaluation

Control

Standard combinaisons

MCTS-based Agents

MCTS

AlphaZero

MuZero

Recommend Projects

Recommend Topics

Recommend Org

Jobs