GithubHelp home page GithubHelp logo

david-abel / simple_rl Goto Github PK

View Code? Open in Web Editor NEW
264.0 18.0 95.0 4.02 MB

A simple framework for experimenting with Reinforcement Learning in Python.

License: Apache License 2.0

Python 68.81% Jupyter Notebook 31.19%
reinforcement-learning planning-algorithms mdp agent reinforcement-learning-algorithms python

simple_rl's Introduction

simple_rl

A simple framework for experimenting with Reinforcement Learning in Python.

There are loads of other great libraries out there for RL. The aim of this one is twofold:

  1. Simplicity.
  2. Reproducibility of results.

A brief tutorial for a slightly earlier version is available here. As of version 0.77, the library should work with both Python 2 and Python 3. Please let me know if you find that is not the case!

simple_rl requires numpy and matplotlib. Some MDPs have visuals, too, which requires pygame. Also includes support for hooking into any of the Open AI Gym environments. The library comes along with basic test script, contained in the tests directory. I suggest running it and making sure all tests pass when you install the library.

Documentation available here

Installation

The easiest way to install is with pip. Just run:

pip install simple_rl

Alternatively, you can download simple_rl here.

Citation

If you use simple_rl in your research, please cite the workshop paper as follows:

@article{abel2019simple_rl,
  title={simple_rl: Reproducible Reinforcement Learning in Python},
  author={David Abel},
  booktitle={ICLR Workshop on Reproducibility in Machine Learning},
  year={2019}
}

New Feature: Easy Reproduction of Results

I just added a new feature I'm quite excited about: easy reproduction of results. Every experiment run now outputs a file "full_experiment.txt" in the results/exp_name/ directory. The new function reproduce_from_exp_file(file_name), when pointed at an experiment directory, will reassemble and rerun an entire experiment based on this file. The goal here is to encourage simple tracking of experiments and enable quick result-reproduction. It only works with MDPs though -- it does not yet work with OOMDPs, POMDPs, or MarkovGames (I'd be delighted if someone wants to make it work, though!).

See the second example below for a quick sense of how to use this feature.

Example

Some examples showcasing basic functionality are included in the examples directory.

To run a simple experiment, import the run_agents_on_mdp(agent_list, mdp) method from simple_rl.run_experiments and call it with some agents for a given MDP. For example:

# Imports
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearningAgent

# Run Experiment
mdp = GridWorldMDP()
agent = QLearningAgent(mdp.get_actions())
run_agents_on_mdp([agent], mdp)

Running the above code will run Q-learning on a simple GridWorld. When it finishes it stores the results in cur_dir/results/* and makes and opens the following plot:

For a slightly more complicated example, take a look at the code of simple_example.py. Here we run two agents on the grid world from the Russell-Norvig AI textbook:

from simple_rl.agents import QLearningAgent, RandomAgent, RMaxAgent
from simple_rl.tasks import GridWorldMDP
from simple_rl.run_experiments import run_agents_on_mdp

# Setup MDP.
mdp = GridWorldMDP(width=4, height=3, init_loc=(1, 1), goal_locs=[(4, 3)], lava_locs=[(4, 2)], gamma=0.95, walls=[(2, 2)], slip_prob=0.05)

# Setup Agents.
ql_agent = QLearningAgent(actions=mdp.get_actions())
rmax_agent = RMaxAgent(actions=mdp.get_actions())
rand_agent = RandomAgent(actions=mdp.get_actions())

# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rmax_agent, rand_agent], mdp, instances=5, episodes=50, steps=10)

The above code will generate the following plot:

To showcase the new reproducibility feature, suppose we now wanted to reproduce the above experiment. We just do the following:

from simple_rl.run_experiments import reproduce_from_exp_file

reproduce_from_exp_file("gridworld_h-3_w-4")

Which will rerun the entire experiment, based on a file created and populated behind the scenes. Then, we should get the following plot:

Easy! This is a new feature, so there may be bugs -- just let me know as things come up. It's only supposed to work for MDPs, not POMDPs/OOMDPs/MarkovGameMDPs (so far). Take a look at reproduce_example.py for a bit more detail.

Overview

  • (agents): Code for some basic agents (a random actor, Q-learning, [R-Max], Q-learning with a Linear Approximator, and so on).

  • (experiments): Code for an Experiment class to track parameters and reproduce results.

  • (mdp): Code for a basic MDP and MDPState class, and an MDPDistribution class (for lifelong learning). Also contains OO-MDP implementation [Diuk et al. 2008].

  • (planning): Implementations for planning algorithms, includes ValueIteration and MCTS [Couloum 2006], the latter being still in development.

  • (tasks): Implementations for a few standard MDPs (grid world, N-chain, Taxi [Dietterich 2000], and the OpenAI Gym).

  • (utils): Code for charting and other utilities.

Contributing

If you'd like to contribute: that's great! Take a look at some of the needed improvements below: I'd love for folks to work on those items. Please see the contribution guidelines. Email me with any questions.

Making a New MDP

Make an MDP subclass, which needs:

  • A static variable, ACTIONS, which is a list of strings denoting each action.

  • Implement a reward and transition function and pass them to MDP constructor (along with ACTIONS).

  • I also suggest overwriting the "__str__" method of the class, and adding a "__init__.py" file to the directory.

  • Create a State subclass for your MDP (if necessary). I suggest overwriting the "__hash__", "__eq__", and "__str__" for the class to play along well with the agents.

Making a New Agent

Make an Agent subclass, which requires:

  • A method, act(self, state, reward), that returns an action.

  • A method, reset(), that puts the agent back to its tabula rasa state.

In Development

I'm hoping to add the following features:

  • Planning: Finish MCTS [Coloum 2006], implement RTDP [Barto et al. 1995]
  • Deep RL: Write a DQN [Mnih et al. 2015] in PyTorch, possibly others (some kind of policy gradient).
  • Efficiency: Convert most defaultdict/dict uses to numpy.
  • Reproducibility: The new reproduce feature is limited in scope -- I'd love for someone to extend it to work with OO-MDPs, Planning, MarkovGames, POMDPs, and beyond.
  • Docs: Tutorial and documentation.
  • Visuals: Unify MDP visualization.
  • Misc: Additional testing.

Cheers,

-Dave

simple_rl's People

Contributors

aaell avatar borea17 avatar camall3n avatar david-abel avatar dwhit avatar jinnaiyuu avatar jsalvatier avatar korlamarch avatar lucasgelfond avatar maldil avatar melroderick avatar nishanthjkumar avatar paultouma avatar seand88 avatar thisiscam avatar umbanhowar avatar zhouzypaul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simple_rl's Issues

Forcing the use of the TkAgg Matplot lib backend prevents usage in notebooks

Hi there. In a few places you force the use of TkAgg. In the past this was necessary to work on certain systems, but I believe this is no longer the case with newer versions of Matplotlib.

However, forcing TkAgg breaks the code on, for example, Google Colab.

I see it was introduced in this commit: e089dd9

I think the easiest solution is to revert that commit and test again on the original system that was having issues (probably OSX).

Here is the most simple reproduction: https://colab.research.google.com/gist/philwinder/148a4a54d0a7fdc1a10677f30da2da57/untitled6.ipynb

This:

!pip install simple_rl
from simple_rl.tasks import GridWorldMDP

Thanks!

Visualizing GridWorld example

Hi,
Great looking project.

I'd like to get visualization (in pygame) of an agent moving through the gridworld example working, if possible. It seems like there's infrastructure in place for it: the visualize_() methods under simple_rl/tasks/grid_world/GridWorldMDPClass.py. Looks like it might be deprecated though? And it's not clear to me where the pygame viz is supposed to happen within the call stack.

For context, I'm running the example on this page https://david-abel.github.io/blog/posts/simple_rl.html, successfully.

Thanks!

Issues Running Example, open is not recognized as an internal or external command

`
import simple_rl

from simple_rl.agents import QLearningAgent

from simple_rl.tasks import GridWorldMDP

from simple_rl.run_experiments import run_agents_on_mdp
mdp = GridWorldMDP(10, 10, goal_locs=[(10,10)])
ql_agent = QLearningAgent(mdp.get_actions())
run_agents_on_mdp([ql_agent], mdp, steps=100)
`

Ran this code on windows.
Received 'open' is not recognized as an internal or external command,
operable program or batch file.

Python version 3.7

Invalid policy visualization due to use of e-greedy action

For the visualize_policy function you have to pass a policy for it to plot. But most of the time we just pass the policy used for training (see the viz_example for an example).

This means that the policy will be affected by the exploration function (e.g. e-greedy) and therefore 1 in ten times you get an incorrect policy.

I'm not sure if it is a good idea to add a greedy option flag, to prevent the use of any e-greedy action. But obviously that would have to be changed for all agents.

Gridworld Not Resetting

I believe there's a bug with the reset method in the GridWorldMDPClass. I don't think it's nested under the class where I believe it should be.

AttributeError: type object 'RandomAgent' has no attribute 'q_func'

simple_rl is v0.811


from simple_rl.tasks import GridWorldMDP

from options.option_generation.vi_distance import get_distance
if __name__ == "__main__":
    mdp = GridWorldMDP(width=5, height=5, init_loc=(1, 1), goal_locs=[(5, 5)])
    #mdp.visualize(filename="vi_distance")
    mdp.visualize_interaction()

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
File "yyf/vi_distance_test.py", line 8, in
mdp.visualize_interaction()
File "/home/yyf/WorkSpace/coveroption/covering-options/simple_rl/tasks/grid_world/GridWorldMDPClass.py", line 278, in visualize_interaction
mdpv.visualize_interaction(self, _draw_state)
File "/home/yyf/WorkSpace/coveroption/covering-options/simple_rl/utils/mdp_visualizer.py", line 344, in visualize_interaction
agent_shape = _vis_init(screen, mdp, draw_state, cur_state, agent)
File "/home/yyf/WorkSpace/coveroption/covering-options/simple_rl/utils/mdp_visualizer.py", line 387, in _vis_init
agent_shape = draw_state(screen, mdp, cur_state, agent=agent, show_value=True, draw_statics=True)
File "/home/yyf/WorkSpace/coveroption/covering-options/simple_rl/tasks/grid_world/grid_visualizer.py", line 44, in _draw_state
for s in agent.q_func.keys():
AttributeError: type object 'RandomAgent' has no attribute 'q_func'

help

Hello David,

I noticed you are working on Deep-RL algorithm for this repo. If you need a helping hand, I can help.

Thanks,
Aadesh

RMAX is slow and not implemented correctly

Hi Dave, I was going through the RMAX code and noticed that thing may not be implemented correctly.

Specifically:

  • the initial Q values are initialized to R_max, but should really be R_max * 1/(1-gamma).
  • for each time step, the agent is doing an update step first, and then choosing the action. It should choose the action before doing the update
  • for each update step, the agent only updates the Q values of (s, a) pairs that it has seen before. It should update Q values of all the (s, a) pairs (that have been seen enough times).

Also, the code ran really slow for me even on small 2D grid worlds (5x5, for example), mostly because of the triple-nested for loop used to do the Q value update.

I have submitted a PR that fixes all the issues mentioned above, and uses numpy to avoid the for loop to speed up the update.

Import error when DISPLAY is not set

fig = matplotlib.pyplot.gcf()

I'm getting an exception while running a code that imports a module from simple_rl. The same code works on my local machine, but fails with following error on an ubuntu docker container defined @ jupyter/base-notebook.

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/__init__.py in <module>()
     58 
     59 # Imports.
---> 60 import simple_rl.abstraction, simple_rl.agents, simple_rl.experiments, simple_rl.mdp, simple_rl.planning, simple_rl.tasks, simple_rl.utils
     61 import simple_rl.run_experiments
     62 

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/abstraction/__init__.py in <module>()
      1 # Classes.
      2 from simple_rl.abstraction.AbstractionWrapperClass import AbstractionWrapper
----> 3 from simple_rl.abstraction.AbstractValueIterationClass import AbstractValueIteration
      4 from simple_rl.abstraction.state_abs.StateAbstractionClass import StateAbstraction
      5 from simple_rl.abstraction.state_abs.ProbStateAbstractionClass import ProbStateAbstraction

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/abstraction/AbstractValueIterationClass.py in <module>()
      4 
      5 # Other imports.
----> 6 from simple_rl.utils import make_mdp
      7 from simple_rl.abstraction.action_abs.ActionAbstractionClass import ActionAbstraction
      8 from simple_rl.abstraction.state_abs.StateAbstractionClass import StateAbstraction

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/utils/make_mdp.py in <module>()
     11 
     12 # Other imports.
---> 13 from simple_rl.tasks import ChainMDP, GridWorldMDP, TaxiOOMDP, RandomMDP, FourRoomMDP, HanoiMDP
     14 from simple_rl.tasks.grid_world.GridWorldMDPClass import make_grid_world_from_file
     15 from simple_rl.mdp import MDPDistribution

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/tasks/__init__.py in <module>()
     21 from simple_rl.tasks.taxi.TaxiOOMDPClass import TaxiOOMDP
     22 from simple_rl.tasks.taxi.TaxiStateClass import TaxiState
---> 23 from simple_rl.tasks.trench.TrenchOOMDPClass import TrenchOOMDP
     24 from simple_rl.tasks.rock_paper_scissors.RockPaperScissorsMDPClass import RockPaperScissorsMDP
     25 try:

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/tasks/trench/TrenchOOMDPClass.py in <module>()
      7 from simple_rl.mdp.oomdp.OOMDPObjectClass import OOMDPObject
      8 from simple_rl.planning import ValueIteration
----> 9 from simple_rl.run_experiments import run_agents_on_mdp
     10 from simple_rl.tasks.trench.TrenchOOMDPState import TrenchOOMDPState
     11 

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/run_experiments.py in <module>()
     27 # Non-standard imports.
     28 from simple_rl.planning import ValueIteration
---> 29 from simple_rl.experiments import Experiment
     30 from simple_rl.mdp import MarkovGameMDP
     31 from simple_rl.utils import chart_utils

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/experiments/__init__.py in <module>()
----> 1 from simple_rl.experiments.ExperimentClass import Experiment

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/experiments/ExperimentClass.py in <module>()
     14 
     15 # Other imports.
---> 16 from simple_rl.utils import chart_utils
     17 from simple_rl.experiments.ExperimentParametersClass import ExperimentParameters
     18 

/opt/conda/lib/python3.6/site-packages/simple_rl-0.8-py3.6.egg/simple_rl/utils/chart_utils.py in <module>()
     43 matplotlib.rcParams['pdf.fonttype'] = 42
     44 # matplotlib.rcParams['text.usetex'] = True
---> 45 fig = matplotlib.pyplot.gcf()
     46 
     47 

/opt/conda/lib/python3.6/site-packages/matplotlib/pyplot.py in gcf()
    584         return figManager.canvas.figure
    585     else:
--> 586         return figure()
    587 
    588 

/opt/conda/lib/python3.6/site-packages/matplotlib/pyplot.py in figure(num, figsize, dpi, facecolor, edgecolor, frameon, FigureClass, clear, **kwargs)
    531                                         frameon=frameon,
    532                                         FigureClass=FigureClass,
--> 533                                         **kwargs)
    534 
    535         if figLabel:

/opt/conda/lib/python3.6/site-packages/matplotlib/backend_bases.py in new_figure_manager(cls, num, *args, **kwargs)
    159         fig_cls = kwargs.pop('FigureClass', Figure)
    160         fig = fig_cls(*args, **kwargs)
--> 161         return cls.new_figure_manager_given_figure(num, fig)
    162 
    163     @classmethod

/opt/conda/lib/python3.6/site-packages/matplotlib/backends/_backend_tk.py in new_figure_manager_given_figure(cls, num, figure)
   1044         """
   1045         _focus = windowing.FocusManager()
-> 1046         window = Tk.Tk(className="matplotlib")
   1047         window.withdraw()
   1048 

/opt/conda/lib/python3.6/tkinter/__init__.py in __init__(self, screenName, baseName, className, useTk, sync, use)
   2018                 baseName = baseName + ext
   2019         interactive = 0
-> 2020         self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
   2021         if useTk:
   2022             self._loadtk()

TclError: no display name and no $DISPLAY environment variable

Disabling following global initialization works, but I'm not sure if that's a right fix because it was added in the commit "Fixing backend to work with non Mac OS distributions of python."
fig = matplotlib.pyplot.gcf()

Error. How to solve this

[Test 1] success_example.py:
Traceback (most recent call last):
File "/opt/simple_rl/tests/../examples/success_example.py", line 8, in
from simple_rl.planning import ValueIteration
File "/opt/simple_rl/simple_rl/init.py", line 60, in
import simple_rl.abstraction, simple_rl.agents, simple_rl.experiments, simple_rl.mdp, simple_rl.planning, simple_rl.tasks, simple_rl.utils
File "/opt/simple_rl/simple_rl/abstraction/init.py", line 2, in
from simple_rl.abstraction.AbstractionWrapperClass import AbstractionWrapper
File "/opt/simple_rl/simple_rl/abstraction/AbstractionWrapperClass.py", line 5, in
from simple_rl.agents import Agent, RMaxAgent, FixedPolicyAgent
File "/opt/simple_rl/simple_rl/agents/init.py", line 18, in
from simple_rl.agents.PolicyGradientAgentClass import PolicyGradientAgent
File "/opt/simple_rl/simple_rl/agents/PolicyGradientAgentClass.py", line 15
raise NotImplementedError("Policy Gradient has not yet been implemented.)
^
SyntaxError: EOL while scanning string literal
FAIL.

Python3: import errors

Hi there!

I used setup.py to install current version of the library as apparently APIs have changed after the latest release. Trying to run the example from README, I get following error:

from simple_rl.run_experiments import run_agents_on_mdp

----> 4 from simple_rl.run_experiments import run_agents_on_mdp
      5 from simple_rl.tasks import GridWorldMDP
      6 from simple_rl.agents import QLearnerAgent
.../simple_rl/__init__.py in <module>()
     28 License: MIT
     29 '''
---> 30 import agents, experiments, mdp, planning, tasks, utils
     31 import run_experiments

ModuleNotFoundError: No module named 'agents'

As a workaround fix, I extended sys.path with directory containing "simple_rl". But then I got more errors like
ModuleNotFoundError: No module named 'AgentClass'

I'm no Python3 expert, is there any quick fix to solve these issues? Which release do you recommend for conducting more experiments? I'm planning to use MDP worlds, and several RL algorithms.

OOMDP representation missing Relations?

From Diuk et al, the OOMDP state should also include relations that are calculated as functions of object attributes. However, this seems to be missing in the current implementation of the OOMDP. Are there any plans of adding that to the representation? Am I missing something? Or should I just go ahead and augment the OOMDP state with my own implementation of relations?

AttributeError: module 'time' has no attribute 'clock'

From Python docs:

The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() or time.process_time() instead, depending on your requirements, to have well-defined behavior.

This throws an error:

line 277, in run_agents_on_mdp
    start = time.clock()

brtdp_example.py

def _qvalue(self, state, action, values):
    return self.mdp.reward_func(state, action) + sum([self.trans_dict[state][action][next_state] * values[next_state]  for next_state in self.states])

发生异常: TypeError
_reward_func() missing 1 required positional argument: 'next_state'

"a dead loop " def run_sample_trial(self, verbose=True): while not state.is_terminal():

brtdp_example.py calls def run_sample_trial(self, verbose=True): in BoundedRTDPClass.py,

while not state.is_terminal(): seems in a dead loop

s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627
s: (1,1) Action: left Gap: 99.07540345871926 MaxDiff: 9.908648275251627

About drawing plots.

Really helpful and neat project! I should definitely express my appreciation to you for releasing such a project.

I am trying to produce the plot figure with shallow background:
image
However, my reward curve looks like this
image
Thus the computed conf_interv is zero.

Is there a parameter I can set to make the shallow background appear in my case?

'can't pickle dict_values objects' when running "belief_mdp_example.py" with Python 3

I get the following error traceback when running python belief_mdp_example.py on the master branch. I am using Python 3.5.2.

Traceback (most recent call last):
  File "belief_mdp_example.py", line 14, in <module>
    main()
  File "belief_mdp_example.py", line 8, in main
    belief_mdp = BeliefMDP(pomdp)
  File "/home/kaiyuzh/repo/simple_rl/simple_rl/pomdp/BeliefMDPClass.py", line 20, in __init__
    BeliefState(pomdp.init_belief), pomdp.gamma, pomdp.step_cost)
  File "/home/kaiyuzh/repo/simple_rl/simple_rl/mdp/MDPClass.py", line 14, in __init__
    self.init_state = copy.deepcopy(init_state)
  File "/home/kaiyuzh/pyenv/py3/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/kaiyuzh/pyenv/py3/lib/python3.5/copy.py", line 297, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/kaiyuzh/pyenv/py3/lib/python3.5/copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "/home/kaiyuzh/pyenv/py3/lib/python3.5/copy.py", line 243, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/kaiyuzh/pyenv/py3/lib/python3.5/copy.py", line 174, in deepcopy
    rv = reductor(4)
TypeError: can't pickle dict_values objects

It looks like an issue with compatibility with python 3.

Are policy gradient-based agents implemented?

I'm looking at the code for your PolicyGradientAgent and your ReinforceAgent and neither seems to have an act or update function implemented. Am I misunderstanding how your code functions, or are neither implemented?

Invalid action values when using `visualize_learning`

Hi there. There is a bit of a tricky bug here: https://github.com/david-abel/simple_rl/blob/master/simple_rl/utils/mdp_visualizer.py#L264 when visualising learning.

To replicate, this is a simple test:

from simple_rl.agents import QLearningAgent
from simple_rl.tasks import GridWorldMDP

def main():
    mdp = GridWorldMDP(init_loc=(1, 1), goal_locs=[(5, 3)], gamma=1.0)
    ql_agent = QLearningAgent(actions=mdp.get_actions())
    mdp.visualize_learning(ql_agent)

if __name__ == "__main__":
    main()

This will only run locally and you will need pygame installed. (python3 -m pip install pygame==2.0.0.dev3 --pre was the only version that worked for me on OSX).

I have set the discount factor to 1 (no discounting). Look at the q-values. All boxes should have a value of 1 (the reward of the goal state), since there is no discounting. But note how the expected return continues to increase for ever.

The reason is the code in the visualize learning function. First, it checks for a terminal state in the wrong place. So the q-value at the initial location ends up getting added to pre-goal state. And secondly it does not reset the agent.

In short, can you have a look and make it behave like the run_single_agent_on_mdp method?

Basically this line (https://github.com/david-abel/simple_rl/blob/master/simple_rl/utils/mdp_visualizer.py#L228) needs to be moved up to just after the action, and we need an extra agent.end_of_episode() call in there too, to reset the agent state.

Thanks!

Switching goals

What's the easiest way to switch an MDP, while still using the same agent? Can I just call run_experiments twice, with different MDPs but the same agent?

Context: I'm looking to see how fast an agent can adapt, when I switch the goal of the GridworldMDP.

Using Cleanup World

Hello. I'm looking to implement Cleanup World, but I don't see any examples. Could you point me in a direction for its use?

I tried implementing an example of CleanupWorld, and I got the following error: TypeError: (simple_rl): Reproduction of results not implemented for CleanUpMDP

Here is the code that I used:
`
#!/usr/bin/env python

#Python imports.
import sys

#Other imports.
import srl_example_setup
from simple_rl.agents import QLearningAgent, RandomAgent
from simple_rl.tasks import CleanUpMDP, CleanUpRoom, CleanUpTask, CleanUpDoor, CleanUpBlock
from simple_rl.run_experiments import run_agents_on_mdp

def main(open_plot=True):

task = CleanUpTask("green", "red")
room1 = CleanUpRoom("room1", [(x, y) for x in range(5) for y in range(3)], "blue")
block1 = CleanUpBlock("block1", 1, 1, color="green")
block2 = CleanUpBlock("block2", 2, 4, color="purple")
block3 = CleanUpBlock("block3", 8, 1, color="orange")
room2 = CleanUpRoom("room2", [(x, y) for x in range(5, 10) for y in range(3)], color="red")
room3 = CleanUpRoom("room3", [(x, y) for x in range(0, 10) for y in range(3, 6)], color="yellow")
rooms = [room1, room2, room3]
blocks = [block1, block2, block3]
doors = [CleanUpDoor(4, 0), CleanUpDoor(3, 2)]
mdp = CleanUpMDP(task, rooms=rooms, doors=doors, blocks=blocks)
# mdp.visualize_interaction()

# Make agents.
ql_agent = QLearningAgent(actions=mdp.get_actions())
rand_agent = RandomAgent(actions=mdp.get_actions())

# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rand_agent], mdp, instances=10, episodes=50, steps=10, open_plot=open_plot)

if name == "main":
main(open_plot=not sys.argv[-1] == "no_plot")
`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.