openai / baselines Goto Github PK

View Code? Open in Web Editor NEW

15.6K 15.6K 4.9K 6.46 MB

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

License: MIT License

Python 51.51% Dockerfile 0.04% HTML 48.45%

baselines's People

Contributors

Stargazers

Watchers

Forkers

yenchenlin nagyist kracwarlock omoindrot albertbuchard phvu stites ml-lab jdc08161063 tigerneil oppa3109 little1tow adolfoeliazat xaveng shmuma ajay-wong bityangke chingyaoc nkcr7 filwaline karansaxena rockt collawolley 0xsuu permanz pushpen skogs riashat soroushmehr nerdoid yigitunallar ameyc adit-chandra vangao yifenzhong1920 wangboyunze x-hexy xhuvom matthewwilfred alokranjan1234 mediaeater prakritidev kastureranjit blackcat30stm chagge pcmoritz ligua arnocandel jiecui lulzzz amano-ginji hhy5277 shuidong jayjinseokkim mylearning2017 19ai deepx-top empia artmario youngdev tjacobs jmassapina taurus3g smasoudn charlontank tdavchev zgsxwsdxg oztc dwqy11 tiagosgc codeaudit zzz622848 snowfeet alexxnica kryndex labbros syzer ngc92 orirmi wmlabs niumeng07 linkpassion walkerrsmith renfeier larenzhang aaronzhangl alakia zach-nervana nottombrown zhudejunai vbmgk gwding 0ad4ai puneethapai linzichuan kongmo b0xtch cpehle hedgefair adrianp-

baselines's Issues

Gym and ALE

Hi, the environments here use Atari of Gym. Are they totally same as Atari of ALE?

recording activations

May I kindly ask for some help / hint regarding the following question / problem:
https://stackoverflow.com/questions/44813861/record-activations-of-openai-baselines-implementation

Pretrained Breakout model error

I'm able to run most other pretrained models except Breakout. Pong and BeamRider have no problem. Breakout has tensorflow mismatch error when loading the model parameters. The error happens to all the Breakout models: vanilla, prior, duel, and prior-duel.

My command for vanilla breakout-1 model:
python -m baselines.deepq.experiments.atari.enjoy --model-dir ~/Temp/models/model-atari-breakout-1 --env Breakout

Error message:

Caused by op 'save/Assign_3', defined at:
  File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/miria/baselines/baselines/deepq/experiments/atari/enjoy.py", line 69, in <module>
    U.load_state(os.path.join(args.model_dir, "saved"))
  File "/Users/miria/baselines/baselines/common/tf_util.py", line 272, in load_state
    saver = tf.train.Saver()
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
    validate_shape=validate_shape)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
	 [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3)]]

sum-tree unit test

According to the doc, find_prefixsum_idx method should return the highest index i in the array such that sum(arr[0] ... arr[i-1]) <= prefixsum.

If this is true, shouldn't the test return 4 instead of 3?
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L44

and here, the test should also return 4 instead of 3
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L60

When i == 4, sum(arr[0]+arr[1]+arr[2]+arr[3] <= 4.0 holds.

Crash on import

I'm trying to run some example code in pybullet, see bulletphysics/bullet3#1234 (comment), that is using baselines but I'm getting an error on import, and they mentioned this is most likely an upstream issue.

I'm on python 2.7.13 on OS X. Perhaps this is a problem with baselines?

athundt at Andrews-2013-MacBook-Pro-2 in ~/src/bullet3/examples/pybullet/gym on master!
± python train_pybullet_racecar.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
  File "train_pybullet_racecar.py", line 4, in <module>
    from baselines import deepq
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
    from baselines.deepq.build_graph import build_act, build_train  # noqa
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
    import baselines.common.tf_util as U
  File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
    import builtins
ImportError: No module named builtins

train_kuka_grasping.py

± python train_kuka_grasping.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
  File "train_kuka_grasping.py", line 4, in <module>
    from baselines import deepq
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
    from baselines.deepq.build_graph import build_act, build_train  # noqa
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
    import baselines.common.tf_util as U
  File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
    import builtins
ImportError: No module named builtins

tf version:

± python -c 'import tensorflow as tf; print(tf.__version__)'
1.2.0

I installed by running pip install baselines, and a full list of installed packages is at bulletphysics/bullet3#1234 (comment)

Load act and continue learning

I am trying to save the Q network and reload it and continue improving it.

This is how I save act every few episodes:

ActWrapper(act, act_params).save("myfile.pkl")

However, when I load it, I get an error saying that some variables are exist. This is how I load a saved act:

act, train, updated_target, debug = deepq.builld_train(....)
act = ActWrapper.load("myfile.pkl")

Any idea would be appreciated.

Low GPU usage

When I tried to run train.py in the atari folder, I found the ETA reached 16 days after a few minutes and the usage of GPU was quite low.

Running into issues on example execution

Get this error when I run the first example python3 -m baselines.deepq.experiments.train_cartpole:

/usr/bin/python3: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')

I have both Python 2 and 3 installed. Thus I installed baselines with pip3.
Any suggestions?

Requirements not clearly stated

The document doesn't state upfront that the code requires Python 3 to run; I only realized this when I got an error about no module named builtins.

In addition, the requires.txt file doesn't state gym as a requirement. I realise most people who install this will probably have that module anyway, but in cases where they don't, the earliest they'll realize something is wrong is when they try to execute the python -m baselines.deepq.experiments.train_cartpole example and fail.

PPO OOM

When I ran with CPU it worked fine, but after install tensorflow-gpu, I got the error below. Perhaps need to share sessions across MPI processes? When I set num_cpu to 1, it worked fine.

2017-07-25 21:11:16.630413: E tensorflow/core/common_runtime/direct_session.cc:138] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimary
CtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 11711807488
Traceback (most recent call last):
  File "run_atari.py", line 54, in <module>
    main()
  File "run_atari.py", line 51, in main
    train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
  File "run_atari.py", line 23, in train
    sess = U.single_threaded_session()
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 233, in single_threaded_session
    return make_session(1)
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 228, in make_session
    return tf.Session(config=tf_config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session
.py", line 1292, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 562, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/framework/erro
rs_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))

Support for continuous action spaces

Would love to see DQN for continuous action spaces implemented (https://arxiv.org/pdf/1509.02971.pdf)

Multi thread to run on Mujoco?

Hi, I found that the current implementation of pposgd/run_mujoco.py using only single thread. Is it possible to modify it into multi-thread like this? Not sure if it will arouse bugs 😢

Pop-Art implementation ?

Looks like we've done a clip wrapper for reward, which might be not very good:

class ClippedRewardsWrapper(gym.RewardWrapper):
    def _reward(self, reward):
        """Change all the positive rewards to 1, negative to -1 and keep zero."""
        return np.sign(reward)

I found this article has done a DDQN without clip operation:
https://arxiv.org/pdf/1602.07714.pdf
Do we have any plan to implement DDQN based on this article?

Can not execute PPO and TRPO

Can PPO or TRPO execute?
I tried my best but failed.
Thank you~

Unable to download all pretrained models

When I try downloading any model with the dueling architecture, it downloads fine.
However, when I try downloading a model that does not use dueling, the download does not start and gets stuck as N/A.
The command I use is:
python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-breakout-1 --model-dir /tmp/models
I have tried it on a couple of computers, and I get the same issue every time.

Baseline code for policy gradient methods?

This repo is awesome! It saves me a lot of time implementing DQN myself. It's a real lifesaver. Many thanks to OpenAI! 👍

When do you plan to release Baseline code for policy gradient methods, like TRPO, A3C, and ACER? It's been almost 2 months since the DQN release. I look forward to the next announcement!

Blas GEMM launch failed

After upgrade to the TensorFlow 1.1 an example python -m baselines.deepq.experiments.train_cartpole stopped working for me. How it can be fixed?

2017-06-01 17:37:06.830729: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,224] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,262] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
2017-06-01 17:37:08.309557: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2017-06-01 17:37:08.309714: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1550] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
    return fn(*args)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
    status, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
    main()
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
    callback=callback
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 216, in learn
    action = act(np.array(obs)[None], update_eps=exploration.value(t))[0]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 402, in <lambda>
    return lambda *args, **kwargs: f(*args, **kwargs)[0]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 445, in __call__
    results = get_session().run(self.outputs_update, feed_dict=feed_dict)[:-1]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
    run_metadata_ptr)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'deepq/q_func/fully_connected/MatMul', defined at:
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
    main()
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
    callback=callback
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 178, in learn
    grad_norm_clipping=10
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 178, in build_train
    act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 111, in build_act
    q_values = q_func(observations_ph.get(), num_actions, scope="q_func")
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 27, in <lambda>
    return lambda *args, **kwargs: _mlp(hiddens, *args, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 9, in _mlp
    out = layers.fully_connected(out, num_outputs=hidden, activation_fn=tf.nn.relu)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1433, in fully_connected
    outputs = layer.apply(inputs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 320, in apply
    return self.__call__(inputs, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 290, in __call__
    outputs = self.call(inputs, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\core.py", line 144, in call
    outputs = standard_ops.matmul(inputs, self.kernel)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1801, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1263, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

warnings in cartpole example

The train_cartpole example generates the following warnings:
VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02 ~/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice. out=out, **kwargs) ~/.local/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

LSTM Model

Are there any thoughts how an LSTM model could be used with Baselines? I have some time series data and would love to use an RNN of sorts. I might be able to work on this project but would appreciate a point in the right direction to properly integrate the "LSTM STATE" and data series.

It seems like there are 2 complexities:

LSTM STATE needs to be captured in replace for random playback (might not need playback with LSTM)
Time horizon data for BP over time needs to be handled as well

Integration with Google Cloud ML Engine

We are using the deepq.mlp class to implement reinforcement learning and would like to host it on Google Cloud ML engine which requires the model to be exported into SavedModel format. My understanding of it is at a beginner level but I believe that requires us to pass the tf Session and input and output tensors to SavedModel builder.

I am not sure exactly how to get those from the deepq.mlp class or if there is maybe a much better way to do all this. Any help would be apreciated!

Example fails

Following example fails with error:

$ python -m baselines.deepq.experiments.train_cartpole

Error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 151, in _run_module_as_main
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/usr/lib/python2.7/runpy.py", line 101, in _get_module_details
    loader = get_loader(mod_name)
  File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
    __import__(pkg)
  File "build/bdist.linux-x86_64/egg/baselines/deepq/__init__.py", line 4, in <module>
  File "build/bdist.linux-x86_64/egg/baselines/deepq/simple.py", line 10, in <module>
  File "/usr/local/lib/python2.7/dist-packages/baselines-0.1.0-py2.7.egg/baselines/logger.py", line 139
    def log(*args, level=INFO):
                       ^
SyntaxError: invalid syntax

Last command line fails

Running the visualise command line fails with version incompatibility:
raise error.DeprecatedEnv('Env {} not found (valid versions include {})'.format(id, matching_envs)) gym.error.DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v4', 'BreakoutNoFrameskip-v0'])

Commandline used:
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

Baseline network does not learn Pong

If I run train_pong.py, I get a final score of -20.1, which is close to a random control and far away from the results of the original publication (21.0). Do I have to tweak parameters?

Benchmarking for PPO and TRPO

Thanks to the OpenAI team for the latest release!

Are there any benchmark results (like Atari score) on PPO and TRPO? DQN has a report here: https://github.com/openai/baselines-results. It's super useful. Thanks again!

Poor PPO 1 CPU performance on the Pong task

Hi,

I've started the default Pong training with run_atari.py on my laptop. The only change to the start parameters was a num_cpu =1. After more than a 2 days training the reward was still around -20.4. It started from -20.6 after a day of training temporary improved to the -20.2 and then dropped again to -20.4 without any changes for a quite long time. On the same laptop it took near the same time to train baselines vanilla DQN agent to the maximal reward 20+.

Is it an expected result for a single CPU PPO training?

a

sorry, accidentally opened an issue

Confusion between `done` and `info` in `env.step`, and the correct way we need to detect for when episodes complete.

I ran into an interesting problem today and, while I understand the solution, I'd like to explain it here and inquire about how OpenAI gym and OpenAI baselines are going to handle this going forward. I'm running gym version 0.9.2 and AtariPy 0.0.20, which is outdated but that's the version where the models were pre-trained for baselines here.

According to the docs, the env.step condition returns a done parameter which tells us that:

done (boolean): whether it's time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)

Note the emphasis on "you lost your last life". This is true, for instance when I run Breakout:

import gym 
import numpy as np

env = gym.make('Breakout-v0')
obs = env.reset()
done = False
steps = 0 

while not done:
    obs, rew, done, info = env.step(np.random.randint(env.action_space.n))
    steps += 1
    if done:
        print("done == True")
        print("info: {}".format(info))
print("steps: {}".format(steps))

The outcome is:

[2017-06-29 10:26:00,612] Making new env: Breakout-v0
done == True
info: {'ale.lives': 0}
steps: 271

However, the baselines code wraps several monitors around the environment, which results in different semantics of the method. To test, I downloaded the pre-trained Breakout-1 model for Prioritized, Dueling DQN. Then I ran the following command:

python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1/ --env Breakout --dueling

This runs the enjoy script. The only things I changed from the current master branch (version 0778e9f) are some print statements and removing the render since I was running ssh. You can see the git diff here:

git diff
diff --git a/baselines/deepq/experiments/atari/enjoy.py b/baselines/deepq/experiments/atari/enjoy.py
index fe482ca..ec5e78e 100644
--- a/baselines/deepq/experiments/atari/enjoy.py
+++ b/baselines/deepq/experiments/atari/enjoy.py
@@ -42,11 +42,13 @@ def play(env, act, stochastic, video_path):
         env, video_path, enabled=video_path is not None)
     obs = env.reset()
     while True:
-        env.unwrapped.render()
+        #env.unwrapped.render()
         video_recorder.capture_frame()
         action = act(np.array(obs)[None], stochastic=stochastic)[0]
         obs, rew, done, info = env.step(action)
         if done:
+            print("done == True")
+            print("info: {}".format(info))
             obs = env.reset()
         if len(info["rewards"]) > num_episodes:
             if len(info["rewards"]) == 1 and video_recorder.enabled:
@@ -56,6 +58,7 @@ def play(env, act, stochastic, video_path):
                 video_recorder.enabled = False
             print(info["rewards"][-1])
             num_episodes = len(info["rewards"])
+            print("we must have finished an episode here now\n")

I ran this, but then I saw this output:

[2017-06-29 10:22:35,014] Making new env: BreakoutNoFrameskip-v4
done == True
info: {'rewards': [], 'steps': 6845, 'ale.lives': 4}
done == True
info: {'rewards': [], 'steps': 9703, 'ale.lives': 3}
done == True
info: {'rewards': [], 'steps': 10228, 'ale.lives': 2}
done == True
info: {'rewards': [], 'steps': 16350, 'ale.lives': 1}
done == True
info: {'rewards': [], 'steps': 22194, 'ale.lives': 0}
846.0
we must have finished an episode here now

done == True
info: {'rewards': [846.0], 'steps': 24488, 'ale.lives': 4}
done == True
info: {'rewards': [846.0], 'steps': 33442, 'ale.lives': 3}
done == True
info: {'rewards': [846.0], 'steps': 35160, 'ale.lives': 2}
done == True
info: {'rewards': [846.0], 'steps': 37665, 'ale.lives': 1}
done == True
info: {'rewards': [846.0], 'steps': 38732, 'ale.lives': 0}
438.0
we must have finished an episode here now

I terminated the run after this, but what happens now is that the done semantics have changed and break from the docs. Instead, to detect when an episode finishes, I have to detect when the "rewards" list has increased in size, or when ale.lives is zero. This doesn't seem as elegant as the previous way of just detecting a single done==True condition.

In conclusion:

Detect when an episode finishes with the "rewards" list from info, NOT the done condition, despite what the documentation says.
More generally, if the documentation is going to break from default gym, then I think it should clarified somewhere.
In addition, is there any other set of formal documentation other than the website I linked to earlier, which hasn't changed (as far as I can tell) in about a year and a half.

Pip install fails on OS-X

On macOS Sierra (10.12.5) attempting to run pip install baselines results in the following error message:

  Using cached baselines-0.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/setup.py", line 8, in <module>
        with open(os.path.join(repo_dir, "README.md")) as f:
    IOError: [Errno 2] No such file or directory: '/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/README.md'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/```

Fails to import a module, from itself

Traceback (most recent call last):
File "pole_train.py", line 3, in
from baselines import deepq
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/simple.py", line 12, in
from baselines import deepq
ImportError: cannot import name 'deepq'

Not able to run pre-trained model

Both python2 and python3 were not working:
yhu@yhu-Aspire-M3920:$ python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib/python2.7/runpy.py", line 102, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
import(pkg)
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/simple.py", line 10, in
from baselines import logger
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/logger.py", line 139
def log(*args, level=INFO):
^
SyntaxError: invalid syntax
yhu@yhu-Aspire-M3920:$ python3 -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/deepq/experiments/atari/enjoy.py", line 15, in
from baselines.common.atari_wrappers_deprecated import wrap_dqn
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/common/atari_wrappers_deprecated.py", line 1, in
import cv2
ImportError: No module named 'cv2'
yhu@yhu-Aspire-M3920:~$

Small possible divergence with original DQN paper

Hi,

I noticed that there might be a slight difference between this implementation of the network and the original one by DeepMind. Maybe this is a known fact, but I didn't see it mentionned anywhere, and as this implementation seems to try to be as close as possible to the original one, I thought it'd be worth it to point it out.

It boils down to the fact that this implementation uses the default padding from TensorFlow, which is 'VALID', whereas DeepMind didn't document any padding on their convolutional layers.
If we refer to the Torch implementation they released, we can conclude that they used the default padding of Torch (which is 0 in the SpatialConvolution module), except for the first layer, where they used padding=1

After the convolutions, the image sizes are quite different: 7x7 for (py)Torch and 11x11 for TensorFlow (40% difference). As such, the input size of the first linear layer diverge (3136 vs 7744)

I'm not sure that makes a huge difference (be it positive or negative) in the outcome, but experience has proved that devil's in the details when it comes to deep architectures.

What do you guys think ?

Roadmap of DDPG, TRPO and Q-prop updates

Hi,

Quality of your DQN implementations is impressive. Looking forward for continuous control algorithms. Do you have, at least very approximate schedule when implementations of the DDPG, TRPO and Q-prop algorithms will be added?

Best regards,
Viktor

Program Fails in def log.

Hi, I am using baselines by installing with pip, and run python -m baselines.deepq.experiments.train_cartpole, but I encountered with:

Traceback (most recent call last):
  File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 102, in _get_module_details
    loader = get_loader(mod_name)
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 430, in iter_importers
    __import__(pkg)
  File "baselines/deepq/__init__.py", line 4, in <module>
    from baselines.deepq.simple import learn, load  # noqa
  File "baselines/deepq/simple.py", line 10, in <module>
    from baselines import logger
  File "baselines/logger.py", line 139
    def log(*args, level=INFO):
                       ^
SyntaxError: invalid syntax

How can I solve that?

Is a plan to replicate other RL algorithms like A3C, DPG, etc ?

Error when restoring model to run enjoy.py

Hi,

I was running these two commands:

python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-duel-breakout-1 --model-dir /tmp/models
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

in the bottom of README.

However, I got the following error:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
         [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3/_1)]]

Unable to download all pretrained models

Are all models listed in "python -m baselines.deepq.experiments.atari.download_model" available for download? I am unable to download anything without dueling nets.

Integration with Rllab

One thing that seems a bit redundant is the fact that there is openai/rllab and now openai/baselines implementing RL algorithms. It seems like it may be a worthwhile endeavor to merge the two in some way rather than have two parallel repositories that are supposed to have baseline RL implementations. Are there any plans to do so or any thoughts on this from the openai team?

Thanks.

no BreakoutNoFrameskip-v3 env

I test python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

Came across the following error:

[2017-05-25 09:54:25,435] Making new env: BreakoutNoFrameskip-v3

.....
DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v0', 'BreakoutNoFrameskip-v4'])

It seems error of version of our dependencies. Easy to make it runnable: gym<=0.8.2 and atari-py<=0.0.21

Finally I degrade gym and atari-py to successfully run the enjoy for v3.

Really enjoy~

How to set the randomness of the act during enjoy?

In the file enjoy_cartpole.py, the action is provided by
obs, rew, done, _ = env.step(act(obs[None])[0])

When the outputs of act(obs[None])[0] were printed out, it seemed that it did not print out the same action and changed somewhat randomly even though the input sequence was the same.

How can it be set to work as the simple greedy action?
How can the rate of randomness can be controlled?

cheers,

Could you explain how to execute PPO and TRPO?

There is readme explaining all the process to execute deepqn algorithm.

However, there is no such thing for PPO and TRPO....

Could you please explain how to execute PPO and TRPO?

what's the detail means of the available models name?

Greetings all! I have run "python -m baselines.deepq.experiments.atari.download_model". It listed some available models's name. But I'm in a puzzle about the detail means of model's name. For example, what the differences between "model-atari-alien-1","model-atari-alien-2",and "model-atari-alien-3", are they trained by dqn or double dqn? "model-atari-duel-alien-1" was trained with dueling double dqn or dueling dqn? what the detail about "model-atari-rb100000-test-seaquest-1", and the meaning of rb100000? What's more,how can I know the detail params were used to trained these models? Thanks!

Atari wrappers deprecated?

Quick question:

Why are the atari wrappers deprecated? Do you plan to add the non-deprecated version of wrappers soon?

Save and load of the trained TRPO and PPO agents

Hi,

How save, load and visualization of the trained agents with TRPO or PPO algorithms can be done?

ImportError: cannot import name 'weakref'

any comment?

➜  baselines git:(master) python baselines/pposgd/run_atari.py
Traceback (most recent call last):
  File "baselines/pposgd/run_atari.py", line 54, in <module>
    main()
  File "baselines/pposgd/run_atari.py", line 51, in main
    train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
  File "baselines/pposgd/run_atari.py", line 18, in train
    from baselines.pposgd import pposgd_simple, cnn_policy
  File "/Users/Tiger/projects/baselines/baselines/pposgd/pposgd_simple.py", line 3, in <module>
    import baselines.common.tf_util as U
  File "/Users/Tiger/projects/baselines/baselines/common/tf_util.py", line 2, in <module>
    import tensorflow as tf  # pylint: ignore-module
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 63, in <module>
    from tensorflow.python.framework.framework_lib import *
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/framework_lib.py", line 100, in <module>
    from tensorflow.python.framework.subscribe import subscribe
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/subscribe.py", line 26, in <module>
    from tensorflow.python.ops import variables
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 26, in <module>
    from tensorflow.python.ops import control_flow_ops
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 70, in <module>
    from tensorflow.python.ops import tensor_array_ops
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 33, in <module>
    from tensorflow.python.util import tf_should_use
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py", line 28, in <module>
    from backports import weakref  # pylint: disable=g-bad-import-order
ImportError: cannot import name 'weakref'

MaxAndSkipEnv is calculating the max over the last two time steps only

The MaxAndSkipEnv as it is right now is skipping over the skip observations and calculating the total reward properly over all the time steps, but the max over the observations is only calculated over the last 2 observations regardless of skip size. Is this intentional?

One could move the max_frame line into the loop and then it could use the deque of size 2 to keep track of the max over all the skip time step.

Please expose size, number and activation function parameters for DDPG actor and critic

Great work with adding DDPG with parameter space noise, thank!

Can you also expose additional command line parameters in main.py like sizes of the layers, their numbers and activation functions for DDPG actor and critic. Currently they can only be set in models.py

cannot import name 'deepq'

On a fresh Debian GNU/Linux 3.16.0-4-amd64 install I tried:

wget https://bootstrap.pypa.io/get-pip.py
sudo python3.4 get-pip.py
sudo pip install baselines
python3.4 -m baselines.deepq.experiments.train_cartpole

Error: "/usr/bin/python3.4: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')"

I couldn't find any mention of deepq in the feedback from the installation.
baselines_install_log.txt

A little bug in line 111 in replay_buffer.py??

mass = random.random() * self._it_sum.sum(0, len(self._storage) - 1)

seems should be:

mass = random.random() * self._it_sum.sum(0, len(self._storage) ) ??

failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

I got the following error while training the cart pole example:

failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
screen.txt

There is no issue with my cuda driver as I have verified the installation by running some for the cuda sample codes

I guess the issue can be resolved by making the following changes in the code:
https://stackoverflow.com/questions/41117740/tensorflow-crashes-with-cublas-status-alloc-failed

Fail to load pretrained checkpoints

Hi,
I try to load a part of variables from download models. But it turns out Key Not Found Error, even though the variables names are same. The only difference is the ':0' in tails, but I think it does not matter since this is added automatically by tensorflow op.

Here is what I read from checkpoints and the Key not found error log.

openai / baselines Goto Github PK

baselines's People

Contributors

Stargazers

Watchers

Forkers

baselines's Issues

Error:

Recommend Projects

Recommend Topics

Recommend Org

Jobs