openai / baselines Goto Github PK
View Code? Open in Web Editor NEWOpenAI Baselines: high-quality implementations of reinforcement learning algorithms
License: MIT License
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
License: MIT License
Are all models listed in "python -m baselines.deepq.experiments.atari.download_model" available for download? I am unable to download anything without dueling nets.
In the file enjoy_cartpole.py
, the action is provided by
obs, rew, done, _ = env.step(act(obs[None])[0])
When the outputs of act(obs[None])[0]
were printed out, it seemed that it did not print out the same action and changed somewhat randomly even though the input sequence was the same.
How can it be set to work as the simple greedy action?
How can the rate of randomness can be controlled?
cheers,
One thing that seems a bit redundant is the fact that there is openai/rllab and now openai/baselines implementing RL algorithms. It seems like it may be a worthwhile endeavor to merge the two in some way rather than have two parallel repositories that are supposed to have baseline RL implementations. Are there any plans to do so or any thoughts on this from the openai team?
Thanks.
Thanks to the OpenAI team for the latest release!
Are there any benchmark results (like Atari score) on PPO and TRPO? DQN has a report here: https://github.com/openai/baselines-results. It's super useful. Thanks again!
sorry, accidentally opened an issue
I got the following error while training the cart pole example:
failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
screen.txt
There is no issue with my cuda driver as I have verified the installation by running some for the cuda sample codes
I guess the issue can be resolved by making the following changes in the code:
https://stackoverflow.com/questions/41117740/tensorflow-crashes-with-cublas-status-alloc-failed
Traceback (most recent call last):
File "pole_train.py", line 3, in
from baselines import deepq
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/simple.py", line 12, in
from baselines import deepq
ImportError: cannot import name 'deepq'
On a fresh Debian GNU/Linux 3.16.0-4-amd64 install I tried:
wget https://bootstrap.pypa.io/get-pip.py
sudo python3.4 get-pip.py
sudo pip install baselines
python3.4 -m baselines.deepq.experiments.train_cartpole
Error: "/usr/bin/python3.4: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')"
I couldn't find any mention of deepq in the feedback from the installation.
baselines_install_log.txt
I'm trying to run some example code in pybullet, see bulletphysics/bullet3#1234 (comment), that is using baselines but I'm getting an error on import, and they mentioned this is most likely an upstream issue.
I'm on python 2.7.13 on OS X. Perhaps this is a problem with baselines?
athundt at Andrews-2013-MacBook-Pro-2 in ~/src/bullet3/examples/pybullet/gym on master!
± python train_pybullet_racecar.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
File "train_pybullet_racecar.py", line 4, in <module>
from baselines import deepq
File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
from baselines.deepq.build_graph import build_act, build_train # noqa
File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
import baselines.common.tf_util as U
File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
import builtins
ImportError: No module named builtins
train_kuka_grasping.py
± python train_kuka_grasping.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
File "train_kuka_grasping.py", line 4, in <module>
from baselines import deepq
File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
from baselines.deepq.build_graph import build_act, build_train # noqa
File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
import baselines.common.tf_util as U
File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
import builtins
ImportError: No module named builtins
tf version:
± python -c 'import tensorflow as tf; print(tf.__version__)'
1.2.0
I installed by running pip install baselines
, and a full list of installed packages is at bulletphysics/bullet3#1234 (comment)
Looks like we've done a clip wrapper for reward, which might be not very good:
class ClippedRewardsWrapper(gym.RewardWrapper):
def _reward(self, reward):
"""Change all the positive rewards to 1, negative to -1 and keep zero."""
return np.sign(reward)
I found this article has done a DDQN without clip operation:
https://arxiv.org/pdf/1602.07714.pdf
Do we have any plan to implement DDQN based on this article?
Hi,
I was running these two commands:
python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-duel-breakout-1 --model-dir /tmp/models
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling
in the bottom of README.
However, I got the following error:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
[[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3/_1)]]
mass = random.random() * self._it_sum.sum(0, len(self._storage) - 1)
seems should be:
mass = random.random() * self._it_sum.sum(0, len(self._storage) ) ??
I test python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling
Came across the following error:
[2017-05-25 09:54:25,435] Making new env: BreakoutNoFrameskip-v3
.....
DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v0', 'BreakoutNoFrameskip-v4'])
It seems error of version of our dependencies. Easy to make it runnable: gym<=0.8.2 and atari-py<=0.0.21
Finally I degrade gym and atari-py to successfully run the enjoy for v3
.
Really enjoy~
There is readme explaining all the process to execute deepqn algorithm.
However, there is no such thing for PPO and TRPO....
Could you please explain how to execute PPO and TRPO?
Hi,
I've started the default Pong training with run_atari.py on my laptop. The only change to the start parameters was a num_cpu =1. After more than a 2 days training the reward was still around -20.4. It started from -20.6 after a day of training temporary improved to the -20.2 and then dropped again to -20.4 without any changes for a quite long time. On the same laptop it took near the same time to train baselines vanilla DQN agent to the maximal reward 20+.
Is it an expected result for a single CPU PPO training?
Hi,
How save, load and visualization of the trained agents with TRPO or PPO algorithms can be done?
Great work with adding DDPG with parameter space noise, thank!
Can you also expose additional command line parameters in main.py like sizes of the layers, their numbers and activation functions for DDPG actor and critic. Currently they can only be set in models.py
This repo is awesome! It saves me a lot of time implementing DQN myself. It's a real lifesaver. Many thanks to OpenAI! 👍
When do you plan to release Baseline code for policy gradient methods, like TRPO, A3C, and ACER? It's been almost 2 months since the DQN release. I look forward to the next announcement!
Are there any thoughts how an LSTM model could be used with Baselines? I have some time series data and would love to use an RNN of sorts. I might be able to work on this project but would appreciate a point in the right direction to properly integrate the "LSTM STATE" and data series.
It seems like there are 2 complexities:
Hi, I am using baselines by installing with pip, and run python -m baselines.deepq.experiments.train_cartpole
, but I encountered with:
Traceback (most recent call last):
File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 102, in _get_module_details
loader = get_loader(mod_name)
File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 430, in iter_importers
__import__(pkg)
File "baselines/deepq/__init__.py", line 4, in <module>
from baselines.deepq.simple import learn, load # noqa
File "baselines/deepq/simple.py", line 10, in <module>
from baselines import logger
File "baselines/logger.py", line 139
def log(*args, level=INFO):
^
SyntaxError: invalid syntax
How can I solve that?
According to the doc, find_prefixsum_idx
method should return the highest index i in the array such that sum(arr[0] ... arr[i-1]) <= prefixsum
.
If this is true, shouldn't the test return 4 instead of 3?
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L44
and here, the test should also return 4 instead of 3
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L60
When i == 4
, sum(arr[0]+arr[1]+arr[2]+arr[3] <= 4.0
holds.
When I try downloading any model with the dueling architecture, it downloads fine.
However, when I try downloading a model that does not use dueling, the download does not start and gets stuck as N/A.
The command I use is:
python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-breakout-1 --model-dir /tmp/models
I have tried it on a couple of computers, and I get the same issue every time.
When I tried to run train.py
in the atari
folder, I found the ETA reached 16 days after a few minutes and the usage of GPU was quite low.
Greetings all! I have run "python -m baselines.deepq.experiments.atari.download_model". It listed some available models's name. But I'm in a puzzle about the detail means of model's name. For example, what the differences between "model-atari-alien-1","model-atari-alien-2",and "model-atari-alien-3", are they trained by dqn or double dqn? "model-atari-duel-alien-1" was trained with dueling double dqn or dueling dqn? what the detail about "model-atari-rb100000-test-seaquest-1", and the meaning of rb100000? What's more,how can I know the detail params were used to trained these models? Thanks!
Hi,
I noticed that there might be a slight difference between this implementation of the network and the original one by DeepMind. Maybe this is a known fact, but I didn't see it mentionned anywhere, and as this implementation seems to try to be as close as possible to the original one, I thought it'd be worth it to point it out.
It boils down to the fact that this implementation uses the default padding from TensorFlow, which is 'VALID', whereas DeepMind didn't document any padding on their convolutional layers.
If we refer to the Torch implementation they released, we can conclude that they used the default padding of Torch (which is 0 in the SpatialConvolution module), except for the first layer, where they used padding=1
After the convolutions, the image sizes are quite different: 7x7 for (py)Torch and 11x11 for TensorFlow (40% difference). As such, the input size of the first linear layer diverge (3136 vs 7744)
I'm not sure that makes a huge difference (be it positive or negative) in the outcome, but experience has proved that devil's in the details when it comes to deep architectures.
What do you guys think ?
Quick question:
Why are the atari wrappers deprecated? Do you plan to add the non-deprecated version of wrappers soon?
We are using the deepq.mlp class to implement reinforcement learning and would like to host it on Google Cloud ML engine which requires the model to be exported into SavedModel format. My understanding of it is at a beginner level but I believe that requires us to pass the tf Session and input and output tensors to SavedModel builder.
I am not sure exactly how to get those from the deepq.mlp class or if there is maybe a much better way to do all this. Any help would be apreciated!
Get this error when I run the first example python3 -m baselines.deepq.experiments.train_cartpole
:
/usr/bin/python3: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')
I have both Python 2 and 3 installed. Thus I installed baselines
with pip3
.
Any suggestions?
When I ran with CPU it worked fine, but after install tensorflow-gpu, I got the error below. Perhaps need to share sessions across MPI processes? When I set num_cpu to 1, it worked fine.
2017-07-25 21:11:16.630413: E tensorflow/core/common_runtime/direct_session.cc:138] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimary
CtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 11711807488
Traceback (most recent call last):
File "run_atari.py", line 54, in <module>
main()
File "run_atari.py", line 51, in main
train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
File "run_atari.py", line 23, in train
sess = U.single_threaded_session()
File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 233, in single_threaded_session
return make_session(1)
File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 228, in make_session
return tf.Session(config=tf_config)
File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session
.py", line 1292, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 562, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/home/ben/miniconda3/envs/gym/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/framework/erro
rs_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
Hi,
Quality of your DQN implementations is impressive. Looking forward for continuous control algorithms. Do you have, at least very approximate schedule when implementations of the DDPG, TRPO and Q-prop algorithms will be added?
Best regards,
Viktor
Would love to see DQN for continuous action spaces implemented (https://arxiv.org/pdf/1509.02971.pdf)
Running the visualise command line fails with version incompatibility:
raise error.DeprecatedEnv('Env {} not found (valid versions include {})'.format(id, matching_envs)) gym.error.DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v4', 'BreakoutNoFrameskip-v0'])
Commandline used:
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling
If I run train_pong.py, I get a final score of -20.1, which is close to a random control and far away from the results of the original publication (21.0). Do I have to tweak parameters?
Both python2 and python3 were not working:
yhu@yhu-Aspire-M3920:$ python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling$ python3 -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib/python2.7/runpy.py", line 102, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
import(pkg)
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/simple.py", line 10, in
from baselines import logger
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/logger.py", line 139
def log(*args, level=INFO):
^
SyntaxError: invalid syntax
yhu@yhu-Aspire-M3920:
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/deepq/experiments/atari/enjoy.py", line 15, in
from baselines.common.atari_wrappers_deprecated import wrap_dqn
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/common/atari_wrappers_deprecated.py", line 1, in
import cv2
ImportError: No module named 'cv2'
yhu@yhu-Aspire-M3920:~$
I'm able to run most other pretrained models except Breakout. Pong and BeamRider have no problem. Breakout has tensorflow mismatch error when loading the model parameters. The error happens to all the Breakout models: vanilla, prior, duel, and prior-duel.
My command for vanilla breakout-1
model:
python -m baselines.deepq.experiments.atari.enjoy --model-dir ~/Temp/models/model-atari-breakout-1 --env Breakout
Error message:
Caused by op 'save/Assign_3', defined at:
File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/miria/baselines/baselines/deepq/experiments/atari/enjoy.py", line 69, in <module>
U.load_state(os.path.join(args.model_dir, "saved"))
File "/Users/miria/baselines/baselines/common/tf_util.py", line 272, in load_state
saver = tf.train.Saver()
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
self.build()
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1170, in build
restore_sequentially=self._restore_sequentially)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
validate_shape=validate_shape)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
use_locking=use_locking, name=name)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
[[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3)]]
The train_cartpole example generates the following warnings:
VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02 ~/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice. out=out, **kwargs) ~/.local/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)
On macOS Sierra (10.12.5) attempting to run pip install baselines
results in the following error message:
Using cached baselines-0.1.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/setup.py", line 8, in <module>
with open(os.path.join(repo_dir, "README.md")) as f:
IOError: [Errno 2] No such file or directory: '/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/README.md'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/```
Following example fails with error:
$ python -m baselines.deepq.experiments.train_cartpole
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 151, in _run_module_as_main
mod_name, loader, code, fname = _get_module_details(mod_name)
File "/usr/lib/python2.7/runpy.py", line 101, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
__import__(pkg)
File "build/bdist.linux-x86_64/egg/baselines/deepq/__init__.py", line 4, in <module>
File "build/bdist.linux-x86_64/egg/baselines/deepq/simple.py", line 10, in <module>
File "/usr/local/lib/python2.7/dist-packages/baselines-0.1.0-py2.7.egg/baselines/logger.py", line 139
def log(*args, level=INFO):
^
SyntaxError: invalid syntax
Hi, the environments here use Atari of Gym. Are they totally same as Atari of ALE?
I ran into an interesting problem today and, while I understand the solution, I'd like to explain it here and inquire about how OpenAI gym and OpenAI baselines are going to handle this going forward. I'm running gym version 0.9.2 and AtariPy 0.0.20, which is outdated but that's the version where the models were pre-trained for baselines here.
According to the docs, the env.step
condition returns a done
parameter which tells us that:
done (boolean): whether it's time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
Note the emphasis on "you lost your last life". This is true, for instance when I run Breakout:
import gym
import numpy as np
env = gym.make('Breakout-v0')
obs = env.reset()
done = False
steps = 0
while not done:
obs, rew, done, info = env.step(np.random.randint(env.action_space.n))
steps += 1
if done:
print("done == True")
print("info: {}".format(info))
print("steps: {}".format(steps))
The outcome is:
[2017-06-29 10:26:00,612] Making new env: Breakout-v0
done == True
info: {'ale.lives': 0}
steps: 271
However, the baselines code wraps several monitors around the environment, which results in different semantics of the method. To test, I downloaded the pre-trained Breakout-1 model for Prioritized, Dueling DQN. Then I ran the following command:
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1/ --env Breakout --dueling
This runs the enjoy script. The only things I changed from the current master branch (version 0778e9f) are some print statements and removing the render since I was running ssh. You can see the git diff
here:
git diff
diff --git a/baselines/deepq/experiments/atari/enjoy.py b/baselines/deepq/experiments/atari/enjoy.py
index fe482ca..ec5e78e 100644
--- a/baselines/deepq/experiments/atari/enjoy.py
+++ b/baselines/deepq/experiments/atari/enjoy.py
@@ -42,11 +42,13 @@ def play(env, act, stochastic, video_path):
env, video_path, enabled=video_path is not None)
obs = env.reset()
while True:
- env.unwrapped.render()
+ #env.unwrapped.render()
video_recorder.capture_frame()
action = act(np.array(obs)[None], stochastic=stochastic)[0]
obs, rew, done, info = env.step(action)
if done:
+ print("done == True")
+ print("info: {}".format(info))
obs = env.reset()
if len(info["rewards"]) > num_episodes:
if len(info["rewards"]) == 1 and video_recorder.enabled:
@@ -56,6 +58,7 @@ def play(env, act, stochastic, video_path):
video_recorder.enabled = False
print(info["rewards"][-1])
num_episodes = len(info["rewards"])
+ print("we must have finished an episode here now\n")
I ran this, but then I saw this output:
[2017-06-29 10:22:35,014] Making new env: BreakoutNoFrameskip-v4
done == True
info: {'rewards': [], 'steps': 6845, 'ale.lives': 4}
done == True
info: {'rewards': [], 'steps': 9703, 'ale.lives': 3}
done == True
info: {'rewards': [], 'steps': 10228, 'ale.lives': 2}
done == True
info: {'rewards': [], 'steps': 16350, 'ale.lives': 1}
done == True
info: {'rewards': [], 'steps': 22194, 'ale.lives': 0}
846.0
we must have finished an episode here now
done == True
info: {'rewards': [846.0], 'steps': 24488, 'ale.lives': 4}
done == True
info: {'rewards': [846.0], 'steps': 33442, 'ale.lives': 3}
done == True
info: {'rewards': [846.0], 'steps': 35160, 'ale.lives': 2}
done == True
info: {'rewards': [846.0], 'steps': 37665, 'ale.lives': 1}
done == True
info: {'rewards': [846.0], 'steps': 38732, 'ale.lives': 0}
438.0
we must have finished an episode here now
I terminated the run after this, but what happens now is that the done
semantics have changed and break from the docs. Instead, to detect when an episode finishes, I have to detect when the "rewards" list has increased in size, or when ale.lives is zero. This doesn't seem as elegant as the previous way of just detecting a single done==True
condition.
In conclusion:
info
, NOT the done
condition, despite what the documentation says.any comment?
➜ baselines git:(master) python baselines/pposgd/run_atari.py
Traceback (most recent call last):
File "baselines/pposgd/run_atari.py", line 54, in <module>
main()
File "baselines/pposgd/run_atari.py", line 51, in main
train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
File "baselines/pposgd/run_atari.py", line 18, in train
from baselines.pposgd import pposgd_simple, cnn_policy
File "/Users/Tiger/projects/baselines/baselines/pposgd/pposgd_simple.py", line 3, in <module>
import baselines.common.tf_util as U
File "/Users/Tiger/projects/baselines/baselines/common/tf_util.py", line 2, in <module>
import tensorflow as tf # pylint: ignore-module
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 63, in <module>
from tensorflow.python.framework.framework_lib import *
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/framework_lib.py", line 100, in <module>
from tensorflow.python.framework.subscribe import subscribe
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/subscribe.py", line 26, in <module>
from tensorflow.python.ops import variables
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 26, in <module>
from tensorflow.python.ops import control_flow_ops
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 70, in <module>
from tensorflow.python.ops import tensor_array_ops
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 33, in <module>
from tensorflow.python.util import tf_should_use
File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py", line 28, in <module>
from backports import weakref # pylint: disable=g-bad-import-order
ImportError: cannot import name 'weakref'
Hi,
I try to load a part of variables from download models. But it turns out Key Not Found Error
, even though the variables names are same. The only difference is the ':0' in tails, but I think it does not matter since this is added automatically by tensorflow op.
Here is what I read from checkpoints and the Key not found error log.
May I kindly ask for some help / hint regarding the following question / problem:
https://stackoverflow.com/questions/44813861/record-activations-of-openai-baselines-implementation
Can PPO or TRPO execute?
I tried my best but failed.
Thank you~
I am trying to save the Q network and reload it and continue improving it.
This is how I save act
every few episodes:
ActWrapper(act, act_params).save("myfile.pkl")
However, when I load it, I get an error saying that some variables are exist. This is how I load a saved act
:
act, train, updated_target, debug = deepq.builld_train(....)
act = ActWrapper.load("myfile.pkl")
Any idea would be appreciated.
The document doesn't state upfront that the code requires Python 3 to run; I only realized this when I got an error about no module named builtins
.
In addition, the requires.txt
file doesn't state gym
as a requirement. I realise most people who install this will probably have that module anyway, but in cases where they don't, the earliest they'll realize something is wrong is when they try to execute the python -m baselines.deepq.experiments.train_cartpole
example and fail.
The MaxAndSkipEnv as it is right now is skipping over the skip
observations and calculating the total reward properly over all the time steps, but the max over the observations is only calculated over the last 2 observations regardless of skip size. Is this intentional?
One could move the max_frame line into the loop and then it could use the deque of size 2 to keep track of the max over all the skip time step.
Hi, I found that the current implementation of pposgd/run_mujoco.py
using only single thread. Is it possible to modify it into multi-thread like this? Not sure if it will arouse bugs 😢
After upgrade to the TensorFlow 1.1 an example python -m baselines.deepq.experiments.train_cartpole stopped working for me. How it can be fixed?
2017-06-01 17:37:06.830729: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,224] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,262] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
2017-06-01 17:37:08.309557: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2017-06-01 17:37:08.309714: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1550] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
return fn(*args)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
status, run_metadata)
File "C:\Users\Viktor\Anaconda3\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
[[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
[[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
main()
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
callback=callback
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 216, in learn
action = act(np.array(obs)[None], update_eps=exploration.value(t))[0]
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 402, in <lambda>
return lambda *args, **kwargs: f(*args, **kwargs)[0]
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 445, in __call__
results = get_session().run(self.outputs_update, feed_dict=feed_dict)[:-1]
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
run_metadata_ptr)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
[[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
[[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'deepq/q_func/fully_connected/MatMul', defined at:
File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
main()
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
callback=callback
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 178, in learn
grad_norm_clipping=10
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 178, in build_train
act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 111, in build_act
q_values = q_func(observations_ph.get(), num_actions, scope="q_func")
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 27, in <lambda>
return lambda *args, **kwargs: _mlp(hiddens, *args, **kwargs)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 9, in _mlp
out = layers.fully_connected(out, num_outputs=hidden, activation_fn=tf.nn.relu)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1433, in fully_connected
outputs = layer.apply(inputs)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 320, in apply
return self.__call__(inputs, **kwargs)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 290, in __call__
outputs = self.call(inputs, **kwargs)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\core.py", line 144, in call
outputs = standard_ops.matmul(inputs, self.kernel)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1801, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1263, in _mat_mul
transpose_b=transpose_b, name=name)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
op_def=op_def)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
[[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
[[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.