GithubHelp home page GithubHelp logo

miyosuda / unreal Goto Github PK

View Code? Open in Web Editor NEW
415.0 34.0 131.0 702 KB

Reinforcement learning with unsupervised auxiliary tasks

License: Other

Shell 0.15% Python 99.85%
tensorflow deepmind-lab reinforcement-learning unreal

unreal's Introduction

UNREAL

CircleCI

About

Replicating UNREAL algorithm described in Google Deep Mind's paper "Reinforcement learning with unsupervised auxiliary tasks."

https://arxiv.org/pdf/1611.05397.pdf

Implemented with TensorFlow and DeepMind Lab environment.

Preview

seekavoid_arena_01

seekavoid_arena_01

stairway_to_melon

stairway_to_melon

nav_maze_static_01

nav_maze_static_01

Network

Network

All weights of convolution layers and LSTM layer are shared.

Requirements

  • TensorFlow (Tested with r1.0)
  • DeepMind Lab
  • numpy
  • cv2
  • pygame
  • matplotlib

Result

"seekavoid_arena_01" Level

seekavoid_01_score

"nav_maze_static_01" Level

nav_maze_static_01_score

How to train

First, download and install DeepMind Lab

$ git clone https://github.com/deepmind/lab.git

Then build it following the build instruction. https://github.com/deepmind/lab/blob/master/docs/build.md

Clone this repo in lab directory.

$ cd lab
$ git clone https://github.com/miyosuda/unreal.git

Add this bazel instruction at the end of lab/BUILD file

package(default_visibility = ["//visibility:public"])

Then run bazel command to run training.

bazel run //unreal:train --define headless=glx

--define headlesss=glx uses GPU rendering and it requires display not to sleep. (We need to disable display sleep.)

If you have any trouble with GPU rendering, please use software rendering with --define headless=osmesa option.

How to show result

To show result after training, run this command.

bazel run //unreal:display --define headless=glx

unreal's People

Contributors

arpit15 avatar miyosuda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unreal's Issues

Error when running `unreal:display`

Hello,
when I was running your code, I got an error for the display target:

Fatal Python error: (pygame parachute) Segmentation Fault
Assertion 'pa_atomic_load(&(b)->_ref) > 0' failed at pulsecore/memblock.c:597, function pa_memblock_unref(). Aborting.

It seems it happens when resetting the environment, namely when checking the argument of the if, namely in Com_init (it seems gc->command_line is empty -- do you know if it should be so?). It looks like a problem within the lab, but on the other hand I have run the tests of the lab itself and they passed (including showing some video of the random agent).

Do you have any idea what could be wrong and how to debug it?
Thanks for the help and for providing the library :)

A question about visualize.py

I tested the script on the Atari game, but during the visualization process, I don’t know what caused the graph below instead of the curve of the reward change with the number of trainings. Hope someone clarifies. thanks

image

About reward prediction task

Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?

Thanks.

Log uniform learning rates

Great implementation, thanks for sharing!

I'm trying to understand the log uniform learning rates (p12 of the UNREAL paper, line 27 of main.py). My understanding of the implementation would be to randomly select a value in the log uniform distribution between the alpha_low and alpha_high values, something like

def loguniform(low=0, high=1, size=None):
    return np.exp(np.random.uniform(low, high, size))

(from https://stackoverflow.com/a/43977980). Can you comment on the "initial_alpha_log_rate" option, and static learning rate for each training thread?

Another note I have is around maze_environment.py. I tried training it with the default values (just to make sure everything was running smoothly), and it looked like it was diverging. Changing the hit reward to -0.001 (instead of -1) allows it to train with the default hyperparameters. If that's unexpected let me know, otherwise I can submit a PR.

Parameter configuration nav_maze_static_01

I tried to reproduce your learning curve for nav_maze_static_01, but after 10 million steps it hasn't learnt a lot whilst your curve seems way better. Did you use a specific parameter configuration different from the configuration in your commits to generate this curve and if so, would you mind sharing these?

The implementation?

Hi, thanks for posting the code, really nice work!

I remember when I checked this repository a few weeks ago the readme.md said it did not work well - even worse than base A3C over the Atari game "breakout".

Now the code seems work well. Could you comment on what improvement did you make to manage this? Any technical details/tricks to share? Any pitfalls to avoid? Thanks so much!!

use deepmind lab as a simple python module?

Sorry that I do not know much about the bazel.
If I want to use deepmind lab, do I have to run bazel run //unreal:train --define headless=osmesa ?
Is it possible that simply run python train.py
How do you debug with the deepmind_lab(with bazel run xxx)?
I ran bazel run //unreal:train --define headless=osmesa, but perhaps due to the tensroflow 1.0 upgrade, some errors appear, even after I adjusted the code to tf1.0
Also, thank you for sharing the code!

Replicating inputs for VR and PC

Hello Miyosuda, I have observed that you are replicating the input of the base network for PC and VR. Replicating the conv layers and reusing kernels/pars. Are you sure this is required? To my understanding TF should be able to compute PC/VR by just reusing self.base_rnn_outputs. It should only process the required operations to get the requested data.

Regards and thanks for sharing your code.

Original RGB image from the lab simulator

Hi @miyosuda ,
Could you please suggest where is the best place to obtain the raw RGB image/frame before it is changed to grayscale etc. ? I wish to preprocess the raw frames before they are used in the algorithm.

In trainer.py self.environment.last_state seems to be the processed image frame in grayscale and also new_state in new_state, reward, terminal, pixel_change = self.environment.process(action) seems to be returning the current image frame with image = self._get_current_image().

The following function then is the one I am not sure about, what is put_pixel's functionality here?

  def _get_current_image(self):
    image = np.array(self._maze_image)
    self._put_pixel(image, self.x, self.y, 1)
    return image

labenvironment.py on the other hand uses the following preprocessing:

  def _preprocess_frame(self, image):
    image = image.astype(np.float32)
    image = image / 255.0
    return image

Could you please suggest if my understanding is correct and also where exactly is the raw RGB image before resizing obtained in the maze environment? Really appreciate your inputs. thanks a lot.

How to change the env_name

Sorry,I wonder to know where to change the env_name,I can't find it. I changed it in options but have no effect.

Issues in _process_base and _process_pc

This is really a nice work! I have some questions about the implementation details. Each "frame" stores prev_state, action, reward, terminal, last_action, last_reward, and thus the observation from "new_state" is not stored and is only available through "self.environment" object. In trainer.py line 212, since new_state is reached, shouldn't the action_reward be from environment.last_action and environment.last_reward instead of frame?
Also in trainer.py line 255, the observation from new state is not stored in frame. Specifically, pc_experience_frames[0].terminal (from the 21st frame) indicates whether the 22nd state is a terminal state. But the inference R = 0 or R = max_a Q(s, a) is computed for the 21st state.

bazel failed to build, not visible from target

I am new to bazel and am unabe to get it to build.

I added package(default_visibility = ["//visibility:public"]) to the bottom of the build file in the lab folder

I have tried building with the osmesa option as well as the glx option
Here is the output error when I run the command from the unreal folder.

teves@teves:~/RL_Codes/lab/unreal$ bazel run //unreal:train --define headless=glx
ERROR: /home/teves/RL_Codes/lab/unreal/BUILD:11:1: Target '//:deepmind_lab.so' is not visible from target '//unreal:train'. Check the visibility declaration of the former target if you think the dependency is legitimate
ERROR: Analysis of target '//unreal:train' failed; build aborted: Analysis of target '//unreal:train' failed; build aborted
INFO: Elapsed time: 0.593s
FAILED: Build did NOT complete successfully (0 packages loaded)
ERROR: Build failed. Not running target

I am in a conda environment.

What am I doing wrong?

Base Process Questions

Hi @miyosuda

thanks a lot for your code.
I was wondering how would decreasing local_tmax impact the base process and learning?

Currently, the code uses 20-time steps inside the base process, and hence the for loops goes for 20 steps. I want to speed up the code - would reducing the local_tmax help?

How can we speed up the learning according to you? Thanks a lot for your inputs.

Training time

Hi,
The figures in the results section plot rewards vs time-steps during training (for seekavoid_arena and static_maze). How much host time (average) does each time-step take?

How to plot the reward diagram?

When i type :

tensorboard --logdir=./tmp/unreal_log

nothing shows up in "Scalar" Section. Can you describe in the README.md how to do it?

edit : nevermind. There is a "score" section.

Shouldn't A3C updates already happen before filling the replay buffer?

First of all, great work @miyosuda !

I studied your project and the DeepMind paper, but as far as I understood the paper, I think the base A3C algorithm should already start updating the base A3C part of the network from the start. After a while, when the replay buffer is filled, the auxiliary tasks also begin to be trained.

Is my assumption correct?

Questions about Experience Replay Buffer

Hi @miyosuda

Thanks again for the open-source code implementation. It is of great help.
I had a doubt on the way experience replay buffer is being filled.

In the main.py the when the top level process is called for each environment,

diff_global_t = trainer.process(self.sess,
                                      self.global_t,
                                      self.summary_writer,
                                      self.summary_op,
                                      self.score_input)

the replay buffer is being filled here in the below lines of code since the experience will not be full at the start- Am I correct?.

 # Fill experience replay buffer
    if not self.experience.is_full():
      self._fill_experience(sess)
      return 0

Then inside the base A3C process, we keep adding the new frames in the below lines:

frame = ExperienceFrame(prev_state, reward, action, terminal, pixel_change,
                              last_action, last_reward)

      # Store to experience
      self.experience.add_frame(frame)

So just to confirm, the _process_base function will control what goes to the experience replay always, is this understanding correct of the implementation? Although, at first instance, the auxiliary tasks (VR, RP, PC) use the experience frames from the foremost filling which happened outside the base process? Is this correct? Am I missing something?

Thank you for your time in clarification on these doubts.

the score is not reset to 0 when episode terminates

In 'nav_maze_static_01' environment, each apple is accounted as 1 point and the final target as 10 points. So the 80+ score is not very reasonable. And by running the display process, I observe that the score is not reset to 0 when some episodes terminate.

Is this normal? Or designed for some reason?

failed to open library - ./libdmlab.so

I tried running unreal:train --define headless=osmesa, but got the following error.

Failed to open library! - ./libdmlab.so
dlopen: cannot load any more object with static TLS
Process Process-1:
Traceback (most recent call last):
  File "/part/01/Tmp/lisa/os_v5/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/part/01/Tmp/lisa/os_v5/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/data/lisa/exp/sygnowsj/lab/unreal/environment/lab_environment.py", line 26, in worker
    'height': str(84)
RuntimeError: Failed to connect RL API

Do you know why it happens? (the libdmlab.so library is there in bazel-bin)

Value function replay and pixel control not uses LSTM context - why?

@miyosuda , thank you for your code!
One point is unclear to me:
when calculating value function replay you always resetting LSTM state, thus assuming it is always 'start of the episode'.
In model.py, lines 229-249:

  def _create_vr_network(self):
    # State (Image input)
    self.vr_input = tf.placeholder("float", [None, 84, 84, 3])

    # Last action and reward
    self.vr_last_action_reward_input = tf.placeholder("float", [None, self._action_size+1])

    # VR conv layers
    vr_conv_output = self._base_conv_layers(self.vr_input, reuse=True)

    # pc lastm layers
    vr_initial_lstm_state = self.lstm_cell.zero_state(1, tf.float32)
    # (Initial state is always resetted.)
    
    vr_lstm_outputs, _ = self._base_lstm_layer(vr_conv_output,
                                               self.vr_last_action_reward_input,
                                               vr_initial_lstm_state,
                                               reuse=True)
    # value output
    self.vr_v  = self._base_value_layer(vr_lstm_outputs, reuse=True)

Isn't it be correct to store context LSTM state in replay buffer frame and pass it along with state and action-reward inputs?
Added: Same question applies to 'pixel control' estimation.

Clarification on flag grad_norm_clip

When I try to run this codebase, the flag --grad_norm_clip =40.0 throws an error that 'Non-boolean argument to boolean flag' Could you clarify if this should be 40.0 or some bool value based on if we want clipped gradients or not?

Thanks

Feature Control

The paper lists feature control as an auxiliary task. Quoting the paper,

We train agents that learn a separate policy for maximally activating each of the units in a specific hidden layer. We refer to these tasks as feature control.

The paper also does not provide any supplementary details as mentioned in the appendix to show how these can be implemented.

Has this auxiliary task been implemented? I cannot find it in the code.

Also Mnih mentions on OpenReview that

We have updated the paper with improved results for feature control. We used a target network to make the features being controlled change less frequently during training. Feature control now works roughly as well as pixel control.

always display "Map loaded: 'nav_maze_static_01"

Hello, there always display "Map loaded: 'nav_maze_static_01", like this:

Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'
Map loaded: 'nav_maze_static_01'

so I want to know is it working well?

Performance across multiple runs

@miyosuda Have you tried multiple runs of an experiment ?
the as it is code, when I try multiple runs, the variance is huge across runs . I was wondering if you have any insights about this ? Thanks a lot for your inputs

Builds complete successfully, but no results are shown.

I have followed the instructions and builds is done successfully. However, when I try to display the results, a black empty window opens for a couple of seconds and then closes and then terminal stands idle at the end of the following message body:

INFO: Analysed target //unreal:display (0 packages loaded).
INFO: Found 1 target...
Target //unreal:display up-to-date:
bazel-bin/unreal/display
INFO: Elapsed time: 0.237s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/unreal/display --env_type lab --env_name nav_maze_static_01 --use_pixel_change True --use_value_replay True --use_reward_pINFO: Build completed successfully, 1 total action
/usr/local/lib/python2.7/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.24.1) or chardet (2.3.0) doesn't match a supported version!
RequestsDependencyWarning)
/usr/local/lib/python2.7/dist-packages/requests/init.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
warnings.warn(warning, RequestsDependencyWarning)
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
2019-01-02 13:05:30.170104: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Failed to get file attribute.
WARNING:tensorflow:From /home/behroozmb/lab/unreal/model/model.py:55: init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
WARNING:tensorflow:From /home/behroozmb/lab/unreal/model/model.py:216: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/behroozmb/lab/unreal/model/model.py:223: calling reduce_max (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Using deprecated observation format: 'RGB_INTERLACED'
Could not find old checkpoint
Traceback (most recent call last):
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 333, in
tf.app.run()
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 314, in main
display.update(sess)
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 108, in update
self.process(sess)
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 250, in process
last_action_reward)
File "/home/behroozmb/lab/unreal/model/model.py", line 374, in run_base_policy_value_pc_q
self.base_initial_lstm_state1 : self.base_lstm_state_out[1]} )
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in do_run
run_metadata)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value net
-1/base_policy/W_base_fc_p
[[node net
-1/base_policy/W_base_fc_p/read (defined at /home/behroozmb/lab/unreal/model/model.py:441) = IdentityT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'net_-1/base_policy/W_base_fc_p/read', defined at:
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 333, in
tf.app.run()
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 287, in main
display = Display(display_size)
File "/home/behroozmb/.cache/bazel/_bazel_behroozmb/bb2529cc1fa3397d81b46cf47dc2b994/execroot/org_deepmind_lab/bazel-out/k8-fastbuild/bin/unreal/display.runfiles/org_deepmind_lab/unreal/display.py", line 99, in init
for_display=True)
File "/home/behroozmb/lab/unreal/model/model.py", line 49, in init
self._create_network(for_display)
File "/home/behroozmb/lab/unreal/model/model.py", line 58, in _create_network
self._create_base_network()
File "/home/behroozmb/lab/unreal/model/model.py", line 101, in _create_base_network
self.base_pi = self._base_policy_layer(self.base_lstm_outputs) # policy output
File "/home/behroozmb/lab/unreal/model/model.py", line 152, in _base_policy_layer
W_fc_p, b_fc_p = self._fc_variable([256, self._action_size], "base_fc_p")
File "/home/behroozmb/lab/unreal/model/model.py", line 441, in _fc_variable
weight = tf.get_variable(name_w, weight_shape, initializer=fc_initializer(input_channels))
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1487, in get_variable
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1237, in get_variable
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 540, in get_variable
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 492, in _true_getter
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 922, in _get_single_variable
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 183, in call
return cls._variable_v1_call(*args, **kwargs)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 146, in _variable_v1_call
aggregation=aggregation)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 125, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2444, in default_variable_creator
expected_shape=expected_shape, import_scope=import_scope)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 187, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 1329, in init
constraint=constraint)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 1491, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 81, in identity
return gen_array_ops.identity(input, name=name)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3454, in identity
"Identity", input=input, name=name)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/behroozmb/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value net_-1/base_policy/W_base_fc_p
[[node net_-1/base_policy/W_base_fc_p/read (defined at /home/behroozmb/lab/unreal/model/model.py:441) = IdentityT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

^CProcess Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/behroozmb/lab/unreal/environment/lab_environment.py", line 29, in worker
command, arg = conn.recv()
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 325, in _exit_function
p.join()
File "/usr/lib/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 325, in _exit_function
p.join()
File "/usr/lib/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)

Any idea how can I solve this problem. Is the main root of problem the build process or the display process.

Thanks

lstm only have one cell?

The paper said "all agent used an LSTM with 256 cells",however, this code seems just use one cell with 256 units.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.