hardmaru / worldmodelsexperiments Goto Github PK

World Models Experiments

Python 18.50% Shell 0.05% Jupyter Notebook 81.45%

worldmodelsexperiments's Introduction

World Models Experiments

Step by step instructions of reproducing World Models (pdf).

Update (May 26, 2020): If you are looking for a recent implementation that uses a more updated framework, refer to this implementation of World Models using TensorFlow 2.2 by @zacwellmer that reproduces all the experiments in our paper in a Docker container.

Please see blog post for step-by-step instructions.

Note regarding OpenAI Gym Version

Please note the library versions in the blog post. In particular, the experiments work on gym 0.9.x and does NOT work on gym 0.10.x. You can install the older version of gym using the command pip install gym==0.9.4, pip install numpy==1.13.3 etc.

Citation

If you find this project useful in an academic setting, please cite:

@incollection{ha2018worldmodels,
  title = {Recurrent World Models Facilitate Policy Evolution},
  author = {Ha, David and Schmidhuber, J{\"u}rgen},
  booktitle = {Advances in Neural Information Processing Systems 31},
  pages = {2451--2463},
  year = {2018},
  publisher = {Curran Associates, Inc.},
  url = {https://papers.nips.cc/paper/7512-recurrent-world-models-facilitate-policy-evolution},
  note = "\url{https://worldmodels.github.io}",
}

Issues

For general discussion about the World Model article, there are already some good discussion threads here in the GitHub issues page of the interactive article. Please raise issues about this specific implementation in the issues page of this repo.

Licence

MIT

worldmodelsexperiments's People

Stargazers

Watchers

Forkers

iamrishab mehrdad-shokri 174high sakusss wellbeing18 collector-m rgilman33 bleyddyn zxsted kaiolae torstenk91 lurium davidadsp nolisten benzei hcch0912 noonkum joydosun huangatlas pmerwah zb14zb14 nke001 jperl alibaheri codeaudit esmaeilinia wangyang59 apsdehal-archives kessler-frost naitianzheng ezelikman wadekarg zumbalamambo klekkala daftpwner rustleman gargahcerk davidsandberg afcarl xellthethird dunaifuentes thapaasanjay philosopherchef josedmendo jyscardioid xiaoschannel chazzz dosssman hal2001 illhyhl1111 cjg429 diyano harini-kannan alexanderimanicowenrivers bardas mathematicalmodels raymondkroon yooceii beeperman denis-xiao frankroeder jadentravnik shadek07 wangze3 staminatang lidongyv cross32768 haochihlin eric-yyjau anthonyhu berenmillidge wh-forker thinhlx1993 saminyeasar batermj ahavenoname hzheng40 lucaslingle morrisonmong stjordanis habibzadeh rish-16 grandsmile qinaigonghe asolano kongaloosh garibarba redbudthu fanyuzeng lukemshannonhill ctallec gabormihucz wwxfromtju jordy-kieto william0523 pokaxpoka chaochaolu crosstuck dhrn harrygcoppock

worldmodelsexperiments's Issues

Gray screen on running dream_model.py

python3 model.py render log/carracing.cma.16.64.best.json executes without errors, but for some reason the car wanders slowly and aimlessly. I've tracked it down to python3 dream_model.py log/carracing.cma.16.64.best.json generating a grey screen. My theory is that the two errors are correlated, but I'm a little lost as to what would cause the grey screen.

The default box2d install gives an error related to a swig dependency mismatch on runtime, so I cloned the box2d repo and ran setup.py clean/build/install.

Code run: python3 dream_model.py log/carracing.cma.16.64.best.json

Expected output: Virtual data generated by model rendered to screen in popup

Actual output: Popup with 'dream_model.py' title but gray image

Stack:

$lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.10
Release:	18.10
Codename:	cosmic
$ python3 --version
Python 3.6.7
$ pip3 show gym
Name: gym
Version: 0.9.2
Summary: The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
Home-page: https://github.com/openai/gym
Author: OpenAI
Author-email: [email protected]
License: UNKNOWN
Location: /home/vincent/.local/lib/python3.6/site-packages
Requires: requests, pyglet, six, numpy
$ pip3 show numpy
Name: numpy
Version: 1.16.0
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: None
License: BSD
Location: /home/vincent/.local/lib/python3.6/site-packages
Requires: 
$ pip3 show tensorflow
$ pip3 show tensorflow-gpu
Name: tensorflow-gpu
Version: 1.12.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /home/vincent/.local/lib/python3.6/site-packages
Requires: astor, tensorboard, grpcio, wheel, keras-applications, absl-py, keras-preprocessing, termcolor, six, protobuf, gast, numpy
$ pip show box2d
$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Box2D
>>> Box2D.__version__
'2.3.2'

Sketchy logic in vae_train.py:create_dataset

I'd submit a pull request, but the logic is likely going to get heavily overwritten by a memory-efficient loader, so I'll post this here and we'll see what happens.

As written (hardmaru): load N episodes and add M*N total images to dateset.
As written (Chazzz): load N episodes and add a maximum of M images each to dateset.

--- a/vae_train.py
+++ b/vae_train.py
@@ -47,19 +47,50 @@ def create_dataset(filelist, N=10000, M=1000): # N is 10000 episodes, M is 1000 num
   for i in range(N):
     filename = filelist[i]
     raw_data = np.load(os.path.join("record", filename))['obs']
-    l = len(raw_data)
+    l = min(len(raw_data), M)
-    if (idx+l) > (M*N):
-      data = data[0:idx]
-      print('premature break')
-      break
-    data[idx:idx+l] = raw_data
+    data[idx:idx+l] = raw_data[0:l]
     idx += l
     if ((i+1) % 100 == 0):
       print("loading file", i+1)
+
+  if len(data) == M*N and idx < M*N:
+    data = data[:idx]
   return data

MDN RNN Loss

Hi,

I wonder how is the loss curve for you during training of Doom MDN RNN.

For me the negative log likelihood of next latent state variable prediction is staturated at around 1.0, which means the likelihood is around exp(-1) = 0.36, a low probability. I am replementing this paper, the training of VAE and ES-Controller has confirmed no problem. Just the gap between real and dream world is too large for the controller, seen in below figure. I suspect the MDN RNN was not trained well.
May I have your opinion?

Ask for advice

Hi, thanks for your work. I have one problem to ask for your advice.
In these two experiments, the goal is included in the original image. But what if it is not included? I try to use World Model to solve the robot navigation task. The state is {image, relative_position}, in which image tells the robot to avoid obstacles and relative_position tells where the goal is. I use World Models to condense the captured image and MDN-RNN to remember the environment. Then outputs of VAE and MDN-RNN are concatenated with relative_position as controller's input. But the it can't work, then I try different structures of controller(I use DNN as controller), it still doesn't work. Any advice?
Details: The image is 64X48X3 and the latent dim of VAE is 32. The training result is good:

The units in RNN is 256, the training result is also good:

About the new version

Hi, Thanks for your work and I find some files were updated, such as 03_generate_rnn_data.py and 04_train_rnn.py . Is it because the previous file has an error? (Because I ever used the files that has not updated to train my agent.) And I read the updated code and find they are almost the same except that RNN can predict reward now. Is it right?

Memory leak and potential fix

Hi @hardmaru!
Thanks for a great repo and a really cool project!!
I was running the 'carracing' experiment but experienced memory problems. This was caused by new tensorflow assign operations being created every time set_model_params (in rnn.py and vae.py) is called.
I implemented a fix where the assign operations were created when the graph was built and used when setting the parameters.
If you are interested in a fix I can make a PR, or you simply just copy and paste the fix from here (look at the diffs in rnn.py and vae.py).

tf_vae.json empty after running vae_train.py

Greetings,

I am trying to reproduce the experiment on a DGX station I currently have access to, and the fist two steps looks alright, but the result of the command:

$ python vae_train.py
...
step 298000 35.82913 3.7688284 32.0603
step 298500 34.947067 2.9355032 32.011562
step 299000 35.83263 3.8249977 32.007633
step 299500 36.45114 4.418231 32.03291
step 300000 35.098816 3.0974069 32.001408
step 300500 35.483387 3.4664068 32.01698
step 301000 35.43274 3.4285662 32.004173

is an empty array:

$ cat tf_vae/vae.json 
[]

According to the documentation the model should be saved on that file, so any hint about where to look for the problem is appreciated.

Thanks,

Alfredo

PS: I am using the following Dockerfile to recreate the environment in the paper, in case in might be relevant:

FROM tensorflow/tensorflow:1.8.0-gpu-py3

# gym-doom requirements
RUN apt-get update && apt-get install -y --no-install-recommends \
        cmake \
        zlib1g-dev \
        libjpeg-dev \
        libboost-all-dev \
        gcc \
        libsdl2-dev \
        wget \
        unzip \
        python3-tk \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# make python3 the default
RUN update-alternatives --remove python /usr/bin/python2 && \
    update-alternatives --install /usr/bin/python python /usr/bin/python3 10

# NOTE overriding numpy version to match the paper's
# NOTE numpy==1.13.3 gives an error importing vizdoom
RUN pip install --upgrade pip && \
    pip install --no-cache-dir --user --upgrade \
        gym==0.9.4 \
        ppaquette-gym-doom==0.0.6 \
        cma==2.2.0  \
        mpi4py==2.0.0

ENTRYPOINT ["/bin/bash"]

the training of CMA-ES shows high average reward but when you just check the model against the log the rewards are practically zero

Hi @hardmaru

Thanks for posting this repo. i have a strange issue I see a very promising curve for the training of my CMA-ES model however i cannot replicate the results when i execute the following command.

python3.5 model.py log/filewiththe best stats.json

I am using a custom environment.

I also wish to ask you something about the number of processors for the training of the CMA-ES model. I used 16 processors and also 48 processors (couldn't use 64 processors as then i run out of memory). Do you think reducing the number of processors for training of the CMA-ES model will have some adverse effect?

Kindly advise.
Rohit

ImportError: No module named 'doom_py'

when run "python model.py doomreal render log/doomrnn.cma.16.64.best.json"

Questions about the doom's action space

We know that the doom's action space is discrete(3).

Have you tried using discrete actions instead of transferring them to continuous action？

Will one single continuous action be more beneficial/efficient in the evolution strategy ?

cannot reshape array of size 51529 into shape (227,227,3)

import matplotlib.pyplot as plt

test_data = process_test_data()
test_data = np.load('test_data.npy')

fig=plt.figure()

for num,data in enumerate(test_data[:12]):

img_num = data[1]
img_data = data[0].reshape(-1,IMG_SIZE,IMG_SIZE,3)

y = fig.add_subplot(3,4,num+1)
orig = img_data
data = img_data
model_out = model.predict([data])[0]

if np.argmax(model_out) == 1: str_label='Dog'
else: str_label='Cat'
    
y.imshow(orig,cmap='gray')
plt.title(str_label)
y.axes.get_xaxis().set_visible(False)
y.axes.get_yaxis().set_visible(False)

plt.show()

Can not install doom-py, is thers something to be done except the install instructions?

I tried several times to install doom-py, including 'pip3 install doom-py', install from the source code, 'pip3 install ppaquette-gym-doom', it always failled. Is there something else I can do?

Train RNN loss goes to NaN

Hi, I am trying to replicate your result in pytorch.
I face the problem that loss goes to NaN when training RNN, it's like loss first drop from 2.x to 1.0x then suddenly became NaN.
I wonder if you also have faced this problem before as well and sovled, since I notice in your doomrnn.py file, there is an epsilon unused (although in VAE part). I tried to add epsilon to a larger value to Adam optimizer, this will solve the problem, however the loss will drop much more slower.
Is there any suggestion from you?
Bin.

question about temperature adjustment in MDN-RNN

Porting to other gym environments

I wanted to ask that in order to port this to other gym environments, where exactly should I make changes. Like, what should I modify in the doomreal.py file (this is where I believe most of the changes are required. Other files only require a bit of refactoring).

t1

full episodes in extracting date

What's the reasoning behind using full 1000-frame episodes even when the game fails before 1000 steps? It seems to result in a lot of useless images (the ones after failure) in training.

render_mode=False is useless. Fake option not to render

render_mode = False means there is 'rgb_array' instead of 'human'. Somehow it still renders.

Takes more than 12 hours for training VAE Model

I've used a 24 vCPU 220 GB RAM and 200 GB hard drive with 4 P100 GPUs in order to run gpu_bash.bash process. Even after training the vae model for more than 12 hours, the training doesn't end and the steps completed are 37500 with loss at 37.4, recon_loss at 5.4, kl_loss at 32. I removed the line os.environ["CUDA_VISIBLE_DEVICES"]="0" in vae_train.py so that multiple GPUs be used.

python train.py gives a CalledProcessError

When I run python train.py on the specified CPU system I get a very long error message ending with,
Traceback (most recent call last): File "train.py", line 450, in <module> if "parent" == mpi_fork(args.num_worker+1): os.exit() File "train.py", line 424, in mpi_fork subprocess.check_call(["mpirun", "-np", str(n), sys.executable] +['-u']+ sys.argv, env=env) File "/home/neptune/anaconda3/lib/python3.5/subprocess.py", line 581, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['mpirun', '-np', '65', '/home/neptune/anaconda3/bin/python', '-u', 'train.py']' returned non-zero exit status 134
I searched for the exit status for mpirun but wasn't able to debug the issue.

t1

ValueError: cannot reshape array of size 27648 into shape (1,64,64,3)

when I run the 'python model.py render /log/'
there is a value error I cannot fix that .I use python 3.6.5

Why I can't use only 1 worker while training the controller

While training the controller and putting the num_worker = 1, I've got the following issue : ValueError: number of weights must be >=2, was 1
Can tell me why the number of workers should be more then 2?

Regards,
Antonio

Is antithetic sampling doing anything for CMA-ES?

To me it looks like we are just re-using the same seed twice which would waste compute and effectively cut our population size in half(seen here in train.py)?
It seems like we are only making use of antithetic sampling in OpenES and PEPG.

Am I missing something here?

MemoryError on doomrnn

I am encountering what appears to be an error where the program is using too much memory:

When I load 500 episodes, the program runs fine and VAE gets trained and loss decreases.
When I load 2000 episodes, I get the following:

Traceback (most recent call last):
  File "vae_train.py", line 77, in <module>
    dataset = create_dataset(dataset)
  File "vae_train.py", line 60, in create_dataset
    data = np.zeros((M, 64, 64, 3), dtype=np.uint8)
MemoryError

The repo uses 10k episodes, but I cannot load even 2k on my 16GB machine. Am I missing something? If my memory really is the issue here, what amount of memory is necessary to replicate the paper with the codes here?

Question about the VAE's KL-loss

Hi!
I'm trying to reproduce the doom example in Keras, and was curious about the KL-loss calculation of the VAE, specifically the parameter kl_tolerance. As far as I understand it limits the KL-loss from ever going under 32. What is the purpose of this? What effect would it have to remove this tolerance?
Thanks, and thanks for a very well written paper!
-Kai

MemoryError in vae_train.py

Running python vae_train.py prompts a memory error on my system. I felt bad about this, but after running the numbers, vae_train.py needs to allocate ~125 GB of memory to this array!

>>> import numpy as np
>>> M = 1000
>>> N = 10000
>>> data = np.zeros((M*N, 64, 64, 3), dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Out of memory when running extract.bash due to multiple extract.py using DoomTakeCoverWrapper

I am able to run a single instance of extract.py. But when i run extract.bash, it causes an out of memory error.

2019-01-17 03:20:10.196729: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 9.98G (10713064960 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
...
(this goes on for a while)
...
2019-01-17 03:20:10.208612: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 534.69M (560665856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

My best guess, after having a look at the code, is it runs multiple workers to make data generation faster, which creates multiple tensorflow instances because extract.py is using DoomTakeCoverWrapper rather than DoomTakeCoverEnv.

From the perspective of replicating paper itself, this does not seem to make a lot of sense since a model is not needed in data generation since it is using the pretraining scheme.

Although I can see, since you are planning to switch to the iterative scheme, which would use a trained model to get better samples for the next iteration, this would be useful.

Adding a "export CUDA_VISIBLE_DEVICES=""" would prevent this from happening without removing the potential to upgrade to an iterative scheme later. Should I make a pull request?

'CarRacingWrapper' object has no attribute '_render'

Hello! I'm trying to run the trained model:

python model.py render log/carracing.cma.16.64.best.json

The error is:

  File "model.py", line 290, in <module>
    main()
  File "model.py", line 280, in main
    train_mode=False, render_mode=render_mode, num_episode=1)
  File "model.py", line 175, in simulate
    obs = model.env.reset()
  File "/Users/keithgould/miniconda3/envs/py34/lib/python3.4/site-packages/gym/envs/box2d/car_racing.py", line 292, in reset
    return self.step(None)[0]
  File "/Users/keithgould/miniconda3/envs/py34/lib/python3.4/site-packages/gym/envs/box2d/car_racing.py", line 304, in step
    self.state = self._render("state_pixels")
AttributeError: 'CarRacingWrapper' object has no attribute '_render'

I suspect the car_racing.py module is just fine, and there is something in my environment that is not right (details below). I'm just not sure.

As an added piece of info, I was able to run the car racing environment for manual control via:

python env.py

And the above worked just fine.

Environment info:

I'm on a MacBook Pro, 10.13.6.
I'm running an environment via miniconda.

# Name                    Version                   Build  Channel
absl-py                   0.5.0                     <pip>
astor                     0.7.1                     <pip>
ca-certificates           2018.03.07                    0    anaconda
certifi                   2018.8.24                 <pip>
chardet                   3.0.4                     <pip>
future                    0.16.0                    <pip>
gast                      0.2.0                     <pip>
grpcio                    1.15.0                    <pip>
gym                       0.9.7                     <pip>
idna                      2.7                       <pip>
intel-openmp              2019.0                      118    anaconda
Markdown                  2.6.11                    <pip>
mkl                       2017.0.4             h1fae6ae_0    anaconda
numpy                     1.11.3                   py34_0    anaconda
numpy                     1.14.5                    <pip>
openssl                   1.0.2p               h1de35cc_0    anaconda
Pillow                    5.2.0                     <pip>
pip                       9.0.1                    py34_1
protobuf                  3.6.1                     <pip>
pybox2d                   2.3.1post2               py34_0    kne
pyglet                    1.3.2                     <pip>
python                    3.4.5                         0
readline                  6.2                           2
requests                  2.19.1                    <pip>
scipy                     0.18.1              np111py34_1    anaconda
setuptools                27.2.0                   py34_0
six                       1.11.0                    <pip>
sqlite                    3.13.0                        0
tensorboard               1.10.0                    <pip>
tensorflow                1.10.1                    <pip>
termcolor                 1.1.0                     <pip>
tk                        8.5.18                        0
urllib3                   1.23                      <pip>
Werkzeug                  0.14.1                    <pip>
wheel                     0.29.0                   py34_0
xz                        5.2.4                h1de35cc_4
zlib                      1.2.11               hf3cbc9b_2

Thank you for any thoughts!

hardmaru / worldmodelsexperiments Goto Github PK

worldmodelsexperiments's Introduction

World Models Experiments

Note regarding OpenAI Gym Version

Citation

Issues

Licence

worldmodelsexperiments's People

Stargazers

Watchers

Forkers

worldmodelsexperiments's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs