Thanks for your work and contribution. Yet we encountered reproduction inability when

Reproducibility issue: low rewards in Atari examples about ncps HOT 4 CLOSED

Annihillusion commented on June 9, 2024

Reproducibility issue: low rewards in Atari examples

from ncps.

Comments (4)

lungd commented on June 9, 2024 2

I had the same problem and could solve it by removing the division by 255 (https://github.com/mlech26l/ncps/blob/master/examples/atari_torch.py#L122)
Seems like the observation already get scaled inside some function of one of the dependencies.

from ncps.

mlech26l commented on June 9, 2024

I have updated the Conv layers in the examples (by adding batch norms between the conv layers as suggested here). I get a reward > 40 after a couple of minutes of training (with pytorch behavior cloning)
The RL training probably benefits from these changes as well (+ RL needs to train for 24h or so to collect enough experience)

loss=0.3713: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:43<00:00,  9.04it/s]
Epoch 1, val_loss=0.2822, val_acc=89.86%
Mean return 5.7 (n=10)
loss=0.2131: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 2, val_loss=0.215, val_acc=92.41%
Mean return 20.1 (n=10)
loss=0.1761: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.18it/s]
Epoch 3, val_loss=0.1937, val_acc=93.19%
Mean return 71.8 (n=10)
loss=0.1563: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.17it/s]
Epoch 4, val_loss=0.1684, val_acc=94.15%
Mean return 60.2 (n=10)
loss=0.1426: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.17it/s]
Epoch 5, val_loss=0.154, val_acc=94.53%
Mean return 31.6 (n=10)
loss=0.1311: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 6, val_loss=0.1506, val_acc=94.63%
Mean return 74.9 (n=10)
loss=0.1213: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 7, val_loss=0.1624, val_acc=94.10%
Mean return 65.2 (n=10)
loss=0.1135: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 8, val_loss=0.1443, val_acc=94.89%
Mean return 64.8 (n=10)
loss=0.1055: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.14it/s]
Epoch 9, val_loss=0.1376, val_acc=95.14%
Mean return 64.1 (n=10)
loss=0.09882: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.13it/s]
Epoch 10, val_loss=0.1359, val_acc=95.26%
Mean return 69.6 (n=10)
loss=0.09216: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.16it/s]
Epoch 11, val_loss=0.1377, val_acc=95.25%
Mean return 77.5 (n=10)
loss=0.08617: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 938/938 [01:42<00:00,  9.12it/s]

from ncps.

Annihillusion commented on June 9, 2024

Many thanks for your reply！But my result remains the same using the new code with BN. I suppose there're some version issues in my Python environment. Could you kindly provide your conda environment info and the model's parameters (pytorch behavior cloning, several epochs will suffice). It would be really helpful for my reproduction work.

from ncps.

Annihillusion commented on June 9, 2024

Indeed, I check the observation given by the env and find it already standardized to [0, 1), which do not conform with the description of Gym's documentation([0, 255]). However, this seems only to happen in Python 3.10. Removing observation's division by 255 can perfectly solve the problem (for Python version 3.10). Thanks for your solution!

from ncps.

Reproducibility issue: low rewards in Atari examples about ncps HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs