takuseno / d4rl-atari Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 14.0 63 KB

Datasets for data-driven deep reinforcement learning with Atari (wrapper for datasets released by Google)

License: MIT License

Python 98.57% Shell 1.43%

data-driven-reinforcement-learning dataset deep-reinforcement-learning

d4rl-atari's Introduction

📖 About me

💻 Research Scientist @ Sony Research (2020/10/1 - Present)
🎓 Ph.D @ Keio University (2023)
🔥 IPA MITOU super creator (2020)
⌨️ Vimmer (a whole time)
👀 Visit here for more information

🚀 GitHub Projects

As an owner

As a contributor

d4rl-atari's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger aubret dz9 staminatang ikeyasu takuyamagata ianwangg vermouth1992 msc5 daniellawson9999 saminyeasar standardgalactic 5l1v3r1 shenjiede

d4rl-atari's Issues

Seed not working for AtariEnv

First of all: Thank you for writing this nice wrapper! :)

I wanted to use the AtariEnv for evaluation and I realized that the seed method is not working.
I am not sure whether this is a bug or should be that way

from d4rl_atari.envs import AtariEnv
import numpy as np 

env0 = AtariEnv("Pong", stack=True)
env0.seed(0)

env1 = AtariEnv("Pong", stack=True)
env1.seed(0)

env0.reset()
env1.reset()


for i in range(10):
    a = env0.action_space.sample()
    s0, *_ = env0.step(a)
    s1, *_ = env1.step(a)

    np.equal(np.array(s0), np.array(s1)).all()

The following code works:

from d4rl_atari.envs import AtariEnv
import numpy as np 

env0 = AtariEnv("Pong", stack=True)
env0._env.seed(0)

env1 = AtariEnv("Pong", stack=True)
env1._env.seed(0)

env0.reset()
env1.reset()


for i in range(10):
    a = env0.action_space.sample()
    s0, *_ = env0.step(a)
    s1, *_ = env1.step(a)

    np.equal(np.array(s0), np.array(s1)).all()

different observation with/without `stack=True`

When I set stack = True/False for the same environment and get the first observation&reward&action:
for stack case, the first observation is dataset_s['observations'][0][0,:])
for unstack case, the first observation is dataset['observations'][0,:]).

The question is, in both cases, the reward list and the action list is same. But the observation list in stack/unstack cases are different. I attached the first observation in stack/unstack case. I wonder what the reason is? Could you please explain it? Thanks in advance.

stacked case

unstacked case

here's the code:

import gym
import d4rl_atari
import pickle
import numpy as np
import matplotlib.pyplot as plt

def test_stack():
    env_s = gym.make('ms-pacman-expert-v0', stack=True) # -v{0, 1, 2, 3, 4} for datasets with the other random seeds
    env_s.reset()
    dataset_s = env_s.get_dataset()
    ob_s = dataset_s['observations'][0]
    # print(len(ob_s)) 1m
    # print(ob_s[0].shape) (4,84,84)
    re_s = dataset_s['rewards']
    # print(re_s.shape) (1m,)

    env = gym.make('ms-pacman-expert-v0', stack=False)
    env.reset()
    dataset = env.get_dataset()
    ob = dataset['observations'][0,:]
    re = dataset['rewards']
    print(np.sum(re != re_s))  # 0, so reward sequence is same
    a_s = dataset_s['actions']
    a = dataset['actions']
    print(np.sum(a_s != a)) #0, so action sequence is same
    o_s = ob_s[0,:]
    plt.imshow(o_s)
    plt.show()
    o = ob[0,:]
    plt.imshow(o)
    plt.show()
    # print(np.sum(o_s != o))

if __name__ == '__main__':
    test_stack()

some games not available?

How can we find the names of all available datasets?

It looks like some games are not available:

d = gym.make('beamrider-mixed-v0')

The output is

Traceback (most recent call last):
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 132, in spec
    return self.env_specs[id]
KeyError: 'beamrider-mixed-v0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 156, in make
    return registry.make(id, **kwargs)
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 100, in make
    spec = self.spec(path)
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 142, in spec
    raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: beamrider-mixed-v0

seed

Is the expert data the real expert?

I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?

env = gym.make('asterix-expert-v0'.format(game), stack=True)\
dataset = env.get_dataset()

# Split trajectories
traj_ends = np.where(dataset['terminals'] == 1)[0]
traj_start_ends = [(0, traj_ends[0])]

for i in range(len(traj_ends) - 2):
    traj_start_ends.append((traj_ends[i], traj_ends[i + 1]))
    
rewards_list = list()
for traj_start, traj_end in traj_start_ends:
    rewards_list.append(np.array(dataset['rewards'][traj_start:traj_end][:,np.newaxis]))

print(np.mean([np.sum(_) for _ in rewards_list]), np.std([np.sum(_) for _ in rewards_list]))

Asking for packages from atari-py

gym.error.NameNotFound: Environment PongNoFrameskip doesn't exist.

i try to use this repo ,and get wrong with following things

Traceback (most recent call last):
File "D:\pythoncode\test01\test.py", line 4, in
env = gym.make('pong-mixed-v4') # -v{0, 1, 2, 3, 4} for datasets with the other random seeds
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 640, in make
env = env_creator(**_kwargs)
File "D:\ide\conda\envs\testenv\lib\site-packages\d4rl_atari\envs.py", line 59, in init
AtariEnv.init(self, game=game, **kwargs)
File "D:\ide\conda\envs\testenv\lib\site-packages\d4rl_atari\envs.py", line 27, in init
env = AtariPreprocessing(gym.make(env_id),
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 569, in make
_check_version_exists(ns, name, version)
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 219, in _check_version_exists
_check_name_exists(ns, name)
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 198, in _check_name_exists
f"Environment {name} doesn't exist{namespace_msg}. {suggestion_msg}"
gym.error.NameNotFound: Environment BreakoutNoFrameskip doesn't exist.

Process finished with exit code 1

my python version is python==3.7 and here is my package:

Package Version

aiohttp 3.8.4
aiosignal 1.3.1
argcomplete 3.0.5
async-timeout 4.0.2
asynctest 0.13.0
atari-py 0.2.6
attrs 22.2.0
boto 2.49.0
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
cloudpickle 2.2.1
crcmod 1.7
cryptography 40.0.2
d4rl-atari 0.1
fasteners 0.18
frozenlist 1.3.3
gcs-oauth2-boto-plugin 3.0
google-apitools 0.5.32
google-auth 2.17.3
google-reauth 0.1.1
gsutil 5.23
gym 0.26.2
gym-notices 0.0.8
httplib2 0.20.4
idna 3.4
importlib-metadata 5.2.0
monotonic 1.6
multidict 6.0.4
numpy 1.21.6
oauth2client 4.1.3
opencv-python 4.7.0.72
pip 22.3.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pyOpenSSL 23.1.1
pyparsing 3.0.9
pyu2f 0.1.5
requests 2.28.2
retry-decorator 1.1.1
rsa 4.7.2
setuptools 65.6.3
six 1.16.0
typing_extensions 4.5.0
urllib3 1.26.15
wheel 0.38.4
wincertstore 0.2
yarl 1.8.2
zipp 3.15.0
so i want to seak someone`s help.

why using NoFrameskip environment?

In the document of the DQN Replay Dataset, it says:

"Note that the dataset consists of approximately 50 million tuples due to frame skipping (i.e., repeating a selected action for k consecutive frames) of 4"

but I notics that the online environment is built with NoFrameskip. Is there any reason for this?

Loading datasets

Hi, I found that even I have downloaded all the datasets of 50 epochs, d3rlpy still only loads the data of index 1 and epoch 1. How can I make it load all the data of 50 epochs? Or how can I specify the data it loads?

subsample used? 200 m to 1 m frames

Based on the Google's post, it use 200m frames. I wonder what kind of subsample methods are used to keep only 1m frames? Thank you.

How to get observation_next

How do I get the next observations? Can I assume that the dataset is ordered? If so the following should work, right?

observation = dataset['observations'][i]
observation_next = dataset['observations'][i + 1]

Can the data be split into trajectories?

I am currently attempting to divide the data into trajectories for my replication of decision transformer is there any way i can do this?

Thanks a lot

UnregisteredEnv: No registered env with id: BreakoutNoFrameskip-v4

I get environment not registered error for this block of code. I installed d3rlpy first and then installed d4rl-atari. Is there something that I missed during installation? Thanks in advance.

import gym
import d4rl_atari

env = gym.make('breakout-mixed-v0') # -v{0, 1, 2, 3, 4} for datasets with the other random seeds