GithubHelp home page GithubHelp logo

takuseno / d4rl-atari Goto Github PK

View Code? Open in Web Editor NEW
96.0 96.0 14.0 63 KB

Datasets for data-driven deep reinforcement learning with Atari (wrapper for datasets released by Google)

License: MIT License

Python 98.57% Shell 1.43%
data-driven-reinforcement-learning dataset deep-reinforcement-learning

d4rl-atari's Introduction

trophy

Anurag's github stats

๐Ÿ“– About me

  • ๐Ÿ’ป Research Scientist @ Sony Research (2020/10/1 - Present)
  • ๐ŸŽ“ Ph.D @ Keio University (2023)
  • ๐Ÿ”ฅ IPA MITOU super creator (2020)
  • โŒจ๏ธ Vimmer (a whole time)
  • ๐Ÿ‘€ Visit here for more information

๐Ÿš€ GitHub Projects

As an owner

As a contributor

d4rl-atari's People

Contributors

takuseno avatar takuyamagata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

d4rl-atari's Issues

Seed not working for AtariEnv

First of all: Thank you for writing this nice wrapper! :)

I wanted to use the AtariEnv for evaluation and I realized that the seed method is not working.
I am not sure whether this is a bug or should be that way

from d4rl_atari.envs import AtariEnv
import numpy as np 

env0 = AtariEnv("Pong", stack=True)
env0.seed(0)

env1 = AtariEnv("Pong", stack=True)
env1.seed(0)

env0.reset()
env1.reset()


for i in range(10):
    a = env0.action_space.sample()
    s0, *_ = env0.step(a)
    s1, *_ = env1.step(a)

    np.equal(np.array(s0), np.array(s1)).all()

The following code works:

from d4rl_atari.envs import AtariEnv
import numpy as np 

env0 = AtariEnv("Pong", stack=True)
env0._env.seed(0)

env1 = AtariEnv("Pong", stack=True)
env1._env.seed(0)

env0.reset()
env1.reset()


for i in range(10):
    a = env0.action_space.sample()
    s0, *_ = env0.step(a)
    s1, *_ = env1.step(a)

    np.equal(np.array(s0), np.array(s1)).all()

different observation with/without `stack=True`

When I set stack = True/False for the same environment and get the first observation&reward&action:
for stack case, the first observation is dataset_s['observations'][0][0,:])
for unstack case, the first observation is dataset['observations'][0,:]).

The question is, in both cases, the reward list and the action list is same. But the observation list in stack/unstack cases are different. I attached the first observation in stack/unstack case. I wonder what the reason is? Could you please explain it? Thanks in advance.

stacked case

unstacked case

here's the code:

import gym
import d4rl_atari
import pickle
import numpy as np
import matplotlib.pyplot as plt

def test_stack():
    env_s = gym.make('ms-pacman-expert-v0', stack=True) # -v{0, 1, 2, 3, 4} for datasets with the other random seeds
    env_s.reset()
    dataset_s = env_s.get_dataset()
    ob_s = dataset_s['observations'][0]
    # print(len(ob_s)) 1m
    # print(ob_s[0].shape) (4,84,84)
    re_s = dataset_s['rewards']
    # print(re_s.shape) (1m,)

    env = gym.make('ms-pacman-expert-v0', stack=False)
    env.reset()
    dataset = env.get_dataset()
    ob = dataset['observations'][0,:]
    re = dataset['rewards']
    print(np.sum(re != re_s))  # 0, so reward sequence is same
    a_s = dataset_s['actions']
    a = dataset['actions']
    print(np.sum(a_s != a)) #0, so action sequence is same
    o_s = ob_s[0,:]
    plt.imshow(o_s)
    plt.show()
    o = ob[0,:]
    plt.imshow(o)
    plt.show()
    # print(np.sum(o_s != o))

if __name__ == '__main__':
    test_stack()

some games not available?

How can we find the names of all available datasets?

It looks like some games are not available:

d = gym.make('beamrider-mixed-v0')

The output is

Traceback (most recent call last):
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 132, in spec
    return self.env_specs[id]
KeyError: 'beamrider-mixed-v0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 156, in make
    return registry.make(id, **kwargs)
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 100, in make
    spec = self.spec(path)
  File "/home/xx/anaconda3/envs/pomdp/lib/python3.6/site-packages/gym/envs/registration.py", line 142, in spec
    raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: beamrider-mixed-v0

Is the expert data the real expert?

I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?

env = gym.make('asterix-expert-v0'.format(game), stack=True)\
dataset = env.get_dataset()

# Split trajectories
traj_ends = np.where(dataset['terminals'] == 1)[0]
traj_start_ends = [(0, traj_ends[0])]

for i in range(len(traj_ends) - 2):
    traj_start_ends.append((traj_ends[i], traj_ends[i + 1]))
    
rewards_list = list()
for traj_start, traj_end in traj_start_ends:
    rewards_list.append(np.array(dataset['rewards'][traj_start:traj_end][:,np.newaxis]))

print(np.mean([np.sum(_) for _ in rewards_list]), np.std([np.sum(_) for _ in rewards_list]))

gym.error.NameNotFound: Environment PongNoFrameskip doesn't exist.

i try to use this repo ,and get wrong with following things

Traceback (most recent call last):
File "D:\pythoncode\test01\test.py", line 4, in
env = gym.make('pong-mixed-v4') # -v{0, 1, 2, 3, 4} for datasets with the other random seeds
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 640, in make
env = env_creator(**_kwargs)
File "D:\ide\conda\envs\testenv\lib\site-packages\d4rl_atari\envs.py", line 59, in init
AtariEnv.init(self, game=game, **kwargs)
File "D:\ide\conda\envs\testenv\lib\site-packages\d4rl_atari\envs.py", line 27, in init
env = AtariPreprocessing(gym.make(env_id),
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 569, in make
_check_version_exists(ns, name, version)
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 219, in _check_version_exists
_check_name_exists(ns, name)
File "D:\ide\conda\envs\testenv\lib\site-packages\gym\envs\registration.py", line 198, in _check_name_exists
f"Environment {name} doesn't exist{namespace_msg}. {suggestion_msg}"
gym.error.NameNotFound: Environment BreakoutNoFrameskip doesn't exist.

Process finished with exit code 1

my python version is python==3.7 and here is my package:

Package Version


aiohttp 3.8.4
aiosignal 1.3.1
argcomplete 3.0.5
async-timeout 4.0.2
asynctest 0.13.0
atari-py 0.2.6
attrs 22.2.0
boto 2.49.0
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
cloudpickle 2.2.1
crcmod 1.7
cryptography 40.0.2
d4rl-atari 0.1
fasteners 0.18
frozenlist 1.3.3
gcs-oauth2-boto-plugin 3.0
google-apitools 0.5.32
google-auth 2.17.3
google-reauth 0.1.1
gsutil 5.23
gym 0.26.2
gym-notices 0.0.8
httplib2 0.20.4
idna 3.4
importlib-metadata 5.2.0
monotonic 1.6
multidict 6.0.4
numpy 1.21.6
oauth2client 4.1.3
opencv-python 4.7.0.72
pip 22.3.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pyOpenSSL 23.1.1
pyparsing 3.0.9
pyu2f 0.1.5
requests 2.28.2
retry-decorator 1.1.1
rsa 4.7.2
setuptools 65.6.3
six 1.16.0
typing_extensions 4.5.0
urllib3 1.26.15
wheel 0.38.4
wincertstore 0.2
yarl 1.8.2
zipp 3.15.0
so i want to seak someone`s help.

why using NoFrameskip environment?

In the document of the DQN Replay Dataset, it says:

"Note that the dataset consists of approximately 50 million tuples due to frame skipping (i.e., repeating a selected action for k consecutive frames) of 4"

but I notics that the online environment is built with NoFrameskip. Is there any reason for this?

Loading datasets

Hi, I found that even I have downloaded all the datasets of 50 epochs, d3rlpy still only loads the data of index 1 and epoch 1. How can I make it load all the data of 50 epochs? Or how can I specify the data it loads?

How to get observation_next

How do I get the next observations? Can I assume that the dataset is ordered? If so the following should work, right?

observation = dataset['observations'][i]
observation_next = dataset['observations'][i + 1]

UnregisteredEnv: No registered env with id: BreakoutNoFrameskip-v4

I get environment not registered error for this block of code. I installed d3rlpy first and then installed d4rl-atari. Is there something that I missed during installation? Thanks in advance.

import gym
import d4rl_atari

env = gym.make('breakout-mixed-v0') # -v{0, 1, 2, 3, 4} for datasets with the other random seeds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.