An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.

License: MIT License

Python 49.70% Jupyter Notebook 50.30%

atari2600 deep-reinforcement-learning double-dqn dqn dueling-dqn super-mario-bros tetris

playing-mario-with-deep-reinforcement-learning's People

Contributors

Stargazers

Watchers

Forkers

aadeshnpn zxshinxz submitcode wangluo2028 arjunmurthys decoderkurt w10551 adrienalarylim alokproc mscrnt runwuxisilence

playing-mario-with-deep-reinforcement-learning's Issues

Downsampler Class

A custom class for downsampling would be nice for some game specific downsampling (color overrides)

ImportError: cannot import name 'wrap'

when the episode ends it appears >
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 6, in
from src.cli import main
File "./src/cli.py", line 3, in
from .train import train
File "./src/train.py", line 7, in
from .setup_env import setup_env
File "./src/setup_env.py", line 3, in
from nes_py.wrappers import JoypadSpace, wrap as nes_py_wrap
ImportError: cannot import name 'wrap'

Final evaluation over 100 episodes (instead of 30)

Savable Replay Queue

a way to save and restore the replay queue could be helpful

DQN: Optimizer

Adam
Nadam
RMSprop

DQN: discount growth?

no papers implement this, but would starting with a low discount and gradually growing it help?

Reward Schemes

Games with a weaker reward scheme seem to be performing somewhat poorly. Perhaps investigate the info return from step to evaluate if we can reward agents based on number of lives left. Games such as Breakout and SpaceInvaders provide no clear incentive to prevent death unless the agent learns to correlate each death with the eventual loss of the game (and thus future rewards). A possible case is that the agent predicts a higher future reward from dying (no penalty for death currently) as it will be able to collect more rewards in the next life in certain states.

Broken Prioritized Experience Replay

results on Pong seem to indicate that the experience replay functionality is not working correctly. Performance is terrible, and the agent is far worse than its vanilla alternative. The paper introducing this technique will need reviewed to locate the source of the learning error. A code review is necessary to understand performance limitations and improve the runtime.

render_mode in Agent

headless servers can't render to the standard GPU, add an option to override the render mode to the Agents so they can be easily run on clusters / servers.

DQN: epoch

define an "epoch" for training this beast and implement it into the JupyterCallback

Cleanup requirements.txt

there are placeholder packages that will likely be used, but aren't yet. Ensure that these are removed and cleaned up when the project reaches a more terminal state.

Inefficient render_mode usage in Agent

rgb_array as a render mode adds overhead and effectively does nothing at all. This option should either be removed or integrated in a meaningful fashion. If the latter, a None option should be included to improve performance by not rendering at all. Environments don't need to have render called in order to function in the background.

Games to Test

which games should be included?

so far:

breakout
space invaders
pong

potential

seaquest
ms pacman
video pinball
asteroids
doom
mario

Training error

train.py works for Tetris-v0 but
throws

_self._did_step(done)
TypeError: _did_step() takes 1 positional argument but 2 were given.

I think there is issue with how the environment is wrapped for Mario. I will try to look deeper and fix the issue but help is appreciated.

DQN: parameterize optimizer

replace learning_rate with optimizer

Negative Reward for terminal flag

the current behavior is to penalize the end of an episode to encourage the agent to prolong episodes (games). In cases where a game is never "solved" -- i.e. it can be played indefinitely -- this surely makes sense. However, in the case of Pong, the game is solved when either adversary achieves 20 points. if the agents wins, it is currently penalized. Should situations like this be addressed? Or, does this not matter too much in the long run?

Cite Ohio HPC Center

Optimize Replay Queue

current implementation is not guaranteed to be the most optimal. perhaps refactor.

Play Time Limit

gym has a time limit built in https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py

No registered env with id: DeepQAgentNoFrameskip-v10

@Kautenja - Firstly, great to see your repositories. Reflects the amount of time and efforts invested.

I have followed your documentation and have trained a deep_q_agent in the 'SuperMarioBros-v0'-env Id. Accordingly, there's a saved file in weights.h5 file in the results directory. But there seems to be some issue in the play.py file (src).

Whenever I try to import weights.h5 using the path as you mention through the following command-
python . -m play -o results/SuperMarioBros-v0/DeepQAgent/2018-08-11_12-54/
following error pops up.

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
/home/ameya/.env/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 143, in spec
return self.env_specs[id]
KeyError: 'DeepQAgentNoFrameskip-v10'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 8, in
main()
File "./src/cli.py", line 52, in main
results_dir=args.output
File "./src/play.py", line 58, in play
env = build_atari_environment(env_id, is_validation=True)
File "./src/environment/atari.py", line 43, in build_atari_environment
env = gym.make('{}NoFrameskip-v10'.format(game_name))
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 118, in make
spec = self.spec(id)
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in spec
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: DeepQAgentNoFrameskip-v10

Has this error been encountered previously?

DownsampleEnv metadata

metadata for downsampling is needed for:

Enduro
Seaquest
Asteroids

nocite in paper

there is a \nocite{*} at the end of the official paper to suppress an error from bibtex about no citations in the boilerplate template. This should be removed before the final copy is printed to ensure that only referenced articles appear in the Reference section of the paper

EDIT: fix spelling

Environment

the environment in use in the DeepMind and Uber paper is actually the same. Our environment is different at the moment, we need to update the wrappers to the full DeepMind specification and re run the baselines.

Out of memory error during training

I started training the algorithm for SuperMarioBros. I tried it on Desktop and Laptop with Nvidia GPU and 16 GB of RAM. On both machines, the training scripts exits after the RAM is full.

cannot run "python . -m play -o results/SuperMarioBros-1-4-v0/DeepQAgent/2018-08-18_18-33"

I want to see the movement of a learned agent.
I tried to run "python . -m play -o results/SuperMarioBros-1-4-v0/DeepQAgent/2018-08-18_18-33" .
But the following error was output.

Using TensorFlow backend.
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 8, in
main()
File "./src/cli.py", line 66, in main
monitor=args.monitor,
File "./src/play.py", line 68, in play
env = setup_env(env_id, monitor_dir)
File "./src/setup_env.py", line 27, in setup_env
env = gym_super_mario_bros.make(env_id)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 119, in make
env = spec.make()
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 86, in make
env = cls(**self._kwargs)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym_super_mario_bros/smb_env.py", line 41, in init
max_episode_steps=max_episode_steps,
TypeError: init() got an unexpected keyword argument 'frames_per_step'

How can I solve it?

Versions:
gym==0.10.9
gym-super-mario-bros==6.0.1
nes-py==8.0.2

Training Killed by OS

Sometimes the script dddqn_train.py is killed by Ubuntu. Not sure if this is an issue causes by memory limitations? There should be plenty of memory for this setup, but perhaps Ubuntu kills this process for some reason. The other alternative is some sparse edge case between the Python and Lua script that is hard to reproduce

Lua thread bombed out: ...ckages/gym_super_mario_bros/lua/super-mario-bros.lua:12: bad argument #1 to 'find' (string expected, got nil)
Emulation speed 100.0%
[1]    1819 killed    python3 dddqn_train.py SuperMarioBrosNaFrameskip results

oddly, the command doesn't match what was actually issued. This is a peculiar bug.

Do you have results?

Do you have results on the performance of the RL algorithms for this environment that I can look at without having to train an agent on my own?

cannot import name 'wrap' from 'nes_py.wrappers'

(venv) pratikku@pratikku-mac playing-mario-with-deep-reinforcement-learning master*$ python . -h
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "./__main__.py", line 6, in <module>
    from src.cli import main
  File "./src/cli.py", line 3, in <module>
    from .train import train
  File "./src/train.py", line 7, in <module>
    from .setup_env import setup_env
  File "./src/setup_env.py", line 3, in <module>
    from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv, wrap as nes_py_wrap
ImportError: cannot import name 'wrap' from 'nes_py.wrappers' (/Users/pratikku/Documents/cs221-project/venv/lib/python3.7/site-packages/nes_py/wrappers/__init__.py)

Versions:

gym-pull             0.1.7
gym-retro            0.7.0
gym-super-mario-bros 7.2.1
gym-tetris           2.2.2
nes-py               7.0.1

kautenja / playing-mario-with-deep-reinforcement-learning Goto Github PK

playing-mario-with-deep-reinforcement-learning's People

Contributors

Stargazers

Watchers

Forkers

playing-mario-with-deep-reinforcement-learning's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs