kautenja / playing-mario-with-deep-reinforcement-learning Goto Github PK
View Code? Open in Web Editor NEWAn implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
License: MIT License
An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
License: MIT License
A custom class for downsampling would be nice for some game specific downsampling (color overrides)
when the episode ends it appears >
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 6, in
from src.cli import main
File "./src/cli.py", line 3, in
from .train import train
File "./src/train.py", line 7, in
from .setup_env import setup_env
File "./src/setup_env.py", line 3, in
from nes_py.wrappers import JoypadSpace, wrap as nes_py_wrap
ImportError: cannot import name 'wrap'
a way to save and restore the replay queue could be helpful
no papers implement this, but would starting with a low discount and gradually growing it help?
Games with a weaker reward scheme seem to be performing somewhat poorly. Perhaps investigate the info
return from step to evaluate if we can reward agents based on number of lives left. Games such as Breakout and SpaceInvaders provide no clear incentive to prevent death unless the agent learns to correlate each death with the eventual loss of the game (and thus future rewards). A possible case is that the agent predicts a higher future reward from dying (no penalty for death currently) as it will be able to collect more rewards in the next life in certain states.
results on Pong seem to indicate that the experience replay functionality is not working correctly. Performance is terrible, and the agent is far worse than its vanilla alternative. The paper introducing this technique will need reviewed to locate the source of the learning error. A code review is necessary to understand performance limitations and improve the runtime.
headless servers can't render to the standard GPU, add an option to override the render mode to the Agents so they can be easily run on clusters / servers.
define an "epoch" for training this beast and implement it into the JupyterCallback
there are placeholder packages that will likely be used, but aren't yet. Ensure that these are removed and cleaned up when the project reaches a more terminal state.
rgb_array
as a render mode adds overhead and effectively does nothing at all. This option should either be removed or integrated in a meaningful fashion. If the latter, a None
option should be included to improve performance by not rendering at all. Environments don't need to have render called in order to function in the background.
which games should be included?
so far:
potential
train.py works for Tetris-v0 but
throws
_self._did_step(done)
TypeError: _did_step() takes 1 positional argument but 2 were given.
I think there is issue with how the environment is wrapped for Mario. I will try to look deeper and fix the issue but help is appreciated.
replace learning_rate
with optimizer
the current behavior is to penalize the end of an episode to encourage the agent to prolong episodes (games). In cases where a game is never "solved" -- i.e. it can be played indefinitely -- this surely makes sense. However, in the case of Pong, the game is solved when either adversary achieves 20 points. if the agents wins, it is currently penalized. Should situations like this be addressed? Or, does this not matter too much in the long run?
current implementation is not guaranteed to be the most optimal. perhaps refactor.
gym has a time limit built in https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py
@Kautenja - Firstly, great to see your repositories. Reflects the amount of time and efforts invested.
I have followed your documentation and have trained a deep_q_agent in the 'SuperMarioBros-v0'-env Id. Accordingly, there's a saved file in weights.h5 file in the results directory. But there seems to be some issue in the play.py file (src).
Whenever I try to import weights.h5 using the path as you mention through the following command-
python . -m play -o results/SuperMarioBros-v0/DeepQAgent/2018-08-11_12-54/
following error pops up.
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
/home/ameya/.env/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 143, in spec
return self.env_specs[id]
KeyError: 'DeepQAgentNoFrameskip-v10'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 8, in
main()
File "./src/cli.py", line 52, in main
results_dir=args.output
File "./src/play.py", line 58, in play
env = build_atari_environment(env_id, is_validation=True)
File "./src/environment/atari.py", line 43, in build_atari_environment
env = gym.make('{}NoFrameskip-v10'.format(game_name))
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 118, in make
spec = self.spec(id)
File "/home/ameya/.env/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in spec
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: DeepQAgentNoFrameskip-v10
Has this error been encountered previously?
metadata for downsampling is needed for:
there is a \nocite{*}
at the end of the official paper to suppress an error from bibtex about no citations in the boilerplate template. This should be removed before the final copy is printed to ensure that only referenced articles appear in the Reference section of the paper
EDIT: fix spelling
the environment in use in the DeepMind and Uber paper is actually the same. Our environment is different at the moment, we need to update the wrappers to the full DeepMind specification and re run the baselines.
I started training the algorithm for SuperMarioBros. I tried it on Desktop and Laptop with Nvidia GPU and 16 GB of RAM. On both machines, the training scripts exits after the RAM is full.
I want to see the movement of a learned agent.
I tried to run "python . -m play -o results/SuperMarioBros-1-4-v0/DeepQAgent/2018-08-18_18-33" .
But the following error was output.
Using TensorFlow backend.
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./main.py", line 8, in
main()
File "./src/cli.py", line 66, in main
monitor=args.monitor,
File "./src/play.py", line 68, in play
env = setup_env(env_id, monitor_dir)
File "./src/setup_env.py", line 27, in setup_env
env = gym_super_mario_bros.make(env_id)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 167, in make
return registry.make(id)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 119, in make
env = spec.make()
File "/home/miyajima/.env/lib/python3.5/site-packages/gym/envs/registration.py", line 86, in make
env = cls(**self._kwargs)
File "/home/miyajima/.env/lib/python3.5/site-packages/gym_super_mario_bros/smb_env.py", line 41, in init
max_episode_steps=max_episode_steps,
TypeError: init() got an unexpected keyword argument 'frames_per_step'
How can I solve it?
Versions:
gym==0.10.9
gym-super-mario-bros==6.0.1
nes-py==8.0.2
Sometimes the script dddqn_train.py
is killed by Ubuntu. Not sure if this is an issue causes by memory limitations? There should be plenty of memory for this setup, but perhaps Ubuntu kills this process for some reason. The other alternative is some sparse edge case between the Python and Lua script that is hard to reproduce
Lua thread bombed out: ...ckages/gym_super_mario_bros/lua/super-mario-bros.lua:12: bad argument #1 to 'find' (string expected, got nil)
Emulation speed 100.0%
[1] 1819 killed python3 dddqn_train.py SuperMarioBrosNaFrameskip results
oddly, the command doesn't match what was actually issued. This is a peculiar bug.
Do you have results on the performance of the RL algorithms for this environment that I can look at without having to train an agent on my own?
(venv) pratikku@pratikku-mac playing-mario-with-deep-reinforcement-learning master*$ python . -h
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./__main__.py", line 6, in <module>
from src.cli import main
File "./src/cli.py", line 3, in <module>
from .train import train
File "./src/train.py", line 7, in <module>
from .setup_env import setup_env
File "./src/setup_env.py", line 3, in <module>
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv, wrap as nes_py_wrap
ImportError: cannot import name 'wrap' from 'nes_py.wrappers' (/Users/pratikku/Documents/cs221-project/venv/lib/python3.7/site-packages/nes_py/wrappers/__init__.py)
Versions:
gym-pull 0.1.7
gym-retro 0.7.0
gym-super-mario-bros 7.2.1
gym-tetris 2.2.2
nes-py 7.0.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.