GithubHelp home page GithubHelp logo

duel_ddqn's Introduction

#Duel_DDQN V0.3

based on Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581

run

python duel.py CartPole-v0 --no_display --episodes 200 --replay_start_size 0 --gym_record ../th

results / Params

note - in progress; I branched @tambetm's gist; I've fixed some issues and trying to get it closer to the paper. Pull request / help is welcome!!

**Orginally gist on:- ** https://gist.github.com/tambetm/0bd29b14d76b85946422b79f3a87df70 - https://gym.openai.com/evaluations/eval_sOUmkzSy26GIWJ5IIQeA#reproducibility

duel_ddqn's People

Contributors

rafaelcp avatar sohojoe avatar tambetm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

duel_ddqn's Issues

when network is deep, the training not stable

if i make the --layers 4, or larger( in my opinion, 4 layers is not very deep)
the performance is not stable.
Episode 259 finished after 192 timesteps, episode reward 192.0
Episode 260 finished after 76 timesteps, episode reward 76.0
Episode 261 finished after 127 timesteps, episode reward 127.0
Episode 262 finished after 26 timesteps, episode reward 26.0
Episode 263 finished after 200 timesteps, episode reward 200.0
Episode 264 finished after 200 timesteps, episode reward 200.0
Episode 265 finished after 10 timesteps, episode reward 10.0
Episode 266 finished after 200 timesteps, episode reward 200.0
Episode 267 finished after 200 timesteps, episode reward 200.0
Episode 268 finished after 34 timesteps, episode reward 34.0
Episode 269 finished after 62 timesteps, episode reward 62.0
Episode 270 finished after 113 timesteps, episode reward 113.0
Episode 271 finished after 107 timesteps, episode reward 107.0
Episode 272 finished after 119 timesteps, episode reward 119.0
Episode 273 finished after 115 timesteps, episode reward 115.0
Episode 274 finished after 54 timesteps, episode reward 54.0
Episode 275 finished after 200 timesteps, episode reward 200.0
Episode 276 finished after 170 timesteps, episode reward 170.0
Episode 277 finished after 200 timesteps, episode reward 200.0
Episode 278 finished after 150 timesteps, episode reward 150.0
Episode 279 finished after 13 timesteps, episode reward 13.0
Episode 280 finished after 153 timesteps, episode reward 153.0
Episode 281 finished after 21 timesteps, episode reward 21.0
Episode 282 finished after 94 timesteps, episode reward 94.0

why?

when --layers 1,the training is statble

Episode 218 finished after 200 timesteps, episode reward 200.0
Episode 219 finished after 200 timesteps, episode reward 200.0
Episode 220 finished after 187 timesteps, episode reward 187.0
Episode 221 finished after 200 timesteps, episode reward 200.0
Episode 222 finished after 200 timesteps, episode reward 200.0
Episode 223 finished after 200 timesteps, episode reward 200.0
Episode 224 finished after 200 timesteps, episode reward 200.0
Episode 225 finished after 200 timesteps, episode reward 200.0
Episode 226 finished after 200 timesteps, episode reward 200.0
Episode 227 finished after 200 timesteps, episode reward 200.0
Episode 228 finished after 200 timesteps, episode reward 200.0
Episode 229 finished after 200 timesteps, episode reward 200.0
Episode 230 finished after 200 timesteps, episode reward 200.0
Episode 231 finished after 200 timesteps, episode reward 200.0
Episode 232 finished after 200 timesteps, episode reward 200.0
Episode 233 finished after 200 timesteps, episode reward 200.0
Episode 234 finished after 200 timesteps, episode reward 200.0
Episode 235 finished after 200 timesteps, episode reward 200.0
Episode 236 finished after 200 timesteps, episode reward 200.0
Episode 237 finished after 200 timesteps, episode reward 200.0
Episode 238 finished after 200 timesteps, episode reward 200.0
Episode 239 finished after 200 timesteps, episode reward 200.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.