Comments (6)
Hello @mrgloom ,
I set OBSERVE steps so big just for demo purpose 😄
If you are trying to reproduce the model,
I've added a section about that.
from deeplearningflappybird.
Hi @mrgloom ,
If above comments have answered your question, would you please close this issue?
Thanks!
from deeplearningflappybird.
I'm still not sure how number of OBSERVE timesteps estimated, it's just arbitary number BATCH < OBSERVE < REPLAY_MEMORY ?
Also what if I can't do all 3000000 at one time, how training can be continued? Just set OBSERVE to same value, load CNN weights, and set EXPLORE = (3000000 - steps_already_trained) ?
from deeplearningflappybird.
Hello @mrgloom
- arbitary number BATCH < OBSERVE <= REPLAY_MEMORY
However, I set it according to the reference paper and empirical result.
- Yes
from deeplearningflappybird.
Also is there something special about OBSERVE state, for example should bird pass through a pipe at least once during this state or it's not necessary?
Or OBSERVE state just used to init replay memory?
Also I run 2 training cases (about 150000 timesteps) one with recommended parameters and another with no EXPLORE state at all (I set FINAL_EPSILON and INITIAL_EPSILON to 0)
I found that without EXPLORE state it also learn to play fine, but my intuition about this that it will choose more long routes trying to maximize score and this will lead to more risky playing, and with random actions at each timestep with small probability model learn to play more safely(so it's some kind of regularization?).
What my intuition can't understand is that how model learns to play game if during OBSERVE state bird do not pass any pipes.
from deeplearningflappybird.
OBSERVE is only used to fill in the replay memory.
Regarding why it still works without EXPLORE state, I think it's because this network is an overkill for this game.
from deeplearningflappybird.
Related Issues (20)
- why my project can't run? HOT 2
- How long does it take if I train the network in CPU? HOT 3
- Setting the Difficulty Level of the Game HOT 2
- Can't reproduce. Is the reward and penalty rule right?
- Another AI flappy bird using genetic programming (evolutionary computation) HOT 1
- reading file issue HOT 1
- #53
- but way you use the same value on INITIAL_EPSILON and FINAL_EPSILON
- How do I reproduce the training process and how long it will take. Is it OK just not to load the training result model? HOT 1
- DTphotoMobile
- The final loss gradient is 1D but network output is (1,2). How is the gradient propagated ?
- training HOT 2
- Synthax Error
- Flappy bird
- Flappy bird
- AttributeError HOT 2
- TheMark.py
- Cannot run deep_q_network.py
- I don't understand
- help me
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplearningflappybird.