Comments (4)
@gaurav2695 because in raw q-learning, you should manually define the state value function, which is difficult to come up with a fairly good one. In the example you referred, the author redefined the state to obtain a better value function. We name these sophisticated rules that identify a state is good or bad as features in approximate q-learning, and you should also design features yourself.
However,the story is different in deep q-learning network. All you have to do is feeding the raw pixels of a frame into the neural network, the features that define a state is good or bad will be learned automatically.
from deeplearningflappybird.
@ColdCodeCool Well, actually, you should much rather define what is good and what's bad with NN too.
I'd instead just say that these approaches are both valid and there's also nothing about NNs to prevent you from using THEM to learn to play the game not from pixels, but from quantified features like in https://github.com/chncyhn/flappybird-qlearning-bot . And, of course, quantified approaches are going to work better. It's just that you also have to parse the scene. And a lot of the time this is going to be okay for your application and not too hard for you to make, I'm guessing, and it may work much better depending on your data/environment. Sometimes, of course, like in cases with robot visual navigation and stuff, the best way to solve something does involve convolutionary thinking to make it right, sometimes, like in case if you're actually making the best possible bot for Flappy, I think, you indeed are much better off first getting a good understanding of the game, considering features, parsing scenes, defining proper update rules. That way you can end up with a simple tiny superfast and superaccurate bot. And of course, you don't have to do some Q Learning for that, you can do that with the same backpropagating neural networks.
One algorithm where you probably actually don't need a lot of judgemental support to help the network is if you're using Evolutionary Strategies, but well..
Having a good understanding of the game is always critical if you want to make a performant bot, regardless of your choice of approximation algorithm. For example, @yenchenlin clearly missed the point that if you're doing this kind of CNN solving, you definitely can't use sticks and carrots, only carrots. His approach can easily fall into simply the bird going up all of the time, he obviously needed to tune for that not to happen. I've made a similar implementation of convolutionary NN learning for Flappy to yenchenlin's but I've thought a bit more about what the data and the game actually are and I've got a top score of >1000 so far and that can be trained in a couple hours when Kevin Chen seems to have gotten a bit over 200 tops https://pdfs.semanticscholar.org/b56c/7703337cb9db008422b9b3410c97fff8bb54.pdf and I'm guessing that this repo's network is many many times slower and larger than mine which is <1.5MBs in size without using such huge kernels. And https://github.com/chncyhn/flappybird-qlearning-bot this Q Learning guy you linked had a way better score than I had so far. Though, we may have slightly different versions of the game - I forked https://github.com/shalabhsingh/A3C_Keras_FlappyBird , but I'm guessing that mine is just a cut-down version with less graphics. The pipe sizes and difficulty seems the same.
from deeplearningflappybird.
Posted my immortal bot https://github.com/ibmua/flappy/
from deeplearningflappybird.
The whole point is that you don't want to design specific algorithm or hack for a single game.
from deeplearningflappybird.
Related Issues (20)
- why my project can't run? HOT 2
- How long does it take if I train the network in CPU? HOT 3
- Setting the Difficulty Level of the Game HOT 2
- Can't reproduce. Is the reward and penalty rule right?
- Another AI flappy bird using genetic programming (evolutionary computation) HOT 1
- reading file issue HOT 1
- #53
- but way you use the same value on INITIAL_EPSILON and FINAL_EPSILON
- How do I reproduce the training process and how long it will take. Is it OK just not to load the training result model? HOT 1
- DTphotoMobile
- The final loss gradient is 1D but network output is (1,2). How is the gradient propagated ?
- training HOT 2
- Synthax Error
- Flappy bird
- Flappy bird
- AttributeError HOT 2
- TheMark.py
- Cannot run deep_q_network.py
- I don't understand
- help me
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplearningflappybird.