GithubHelp home page GithubHelp logo

iishreya / learning-mario-agent Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 184 KB

Learning Mario Agent with the Double Deep Q-Learning Algorithm in the Gym Super Mario Environment.

License: MIT License

Jupyter Notebook 100.00%
convolutional-neural-networks deep-learning densenet mario-game pytorch q-learning reinforcement-learning relu rl super-mario

learning-mario-agent's Introduction

Learning-Mario-Agent

alt text

Preprocess the data:

  1. Applying wrappers to environment:

What are Wrappers?

Wrappers around functions are also knows as decorators which are very powerful and useful tool in Python since it allows programmers to modify the behavior of function or class. Decorators allow us to wrap another function in order to extend the behavior of the wrapped function, without permanently modifying it.

def func2(func):
    def ref():
        print("sentence 1")
        func()
        print("sentence 3")
        
    return ref    
def func():
    print("sentence 2")
    
func = func2(func)
func()

Output: 
sentence 1
sentence 2
sentence 3
1. GrayScaleObservation: Transform an RGB image to grayscale. By doing so reduces the size of the state representation without losing useful information.
2. ResizeObservation: Downsampling each observation into a square image.
3. SkipFrame: Consecutive frames don’t vary much, we can skip n-intermediate frames without losing much information. The n-th frame aggregates rewards accumulated over each skipped frame.
3. FrameStack: Then we squash consecutive frames of the environment into a single observation point to feed to our learning model. This way, we can identify if Mario was landing or jumping based on the direction of his movement in the previous several frames.

After applying the wrappers, we get the final wrapped state consisting of 4 gray-scaled consecutive frames stacked together. Each time Mario makes an action, the environment responds with a state of this structure. The structure is represented by a 3-D array of size [4, 84, 84]

Agent:

Mario is our agent who takes decisions in the Super Mario Environment based on rewards and punishments after every action.

There are three actions of an agent:

- Act according to the optimal action policy based on the current state (of the environment).
+ Remember experiences. Experience = (current state, current action, reward, next state). Mario caches and later recalls his experiences to update his action policy.
! Learn a better action policy over time

Act

For a state, an agent can choose to Explore or Exploit. Explore: take a random action Exploit: choose the most optimal action We start with a high value of exploration and decrease the exploration rate with increasing time steps.
Limiting the action space to:\

  • 0: walk right
  • 1: jump right

Remember/Memory

For memory, we create two functions,

  • cache(): Each time Mario performs an action, he stores the experience to his memory. His experience includes the current state, action performed, reward from the action, the next state, and whether the game is done.

  • recall(): Mario randomly samples a batch of experiences from his memory, and uses that to learn the game.

Learn

The Reinforcement Learning Algorithm that our Mario Agent uses is the Double Deep Q-Learning Network Algorithm. DDQN uses two ConvNets - Qonline and Qtarget that independently approximate the optimal action-value function.

Mini cnn structure
  input -> (conv2d + relu) x 3 -> flatten -> (dense + relu) x 2 -> output

Calculating the TD Target and TD Estimate

Alt Text

Updating the Model

Alt Text

Metrics

  • episode rewards
  • episode lengths
  • episode average loss
  • episode average Q values

Print

  • Episode
  • Step
  • Epsilon
  • Mean Reward
  • Mean Length
  • Mean Loss
  • Mean Q Value
  • Time Delta
  • Time

reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.