GithubHelp home page GithubHelp logo

rl_supermariobro_ddqn_ppo's Introduction

CSCN8020-23F-Sec1-Reinforcement Learning Programming Project Report

Reinforcement Learning Approach for Super Mario Bros - Group 1

  • Members:
    • Lifei Wang
    • Jia Zeng
    • Sudhan Shrestha
  • Instructor: Mahmoud Mohamed
  • Date: December 13, 2023

Table of Contents

  1. Introduction
  2. Data Processing
  3. Algorithms
  4. Experiments
  5. Comparison
  6. Challenges
  7. Future Work
  8. Reference

Introduction

This project aims to train a reinforcement learning agent on the Nintendo Entertainment System (NES) version of Super Mario Bros. We employ and compare two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO), in the gym-super-mario-bros environment.

Environment

We utilize the OpenAI Gym environment gym-super-mario-bros for this project. The agent is tasked with completing all 32 stages of Super Mario Bros with just three lives. The environment simulates the gameplay frames, similar to the experience of a human player. image

States

The state includes various information such as pixel values of the current game screen, agent's position, level details, remaining time, and collected items.

Actions

The environment offers 256 discrete actions with three lists of actions: RIGHT_ONLY, SIMPLE_MOVEMENT, and COMPLEX_MOVEMENT, simulating various degrees of movement and interaction within the game.

Rewards

The default reward function promotes moving right as quickly as possible without dying, with modifications to account for game clock differences and death penalties.


Data Processing

We preprocess the game frames by converting them to grayscale, resizing, and normalizing the pixel values to feed into our model effectively.

Algorithms

DDQN

We implement the Double Deep Q-Network (DDQN) with online and target networks to reduce overestimation bias, adjusting the learning rate during optimization.

PPO

Proximal Policy Optimization (PPO) alternates between data sampling and optimizing a surrogate objective function, using clipped probability ratios to ensure moderate policy updates.


Experiments

DDQN Experiments

We conducted five experiments with DDQN, tweaking settings such as batch size, learning rate, and epsilon. We observed how changes in exploration rate and update interval impact the learning performance.

PPO Experiments

Three experiments with PPO explored different learning rates, batch sizes, and epsilon values. Adjustments were made to optimize the performance and learning efficiency of the agent.


Comparison

We compared DDQN and PPO based on environment setup, reward schemes, and configuration parameters. Our findings suggest differences in performance stability and efficiency, with PPO showing potential in discovering and exploiting strategies despite its variance.


Challenges

The project faced challenges such as GPU setup, code conversion from Torch to TensorFlow, parameter tuning, and computational resource limitations.

Future Work

Future projects can build on our findings, focusing on extended training durations and further optimization of the chosen algorithms.

Reference

  • Gym-super-mario-bros. PyPI.
  • Amber. (2019). Deep Q-learning, part2: Double deep Q network, (double DQN).
  • Schulman, J., et al. (2017). Proximal policy optimization algorithms.

For more details, visit our GitHub repository.

rl_supermariobro_ddqn_ppo's People

Contributors

lifeiwangriley avatar sudhanshrestha avatar

Watchers

 avatar

Forkers

sudhanshrestha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.