GithubHelp home page GithubHelp logo

connect4's Introduction

Connect4

Purpose of this Tutorial

The prupose of this tutorial is to teach you how to create a maximally simple game-playing RL agent (for connect4 in this case). I've found that other tutorials for RL often either abstract away the environment or have poor readability.

Organization

In Stage1 we will create a simple connect4 environment, a random agent, and a player agent, that lets you play against the random agent.

In Stage2 we will create a minimax agent.

In Stage3 we will create and train a simple actor and critic.

In Stage4 we will train the agent using PPO (Proximal Policy Optimization).

Stage1

Connect4 Environment

There's only one code block for you to complete. It's in the step function of the Env class. The Env class is the connect4 environment. Read through the rest of the code, as it's fairly straightforward.

In order to implement checking for the win condition efficiently, it is reccomended to use the scipy convolve2d function. You can read about it here.

Random Agent

The code is provided as an example.

Human Agent

The code is provided as an example.

Once you're done, you should be able to play the game in the main notebook.ipynb.

Stage2

Minmax Agent

There's only one code block for you to complete. It's in the minimax function, used in the MinmaxAgent class. The MinmaxAgent class is the minimax agent. Read through the rest of the code, as it's fairly straightforward.

Stage3

In stage 3, we create a MVP of the RL connect4 agent. This is the most challenging stage, as there are a lot of moving parts that we need to add.

Here's the list of the parts we'll add in this stage:

  1. Actor Network
  2. Critic Network
  3. Computing Value
  4. Computing Advantage
  5. Define Policy Gradient Loss
  6. Training Actor and Critic

Note: We're purposely not implementing PPO yet. That comes in stage 4. Additionally, we're leaving off some optimizations in favor of simplicity.

1. Actor Definition

Go into network.py and fill out the missing sections in the Actor class.

Instructions are provided in the file.

2. Critic Definition

Go into network.py and fill out the missing sections in the Critic class.

Instructions are provided in the file.

3. Computing Value

Go into network.py and fill out the rest of the compute_value function.

You might want to consult the following resources:

4. Computing Advantage

Go into network.py and fill out the rest of the compute_advantage function.

You might want to consult the following resources:

5. Defining Policy Gradient Loss

Go into network.py and fill out the rest of the compute_policy_gradient_loss function.

You might want to consult the following resources:

6. Training Actor and Critic

Go into network.py and fill out the train_policygradient function.

Instructions are provided in the file.

Stage4

In stage 4, we implement PPO (Proximal Policy Optimization) to train our agent. Compared to stage 3, this stage is fairly simple.

Here's the list of the parts we'll add in this stage:

  1. PPO Loss
  2. Training PPO

1. PPO Loss

Go into network.py and fill out the compute_ppo_loss function.

Instructions are provided in the file.

2. Training PPO

Go into network.py and fill out the train_ppo function.

Instructions are provided in the file.

connect4's People

Contributors

pimpale avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.