GithubHelp home page GithubHelp logo

ubastic / aiproject Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scrntnstrnglr/aiproject

0.0 0.0 0.0 242.34 MB

Comparative study of state-of-the-art algorithms to solve a real-world AI problem. Submitted as a final project for fulfillment of the requirements of the Artificial Intelligence module for the academic year 2019-2020 at Trinity College Dublin

Python 71.00% PowerShell 0.01% Batchfile 0.01% Roff 2.06% Shell 0.02% HTML 15.28% CSS 0.03% JavaScript 2.68% Jupyter Notebook 3.14% C++ 1.96% C 0.34% Fortran 0.02% Smarty 0.03% VBScript 0.01% XSLT 0.01% ASP 0.01% CMake 3.17% Makefile 0.01% M4 0.04% TeX 0.23%

aiproject's Introduction

AI Project - Reinforcement Learning with OpenAI Gym Taxi-v3

The project delves into a comparative study of several state-of-the-art Reinforcement Learning algorithms using the Taxi-v3 environment provided by OpenAI Gym. OpenAI Gym provides an array of gaming and virtual environments with pre-defined state spaces and action spaces. This makes for an ideal candidate for developers to explore their own AI algorithms or even implement state-of-the-art Reinforcement Learning algorithms. the Taxi-v3 environment has been selected for our project, because it provides a finite number of state spaces with and ideal number of action spaces. The environment also employs certain sub-missions such as pickups and drop-offs which is beneficial while performing comparative algorithm analysis as it is always desirable to have a certain level of complexity within the problem. The algorithms selected are as follows:

  • Random Search - to establish a baseline
  • Q-Learning
  • SARSA (State Action Reward State Action)
  • Expected SARSA
  • Deep Q Networks

The Taxi-v3 Environment

The Taxi-v3 environment provides a basic real-world transportation problem environment with a single taxi that acts as the agent, a passenger that needs to be picked up from his pick up location and dropped at his drop-off location. The taxi may or may not be at the pick up location at the initial stage. Therefore, essentiall the taxi needs to move towards the passenger, pick them up, travel towards the drop-off location and then drop them off. These set of actions constitute the total action space available to the agent. Additionally there are also certain obstrutions in the environment which the agent needs to avoid. An image of the environment frame is shown below:

Picture alt

There are four possible locations for pick up and drop off, these are the ones marked by R,G,B and Y. For a particular iteration, the pickup spot will be highlighted by blue, while the drop off spot will be highlighted by purple. Therefore, from the initial state as shown in the image, the taxi needs to pick the passenger up from B and drop them off at location Y. When the passenger is not in the taxi, the colour of the agent/taxi is denoted as yellow, if the passenger has been picked up by the taxi, its colour changes to green. And finally, once the taxi drops the passenger off, it's colour changes back to yellow. The straigh lines | denote obstructions through which the agent cannot pass. The possible actions the agent can take are:

  • North - move up from current location
  • South - move down from current location
  • West - move left from current location
  • East - move right from current location
  • Pickup - pick the passenger up.
  • Drop-off - drop the passenger off.

The agent is rewarded a point of -1 on taking any action, if the agent picks up of drops the passenger off at an incorrect location, it incurs a reward of -10, and finally when the agent drops the passenger off at the correct location it is reward a point of +20. This is to ensure that the agent is able to differentiate between the right and wrong steps. A sample run through of the environment can be seen below:

run through

Project Setup

To analyze our algorithms effectively, this project has been setup with certain parameter restrictions:

  • Number of iterations for each algorithm : 5000
  • Measurable quantities:
    • Epochs : time step for the agent to reach the final state from the initial state.
    • Penalties : if reward == -10 then penalty++. We have opted to count the number of times the agent takes an incorrect step.
  • Baseline algorithm : Random Search
  • Hyperparameters:
    • alpha : learning rate.
    • gamma : discount factor.
    • epsilon : balance factor between exploration and exploitation.

Usage

The following steps should be followed to re-use this project:

  • Clone or download this project into your own repo or local machine
git clone https://github.com/scrntnstrnglr/AIProject
  • The repo contains a virtual environment setup that should appear on your code editor with the name 'AIDemo'
  • Install dependencies -- Necessary to have pipenv
pipenv install
  • The following jupyter notebooks contain respective algorithms as follows:
    • gymdemo.ipynb : Q-Learning, Random Search.
    • SARSA_V2.ipynb : SARSA.
    • expected_SARSA.py : Expected SARSA.
    • DQN.ipynb : Deep Q-Networks.

aiproject's People

Contributors

baglat avatar bileir avatar scrntnstrnglr avatar srijang97 avatar tapariaankit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.