GithubHelp home page GithubHelp logo

ei339_easy21's Introduction

EI339 Easy 21 Game

EI339 Artificial Intelligence Team Project

Q-Learning

Codes for Q-learning are put in directory Q_learning/.

For simply reproducing the result in the report, please run python main.py.

Code structure

  • Main function: main.py

  • Environment implementation of Easy21 is in env_easy21.py.

  • Extra tool function is put in policy_eval.py and tools.py.

  • Algorithm implementation:

    • Value iteration code is in value_iter.py. This code is used for calculation theoretical optimal value function. Please run this code first if you'd like to seperately run each algorithm.
    • Q-learning code is in Q_learning.py.
    • Policy iteration code is in policy_iter.py.
    • MCMC (Markov Chain Monte Carlo) code is in monte_carlo.py. This code is used for result comparison.
  • Result comparison plotting code is in compare.py. Please run it after running all algorithms' code.

Logs

Running the codes will generate some log files, as well as some figures in the Q_learning/log and Q_learning/figdirectory. You can clear the logs after the run of all codes.

Parameters

  • Main:
    • You can set episode number to run in parameter episode_num for each algorithm
    • For other pameters, please refer to each algorithms' parameter introduction.
  • Value iteration
    • reward_decay: rewad decay factor $\gamma=1.0$. This experiment does not require change this.
    • theta: threshold to stop between two value update process in value iteration
  • Q-learning
    • learning_rate: learning rate $\alpha$.
    • reward_decay: reward decay factor $\gamma=1.0$..
    • episode_num: default number of episodes to run.
    • epsilon: the exploration factor $\epsilon$.
    • dynamic_epsilon: whether to turn on Dynamic Epsilon mode or not.
    • damping_factor: damping factor $r$ for exponential dynamic epsilon, suggest None.
    • final_epsilon: minimal epsilon for exponential dynamic epsilon, suggest None.
  • Policy iteration
    • Same as Value iteration.
  • MCMC
    • learning_rate: learning rate $\alpha$.
    • reward_decay: reward decay factor $\gamma=1.0$.

ei339_easy21's People

Contributors

fallcicada avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.