GithubHelp home page GithubHelp logo

giaogiao007 / reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hischen/reinforcement-learning

0.0 0.0 0.0 4.05 MB

Reinforcement learning algorithm implements.

Python 4.07% Jupyter Notebook 95.93%

reinforcement-learning's Introduction

reinforcement-learning

Reinforcement learning algorithm implements.

Tips

New gym API:Gym v0.21->Gymnasium

In this repo, I use the new API for creating the env because a temporary wrapper support is provided for the old code and it may cease to be backward compatible some day. Using the new API could have certain minor ramifications to env code (in one line - Dont simply do: done = truncated).

Gym v21 to v26 Migration Guide: https://gymnasium.farama.org/content/migration-guide

Since Gym will not be receiving any future updates, this repo switch over to Gymnasium(import gymnasium as gym). If you'd like to read more about the story behind this switch, please check out this blog post.

Let us quickly understand the change.

To use the new API, add new_step_api=True option for e.g.

env = gym.make('MountainCar-v0', new_step_api=True)

This causes the env.step() method to return five items instead of four. What is this extra one?

  • Well, in the old API - done was returned as True if episode ends in any way.
  • In the nrequirements.txtew API, done is split into 2 parts:
  • terminated=True if environment terminates (eg. due to task completion, failure etc.)
  • truncated=True if episode truncates due to a time limit or a reason that is not defined as part of the task MDP.

This is done to remove the ambiguity in the done signal. done=True in the old API did not distinguish between the environment terminating & the episode truncating. This problem was avoided previously by setting info['TimeLimit.truncated'] in case of a timelimit through the TimeLimit wrapper. All that is not required now and the env.step() function returns us:

next_state, reward, terminated, truncated , info = env.step(action)

How could this impact your code: If your game has some kind of max_steps or timeout, you should read the 'truncated' variable IN ADDITION to the 'terminated' variable to see if your game ended. Based on the kind of rewards that you have you may want to tweak things slightly. A simplest option could just be to do a

done = truncated OR terminated

and then proceed to reuse your old code.

References

reinforcement-learning's People

Contributors

hischen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.