GithubHelp home page GithubHelp logo

vjg28 / linear-inverse-rl-algorithms Goto Github PK

View Code? Open in Web Editor NEW
29.0 1.0 9.0 475 KB

Implementation of Linear Inverse Reinforcement Learning Algorithm (IRL) on Mountain Car Environment.

License: MIT License

Jupyter Notebook 97.89% Python 2.11%

linear-inverse-rl-algorithms's Introduction

Inverse Reinforcement Learning

Implementation of Linear Inverse Reinforcement Learning Algorithm (IRL) on Mountain Car Environment. Mainly for experimental & educational purposes.

Overview

  • The concept of Reinforcement learning was developed based on the sole presupposition that Reward functions are the most succinct, robust and transferable defination of a task. But, in cases like robotic manipulations and self driving cars, defining a reward function or hand-manifacturing a reward function become difficult or in some cases, almost impossible.
  • The first LIRL algorithm was published in a paper Algorithms for IRL (Ng & Russel 2000) in which they proposed an iterative algorithm to extract the reward function given optimal/expert behavior policy for obtaining the goal in that particular environment.
  • This code is an attempt to test IRL algorithm on Mountain Car environment given m expert trajectories.
  • Following Section 5 of IRL paper(Ng & Russel).

LIRL Algorithm Flowchart

Implementation

  • Expert Policy Generation
    • Used Q-learning algorithm with linear approximators to train the agent for generating expert policy.(This is done as it becomes difficult for humans to generate expert policies in Mountain Car)
  • IRL implementation
    • Given the best expert policy, IRL algorithm learns the best reward function for the agent and returns it to the agent.
  • How good is the learnt reward function?
    1. Timesteps required to converge to the optimal solution.
      • Using the learnt reward function, it was observed that the timesteps to learn were significantly less than the default reward function inbuilt in the Mountain car env.

    2. Average reward after completion of task.
      • There is a code block for generating results in the notebook which basically averages reward over 100 trajectories with random start position using the learnt policy based on: 1) Learnt reward function 2) Default reward function.

How to use

  1. git clone the repository.
  2. Execute the jupyter notebook code blocks in order.
  3. The notebook is explained in a very detailed manner for gaining deeper understanding into the code and results. Improvements and future work are desribed in the notebook.

Dependencies

  • Scikit Learn
  • Gym
  • Scipy
  • Numpy
  • MatPlotLib

Install them using pip

Contributing

Please feel free to create a Pull Request for adding implementations of the IRL algorithms and improvements. If you are a beginner, you can refer to this for getting started.

Support

If you found this useful, please consider starring(★) the repo so that it can reach a broader audience.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

linear-inverse-rl-algorithms's People

Contributors

vjg28 avatar

Stargazers

 avatar justwj avatar  avatar  avatar meraline avatar Jian Cui avatar 方城亮 avatar  avatar  avatar Aman Atman avatar  avatar  avatar  avatar 齐小阳 avatar  avatar Valay Dave avatar Lorenzo Terenzi avatar Yue Cao avatar Prasanth Suresh avatar Abhik Singla avatar  avatar  avatar Jie Huang avatar Sui Libin avatar Antoine Théberge avatar Radhakrishnan Ravi Vignesh avatar  avatar Geonhee avatar Rishav Chourasia avatar

Watchers

James Cloos avatar

linear-inverse-rl-algorithms's Issues

featurizer_function not using its arg

Hi Gandhi,

I am a student that is trying to learn IRL and I am taking your program as example. I found the function in this .ipynb has a function featurizer_function( ) that is not using its argument featureVecDim. Would you please clarify if it is still being used and where should it be placed.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.