GithubHelp home page GithubHelp logo

karpathy / reinforcejs Goto Github PK

View Code? Open in Web Editor NEW
1.3K 76.0 338.0 588 KB

Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)

JavaScript 34.60% HTML 65.40%

reinforcejs's Introduction

REINFORCEjs

REINFORCEjs is a Reinforcement Learning library that implements several common RL algorithms, all with web demos. In particular, the library currently includes:

  • Dynamic Programming methods
  • (Tabular) Temporal Difference Learning (SARSA/Q-Learning)
  • Deep Q-Learning for Q-Learning with function approximation with Neural Networks
  • Stochastic/Deterministic Policy Gradients and Actor Critic architectures for dealing with continuous action spaces. (very alpha, likely buggy or at the very least finicky and inconsistent)

See the main webpage for many more details, documentation and demos.

Code Sketch

The library exports two global variables: R, and RL. The former contains various kinds of utilities for building expression graphs (e.g. LSTMs) and performing automatic backpropagation, and is a fork of my other project recurrentjs. The RL object contains the current implementations:

  • RL.DPAgent for finite state/action spaces with environment dynamics
  • RL.TDAgent for finite state/action spaces
  • RL.DQNAgent for continuous state features but discrete actions

A typical usage might look something like:

// create an environment object
var env = {};
env.getNumStates = function() { return 8; }
env.getMaxNumActions = function() { return 4; }

// create the DQN agent
var spec = { alpha: 0.01 } // see full options on DQN page
agent = new RL.DQNAgent(env, spec); 

setInterval(function(){ // start the learning loop
  var action = agent.act(s); // s is an array of length 8
  //... execute action in environment and get the reward
  agent.learn(reward); // the agent improves its Q,policy,model, etc. reward is a float
}, 0);

The full documentation and demos are on the main webpage.

License

MIT.

reinforcejs's People

Contributors

edersantana avatar gaapt avatar karpathy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reinforcejs's Issues

How to export and re-use agent brain information?

I made a program which trains the agent. But now if I want to export the data and reimport the data, how can I do that?

I want something like this:

var data = agent.exportData(); // this might give me a obj of the data for its NN or whatever

Then I just save it to a txt file or database.

Then later, I can import it like this

agent.SetData(data); // imports the data

This way I don't have to re-train it.

Does anyone know how to do something like this?

Thanks

act and learn input ranges

  1. Should all state inputs to act be 0<=stateX<1?
  2. Should all reward inputs be 0<=reward<1?
  3. Is there any way to get out "nope, that wasn't a good reply. I want a second opinion!" (second place answer, etc)

Globals and exports

Any reason for deciding to go with globals over checking for browser and deliveringwindow.RL or module.exports.RL?

Something you'd accept as a PR, if it was done in a way that fits?

SARSA not working in GridWorld_td

Using the default agent parameters, but set spec.update to 'sarsa', the model simply does not converge to the optimal solution.

// agent parameter spec to play with (this gets eval()'d on Agent reset)
var spec = {}
spec.update = 'sarsa'; // 'qlearn' or 'sarsa'
spec.gamma = 0.9; // discount factor, [0, 1)
spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
spec.alpha = 0.1; // value function learning rate
spec.lambda = 0.1; // eligibility trace decay, [0,1). 0 = no eligibility traces
spec.replacing_traces = true; // use replacing or accumulating traces
spec.planN = 0; // number of planning steps per iteration. 0 = no planning

spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
spec.beta = 0.1; // learning rate for smooth policy update

image

waterworld.js - forward()

Hi,

First of all, thank you very much for sharing great demos!

In waterworld.js, I don't get this part in forward()

  forward: function() {
    // in forward pass the agent simply behaves in the environment
    // create input to brain
   .....
  for(var i=0;i<num_eyes;i++) {
      var e = this.eyes[i];
      input_array[i*5] = 1.0;   // ???
      input_array[i*5+1] = 1.0;  //  ???
      input_array[i*5+2] = 1.0;  //   ???
      input_array[i*5+3] = e.vx; // velocity information of the sensed target
      input_array[i*5+4] = e.vy;
      if(e.sensed_type !== -1) {
        // sensed_type is 0 for wall, 1 for food and 2 for poison.
        // lets do a 1-of-k encoding into the input array
        input_array[i*5 + e.sensed_type] = e.sensed_proximity/e.max_range; // normalize to [0,1]
      }
  }

I don't understand why the first three inputs are all 1.0. Shouldn't it be the type of sensed object or something?

On the demo page, it says:

The agent has 30 eye sensors pointing in all directions and in each direction is observes 5 variables: the range, the type of sensed object (green, red), and the velocity of the sensed object. The agent's proprioception includes two additional sensors for its own speed in both x and y directions. This is a total of 152-dimensional state space.

Failed to load file

For the problem waterworld, when I click the button 'Load a Pretrained Agent', it prompts

Failed to load file:///C:/Users/xuxiyang/Desktop/reinforcejs-master/agentzoo/wateragent.json: Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https.
jquery-2.1.3.min.js:4

Anyone knows how to resolve this and load the saved data?

Multiple Workers

Is it possible to get this to run with multiple workers ?
Is there a paper I can look at that explains how this is done ?

Multiple layers of neurons?

How hard would it be to implement this?

I'm trying ReinforceJS the 2048 game here: https://github.com/NullVoxPopuli/doctor-who-thirteen-game-ai/blob/master/worker.js#L105

and I've noticed a couple things:

  • the ai gets to it's best score (of not very high) pretty quickly
  • it seems to have trouble beating its best score
  • achieving the best score is likely a fluke of the random nature of tile spawns

Additionally,

  • I'm not sure how long I should expect training to take
  • is a day too long?

idk :D

Question about strategy

Fantastic library!
I have a ton of questions, most of which likely have answers along the lines of "it depends" :) But, the top questions:

  1. In a basic game with many invalid moves (don't crash into a wall, don't play an invalid move, etc) and only a few valid moves, is it normally better to let the system "work out" the rules? I was considering an alternate of only offering the agent a list of valid moves and having it pick among them, but intuitively it seems more confusing - a small shift in valid moves would "off-by-one" the list, and that would be hard for the agent to learn.
  2. I combined a few examples for the spec. Are any of these "bad" in a general game solver?
    spec.update = 'qlearn'; // 'qlearn' or 'sarsa'
    spec.gamma = 0.9; // discount factor, [0, 1)
    spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
    spec.lambda = 0.8; // eligibility trace decay, [0,1). 0 = no eligibility traces
    spec.replacing_traces = false; // use replacing or accumulating traces
    spec.planN = 50; // number of planning steps per iteration. 0 = no planning
    spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
    spec.beta = 0.1; // learning rate for smooth policy update
    spec.alpha = 0.005; // value function learning rate
    spec.experience_add_every = 5; // number of time steps before we add another experience to replay memory
    spec.experience_size = 10000; // size of experience
    spec.learning_steps_per_iteration = 5;
    spec.tderror_clamp = 1.0; // for robustness
    spec.num_hidden_units = 100; // number of neurons in hidden layer
  1. In your examples, you have
env.getNumStates = function() {
      return 9;
    };
env.getMaxNumActions...

Any particular reason to have it be a function? (relates back to my #1 question)

Using DQN

Hello,

In the example library usage, the environment is created with number of states and also maximum number of actions possible.

For example, if I have 5 possible states, which is defined with 2 values, and 3 possible actions, which is also defined with 2 values, what should be the relevant variables of environment object? Moreover, what should be the given state array in 'act(state)' method.

I normally have more states and actions however I couldn't get the general idea behind it.

P.S. I know that this question is more suitable for StackOverFlow, however I also think that it would be beneficial for other new people.

hidden_size is most likely undefined on line 460...

model['Whd'] = new RandMat(output_size, hidden_size, 0, 0.08);

On line 460 there is a call to make a new random matrix, and the second argument is hidden_size which is only actually defined inside the for loop above it, meaning it should resolve the argument d to undefined when calling the RandMat function because it is out of scope.

I only caught this because I am porting your rl.js to c++ and when doing a unit test, came across this one. I then realized most likely javascript would have let this one slip right under your nose, or anyone's noses, as these RL learners are so good at learning regardless of coding errors.

I hope one day you update as this is a 7 year old repository. I can imagine what you have learned about RL in 7 years and working with Tesla. It would be amazing to see some of the newer stuff like the recent DeepMind paper about continuous action space and their "Director" agent. This lib could provide even more generalization. Anyway, thanks for your wonderful code and keep up the great work. I like the way you go about things, and I can also imagine while porting this, that quite possibly you already made this in c++ and were actually porting it to JS as you refer to some things in comments as structs.

Thanks, and I hope you see this.

GridWorld: TD, Demo Page: Discounted Reward greater than 1.0?

Using the initial settings, how can the discounted reward of the center field be 1.1? The max reward the agent can get is 1.0 and then the goal is reached and the agent is reset.

Also, if changing the field below to R 1.0, I'd expect the discounted reward to be 10 instead of 9.9:
image

and here 50 instead of 49.90:
image

Could throw error when input array mismatches getNumStates

I had neglected to set getNumStates yet nothing complains.

Guess it's an extra check every time you give an input array to act, a comparison of sizes in each call to setFrom would likely be over the top...

Could validate in Agent.forward before passing to DQNAgent.act, but then if doing it there, then why not in act.

Maybe there is a solution in checking the sizes once for the first call to act?

Question about DQN inputs

I am trying to understand the inputs for the example given here
http://cs.stanford.edu/people/karpathy/reinforcejs/index.html

env.getNumStates()
This is the size of the vector that represents the variables of the current game configuration?

env.getMaxNumActions
For this one, is it the total number of configurations the game can have? Or is it the number of actions the play can currently do in the current game configuration, such as in a grid maze, the player has up to 4 directions to move, so it would be 4.

Inside the "setInterval" function, "s" is not defined. It is the vector of the variables of the current game configuration that I have to get myself?

And "reward" is something I have to calculate too based on the current "s" vector?

Also why is getNumStates and getMaxNumActions a function, when they seem to return a constant value? Is it supposed to support returning dynamic values? Can the vector size be allowed to be different at anytime? And the Max num of actions, is that dynamic too?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.