GithubHelp home page GithubHelp logo

pemami4911 / deep-rl Goto Github PK

View Code? Open in Web Editor NEW
297.0 13.0 193.0 93 KB

Collection of Deep Reinforcement Learning algorithms

License: MIT License

Python 100.00%
openai-gym reinforcement-learning

deep-rl's Introduction

deep-rl

Collection of Deep Reinforcement Learning algorithms.

Dependencies:

Tested with Python 2.7 and Python 3.6

So far:

  1. DDPG - Deep Deterministic Policy Gradients, evaluated on the Pendulum-v0 environment in OpenAI Gym.

Places where this code has been used

If you have used this code to do something cool, send me a link and a GIF (via email or pull request) and I'll add it

  1. @keithmgould used the same the DDPG code to solve the inverted Pendulum task in Roboschool. InvertedPendulum demo
  2. @janscholten Deep Reinforcement Learning with Feedback-based Exploration [code]

deep-rl's People

Contributors

afcruzs avatar pemami4911 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-rl's Issues

image inputs

do you have modification of DDPG for image inputs? or any github code of the other person modifiying your code.

converger issue

hi, Recently I used the your template to learn some simple maneuvers.

But I find the output always converge to -1 or +1 if the episodes is large enough, and if the output boundary is [-1 1].

Have you ever met with this situation? or do you know how to solve?

Best

Divide actor gradients by batch size?

Hello,

Shouldn't the actor gradients take into account that the final gradient will be an average of a batch? As a result, shouldn't the actor gradients be divided by the batch size? I believe tf.gradients just adds all of the partial derivatives for all the individual data points and does not take the mean.

Thanks for creating this tutorial!

Some questions!

Hi! I enjoy reading your blog post a lot! Thank you!!

  1. self.actor_gradients = tf.gradients(self.scaled_out, self.network_params, -self.action_gradient)
    is "self.action_gradient" getting multiplied with the original gradients of actor?
    (TF's documentation on this is hard to read)

  2. I notice that DDPG's update for the actor is multiplying on the gradient of critic w.r.t. policy's chosen actions. I've seen some other AC implementations where instead of multiplying on critic's gradient, they directly multiply policy's gradients on critic's output for policy's chosen actions.

Do you think multiplying on critic's gradient is unique to DDPG (since DDPG uses action sampling), or these other implementations are potentially wrong?

There is an error... about Monitor

AttributeError Traceback (most recent call last)
in ()
34
35 if name == 'main':
---> 36 tf.app.run()

/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.pyc in run(main, argv)
42 # Call the main function, passing through any arguments
43 # to the final program.
---> 44 _sys.exit(main(_sys.argv[:1] + flags_passthrough))
45
46

in main(_)
26 env, MONITOR_DIR, video_callable=False, force=True)
27 else:
---> 28 env = wrappers.Monitor(env, MONITOR_DIR, force=True)
29
30 train(sess, env, actor, critic)

AttributeError: 'module' object has no attribute 'Monitor'

I am using python2.7 and tensorflow 1.0

what should I do...? Could you please help me out?

Possible regression to do with batch normalization

Hi,

I just tried your code for the first time and I was disappointed to see that even after 500+ episodes, the rewards for Pendulum env were still in <-1000 area. I poked around a little and after reverting the latest commit (f242533) the algorithm works as expected and achieves good results after around 100 episodes. It seems like the commit above was a regression.

why is DDPG so unstable?

I can train a good agent, but the learning curve is quite noisy. why? is it an implementation issue or something intrinsic to DDPG?

Actor network output increases to 1, TORCS, TF 1.0.0

Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]

Gradients for that set:
[[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

I suspect the problem is some where around the following line:

Combine the gradients here

self.actor_gradients = tf.gradients(self.scaled_out, self.network_params, -self.action_gradient)

Could you tell me what do you think is the problem?

I am using tf 1.0.0 CPU version.

Thanks

A problem about the DDPG

Hi, I want to implement the DDPG algorithm, and before that, i've read your code. It's very useful. And I still have some little questions about the code.
1.As in the DDPG.py, line 61 to 66:
# This gradient will be provided by the critic network
self.action_gradient = tf.placeholder(tf.float32, [None, self.a_dim])
# Combine the gradients here
self.actor_gradients = tf.gradients(
self.scaled_out, self.network_params, -self.action_gradient)
My problem is: The tf.gradients(y,x) sums up the dy/dx for each y in ys. And in the paper, the equation of dJ/d(theta) divides N, and I wonder that whether I should write the code like this:
self.actor_gradients = tf.div(tf.gradients(
self.scaled_out, self.network_params, -self.action_gradient),N)
Looking forward to your reply.
Thank you very much.
My contact way: email: [email protected]

batch normalization not actually enabled?

Hi, this repo has been very helpful to me as I'm learning DDPG myself. As an exercise to make sure I understood what's going on, I re-implemented a similar DDPG setup using Keras, and in the process I noticed something -- I don't think your batch_normalization layers are ever actually learning (adjusting their weights), so they are essentially no-ops except for the small epsilon value. It looks like with tflearn you need to set is_training to true during training steps: http://tflearn.org/config/#is_training

Interestingly, with my Keras implementation I get very similar performance to yours when I disable my batch normalization layers. When I enable my batch norm layers, performance is actually much worse and the agent often doesn't solve Pendulum-v0 even after hundreds of episodes.

I found a couple discussions around the web where other people discuss the difficulties they've had getting batch normalization to work well with DDPG, in spite of what the original papers says. For example this reddit post. It all makes me very curious.

Anyway, sorry this all is mostly just for my own benefit as I'm learning, but I thought you'd like to know. Thanks again for sharing your code!

In tf.gradients, why -self.action_gradien is needed?

Hi, in the code of ActorNetwork

        self.unnormalized_actor_gradients = tf.gradients(
            self.scaled_out, self.network_params, -self.action_gradient)

Why -self.action_gradient is needed here? grad_ys is -self.action_gradient , but you returned self.unnormalized_actor_gradients .

L2 weight decay for Q

The paper mentions "For Q we included L2 weight decay of 0.01 and used a discount factor of gamma = 0.99"

Does that mean that we need to add L2 regularisation to each layer in the critic network?

Maybe something like this in create_critic_network

net = tflearn.add_weights_regularizer(net, 'L2', weight_decay=0.01)

Using DDPG for Pong

Hi Patrick,
I'm trying to convert your ddpg pendulum code to solve pong. I made minimal changes like input pre-processing and modifying of input, output dimensions. Over a few iterations I notice that the paddle sticks to the bottom of the screen as the probabilities for UP action becomes negligible and DOWN becomes almost equal to one. Since the original sample is for continuous space problem, I'm guessing since I'm expecting discrete output (up or down), I missed out changing some part of the code. Could you kindly look at my code here and point me to the missing piece:
https://gist.github.com/option-greek/dfc9288d5811371f578b2f52dce29f0e

Thanks,
OG

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.