GithubHelp home page GithubHelp logo

marload / deeprl-tensorflow2 Goto Github PK

View Code? Open in Web Editor NEW
580.0 19.0 143.0 614 KB

๐Ÿ‹ Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

License: Apache License 2.0

Python 100.00%
tensorflow machine-learning reinforcement-learning a2c a3c reinforce dqn trpo ppo sac

deeprl-tensorflow2's People

Contributors

marload avatar pathway avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeprl-tensorflow2's Issues

Reward modification in PPO

state_batch.append(state)
action_batch.append(action)
reward_batch.append(reward * 0.01)
old_policy_batch.append(probs)

state_batch.append(state)
action_batch.append(action)
reward_batch.append((reward+8)/8)
old_policy_batch.append(log_old_policy)

In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?

A3C_continues.py

'Viewer' object has no attribute 'isopen'
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
Traceback (most recent call last):
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'

l don't konw how to fix it

.

..

Hyper-parameters for successful DQN Agent

Hi @marload,

Great repository you have here ๐Ÿ˜„! I am running your DQN script and I am trying to solve CartPole with it (consistently get >200 score).

I ran the script with the default parameters, but the agent is having trouble learning a successful policy. All I get is fluctuating scores between 10 and 100 for the first 800 episodes I trained it on. There was one episode with >200 but it was early in the training and having in mind that eps would have been very high at this point I think this must have been due to chance.

So my question is - if you have trained a successful agent with this algorithm can you provide me with "working" parameters? Or maybe DQN is just unstable in nature and I should run the script a couple of more times and hope for something better?

I have not reviewed the code thoroughly, because I wanted to see it working first, but at first glance, it looks clean and simple.

Anyway, thanks for posting it on Reddit, not sure why it was deleted. I hope I can learn a thing or two from it since I am working on something similar at the moment. ๐Ÿ˜„

Have a great day!

Any idea why DQN is slow on CPU and on GPU?

The issue is not DQN specific, it's the only module (DQN_Discrete.py) which I tried to run on my mbp and on google colab. It runs okay, but both runs seem to take almost the same time. To activate the GPU, I added the following lines to main():

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

Update:
wandb report shows 0% GPU utilization, you can check the graphs after a few minutes from starting the training here

Probelm in A3C continuous

Hello everyone.
I am trying to use A3C continuous. But I am getting some error saying "unrecognized arguments". Please see the attached picture.
image
image

How to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.