marload / deeprl-tensorflow2 Goto Github PK

View Code? Open in Web Editor NEW

580.0 19.0 143.0 614 KB

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

License: Apache License 2.0

Python 100.00%

tensorflow machine-learning reinforcement-learning a2c a3c reinforce dqn trpo ppo sac

deeprl-tensorflow2's People

Contributors

Stargazers

Watchers

Forkers

ichko kyuhyoung lfy80 exjustice brezezee curieuxjy 10sorflow rheehot scape1989 allensmile pathway staminatang trendingtechnology felipemaldonado jinyeong yanmuu kiranscaria jesusgomezhernandez ntlamdut llt1 tgfred xpertdev 1091390146 tsgtdss583 thswind investor-cpu blackcatian liuxiang2020 jacklinquan visakhunnivimana hanbaoan123 strugoeli o1234 leewt smaranjitghose ai-hub-deep-learning-fundamental joelmap metalglove brycelyd xyshi25 covidmulator wfule wangguangyuan manik-500 hundred06 asjadlfc ahlas yi-jeremy victor-qin gbyuhub leeggpp lufffya renzhenxuexidemaimai ylhe92 zpx10 ymao94 julianstastny uthenpr pretidav crowntailtw0608 ahaidichen ewardyou ipsec georgeren92 hdg94 kokhung0802 percylau abhiroopbhattacharya tahwaru dahyun-kang zhangcollion quanthao ayjabri anianheyi wangxiaoshuai223 miftahur92 911091933 michaelcola rakdol ycao1019 lanski-ai 31cfdc30 zmandyhe yossie1702 amitgupta7580 wenguixuan matthewsparr yorkiapolis wangzhanwei666 sunshinesmilelk duydo77 zeronilzero skwlh544818 hogwild chg0901 jasonj99 kmh8667 abdul-wahab-mc kimhs950627 byeb

deeprl-tensorflow2's Issues

Reward modification in PPO

DeepRL-TensorFlow2/PPO/PPO_Discrete.py

Lines 151 to 154 in 876266d

 state_batch.append(state) 

 action_batch.append(action) 

 reward_batch.append(reward * 0.01) 

 old_policy_batch.append(probs)

DeepRL-TensorFlow2/PPO/PPO_Continuous.py

Lines 167 to 170 in 876266d

 state_batch.append(state) 

 action_batch.append(action) 

 reward_batch.append((reward+8)/8) 

 old_policy_batch.append(log_old_policy)

In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?

From_logit in A2C_discrete.py should be False

In the Actor net, It seems that from_logit should be set to False in tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) as you added a softmax in the last layer:)

'Viewer' object has no attribute 'isopen'
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
Traceback (most recent call last):
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
if self.isopen and sys.meta_path:
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 165, in del
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
self.close()
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
File "E:\anaconda\envs\tf2\lib\site-packages\gym\envs\classic_control\rendering.py", line 81, in close
AttributeError: 'Viewer' object has no attribute 'isopen'
if self.isopen and sys.meta_path:
AttributeError: 'Viewer' object has no attribute 'isopen'

l don't konw how to fix it

.

Hyper-parameters for successful DQN Agent

Hi @marload,

Great repository you have here 😄! I am running your DQN script and I am trying to solve CartPole with it (consistently get >200 score).

I ran the script with the default parameters, but the agent is having trouble learning a successful policy. All I get is fluctuating scores between 10 and 100 for the first 800 episodes I trained it on. There was one episode with >200 but it was early in the training and having in mind that eps would have been very high at this point I think this must have been due to chance.

So my question is - if you have trained a successful agent with this algorithm can you provide me with "working" parameters? Or maybe DQN is just unstable in nature and I should run the script a couple of more times and hope for something better?

I have not reviewed the code thoroughly, because I wanted to see it working first, but at first glance, it looks clean and simple.

Anyway, thanks for posting it on Reddit, not sure why it was deleted. I hope I can learn a thing or two from it since I am working on something similar at the moment. 😄

Have a great day!

"PPO_Continuous.py" trained 1000 EP without effect

[No changes have been made to the code.
tensorflow version is 2.2, will this affect it?

Any idea why DQN is slow on CPU and on GPU?

The issue is not DQN specific, it's the only module (DQN_Discrete.py) which I tried to run on my mbp and on google colab. It runs okay, but both runs seem to take almost the same time. To activate the GPU, I added the following lines to main():

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

Update:
wandb report shows 0% GPU utilization, you can check the graphs after a few minutes from starting the training here

Probelm in A3C continuous

Hello everyone.
I am trying to use A3C continuous. But I am getting some error saying "unrecognized arguments". Please see the attached picture.

How to solve this?

marload / deeprl-tensorflow2 Goto Github PK

deeprl-tensorflow2's People

Contributors

Stargazers

Watchers

Forkers

deeprl-tensorflow2's Issues

Reward modification in PPO

From_logit in A2C_discrete.py should be False

A3C_continues.py

.

Hyper-parameters for successful DQN Agent

"PPO_Continuous.py" trained 1000 EP without effect

Any idea why DQN is slow on CPU and on GPU?

Probelm in A3C continuous

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	state_batch.append(state)
	action_batch.append(action)
	reward_batch.append(reward * 0.01)
	old_policy_batch.append(probs)

	state_batch.append(state)
	action_batch.append(action)
	reward_batch.append((reward+8)/8)
	old_policy_batch.append(log_old_policy)