cun-bjy / gym-ddpg-keras Goto Github PK

Keras Implementation of DDPG(Deep Deterministic Policy Gradient) with PER(Prioritized Experience Replay) option on OpenAI gym framework

License: GNU General Public License v3.0

Python 100.00%

ddpg roboschool gym rl reinforcement-learning keras tensorflow2 openai-gym per

gym-ddpg-keras's Introduction

gym-ddpg-keras

Keras Implementation of DDPG(Deep Deterministic Policy Gradient) with PER(Prioritized Experience Replay) option on OpenAI gym framework

Status : `IMPLEMENTING`

Extended Work : gym-td3-keras(TD3)

Experiments

CartPole-v1, link
RoboschoolInvertedPendulum-v1, link
RoboschoolHopper-v1, link

Details from paper

We used Adam (Kingma & Ba, 2014) for learning the neural network parameters with a learning rate of 10−4 and 10−3 for the actor and critic respectively. For Q we included L2 weight decay of 10−2 and used a discount factor of γ = 0.99. For the soft target updates we used τ = 0.001. The neural networks used the rectified non-linearity (Glorot et al., 2011) for all hidden layers. The final output layer of the actor was a tanh layer, to bound the actions. The low-dimensional networks had 2 hidden layers with 400 and 300 units respectively (≈ 130,000 parameters). Actions were not included until the 2nd hidden layer of Q.

abstract

optimizer : Adam
learning rate: 10-4 ~ 10-3
weight decay: 10-2 (for regularization)
discount factor: 0.99(for q-network)
tau : 0.001 (for soft target update)
activation : ReLU(for hidden layer), tanh(for output layer)
layers: 400, 300 for each hidden layer

Easy Installation

Make an independent environment using virtualenv

# install virtualenv module
sudo apt-get install python3-pip
sudo pip3 install virtualenv

# create a virtual environment named venv
virtualenv venv 

# activate the environment
source venv/bin/activate

To escape the environment, deactivate

Install the requirements

pip install -r requirements.txt

Run the training node

#trainnig
python train.py

Reference

[1] Continuous control with deep reinforcement learning

@misc{lillicrap2015continuous,
    title={Continuous control with deep reinforcement learning},
    author={Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra},
    year={2015},
    eprint={1509.02971},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

REVIEW | PAPER

[2] germain-hug/Deep-RL-Keras

[3] anita-hu/TF2-RL

[4] marload/DeepRL-TensorFlow2

[5] openai/baselines

[6] Improving DDPG via Prioritized Experience Replay

gym-ddpg-keras's People

Contributors

Stargazers

Watchers

Forkers

aarnnity jbyang1 adroitwolf hfut-li blackjocker1995 wanghaoyi518

5 out of the last 5 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing.

When the train.py module run, some warning happen on the tensorflow.
and the speed is critically slow..

I cannot sure if it depends on GPU using or not.

WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 6 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:7 out of the last 7 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:8 out of the last 8 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:9 out of the last 9 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:10 out of the last 10 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:11 out of the last 11 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:11 out of the last 11 calls to <function Model.make_train_function.<locals>.train_function at 0x7f627418ac80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.

train.py:84: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.pause(0.01)
Segmentation fault (core dumped)

support various model

Ant
HalfCheetah
etc.

---step28(EP0)---
predict:  0.0423583984375 sec
update_network:  0.05187201499938965 sec
total:  0.12126398086547852 sec

---step467(EP2)---
predict:  0.16730880737304688 sec
update_network:  0.2372143268585205 sec
total:  0.4585869312286377 sec

train by cuda

with optional usage

look like not being trained

after 20 episodes, the reward 's so poor.

=========EPISODE # 16 ==========
100%|########################################9| 499/500 [00:53<00:00,  8.67it/s]Episode#16, steps:499, rewards:-311.104067
100%|########################################9| 499/500 [00:53<00:00,  9.28it/s]
=========EPISODE # 17 ==========
100%|########################################8| 498/500 [00:51<00:00, 10.16it/s]Episode#17, steps:499, rewards:-211.220928
100%|########################################9| 499/500 [00:51<00:00,  9.67it/s]
=========EPISODE # 18 ==========
100%|########################################8| 498/500 [00:55<00:00,  9.79it/s]Episode#18, steps:499, rewards:-243.705719
100%|########################################9| 499/500 [00:56<00:00,  8.90it/s]
=========EPISODE # 19 ==========
100%|########################################9| 499/500 [00:55<00:00,  7.42it/s]Episode#19, steps:499, rewards:-332.657685
100%|########################################9| 499/500 [00:56<00:00,  8.90it/s]
=========EPISODE # 20 ==========
100%|########################################9| 499/500 [00:55<00:00,  9.19it/s]Episode#20, steps:499, rewards:-231.899928

20.12.29

poor performance even in cartpole..
get back to the original. something wrong..

learning status reporting for analysis

basic analysis tool using seaborn
log save & load for bootstrap

cartpole learning(continuous) : RoboschoolInvertedPendulum-v1

reinforcement learning
in RoboschoolInvertedPendulum-v1 environment