GithubHelp home page GithubHelp logo

anita-hu / tf2-rl Goto Github PK

View Code? Open in Web Editor NEW
286.0 6.0 68.0 6.02 MB

Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]

License: MIT License

Python 100.00%
reinforcement-learning tensorflow2 openai-gym ddpg dqn sac ppo ae-ddpg tensorboard

tf2-rl's Introduction

Reinforcement Learning Agents

Implemented for Tensorflow 2.0+

New Updates!

  • DDPG with prioritized replay
  • Primal-Dual DDPG for CMDP

Future Plans

  • SAC Discrete

Usage

  • Install dependancies imported (my tf2 conda env as reference)
  • Each file contains example code that runs training on CartPole env
  • Training: python3 TF2_DDPG_LSTM.py
  • Tensorboard: tensorboard --logdir=DDPG/logs

Hyperparameter tuning

Agents

Agents tested using CartPole env.

Name On/off policy Model Action space support
DQN off-policy Dense, LSTM discrete
DDPG off-policy Dense, LSTM discrete, continuous
AE-DDPG off-policy Dense discrete, continuous
SAC:bug: off-policy Dense continuous
PPO on-policy Dense discrete, continuous

Contrained MDP

Name On/off policy Model Action space support
Primal-Dual DDPG off-policy Dense discrete, continuous

Models

Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs

Demos

DQN Basic, time step = 4, 500 reward DQN LSTM, time step = 4, 500 reward
DDPG Basic, 500 reward DDPG LSTM, time step = 5, 500 reward
AE-DDPG Basic, 500 reward PPO Basic, 500 reward

tf2-rl's People

Contributors

anita-hu avatar stepneverstop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tf2-rl's Issues

ValueError: Expected scalar shape, saw shape: (1,).

At first thank you Anita -hu for providing code. i am executing AEDDPG code ,with continuous Action space gym environments ,i am getting following error .kindly help solving the error.

self.action_space=spaces.Box(low=-1,high=1,shape=(1,),dtype=np.float32)

File "TF2_AE_DDPG.py", line 212, in async_collection
tf.summary.scalar('Stats/action', action, step=total_steps)
File "C:\New folder\envs\test\lib\site-packages\tensorboard\plugins\scalar\summary_v2.py", line 61, in scalar
tf.debugging.assert_scalar(data)
File "C:\New folder\envs\test\lib\site-packages\tensorflow_core\python\ops\check_ops.py", line 2068, in assert_scalar_v2
assert_scalar(tensor=tensor, message=message, name=name)
File "C:\New folder\envs\test\lib\site-packages\tensorflow_core\python\ops\check_ops.py", line 2098, in assert_scalar
% (message or '', shape,))
ValueError: Expected scalar shape, saw shape: (1,).

PPO问题

你好,我在使用你的PPO模型训练离散输出的环境时,没有任何问题。但是用到连续输出的环境时,就会报错如下:

Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in <cell line: 1>
runfile('/Users/pc/Documents/myproject/TF2-RL-master/PPO/TF2_PPO.py', wdir='/Users/pc/Documents/myproject/TF2-RL-master/PPO')
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/pc/Documents/myproject/TF2-RL-master/PPO/TF2_PPO.py", line 282, in
ppo.train(max_epochs=3000, save_freq=50)
File "/Users/pc/Documents/myproject/TF2-RL-master/PPO/TF2_PPO.py", line 214, in train
self.learn(*sampled_data)
File "/Users/pc/Documents/myproject/TF2-RL-master/PPO/TF2_PPO.py", line 168, in learn
self.model_optimizer.apply_gradients(zip(grad, train_variables))
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 672, in apply_gradients
return self._distributed_apply(strategy, grads_and_vars, name,
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 719, in _distributed_apply
update_op = distribution.extended.update(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2630, in update
return self._update(var, fn, args, kwargs, group)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3703, in _update
return self._update_non_slot(var, fn, (var,) + tuple(args), kwargs, group)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3709, in _update_non_slot
result = fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/autograph/impl/api.py", line 595, in wrapper
return func(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 702, in apply_grad_to_update_var
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/keras/optimizer_v2/adam.py", line 173, in _resource_apply_dense
return gen_training_ops.ResourceApplyAdam(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/util/tf_export.py", line 400, in wrapper
return f(**kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/ops/gen_training_ops.py", line 1427, in resource_apply_adam
_ops.raise_from_not_ok_status(e, name)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf28/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 7164, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'ResourceApplyAdam' OpKernel for 'GPU' devices compatible with node {{node ResourceApplyAdam}}
(OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_DOUBLE, use_locking=true, use_nesterov=false
. Registered: device='XLA_CPU_JIT'; T in [DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_BFLOAT16, DT_COMPLEX128, DT_HALF]
device='GPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
[Op:ResourceApplyAdam]

回溯查询,好像是在计算梯度的时候不能分配到GPU。我的工作环境时mac M1.如果是设备问题,为什么离散的环境(比如小车爬坡)又没有问题呢?问题是不是出在tfd.Normal这个函数上? 请把以上这段翻译成英文

Help regarding pushing code on GPU

Hi Anita hu,
I am Jewaliddinn shaik doing Ph.D from NIT AP,india . I need to force AE_DDPG code to work on GPU mode .kindly suggest me which lines modifications need in code. thank you in advance awaiting for positive response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.