pythonlessons / reinforcement_learning Goto Github PK

View Code? Open in Web Editor NEW

351.0 7.0 149.0 89.09 MB

Reinforcement learning tutorials

Home Page: https://pylessons.com/

License: MIT License

Python 100.00%

dqn ddqn dueling-dqn d3qn reinforcement-learning a2c policy-gradient a3c ppo ppo-agent

reinforcement_learning's People

Contributors

Stargazers

Watchers

Forkers

mohsen-azimi raghuemmadisetti123 vejvarm riouh majadoon amitkml kadirou31 nguyenbaopc rodrigoclira jonathand94 edwige2020 renatomrocha gradpratik kulka193 blackcatian trongdamnguyen khanhpham2411 cwickniss markusbuchholz herminello supercatex myausweis gitlele92 georgmiller jasonzhang929 836304831 alfrentgen webprogrammer77 amine179 dmitryf-go tengteckhou nzmacat manncodes ineiw rishabhdevyadav nedraki darylrodrigo ynorz kahchanlow truongkyle mikesifanele rajeshk738 sramboer mlmabie kanyimatrix alexdavydov357 vxychen sravanikaza qxydcr rezafor97 tom23785886 anhnt2407 blaze-2622555 declandk suqin-haha akuloff msathler jtnghia hititan bibofeng patrickgoettsch jp4711 amdshameer grvnmttl zclllyybb rx0gtx alibeikmohammadi yashguleria ajithpious art-levy amitgupta7580 joshsanchez98 farnazadib zeronilzero liudyboy sim017 bassemsellami shubowu yyummyu bidhyapokharel danilob wmleung2 aliang-ai zw521 gzroy gfngty ouat-in-github herchi05 oseiasdfarias akirasamadesu macyatmacy dayuantan almastreet cris-her giuliacrespi ibrahimgb hunsooni erenulu2020 tiger3927 nesseg994

reinforcement_learning's Issues

module 'tensorflow' has no attribute 'ConfigProto'

I have done the TF backport changes by adding %tensorflow_version 1.x

but still getting the error...
TensorFlow is already loaded. Please restart the runtime to change versions. So is this tested in tensorflow 1.x or TF 2?

AttributeError Traceback (most recent call last)

in ()
21
22 # configure Keras and TensorFlow sessions and graph
---> 23 config = tf.ConfigProto()
24 config.gpu_options.allow_growth = True
25 sess = tf.Session(config=config)

AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

Add licence

Hi, as many people I'm (probably) use your implementation as a start for a project of mine, it would be great if you add a licence to it.
Many thanks for this awesome repo 👍 :)

break statement is missed in done condition

Reinforcement_Learning/LunarLander-v2_PPO/LunarLander-v2_PPO.py

Line 321 in b5eedc7

I've noticed that 'break' statement is missed at the end of the 'done' condition. It seems that the inner loop works infinitely without it as well as the replay experience grows. Found it during re-implementation of this tutorial in pytorch.

library version

Could you update an environment.txt for me easier to run your code. Since i got some errors below.

Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/minh/anaconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/minh/anaconda3/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "main.py", line 302, in train_threading
action, prediction = agent.act(state)
File "main.py", line 107, in act
prediction = self.Actor.predict(state)[0]
File "/home/minh/anaconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/home/minh/anaconda3/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop
batch_outs = f(ins_batch)
File "/home/minh/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/home/minh/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_1/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_1/bias)
[[{{node dense_1/BiasAdd/ReadVariableOp}}]]

A problem in PPOAgent

Hi, Rokas:

First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot.

In the PPOAgent I think there may be something wrong with this line. When I vstack the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should not vstack discounted_r, but vstack the advantages in this line advantages = np.vstack(discounted_r - values), then the advantages are shape of (n,1), which is the expected result.

Thanks.

Dueling question

Hello,
Thanks for a great project. It's very useful. I have a question on the model code related to the Dueling algorithm. For example:
Pong-v0_DQN_CNN_TF2.py

Here is an example of the code:
action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)

let's say our batch looks like this:
a = tf.constant([[1.0, 2.0], [-2.0, 3.0], [3.0, -4.0]])
print('a=', a)
a= tf.Tensor(
[[ 1. 2.]
[-2. 3.]
[ 3. -4.]], shape=(3, 2), dtype=float32)

The result of the "K.mean" function will be a tensor with shape (1, 1):
print('Kmean=', K.mean(a[:, :], keepdims=True))
Kmean= tf.Tensor([[0.5]], shape=(1, 1), dtype=float32)

Shouldn't there be a tensor with shape (3, 1)?
print('Kmean=', K.mean(a[:, :], axis=1, keepdims=True))
Kmean= tf.Tensor(
[[ 1.5]
[ 0.5]
[-0.5]], shape=(3, 1), dtype=float32)

If we assume that our batch contains 3 elements, then the mean value should be calculated for each element in the batch separately. Or am I missing something ?

pythonlessons / reinforcement_learning Goto Github PK

reinforcement_learning's People

Contributors

Stargazers

Watchers

Forkers

reinforcement_learning's Issues

module 'tensorflow' has no attribute 'ConfigProto'

Add licence

break statement is missed in done condition

library version

A problem in PPOAgent

Dueling question

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs