greydanus / baby-a3c Goto Github PK

View Code? Open in Web Editor NEW

168.0 168.0 43.0 83.64 MB

A high-performance Atari A3C agent in 180 lines of PyTorch

License: Apache License 2.0

Python 100.00%

a3c actor-critic atari deep-reinforcement-learning pytorch pytorch-a3c pytorch-rl

baby-a3c's People

Contributors

Stargazers

Watchers

baby-a3c's Issues

Why sync grad only when grad in None

The shared_param.grad is synced only when it is None here https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L159. I am kind of confused. I think we have to sync it without the condition above. That means we have to sync it whenever the local model calculates a grad. Is it auto synced somewhere? Thank you for your time

Loop should break when episode is done

Hi, I believe we should break out of the

for step in range(args.rnn_steps):

loop when done == True. Currently, when the environment indicates that the episode is done, the loop continues to go on for a couple of steps. Not sure how the Gym environment responds to that, but new values, rewards etc keep being added to the lists and that can't be good for training.

finished with exit code 0 qucikly when in train mode (can't train)

Hi, everyone. When I run in train mode, the code finished with exit code 0 quickly (within 10 s) without reporting any error. But, it runs normally when in test mode with a single process or multi-process. Is there anyone facing the same problem?

* Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *

If I disabled all GPUs then I get an error
"*** Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *** "

If I don't disable GPU then I get error

terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error

Then
*** Error in `python': corrupted size vs. prev_size: 0x00000000021dc430 ***

Is there any idea to fix it?

why override the default step()

baby-a3c/baby-a3c.py

Lines 72 to 78 in 85899d7

 def step(self, closure=None): 

 for group in self.param_groups: 

 for p in group['params']: 

 if p.grad is None: continue 

 self.state[p]['shared_steps'] += 1 

 self.state[p]['step'] = self.state[p]['shared_steps'][0] - 1 # a "step += 1" comes later 

 super.step(closure)

Why did you override the default implementation of step(closure)? The default one calculates exponential moving average. Your implementation doesn't calculate the step count because it always returns None. I looked over torch's documentation for step() but couldn't understand exactly why you chose to overide the step function.
Kindly review the following PR: #9

TensorFlow 2 implementation

Thanks for your great implementation.
Currently Iam trying to translate it to TF2 implementation. But I find it difficult for me to understand SharedAdam part and do not know how to implement it in TF2.
Could you kindly give me some tips?
Thank you.

Decreasing entropy factor in loss function

Hi @greydanus ,

I'm wondering is there a specific reason why you've added a term which decreases the entropy in the loss function. From other implementations of A3C I've seen, a factor to increase the entropy is added instead with the factor being reduced over time. My understanding is that preserving a small amount of entropy helps by encouraging exploration.

Many Thanks,
Akmal Bakar

No saved model

Hi, I've run the script in training mode, and even after the training was over, if I then run it in test or render mode I was given the "no saved model" message. How do I make it save the model? And how do I render the (saved and learned) policy only once the training is over? Thanks in advance for your time!

bugs when episode is done

Dear Author,

I think when an episode is done, hx should be reset. I am not sure whether it's a bug in
https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L144

sum vs. mean

Hi, Sam.

First of all, thank so much for this code and making it available. Having such a short implementation definitely helps understanding the algorithm.

I was hoping you can give me some intuition behind your use of sum instead of mean. In my implementation of REINFORCE, A3C, GAE, A2C I use mean and things work fine. Equations in online resources seem to suggest mean is the right approach. Other implementations, also use the mean.

Now, your implementation works very well, too! I tested it myself with ATARI games and other environments, and got rock solid results.

Can you share some insights on the use of sum instead of mean?

Again, thanks so much in advance!

license

Hello,
would you mind adding an open-source license to this project?

	def step(self, closure=None):
	for group in self.param_groups:
	for p in group['params']:
	if p.grad is None: continue
	self.state[p]['shared_steps'] += 1
	self.state[p]['step'] = self.state[p]['shared_steps'][0] - 1 # a "step += 1" comes later
	super.step(closure)

greydanus / baby-a3c Goto Github PK

baby-a3c's People

Contributors

Stargazers

Watchers

Forkers

baby-a3c's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs