greydanus / baby-a3c Goto Github PK
View Code? Open in Web Editor NEWA high-performance Atari A3C agent in 180 lines of PyTorch
License: Apache License 2.0
A high-performance Atari A3C agent in 180 lines of PyTorch
License: Apache License 2.0
The shared_param.grad is synced only when it is None here https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L159. I am kind of confused. I think we have to sync it without the condition above. That means we have to sync it whenever the local model calculates a grad. Is it auto synced somewhere? Thank you for your time
Hi, I believe we should break out of the
for step in range(args.rnn_steps):
loop when done == True
. Currently, when the environment indicates that the episode is done, the loop continues to go on for a couple of steps. Not sure how the Gym environment responds to that, but new values, rewards etc keep being added to the lists and that can't be good for training.
Hi, everyone. When I run in train mode, the code finished with exit code 0 quickly (within 10 s) without reporting any error. But, it runs normally when in test mode with a single process or multi-process. Is there anyone facing the same problem?
If I disabled all GPUs then I get an error
"*** Error in `python': corrupted size vs. prev_size: 0x0000000000863430 *** "
If I don't disable GPU then I get error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA error (3): initialization error
Then
*** Error in `python': corrupted size vs. prev_size: 0x00000000021dc430 ***
Is there any idea to fix it?
Lines 72 to 78 in 85899d7
Thanks for your great implementation.
Currently Iam trying to translate it to TF2 implementation. But I find it difficult for me to understand SharedAdam part and do not know how to implement it in TF2.
Could you kindly give me some tips?
Thank you.
Hi @greydanus ,
I'm wondering is there a specific reason why you've added a term which decreases the entropy in the loss function. From other implementations of A3C I've seen, a factor to increase the entropy is added instead with the factor being reduced over time. My understanding is that preserving a small amount of entropy helps by encouraging exploration.
Many Thanks,
Akmal Bakar
Hi, I've run the script in training mode, and even after the training was over, if I then run it in test or render mode I was given the "no saved model" message. How do I make it save the model? And how do I render the (saved and learned) policy only once the training is over? Thanks in advance for your time!
Dear Author,
I think when an episode is done, hx should be reset. I am not sure whether it's a bug in
https://github.com/greydanus/baby-a3c/blob/master/baby-a3c.py#L144
Hi, Sam.
First of all, thank so much for this code and making it available. Having such a short implementation definitely helps understanding the algorithm.
I was hoping you can give me some intuition behind your use of sum
instead of mean
. In my implementation of REINFORCE, A3C, GAE, A2C I use mean
and things work fine. Equations in online resources seem to suggest mean
is the right approach. Other implementations, also use the mean
.
Now, your implementation works very well, too! I tested it myself with ATARI games and other environments, and got rock solid results.
Can you share some insights on the use of sum
instead of mean
?
Again, thanks so much in advance!
Hello,
would you mind adding an open-source license to this project?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.