mg2033 / a2c Goto Github PK

View Code? Open in Web Editor NEW

182.0 182.0 37.0 907 KB

A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow

License: Apache License 2.0

Python 100.00%

a2c actor-critic computer-vision gym openai-gym-agents openai-gym-environments policy-gradient reinforcement-learning

a2c's People

Contributors

Stargazers

Watchers

a2c's Issues

number_of_classes

A2C/models/model.py

Line 48 in b9db158

self.img_height, self.img_width, self.num_classes = observation_space_params

Hello sir,

I was trying to understand your code, but got confused, what is num_classes ? Is it meant to be a number of channels in an input image ? (3 for RGB and 1 for gray-scale) ? If so, I was very confused as you refer it as num_classes through out the project.

New algorithms planned?

Hi, thanks for your work!! Do you plan to implement any other algorithms A3C, PPO...?

About updating.

Thank you for publishing your A2C codes.
In the updating block, you are using torch de-touch method. And it seems to me as same as stop using no grad method on calculating advantage like my [code](Thank you for publishing your A2C codes.
In the updating block, you are using torch de-touch method. And it seems to me as same as stop using no grad method on calculating advantage like my code.
But my code doesn't learn at all. Is my idea wrong?
Thanks.).
But my code doesn't learn at all. Is my idea wrong?
Thanks.

How to draw figure for episode reward?

In summary, I can only find reward for each environment. So I am supposed to average over all envs?

LSTM policy

Does the current model implementation also include an lstm policy?

render() and monitor() for custom environments

Hello sir,
Can you please provide an example of implementation of render() and monitor() methods for custom environment ?
Thank you,
Pavel

Update TF to version 2.

Since the TF 2.0 Keras API has been frozen for beta, it's possible to convert the code to TF2 without fear of having to deal with API changes in the future.

Model doesn't make use of the GPU

I have started training the model on breakout and it is a little slow. It is only using around 500 MB of the GPU. Even when increasing the number of environments to 20 the use of the GPU is the same. I think this may be the reason openAI coded their model the way they did. It uses around 7GB at least for the ACER model. I need to check for A2C.

Time to converge

Could you elaborate in the Readme on how much time/episodes does it take to converge on the environments?

Dose A2C support experience replay?

I read your code and implement a version with experience replay.
However, I find that the loss explode after a few frames(almost 1000). Value loss would be very large and action loss would be very negatively large.Is it code error or A2C doesn't support experience replay in theory?

config parameters

Sir, can you please clarify what is the use of unroll_time_steps and num_stack and config parameters ?

num_env problem

Hello, I have read the code carefully, and I have some doubts about num_env.

1: If this parameter equal to 4, is it equivalent to training four models? Or is it something like accelerated training?

2: I used openai baseline and get one summary when using 8 num_envs to train one model but I get 4 summaries when the num envs is 4 and using your code . I read the loger code of openAI and your code , I found that openAI add all infos of all envs to one summary but your code add info to its own FileWriter summary . is it right ? If I only want one summary , can i simply add all infos ? is this right ? if not , How can i get only one summry when i use multiply envs to train one model ?

3: when I test pong using A2C , it cost about 8k to coverage , but when I use openAI baseline ,it only costs about 500 steps to coverage , this makes me very confused .

Any suggestions ?

Bests.

Help running code

I am not sure what i am doing wrong but I am in the A2C folder and when I run:

(gym) teves@teves:~/A2C$ python main.py config/breakout.json
usage: main.py [-h] [--version] [--config CONFIG]
main.py: error: unrecognized arguments: config/breakout.json
Add a config file using '--config file_name.json'

or if I run:

(gym) teves@teves:~/A2C$ python main.py --config config/breakout.json
Add a config file using '--config file_name.json'

How shall I run this?

mg2033 / a2c Goto Github PK

a2c's People

Contributors

Stargazers

Watchers

Forkers

a2c's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs