for any game which set the "action_mask" not equal all 1, for example when creating th

Thank you for your feedback. Regarding the issu

[action_mask error] about lightzero HOT 6 CLOSED

opendilab commented on May 22, 2024

[action_mask error]

from lightzero.

Comments (6)

lewis841214 commented on May 22, 2024 1

Hi I run the following file to test
python3 ./zoo/box2d/lunarlander/config/lunarlander_disc_gumbel_muzero_config.py

I set the polocy dict as:
env_type='not_board_games',
action_type = 'varied_action_space',

and set
action_mask[0] = 0
in env file.

If I didn't set action_type = 'varied_action_space', the error mentioned above still occur. But after setting action_type = 'varied_action_space', the error dissapeared, and the reward does increase through training step increases.

But some weired part remain: the complete value will become -inf throughout the training process as:

[12-04 22:47:17] INFO collect end: muzero_collector.py:729
episode_count: 16
envstep_count: 1248
avg_envstep_per_episode: 78.0
avg_envstep_per_sec: 224.66893464049363
avg_episode_per_sec: 2.8803709569294056
collect_time: 5.554840067217127
reward_mean: -373.18820628837983
reward_std: 212.4814746359075
reward_max: -135.52152484840005
reward_min: -788.0163128857876
total_envstep_count: 1248
total_episode_count: 32
total_duration: 14.115308258408682
visit_entropy: 1.3028649394086207
completed_value: -inf
[12-04 22:47:17] WARNING NaN or Inf found in input tensor. x2num.py:14
[12-04 22:47:17] WARNING NaN or Inf found in input tensor.

Doesn't know whether this is an issue?

By the way I've a question:

Today before you upload the fixed version pull request, I am checking the same place and tried to fix this bug. I did the exactly same thing as you did but I just use the "else" part to run the code.
But the weired thing is :
the else part in the picture I capture, the code inside it is independent to
state_index and current_index
so they just keep on producing same thing? I.e. the variable "target_policies" keeps appending same thing?

Thanks for your reply!!

from lightzero.

puyuan1996 commented on May 22, 2024 1

Hello, indeed, after following your modifications, we did encounter this issue. We are currently investigating the cause and searching for a solution. Thank you for your patience and feedback.

from lightzero.

puyuan1996 commented on May 22, 2024

Heelo, I understand your concerns. In previous versions, we did not specifically test for scenarios where the action_mask in not_board_games contains zeros. However, theoretically, our handling of variable action spaces should be extendable to not_board_games. Therefore, we have proposed to expand the original env_type into two variables: env_type and action_type.

In our latest PR #160, we have implemented and optimized this adjustment. We warmly invite you to review and test these modifications. Thank you for your valuable feedback, it is greatly appreciated and beneficial for the advancement of LightZero. Best wishes!

from lightzero.

puyuan1996 commented on May 22, 2024

Thank you for your feedback.

Regarding the issue of encountering completed_value: -inf when running lunarlander_disc_gumbel_muzero_config.py, I would like to confirm, did you only use the default configuration and make no additional modifications? Did this problem arise at the very beginning of the program execution? On my macOS system, I executed 30K environment steps and did not encounter a similar issue. In order to pinpoint the problem more accurately, please provide more detailed information.
About your observation that the code segment does not use state_index and current_index, this is because our goal here is to transform the visit count distribution obtained from MCTS search into target_policies that comply with a specific data format. This is mainly accomplished through distributions = roots_distributions[policy_index] and policy_index += 1. I acknowledge that there is redundancy in this section of the code and there are more efficient implementation methods. We will optimize it in the coming weeks. I greatly appreciate your valuable suggestion.

Best wishes!

from lightzero.

lewis841214 commented on May 22, 2024

Hi I've run
lunarlander_disc_gumbel_muzero_config.py

under the default config with
action_type = 'varied_action_space', added at p.43
and
action_mask[0] = 0 added at p.139 in
LightZero-fix-action-mask/zoo/box2d/lunarlander/envs/lunarlander_env.py

and
completed_value: -inf
occurs.

I guess that you didn't add action_mask[0] = 0 in the env file so all element in action_mask = 1. If I don't put action_mask[0] = 0 into the env file, the error won't occur neither, but this is not what we want right? Since this enhancement is created for some action is masked as 0.

Thanks!

from lightzero.

karroyan commented on May 22, 2024

Hi, this problem occurs because masked actions were not handled properly in gumbel muzero collecting. Now the error has been solved in #178 . We welcome you to review the change and test if it resolves the problem you were facing. Please let us know if you have any other questions or feedback. Thank you for reporting this issue!

from lightzero.

[action_mask error] about lightzero HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs