GithubHelp home page GithubHelp logo

[action_mask error] about lightzero HOT 6 CLOSED

opendilab avatar opendilab commented on May 22, 2024
[action_mask error]

from lightzero.

Comments (6)

lewis841214 avatar lewis841214 commented on May 22, 2024 1

Hi I run the following file to test
python3 ./zoo/box2d/lunarlander/config/lunarlander_disc_gumbel_muzero_config.py

I set the polocy dict as:
env_type='not_board_games',
action_type = 'varied_action_space',

and set
action_mask[0] = 0
in env file.

If I didn't set action_type = 'varied_action_space', the error mentioned above still occur. But after setting action_type = 'varied_action_space', the error dissapeared, and the reward does increase through training step increases.

But some weired part remain: the complete value will become -inf throughout the training process as:

[12-04 22:47:17] INFO collect end: muzero_collector.py:729
episode_count: 16
envstep_count: 1248
avg_envstep_per_episode: 78.0
avg_envstep_per_sec: 224.66893464049363
avg_episode_per_sec: 2.8803709569294056
collect_time: 5.554840067217127
reward_mean: -373.18820628837983
reward_std: 212.4814746359075
reward_max: -135.52152484840005
reward_min: -788.0163128857876
total_envstep_count: 1248
total_episode_count: 32
total_duration: 14.115308258408682
visit_entropy: 1.3028649394086207
completed_value: -inf
[12-04 22:47:17] WARNING NaN or Inf found in input tensor. x2num.py:14
[12-04 22:47:17] WARNING NaN or Inf found in input tensor.

Doesn't know whether this is an issue?

By the way I've a question:
Screenshot from 2023-12-04 23-37-12

Today before you upload the fixed version pull request, I am checking the same place and tried to fix this bug. I did the exactly same thing as you did but I just use the "else" part to run the code.
But the weired thing is :
the else part in the picture I capture, the code inside it is independent to
state_index and current_index
so they just keep on producing same thing? I.e. the variable "target_policies" keeps appending same thing?

Thanks for your reply!!

from lightzero.

puyuan1996 avatar puyuan1996 commented on May 22, 2024 1

Hello, indeed, after following your modifications, we did encounter this issue. We are currently investigating the cause and searching for a solution. Thank you for your patience and feedback.

from lightzero.

puyuan1996 avatar puyuan1996 commented on May 22, 2024

Heelo, I understand your concerns. In previous versions, we did not specifically test for scenarios where the action_mask in not_board_games contains zeros. However, theoretically, our handling of variable action spaces should be extendable to not_board_games. Therefore, we have proposed to expand the original env_type into two variables: env_type and action_type.

In our latest PR #160, we have implemented and optimized this adjustment. We warmly invite you to review and test these modifications. Thank you for your valuable feedback, it is greatly appreciated and beneficial for the advancement of LightZero. Best wishes!

from lightzero.

puyuan1996 avatar puyuan1996 commented on May 22, 2024

Thank you for your feedback.

  • Regarding the issue of encountering completed_value: -inf when running lunarlander_disc_gumbel_muzero_config.py, I would like to confirm, did you only use the default configuration and make no additional modifications? Did this problem arise at the very beginning of the program execution? On my macOS system, I executed 30K environment steps and did not encounter a similar issue. In order to pinpoint the problem more accurately, please provide more detailed information.

  • About your observation that the code segment does not use state_index and current_index, this is because our goal here is to transform the visit count distribution obtained from MCTS search into target_policies that comply with a specific data format. This is mainly accomplished through distributions = roots_distributions[policy_index] and policy_index += 1. I acknowledge that there is redundancy in this section of the code and there are more efficient implementation methods. We will optimize it in the coming weeks. I greatly appreciate your valuable suggestion.

Best wishes!

from lightzero.

lewis841214 avatar lewis841214 commented on May 22, 2024

Hi I've run
lunarlander_disc_gumbel_muzero_config.py

under the default config with
action_type = 'varied_action_space', added at p.43
and
action_mask[0] = 0 added at p.139 in
LightZero-fix-action-mask/zoo/box2d/lunarlander/envs/lunarlander_env.py

and
completed_value: -inf
occurs.

I guess that you didn't add action_mask[0] = 0 in the env file so all element in action_mask = 1. If I don't put action_mask[0] = 0 into the env file, the error won't occur neither, but this is not what we want right? Since this enhancement is created for some action is masked as 0.

Thanks!

from lightzero.

karroyan avatar karroyan commented on May 22, 2024

Hi, this problem occurs because masked actions were not handled properly in gumbel muzero collecting. Now the error has been solved in #178 . We welcome you to review the change and test if it resolves the problem you were facing. Please let us know if you have any other questions or feedback. Thank you for reporting this issue!

from lightzero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.