Comments (6)
Hi I run the following file to test
python3 ./zoo/box2d/lunarlander/config/lunarlander_disc_gumbel_muzero_config.py
I set the polocy dict as:
env_type='not_board_games',
action_type = 'varied_action_space',
and set
action_mask[0] = 0
in env file.
If I didn't set action_type = 'varied_action_space', the error mentioned above still occur. But after setting action_type = 'varied_action_space', the error dissapeared, and the reward does increase through training step increases.
But some weired part remain: the complete value will become -inf throughout the training process as:
[12-04 22:47:17] INFO collect end: muzero_collector.py:729
episode_count: 16
envstep_count: 1248
avg_envstep_per_episode: 78.0
avg_envstep_per_sec: 224.66893464049363
avg_episode_per_sec: 2.8803709569294056
collect_time: 5.554840067217127
reward_mean: -373.18820628837983
reward_std: 212.4814746359075
reward_max: -135.52152484840005
reward_min: -788.0163128857876
total_envstep_count: 1248
total_episode_count: 32
total_duration: 14.115308258408682
visit_entropy: 1.3028649394086207
completed_value: -inf
[12-04 22:47:17] WARNING NaN or Inf found in input tensor. x2num.py:14
[12-04 22:47:17] WARNING NaN or Inf found in input tensor.
Doesn't know whether this is an issue?
Today before you upload the fixed version pull request, I am checking the same place and tried to fix this bug. I did the exactly same thing as you did but I just use the "else" part to run the code.
But the weired thing is :
the else part in the picture I capture, the code inside it is independent to
state_index and current_index
so they just keep on producing same thing? I.e. the variable "target_policies" keeps appending same thing?
Thanks for your reply!!
from lightzero.
Hello, indeed, after following your modifications, we did encounter this issue. We are currently investigating the cause and searching for a solution. Thank you for your patience and feedback.
from lightzero.
Heelo, I understand your concerns. In previous versions, we did not specifically test for scenarios where the action_mask
in not_board_games
contains zeros
. However, theoretically, our handling of variable action spaces should be extendable to not_board_games
. Therefore, we have proposed to expand the original env_type
into two variables: env_type
and action_type
.
In our latest PR #160, we have implemented and optimized this adjustment. We warmly invite you to review and test these modifications. Thank you for your valuable feedback, it is greatly appreciated and beneficial for the advancement of LightZero. Best wishes!
from lightzero.
Thank you for your feedback.
-
Regarding the issue of encountering
completed_value: -inf
when runninglunarlander_disc_gumbel_muzero_config.py
, I would like to confirm, did you only use the default configuration and make no additional modifications? Did this problem arise at the very beginning of the program execution? On my macOS system, I executed30K
environment steps and did not encounter a similar issue. In order to pinpoint the problem more accurately, please provide more detailed information. -
About your observation that the code segment does not use
state_index
andcurrent_index
, this is because our goal here is to transform the visit count distribution obtained from MCTS search intotarget_policies
that comply with a specific data format. This is mainly accomplished throughdistributions = roots_distributions[policy_index]
andpolicy_index += 1
. I acknowledge that there is redundancy in this section of the code and there are more efficient implementation methods. We will optimize it in the coming weeks. I greatly appreciate your valuable suggestion.
Best wishes!
from lightzero.
Hi I've run
lunarlander_disc_gumbel_muzero_config.py
under the default config with
action_type = 'varied_action_space',
added at p.43
and
action_mask[0] = 0
added at p.139 in
LightZero-fix-action-mask/zoo/box2d/lunarlander/envs/lunarlander_env.py
and
completed_value: -inf
occurs.
I guess that you didn't add action_mask[0] = 0
in the env file so all element in action_mask = 1. If I don't put action_mask[0] = 0
into the env file, the error won't occur neither, but this is not what we want right? Since this enhancement is created for some action is masked as 0.
Thanks!
from lightzero.
Hi, this problem occurs because masked actions were not handled properly in gumbel muzero collecting. Now the error has been solved in #178 . We welcome you to review the change and test if it resolves the problem you were facing. Please let us know if you have any other questions or feedback. Thank you for reporting this issue!
from lightzero.
Related Issues (20)
- No module named 'lzero.worker.gumbel_muzero_collector' HOT 1
- gumbel_muzero error HOT 1
- Installation fails on MacBook M1 Pro HOT 6
- alphazero MCTS not working: cannot import mcts_alphazero HOT 4
- Confusion between "battle_mode" and "mcts_mode" HOT 2
- AttributeError: 'EasyDict' object has no attribute 'replay_path_gif' HOT 2
- Is there a missing .gitmodules file? HOT 2
- A typo in the comment of _ucb_score HOT 2
- Sampled MuZero and Sampled EfficientZero HOT 3
- Default lunar lander settings result in RuntimeError during model evaluation HOT 2
- Bipedal continuous discretized sampled efficientzero config error HOT 2
- Tensors on different devices when using GPU (SampledEfficientZeroPolicy) HOT 2
- gomoku muzero self play train problem HOT 1
- `SampledEfficientZeroModel` does not pass `lstm_hidden_size` through `DynamicsNetwork` HOT 2
- Potentially mishandled continuous action space shape HOT 4
- Question about gumbel_scale and dirichlet noise in Gumbel MuZero HOT 1
- Does `downsample = True` lead to masking input data? HOT 1
- JAX support HOT 3
- how to help ai learn faster and better HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightzero.