Comments (5)
Greetings, EfficientZero is primarily designed to enhance sample efficiency in environments with image-based inputs. As board games typically do not rely on image inputs, the performance gains from employing EfficientZero in such contexts might not be particularly significant, which is why we have not previously provided configurations for board games. However, if you are interested in exploring the performance of EfficientZero in board game settings, we have now provided a configuration example for TicTacToe in #204. Should you have any questions or wish to engage in further discussion, please feel free to reach out to us at any time.
from lightzero.
i see, if i can manage to collect some data, i will share them.
in the meantime, i have been trying to understand various configurations, i am not sure this is a bug, but it does look like one to me.
LightZero/lzero/policy/muzero.py
Line 344 in 29c9afd
phi_transoforms are applied regardless of muzero being initialized with categorical rewards or not. this entails that when muzero is used categorical rewards turned off, it fails the reward. I did so by passing categorical_distribution=False to the model dict in the config file of bot tictactoe.
maybe i am missing something about how to use them. It is unclear to me why one should prefer categorical rewards when using a single float as a reward.
from lightzero.
furthemore, i tried change this line of code into the tictactoe env
to
reward = np.array(float(winner == -1)).astype(np.float32)
with the intention of seeing how long would it take to muzero to learn to aim for always drawing the game, in the vs bot version of the setup.
when i did so, it learned something, but after 153.000 steps and 90 minutes of work it did not managed perfectly learn to do so. It this intended? I understand that muzero is a complex model, but this should not be particularly harder to learn than the always winning version.
from lightzero.
maybe i am missing something about how to use them. It is unclear to me why one should prefer categorical rewards when using a single float as a reward.
You can find a detailed analysis in the following papers: "Improving Regression Performance with Distributional Losses" (ICML 2018), "Observe and Look Further: Achieving Consistent Performance on Atari" (2018), and "Stop Regressing: Training Value Functions via Classification for Scalable Deep RL" (2024). These studies indicate that the primary advantage of adopting a categorical distribution is the ability to maintain more stable gradients in the face of noisy target variables and non-constant characteristics. Such stability is a key factor for performance and scalability, which is why LightZero has this option enabled by default.
from lightzero.
but this should not be particularly harder to learn than the always winning version.
Hello, could you please provide the configuration file for your agent as well as the complete TensorBoard log files? This would be beneficial for our in-depth analysis. Additionally, it is advisable to save some replay data from the training process, so that we can observe the learning behaviors and evolution of the agent.
from lightzero.
Related Issues (20)
- How to config to use multi GPUs to train a model ? HOT 19
- Question: How can I set up a custom environment? HOT 3
- AlphaZero for Single Player HOT 2
- Error with Convolutional Input in Example Environments (game_2048) HOT 2
- connect 4 setup bugs HOT 1
- Inconsistency Between Episode Counts in MuZero HOT 1
- LightZero on HPC and other questions HOT 3
- About Replicating SampledZero Performance in the Hopper-V3 Environment HOT 2
- Multi GPU EfficientZero import failure "No module named ding" HOT 1
- Multi-GPU issue
- The multi-GPU issue HOT 1
- When will Go be supported? HOT 1
- the sampled efficient zero portion of the code HOT 2
- Custom environment HOT 1
- Does LightZero currently support compilation on Windows? HOT 2
- Replicating multi-GPU EfficientZero Atari results HOT 1
- how to well model a grid env when it changes frequently? HOT 8
- Minigrid environment HOT 1
- Great!!! HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightzero.