opendilab / lightzero Goto Github PK

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Home Page: https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal

License: Apache License 2.0

Python 88.00% C++ 8.80% Cython 1.75% Shell 0.20% Makefile 0.08% Dockerfile 0.09% CMake 0.12% Jupyter Notebook 0.97%

alpha-beta-pruning alphazero atari board-games continuous-control gomoku monte-carlo-tree-search muzero pytorch reinforcement-learning

lightzero's Introduction

LightZero

Updated on 2024.06.19 LightZero-v0.0.5

LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL). For any questions about LightZero, you can consult the RAG-based Q&A assistant: ZeroPal.

English | 简体中文(Simplified Chinese) | LightZero Paper | 🔥UniZero Paper | 🔥ReZero Paper

Background

The integration of Monte Carlo Tree Search and Deep Reinforcement Learning, exemplified by AlphaZero and MuZero, has achieved unprecedented performance levels in various games, including Go and Atari. This advanced methodology has also made significant strides in scientific domains like protein structure prediction and the search for matrix multiplication algorithms. The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:

Overview

LightZero is an open-source algorithm toolkit that combines MCTS and RL for PyTorch. It provides support for a range of MCTS-based RL algorithms and applications with the following advantages:

Lightweight.
Efficient.
Easy-to-understand.

For further details, please refer to Features, Framework Structure and Integrated Algorithms.

LightZero aims to promote the standardization of the MCTS+RL algorithm family to accelerate related research and applications. A performance comparison of all implemented algorithms under a unified framework is presented in the Benchmark.

Outline

Overview
Installation
Quick Start
Benchmark
Awesome-MCTS Notes
- Paper Notes
- Algo. Overview
Awesome-MCTS Papers
- Key Papers
- Other Papers
Feedback and Contribution
Citation
Acknowledgments
License

Features

Lightweight: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found here.

Efficient: LightZero uses mixed heterogeneous computing programming to improve computational efficiency for the most time-consuming part of MCTS algorithms.

Easy-to-understand: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found here.

Framework Structure

The above picture is the framework pipeline of LightZero. We briefly introduce the three core modules below:

Model: Model is used to define the network structure, including the __init__ function for initializing the network structure and the forward function for computing the network's forward propagation.

Policy: Policy defines the way the network is updated and interacts with the environment, including three processes: the learning process, the collecting process, and the evaluation process.

MCTS: MCTS defines the structure of the Monte Carlo search tree and the way it interacts with the Policy. The implementation of MCTS includes two languages: Python and C++, implemented in ptree and ctree, respectively.

For the file structure of LightZero, please refer to lightzero_file_structure.

Integrated Algorithms

LightZero is a library with a PyTorch implementation of MCTS algorithms (sometimes combined with cython and cpp), including:

The environments and algorithms currently supported by LightZero are shown in the table below:

Env./Algo.	AlphaZero	MuZero	EfficientZero	Sampled EfficientZero	Gumbel MuZero	Stochastic MuZero	UniZero
TicTacToe	✔	✔	🔒	🔒	✔	🔒	✔
Gomoku	✔	✔	🔒	🔒	✔	🔒	✔
Connect4	✔	✔	🔒	🔒	🔒	🔒	✔
2048	---	✔	🔒	🔒	🔒	✔	✔
Chess	🔒	🔒	🔒	🔒	🔒	🔒	🔒
Go	🔒	🔒	🔒	🔒	🔒	🔒	🔒
CartPole	---	✔	✔	✔	✔	✔	✔
Pendulum	---	✔	✔	✔	✔	✔	🔒
LunarLander	---	✔	✔	✔	✔	✔	✔
BipedalWalker	---	✔	✔	✔	✔	🔒	🔒
Atari	---	✔	✔	✔	✔	✔	✔
MuJoCo	---	✔	✔	✔	🔒	🔒	🔒
MiniGrid	---	✔	✔	✔	🔒	🔒	✔
Bsuite	---	✔	✔	✔	🔒	🔒	✔
Memory	---	✔	✔	✔	🔒	🔒	✔

^{(1): "✔" means that the corresponding item is finished and well-tested.}

^{(2): "🔒" means that the corresponding item is in the waiting-list (Work In Progress).}

^{(3): "---" means that this algorithm doesn't support this environment.}

Installation

You can install the latest LightZero in development from the GitHub source codes with the following command:

git clone https://github.com/opendilab/LightZero.git
cd LightZero
pip3 install -e .

Kindly note that LightZero currently supports compilation only on Linux and macOS platforms. We are actively working towards extending this support to the Windows platform. Your patience during this transition is greatly appreciated.

Installation with Docker

We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries. Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.

Download the Dockerfile: The Dockerfile is located in the root directory of the LightZero repository. Download this file to your local machine.
Prepare the build context: Create a new empty directory on your local machine, move the Dockerfile into this directory, and navigate into this directory. This step helps to avoid sending unnecessary files to the Docker daemon during the build process.
```
mkdir lightzero-docker
mv Dockerfile lightzero-docker/
cd lightzero-docker/
```
Build the Docker image: Use the following command to build the Docker image. This command should be run from inside the directory that contains the Dockerfile.
```
docker build -t ubuntu-py38-lz:latest -f ./Dockerfile .
```
Run a container from the image: Use the following command to start a container from the image in interactive mode with a Bash shell.
```
docker run -dit --rm ubuntu-py38-lz:latest /bin/bash
```
Execute LightZero code inside the container: Once you're inside the container, you can run the example Python script with the following command:
```
python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
```

Quick Start

Train a MuZero agent to play CartPole:

cd LightZero
python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py

Train a MuZero agent to play Pong:

cd LightZero
python3 -u zoo/atari/config/atari_muzero_config.py

Train a MuZero agent to play TicTacToe:

cd LightZero
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py

Customization Documentation

For those interested in customizing environments and algorithms, we provide relevant guides:

Should you have any questions, feel free to contact us for support.

Benchmark

Click to collapse

Below are the benchmark results of AlphaZero and MuZero on three board games: TicTacToe, Connect4, Gomoku.

Below are the benchmark results of MuZero, MuZero w/ SSL , EfficientZero and Sampled EfficientZero on three discrete action space games in Atari.

Below are the benchmark results of Sampled EfficientZero with Factored/Gaussian policy representation on three classic continuous action space games: Pendulum-v1, LunarLanderContinuous-v2, BipedalWalker-v3 and two MuJoCo continuous action space games: Hopper-v3, Walker2d-v3.

"Factored Policy" indicates that the agent learns a policy network that outputs a categorical distribution. After manual discretization, the dimensions of the action space for the five environments are 11, 49 (7^2), 256 (4^4), 64 (4^3), and 4096 (4^6), respectively. On the other hand, "Gaussian Policy" refers to the agent learning a policy network that directly outputs parameters (mu and sigma) for a Gaussian distribution.

Below are the benchmark results of GumbelMuZero and MuZero (under different simulation cost) on four environments: PongNoFrameskip-v4, MsPacmanNoFrameskip-v4, Gomoku, and LunarLanderContinuous-v2.

Below are the benchmark results of StochasticMuZero and MuZero on 2048 environment with varying levels of chance (num_chances=2 and 5).

Below are the benchmark results of various MCTS exploration mechanisms of MuZero w/ SSL in the MiniGrid environment.

Awesome-MCTS Notes

Paper Notes

The following are the detailed paper notes (in Chinese) of the above algorithms:

Click to collapse

You can also refer to the relevant Zhihu column (in Chinese): In-depth Analysis of MCTS+RL Frontier Theories and Applications.

Algo. Overview

The following are the overview MCTS principle diagrams of the above algorithms:

Click to expand

Awesome-MCTS Papers

Here is a collection of research papers about Monte Carlo Tree Search. This Section will be continuously updated to track the frontier of MCTS.

Key Papers

Click to expand

LightZero Implemented series

AlphaGo series

MuZero series

MCTS Analysis

MCTS Application

Other Papers

Click to expand

ICML

Scalable Safe Policy Improvement via Monte Carlo Tree Search 2023
- Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
- Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
- ExpEnv: Gridworld and SysAdmin
Efficient Learning for AlphaZero via Path Consistency 2022
- Dengwei Zhao, Shikui Tu, Lei Xu
- Key: limited amount of self-plays, path consistency (PC) optimality
- ExpEnv: Go, Othello, Gomoku
Visualizing MuZero Models 2021
- Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat
- Key: visualizing the value equivalent dynamics model, action trajectories diverge, two regularization techniques
- ExpEnv: CartPole and MountainCar.
Convex Regularization in Monte-Carlo Tree Search 2021
- Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
- Key: entropy-regularization backup operators, regret analysis, Tsallis etropy,
- ExpEnv: synthetic tree, Atari
Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains 2020
- Johannes Fischer, Ömer Sahin Tas
- Key: Continuous POMDP, Particle Filter Tree, information-based reward shaping, Information Gathering.
- ExpEnv: POMDPs.jl framework
- Code
Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search 2020
- Binghong Chen, Chengtao Li, Hanjun Dai, Le Song
- Key: chemical retrosynthetic planning, neural-based A*-like algorithm, ANDOR tree
- ExpEnv: USPTO datasets
- Code

ICLR

The Update Equivalence Framework for Decision-Time Planning 2024
- Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
- Key: imperfect-information games, search, decision-time planning, update equivalence
- ExpEnv: Hanabi, 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe
Efficient Multi-agent Reinforcement Learning by Planning 2024
- Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang
- Key: multi-agent reinforcement learning, planning, multi-agent MCTS
- ExpEnv: SMAC, LunarLander, MuJoCo, and Google Research Football
Become a Proficient Player with Limited Data through Watching Pure Videos 2023
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
- Key: pre-training from action-free videos, forward-inverse cycle consistency (FICC) objective based on vector quantization, pre-training phase, fine-tuning phase.
- ExpEnv: Atari
Policy-Based Self-Competition for Planning Problems 2023
- Jonathan Pirnay, Quirin Göttl, Jakob Burger, Dominik Gerhard Grimm
- Key: self-competition, find strong trajectories by planning against possible strategies of its past self.
- ExpEnv: Traveling Salesman Problem and the Job-Shop Scheduling Problem.
Explaining Temporal Graph Models through an Explorer-Navigator Framework 2023
- Wenwen Xia, Mincai Lai, Caihua Shan, Yao Zhang, Xinnan Dai, Xiang Li, Dongsheng Li
- Key: Temporal GNN Explainer, an explorer to find the event subsets with MCTS, a navigator that learns the correlations between events and helps reduce the search space.
- ExpEnv: Wikipedia and Reddit, Synthetic datasets
SpeedyZero: Mastering Atari with Limited Data and Time 2023
- Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
- Key: distributed RL system, Priority Refresh, Clipped LARS
- ExpEnv: Atari
Efficient Offline Policy Optimization with a Learned Model 2023
- Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
- Key: Regularized One-Step Model-based algorithm for Offline-RL
- ExpEnv: Atari，BSuite
- Code
Enabling Arbitrary Translation Objectives with Adaptive Tree Search 2022
- Wang Ling, Wojciech Stokowiec, Domenic Donato, Chris Dyer, Lei Yu, Laurent Sartran, Austin Matthews
- Key: adaptive tree search, translation models, autoregressive models,
- ExpEnv: Chinese–English and Pashto–English tasks from WMT2020, German–English from WMT2014
What's Wrong with Deep Learning in Tree Search for Combinatorial Optimization 2022
- Maximili1an Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, Tobias Friedrich
- Key: combinatorial optimization, open-source benchmark suite for the NP-hard maximum independent set problem, an in-depth analysis of the popular guided tree search algorithm, compare the tree search implementations to other solvers
- ExpEnv: NP-hard MAXIMUM INDEPENDENT SET.
- Code
Monte-Carlo Planning and Learning with Language Action Value Estimates 2021
- Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim
- Key: Monte-Carlo tree search with language-driven exploration, locally optimistic language value estimates.
- ExpEnv: Interactive Fiction (IF) games
Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design 2021
- Xiufeng Yang, Tanuj Kr Aasawat, Kazuki Yoshizoe
- Key: massively parallel Monte-Carlo Tree Search, molecular design, Hash-driven parallel search,
- ExpEnv: octanol-water partition coefficient (logP) penalized by the synthetic accessibility (SA) and large Ring Penalty score.
Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search 2020
- Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
- Key: parallel Monte-Carlo Tree Search, partition the tree into sub-trees efficiently, compare the observation ratio of each processor.
- ExpEnv: speedup and performance comparison on JOY-CITY game, average episode return on atari game
- Code
Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees 2020
- Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song
- Key: meta path planning algorithm, exploits a novel neural architecture which can learn promising search directions from problem structures.
- ExpEnv: a 2d workspace with a 2 DoF (degrees of freedom) point robot, a 3 DoF stick robot and a 5 DoF snake robot

NeurIPS

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios 2023
- Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
- Key: the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
- ExpEnv: ClassicControl, Box2D, Atari, MuJoCo, GoBigger, MiniGrid, TicTacToe, ConnectFour, Gomoku, 2048, etc.
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning 2023
- Zirui Zhao, Wee Sun Lee, David Hsu
- Key: world model (LLM) and the LLM-induced policy can be combined in MCTS, to scale up task planning.
- ExpEnv: multiplication, travel planning, object rearrangement
Monte Carlo Tree Search with Boltzmann Exploration 2023
- Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda
- Key: Boltzmann exploration with MCTS, optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective, two improved algorithms.
- ExpEnv: the Frozen Lake environment, the Sailing Problem, Go
Generalized Weighted Path Consistency for Mastering Atari Games 2023
- Dengwei Zhao, Shikui Tu, Lei Xu
- Key: Generalized Weighted Path Consistency, A weighting mechanism.
- ExpEnv: Atari
Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction 2023
- Yangqing Fu, Ming Sun, Buqing Nie, Yue Gao
- Key: probability tree state abstraction, transitivity and aggregation error bound
- ExpEnv: Atari, CartPole, LunarLander, Gomoku
Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions 2022
- Weirui Ye, Pieter Abbeel, Yang Gao
- Key: trade off computation versus performancem, virtual expansions, spend thinking time adaptively.
- ExpEnv: Atari, 9x9 Go
Planning for Sample Efficient Imitation Learning 2022
- Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
- Key: Behavioral Cloning，Adversarial Imitation Learning (AIL)，MCTS-based RL.
- ExpEnv: DeepMind Control Suite
- Code
Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex 2022
- Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman
- Key: AlphaZero’s internal representations, model probing and behavioral tests, how these concepts are captured in the network.
- ExpEnv: Hex
Are AlphaZero-like Agents Robust to Adversarial Perturbations? 2022
- Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, 4 Cho-Jui Hsieh
- Key: adversarial states, first adversarial attack on Go AIs.
- ExpEnv: Go
Monte Carlo Tree Descent for Black-Box Optimization 2022
- Yaoguang Zhai, Sicun Gao
- Key: Black-Box Optimization, how to further integrate samplebased descent for faster optimization.
- ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization 2022
- Lei Song∗ , Ke Xue∗ , Xiaobin Huang, Chao Qian
- Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
- ExpEnv: NAS-bench problems and MuJoCo locomotion
Monte Carlo Tree Search With Iteratively Refining State Abstractions 2021
- Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
- Key: stochastic environments, Progressive widening, abstraction refining
- ExpEnv: Blackjack, Trap, five by five Go.
Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess 2021
- Gregory Clark
- Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
- ExpEnv: reconnaissance blind chess
POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis 2020
- Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba¸sar
- Key: continuous state-action spaces, Hierarchical Optimistic Optimization.
- ExpEnv: CartPole, Inverted Pendulum, Swing-up, and LunarLander.
Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search 2020
- Linnan Wang, Rodrigo Fonseca, Yuandong Tian
- Key: learns the partition of the search space using a few samples, a nonlinear decision boundary and learns a local model to pick good candidates.
- ExpEnv: MuJoCo locomotion tasks, Small-scale Benchmarks,
Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions 2020
- Matthew Faw, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, Sanjay Shakkottai
- Key: covariate shift problem, Mix&Match combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions)
- Code

Other Conference or Journal

Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search AAAI 2021.
On Monte Carlo Tree Search and Reinforcement Learning Journal of Artificial Intelligence Research 2017.
Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.

Feedback and Contribution

File an issue on Github
Open or participate in our discussion forum
Discuss on LightZero discord server
Contact our email ([email protected])
We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.

Citation

@article{niu2024lightzero,
  title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
  author={Niu, Yazhe and Pu, Yuan and Yang, Zhenjie and Li, Xueyan and Zhou, Tong and Ren, Jiyuan and Hu, Shuai and Li, Hongsheng and Liu, Yu},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

@article{pu2024unizero,
  title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
  author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
  journal={arXiv preprint arXiv:2406.10667},
  year={2024}
}

@article{xuan2024rezero,
  title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
  author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
  journal={arXiv preprint arXiv:2404.16364},
  year={2024}
}

Acknowledgments

This project has been developed partially based on the following pioneering works on GitHub repositories. We express our profound gratitude for these foundational resources:

We would like to extend our special thanks to the following contributors @PaParaZz1, @karroyan, @nighood, @jayyoung0802, @timothijoe, @TuTuHuss, @HarryXuancy, @puyuan1996, @HansBug for their valuable contributions and support to this algorithm library.

Thanks to all who contributed to this project:

License

All code within this repository is under Apache License 2.0.

(Back to top)

lightzero's People

Contributors

Stargazers

Watchers

Forkers

paparazz1 timothijoe lliai cra2ydavid yang0110 hbyido karroyan iphyer logms mistobaan yibit puyuan1996 xianglunkai jackory contropist jayyoung0802 emrul runchglaceon nighood dan-jacobson smallboom liamdgray newablesys standardgalactic keyman9848 aalonso99 harryxuancy mlshenkai lcmaier conerwei fyq0919 redmie leejwuniverse liuxing9848 gohsyi neilsa012 lujingjun hudhfo eltociear prajjwalyd suravshresth kaiwen-hong 0armaan025 kushal34712 pentesterpriyanshu rs-labhub mohitd404 7gao lewis841214 natithan zjowowen davidkaczer ekiefl artemkolmykov tothemoon96 bluevelvetsackofgoldpotatoes nkepling ard-skelling mofamanz valkryhx hyliu1994 lunathanael kunni918 josephdenman albinjal ocw-university selfsim smanolloff r33drichards rdancer calcu-dev da-ev ishaanharry zaozzz1 chevolier depresivna-ryza zbeucler2018 ppuyuj orrkrup jakemanger h-yanagawa avi9700 shengsuosi tokarev-i-v cosmichazel

lightzero's Issues

Errors on new install

After following the steps here by installing the DI dependency and running, on a new install, python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py I now get:

/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/core.py:268: DeprecationWarning: WARN: Function `env.seed(seed)` is marked as deprecated and will be removed in the future. Please use `env.reset(seed=seed)` instead.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/core.py:268: DeprecationWarning: WARN: Function `env.seed(seed)` is marked as deprecated and will be removed in the future. Please use `env.reset(seed=seed)` instead.
  deprecation(
Traceback (most recent call last):
  File "/home/user/py/LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py", line 93, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/home/user/py/LightZero/lzero/entry/train_muzero.py", line 158, in train_muzero
    new_data = collector.collect(train_iter=learner.train_iter, policy_kwargs=collect_kwargs)
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 383, in collect
    policy_output = self._policy.forward(stack_obs, action_mask, temperature, to_play, epsilon)
  File "/home/user/py/LightZero/lzero/policy/muzero.py", line 520, in _forward_collect
    network_output = self._collect_model.initial_inference(data)
  File "/home/user/py/LightZero/lzero/model/muzero_model_mlp.py", line 170, in initial_inference
    latent_state = self._representation(obs)
  File "/home/user/py/LightZero/lzero/model/muzero_model_mlp.py", line 218, in _representation
    latent_state = self.representation_network(observation)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/py/LightZero/lzero/model/common.py", line 280, in forward
    return self.fc_representation(x)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception ignored in: <function MuZeroCollector.__del__ at 0x7f59b7722310>
Traceback (most recent call last):
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 181, in __del__
    self.close()
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 171, in close
    self._env.close()
  File "/home/user/py/DI-engine/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
    p.send(['close', None, None])
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <function MuZeroEvaluator.__del__ at 0x7f59b7722af0>
Traceback (most recent call last):
  File "/home/user/py/LightZero/lzero/worker/muzero_evaluator.py", line 170, in __del__
    self.close()
  File "/home/user/py/LightZero/lzero/worker/muzero_evaluator.py", line 160, in close
    self._env.close()
  File "/home/user/py/DI-engine/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
    p.send(['close', None, None])
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

__emplace_back still causes compilation issues on m1

Hi y'all,

I'm running clang 14.0.3 on m1 pro, and instillation fails because of a compilation error in:
lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp
specifically line 384:
disc_action_with_probs.__emplace_back(std::make_pair(iter, disturbed_probs[iter]));

When i remove the dunder from emplace_back everything is happy and works fine. Working line:

disc_action_with_probs.emplace_back(std::make_pair(iter, disturbed_probs[iter]));

Details:
2021 M1 Pro
MacOS 13.4.1
Apple clang version 14.0.3 (clang-1403.0.22.14.1)

No module named 'lzero.worker.gumbel_muzero_collector'

When trying to run gumbel_muzero:
python3 ./zoo/board_games/tictactoe/config/tictactoe_gumbel_muzero_bot_mode_config.py
on the main branch, this error pop out.
And it seems that there's no 'gumbel_muzero_collector" under "/zero/worker/".

Thanks!

How to draw train curve

Hello!
I am glad to see such a awesome open source project.
I would like to plot a training curve like the following graph,but I have no idea how to do this, can you please tell me how to do this?
I would be very grateful

A few questions about starting out and the Chess environment?

I'm very interested in using LightZero's chess environment to train some RL chess agents--however, I'm running into a few issues just setting it up. I've installed the module correctly (I think), and have the following initial code:

from LightZero.zoo.board_games.chess.envs.chess_env import ChessEnv
from LightZero.lzero.envs.wrappers.lightzero_env_wrapper import LightZeroEnvWrapper
from telnetlib import Telnet # This is just to silence a warning thrown when I don't have this line in here

env = ChessEnv()
lzero_env = LightZeroEnvWrapper(env, {'is_train': True})

This code throws an AssertionError, specifically:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[12], line 2
      1 env = ChessEnv()
----> 2 lzero_env = LightZeroEnvWrapper(env, {'is_train': True})

File [e:\000_work\ChessAI\LightZero\lzero\envs\wrappers\lightzero_env_wrapper.py:31](file:///E:/000_work/ChessAI/LightZero/lzero/envs/wrappers/lightzero_env_wrapper.py:31), in LightZeroEnvWrapper.__init__(self, env, cfg)
     29 super().__init__(env)
     30 assert 'is_train' in cfg, '`is_train` flag must set in the config of env'
---> 31 self.is_train = cfg.is_train
     32 self.cfg = cfg
     33 self.env_name = cfg.env_name

AttributeError: 'dict' object has no attribute 'is_train'

I set cfg to a dictionary because initially when I ran it with no positional argument (or None) for the cfg, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 2
      1 env = ChessEnv()
----> 2 lzero_env = LightZeroEnvWrapper(env, None)

File [e:\000_work\ChessAI\LightZero\lzero\envs\wrappers\lightzero_env_wrapper.py:30](file:///E:/000_work/ChessAI/LightZero/lzero/envs/wrappers/lightzero_env_wrapper.py:30), in LightZeroEnvWrapper.__init__(self, env, cfg)
     22 """
     23 Overview:
     24     Initialize ``self.`` See ``help(type(self))`` for accurate signature;  \
   (...)
     27     - env (:obj:`gym.Env`): the environment to wrap.
     28 """
     29 super().__init__(env)
---> 30 assert 'is_train' in cfg, '`is_train` flag must set in the config of env'
     31 self.is_train = cfg.is_train
     32 self.cfg = cfg

TypeError: argument of type 'NoneType' is not iterable

What is cfg supposed to be, class or iterable (or both somehow)? I can't find much in the source code. In addition, there seem to be parts of the chess environment itself that are WIP (ie the chess_alphazero_sp-mode_config.py file in the chess config folder)--should I be cognizant of that when I'm writing my MCTS RL agents? Thanks in advance

Leaking illegal actions in SampledEfficientZero

When using SampledEfficientZero implementation there is leaking Illegal Actions in node expand on roots if amount of legal actions is less than number of sampled actions.

In this line:
sampled_actions = torch.multinomial(prob, self.num_of_sampled_actions, replacement=False)
If size of non-zero elements in prob is less than self.num_of_sampled_actions torch.multinomial chooses the actions 1, 2, 3, ... until the amount is equal to self.num_of_sampled_actions.

Which then can lead to model choosing illegal actions, failing of assertions and other negative things. This is particulary problematic when using SampledEfficientZero on tasks with finite amount of actions that can decrease like board games.

training not use GPU full perfermance , is just spikes

i have everything installed correctly but i not understand why is not use full power of GPU

i have some warning but i dont thing is the case ?

installation is done correctly and nvidia already used for other ai training and it work with full perfermence

please how i can solve this problem , without full use of GPU trainning is take too much time !!

Support for dictionaries as observations?

Does LightZero provide support for complex observation types, such as dictionaries of multiple tensors? If not, where should the change be made? Right now I'm thinking:

hope that all tensors in the dictionary are of the same type
join them and flatten the result
let LZero do its thing
unflatten the flattened tensor merge inside a custom model implementation (override model class)

An easy way to visualize the agent ?

Hello,

I'm looking for a way to actually display what the agent does once trained. I am a bit lost in the MuzeroEvaluator class for instance. Could you provide guidance so I can try and implement some env.render there ? I'll submit a pull request if successful afterwards.
Thanks a lot !

Default lunar lander settings result in RuntimeError during model evaluation

Description

When running lunarlander_cont_sampled_efficientzero_config.py without modification, and then lunarlander_eval.py, I get the following error:

/Users/evan/anaconda3/envs/pooltool_ml/lib/python3.8/site-packages/gym/core.py:329: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/Users/evan/anaconda3/envs/pooltool_ml/lib/python3.8/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
Traceback (most recent call last):
  File "/Users/evan/Software/pooltool_ml/LightZero/zoo/box2d/lunarlander/entry/lunarlander_eval.py", line 58, in <module>
    returns_mean, returns = eval_muzero(
  File "/Users/evan/Software/pooltool_ml/LightZero/lzero/entry/eval_muzero.py", line 61, in eval_muzero
    policy.learn_mode.load_state_dict(torch.load(model_path, map_location=cfg.policy.device))
  File "/Users/evan/Software/pooltool_ml/LightZero/lzero/policy/muzero.py", line 774, in _load_state_dict_learn
    self._learn_model.load_state_dict(state_dict['model'])
  File "/Users/evan/anaconda3/envs/pooltool_ml/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MuZeroModelMLP:
	Missing key(s) in state_dict: "prediction_network.fc_policy_head.0.weight", "prediction_network.fc_policy_head.0.bias", "prediction_network.fc_policy_head.1.weight", "prediction_network.fc_policy_head.1.bias", "prediction_network.fc_policy_head.1.running_mean", "prediction_network.fc_policy_head.1.running_var", "prediction_network.fc_policy_head.3.weight", "prediction_network.fc_policy_head.3.bias".
	Unexpected key(s) in state_dict: "dynamics_network.lstm.weight_ih_l0", "dynamics_network.lstm.weight_hh_l0", "dynamics_network.lstm.bias_ih_l0", "dynamics_network.lstm.bias_hh_l0", "prediction_network.fc_policy_head.main.0.weight", "prediction_network.fc_policy_head.main.0.bias", "prediction_network.fc_policy_head.main.2.weight", "prediction_network.fc_policy_head.main.2.bias", "prediction_network.fc_policy_head.mu.weight", "prediction_network.fc_policy_head.mu.bias", "prediction_network.fc_policy_head.log_sigma_layer.weight", "prediction_network.fc_policy_head.log_sigma_layer.bias".
	size mismatch for dynamics_network.fc_dynamics_1.0.weight: copying a param with shape torch.Size([256, 258]) from checkpoint, the shape in current model is torch.Size([256, 260]).

Steps to reproduce

Run the continuous sampled alg:

lz=<YOUR_LIGHTZERO_DIR>
cd $lz
python zoo/box2d/lunarlander/config/lunarlander_cont_sampled_efficientzero_config.py

Wait until the iteration 0 checkpoint is created, then CTRL+C.

Change directory into the output:

cd data_sez_ctree/lunarlander_cont_sampled_efficientzero_k20_ns50_upc200_rr0.0_seed0

Change $lz/zoo/box2d/lunarlander/entry/lunarlander_eval.py with the following edits:

    model_path = './ckpt/iteration_0.pth.tar'
    #model_path = None

Then run the evaluation:

python $lz/zoo/box2d/lunarlander/entry/lunarlander_eval.py

You should receive the above error.

I dream of agent playing 2048 using this library~

May it come true?

CUDA_ERROR_NOT_INITIALIZED when trying to use a GPU accelerated environment

Hi, very impressive work, however when I try to use a GPU accelerated env I got error like CUDA_ERROR_NOT_INITIALIZED.

The env I used is accelerated by tensorflow and it works pretty well seperately. After a few troubleshoots I noticed that ur work somehow use multiprocess to accelerate data collection on env, therefore I believe the error roots from the confilct between multiprocess and CUDA initialization, I'm wonder if your guys could provide some info to help me out.

thx a lot
here is my code

import eveything
@ENV_REGISTRY.register('myenv')
class Myenv(BaseEnv):
    def __init__(self, cfg: dict) -> None:
       #doing something related to gpu mainly initialization
    def reset(self):
        #doing reset
        lightzero_obs_dict = {'observation': obs, 'action_mask': self.action_mask, 'to_play': -1}
        return lightzero_obs_dict
    def step(self, action):
        #doing env step
        return BaseEnvTimestep(lightzero_obs_dict, rew, done, info)
        
    def seed(self, seed: int, dynamic_seed: bool = True) -> None:
        self._seed = seed
        self._dynamic_seed = dynamic_seed
        np.random.seed(self._seed)
    
    def close(self) -> None:
        if self._init_flag:
            pass
        self._init_flag = False

    def __repr__(self) -> str:
        return "My Env"

    @property
    def observation_space(self) -> gym.spaces.Space:
        return self._observation_space

    @property
    def action_space(self) -> gym.spaces.Space:
        return self._action_space
    
    @property
    def reward_space(self) -> gym.spaces.Space:
        return self._reward_space

Unrelated Records During EfficientZero in TensorBoard

Hi, thank you for your great contribution.

I ran the following command:
python3 -u zoo/atari/config/atari_efficientzero_config.py

Afterwards, I checked TensorBoard using the command:
tensorboard --logdir data_ez_ctree

I noticed that there are many unrelated Roland records written. Could you please guide me on how to prevent those results from being recorded?

Bugs caused by some interface adjustments.

After following the README Installation, I run python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py. Some bugs occurs:

Traceback (most recent call last):
  File "/home/jiangyh/code/LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py", line 93, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/home/jiangyh/code/LightZero/lzero/entry/train_muzero.py", line 72, in train_muzero
    policy = create_policy(cfg.policy, model=model, enable_field=['learn', 'collect', 'eval'])
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/policy/base_policy.py", line 344, in create_policy
    return POLICY_REGISTRY.build(cfg.type, cfg=cfg, **kwargs)
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/utils/registry.py", line 96, in build
    raise e
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/utils/registry.py", line 82, in build
    return build_fn(*obj_args, **obj_kwargs)
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/policy/base_policy.py", line 90, in __init__
    model = self._create_model(cfg, model)
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/policy/base_policy.py", line 144, in _create_model
    return create_model(model_cfg)
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/model/common/utils.py", line 18, in create_model
    return MODEL_REGISTRY.build(cfg.pop("type"), **cfg)
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/utils/registry.py", line 96, in build
    raise e
  File "/home/jiangyh/miniconda3/envs/lightzero/lib/python3.9/site-packages/ding/utils/registry.py", line 82, in build
    return build_fn(*obj_args, **obj_kwargs)
  File "/home/jiangyh/code/LightZero/lzero/model/muzero_model_mlp.py", line 104, in __init__
    self.representation_network = RepresentationNetworkMLP(
  File "/home/jiangyh/code/LightZero/lzero/model/common.py", line 260, in __init__
    self.fc_representation = MLP(
TypeError: MLP() got an unexpected keyword argument 'output_norm'

I think this may be caused by code interface adjustment.

alphazero MCTS not working: cannot import mcts_alphazero

Hello,

I try to run the following command
python -u zoo/board_games/tictactoe/config/tictactoe_alphazero_bot_mode_config.py

but it's not working saying
ModuleNotFoundError: No module named 'mcts_alphazero'

Did I miss something obvious to run tictactoe with alphazero?

It seems than the C implementation of the MCTS in alphazero (ctree_alphazero) is different than the one for muzero (ctree_muzero), is it normal? Or it's because this is still under developement? I can run tictactoe with muzero (tictactoe_muzero_bot_mode_config.py) and this look to run normally.

Benefit of MCTS-based RL approaches over PPO, SAC, etc for dynamic robot

Hello, is there any known benefit for the MCTS approaches for dynamic robots? Or what is your take on the matter? Thank you in advance!

Inhomogeneous Shape of GameSegment.child_visit_segment

When I was training with "python -u zoo/board_games/tictactoe/config/tictactoe_muzero_sp_mode_config.py", I encountered an error in the GameSegment.game_segment_to_array() function. The reason for the error was that the self.child_visit_segment could not be converted to np.array due to the different lengths of the lists. What should I do to solve this problem?

how to solve reward dropping after reaching super humain level

how to solve reward dropping after reaching super humain level , or how to save model on this top level , before its start dropping

Upgrade to gymnasium

As gym is no longer maintained, we should upgrade to it's recommended fork gymnasium.

Pip install failing on linux

Hi,

I'm getting the following error when following the installation instruction:

Collecting pygame==2.1.0 (from gym==0.25.1->gym[accept-rom-license]==0.25.1->LightZero==0.0.2)
  Using cached pygame-2.1.0.tar.gz (5.8 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]


      WARNING, No "Setup" File Exists, Running "buildconfig/config.py"
      Using UNIX configuration...

      /bin/sh: 1: sdl2-config: not found
      /bin/sh: 1: sdl2-config: not found
      /bin/sh: 1: sdl2-config: not found
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-jku32nqq/pygame_c061e2c456b84204bbc5fc7abc1e8f5c/setup.py", line 388, in <module>
          buildconfig.config.main(AUTO_CONFIG)
        File "/tmp/pip-install-jku32nqq/pygame_c061e2c456b84204bbc5fc7abc1e8f5c/buildconfig/config.py", line 234, in main
          deps = CFG.main(**kwds)
                 ^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-jku32nqq/pygame_c061e2c456b84204bbc5fc7abc1e8f5c/buildconfig/config_unix.py", line 188, in main
          DependencyProg('SDL', 'SDL_CONFIG', 'sdl2-config', '2.0', ['sdl']),
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-jku32nqq/pygame_c061e2c456b84204bbc5fc7abc1e8f5c/buildconfig/config_unix.py", line 39, in __init__
          self.ver = config[0].strip()
                     ~~~~~~^^^
      IndexError: list index out of range

      Hunting dependencies...

      ---
      For help with compilation see:
          https://www.pygame.org/wiki/Compilation
      To contribute to pygame development see:
          https://www.pygame.org/contribute.html
      ---

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details

My linux details are:

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Any idea how to solve this?

As alternative I also tried just downloading the source code and running it, but that gives me some problem with importing ez_tree in lzero/mcts/tree_search/mcts_ctree.py. Trying to fix that by adding import pyximport; pyximport.install() following this then leads to a compile error.

Thanks in advance!

Training process gets killed due to OOM

Summary of issue

The training process gets killed by the kernel. There is a log in dmesg stating that the reason is "out of memory".

Model: MuZero with self-supervision
Environment: Pong
Architecture is exactly the same as the default one for Atari envs except that:

I am using RGB instead of grayscale (so input to the model is (B, 12, 96, 96) with 4 stacked frames)
I am using a few additional layers in the representation network

The process gets killed after 40k iteration steps (a bit more than 500k environment steps). The Buffer/memory_usage/process log shows that the total memory used starts from 0 and increases a bit faster than linearly to 6e+4, after which the process is killed.

NOTE: I have been able to reproduce the "Quick Start" training run on Pong with the default config. No issue there.

General questions:

Why does the memory used by the process seem to always increase? Is it the replay buffer?
Is there a way to control the memory used from any of the config settings, so that the process does not get killed?

[action_mask error]

for any game which set the "action_mask" not equal all 1, for example when creating the BaseEnv:

    if not self._continuous:
        action_mask = np.ones(self.discrete_action_num, 'int8')
    else:
        action_mask = None
    
    # Here I set the action 2 to be invalid:
    action_mask[2] = 0
    
    obs = {'observation': obs, 'action_mask': action_mask, 'to_play': -1}
    return BaseEnvTimestep(obs, rew, done, info)

Will result in the following error:

Traceback (most recent call last):
File "./zoo/custom/pkgir/config/pjk_disc_gumbel_muzero_config.py", line 93, in
train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
File "/home/LightZero-main/lzero/entry/train_muzero.py", line 174, in train_muzero
train_data = replay_buffer.sample(batch_size, policy)
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 76, in sample
batch_target_policies_non_re = self._compute_target_policy_non_reanalyzed(
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 681, in _compute_target_policy_non_reanalyzed
batch_target_policies_non_re = np.asarray(batch_target_policies_non_re)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (128, 6) + inhomogeneous part.
Exception ignored in: <function MuZeroEvaluator.del at 0x7f8bebff93a0>

After reading the code in game_buffer_muzero around p.661

I found that when
if self._cfg.env_type == 'not_board_games':

The legal_actions isn't processed. But when the case is board game, the legal action is processed.

So I guess the action_mask for not_board_games scenario isn't supported?

What is the easiest way to run experiments on colored Atari environments?

Hi,

For a while, I have been trying to run experiments on colored Atari environments with MuZero (that is when grayscale=False). However, I am experiencing a hard time on changing the configs in zoo/atari to do this. What would be the easiest way to do this?

In particular, I have set grayscale=False in the environment file. But I am wondering how to change the obs_shape and observation_shape parameters in the config file. And, I am also wondering whether if there is anything else that I would need to change?

Best

Can't install on fedora 38: package versions have conflicting dependencies

I didn't manage to install LightZero on fedora 38 in an isolated environment (virtualenv).

This fails with the following error message:

INFO: pip is looking at multiple versions of lightzero to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install di-engine[common-env]==0.4.9 and gym[all]==0.25.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    di-engine[common-env] 0.4.9 depends on ale-py; extra == "common_env"
    gym[all] 0.25.1 depends on ale-py~=0.7.5; extra == "all"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

This can consistently be reproduced with this docker file:

FROM fedora:latest
RUN dnf install -y https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm && dnf -y update && dnf -y install git python3-virtualenv python3-pip python3-devel ffmpeg-libs SDL2-devel SDL2_image-devel SDL2_mixer-devel SDL2_ttf-devel portmidi-devel libavdevice libavc1394-devel zlibrary-devel ccache mesa-libGL mesa-libGL-devel libjpeg-devel && dnf clean all
RUN uname -a
RUN git clone https://github.com/opendilab/LightZero
WORKDIR "/LightZero"
RUN python3 -m venv venv
RUN /LightZero/venv/bin/pip install -e .

Thanks for this amazing work! I plan to develop my own agents using lightzero as building block!

Low resource utilization with default gomoku_alphazero_sp_mode_config.py

I followed the Installation and Quick Start parts in the README.

However, when I tried the default gomoku_alphazero_sp_mode_config.py, it ran with extremely low resource utilization.

I wonder if I have done something wrong. 🤔 Could you help explain this?

Real-time resource utilization

The machine has 24 logical CPU cores and 1x A100 GPU.

Launch command

python -u zoo/board_games/gomoku/config/gomoku_alphazero_sp_mode_config.py

Content of the config

from easydict import EasyDict

# ==============================================================
# begin of the most frequently changed config specified by the user
# ==============================================================
board_size = 6  # default_size is 15
collector_env_num = 32
n_episode = 32
evaluator_env_num = 5
num_simulations = 100
update_per_collect = 50
batch_size = 256
max_env_step = int(1e6)
prob_random_action_in_bot = 0.5
# ==============================================================
# end of the most frequently changed config specified by the user
# ==============================================================
gomoku_alphazero_config = dict(
    exp_name=
    f'data_az_ptree/gomoku_alphazero_sp-mode_rand{prob_random_action_in_bot}_ns{num_simulations}_upc{update_per_collect}_seed0',
    env=dict(
        board_size=board_size,
        battle_mode='self_play_mode',
        bot_action_type='v0',
        prob_random_action_in_bot=prob_random_action_in_bot,
        channel_last=False,  # NOTE
        collector_env_num=collector_env_num,
        evaluator_env_num=evaluator_env_num,
        n_evaluator_episode=evaluator_env_num,
        manager=dict(shared_memory=False, ),
    ),
    policy=dict(
        model=dict(
            observation_shape=(3, board_size, board_size),
            action_space_size=int(1 * board_size * board_size),
            # representation_network_type='conv_res_blocks',  # options={'conv_res_blocks', 'identity'}
            num_res_blocks=1,
            num_channels=32,
        ),
        cuda=True,
        board_size=board_size,
        lr_piecewise_constant_decay=False,
        update_per_collect=update_per_collect,
        batch_size=batch_size,
        optim_type='AdamW',
        learning_rate=0.003,
        weight_decay=0.0001,
        grad_norm=0.5,
        value_weight=1.0,
        entropy_weight=0.0,
        n_episode=n_episode,
        eval_freq=int(2e3),
        num_simulations=num_simulations,
        collector_env_num=collector_env_num,
        evaluator_env_num=evaluator_env_num,
    ),
)

gomoku_alphazero_config = EasyDict(gomoku_alphazero_config)
main_config = gomoku_alphazero_config

gomoku_alphazero_create_config = dict(
    env=dict(
        type='gomoku',
        import_names=['zoo.board_games.gomoku.envs.gomoku_env'],
    ),
    env_manager=dict(type='subprocess'),
    policy=dict(
        type='alphazero',
        import_names=['lzero.policy.alphazero'],
    ),
    collector=dict(
        type='episode_alphazero',
        get_train_sample=False,
        import_names=['lzero.worker.alphazero_collector'],
    ),
    evaluator=dict(
        type='alphazero',
        import_names=['lzero.worker.alphazero_evaluator'],
    )
)
gomoku_alphazero_create_config = EasyDict(gomoku_alphazero_create_config)
create_config = gomoku_alphazero_create_config

if __name__ == '__main__':
    from lzero.entry import train_alphazero
    train_alphazero([main_config, create_config], seed=0, max_env_step=max_env_step)

problem in atari_eval.py

I want to rendering the agent using Atari_eval.py. but I got a problem looks like this

[10-19 11:34:38] WARNING  If you want to use numba to speed up segment tree, please install numba first                                                                      default_helper.py:441
A.L.E: Arcade Learning Environment (version 0.7.5+db37282)
[Powered by Stella]
Traceback (most recent call last):
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/ding/envs/env_manager/base_env_manager.py", line 111, in __init__
    self._observation_space = self._env_ref.observation_space
  File "/remote-home/zzq/13-MCTS-learn/LightZero/zoo/atari/envs/atari_lightzero_env.py", line 154, in observation_space
    return self._observation_space
AttributeError: 'AtariLightZeroEnv' object has no attribute '_observation_space'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/remote-home/zzq/13-MCTS-learn/LightZero/zoo/atari/entry/atari_eval.py", line 31, in <module>
    returns_mean, returns = eval_muzero(
  File "/remote-home/zzq/13-MCTS-learn/LightZero/lzero/entry/eval_muzero.py", line 52, in eval_muzero
    evaluator_env = create_env_manager(cfg.env.manager, [partial(env_fn, cfg=c) for c in evaluator_env_cfg])
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/ding/envs/env_manager/base_env_manager.py", line 528, in create_env_manager
    return ENV_MANAGER_REGISTRY.build(manager_type, env_fn=env_fn, cfg=manager_cfg)
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/ding/utils/registry.py", line 96, in build
    raise e
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/ding/utils/registry.py", line 82, in build
    return build_fn(*obj_args, **obj_kwargs)
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/ding/envs/env_manager/base_env_manager.py", line 120, in __init__
    self._env_ref.reset()
  File "/remote-home/zzq/13-MCTS-learn/LightZero/zoo/atari/envs/atari_lightzero_env.py", line 59, in reset
    self._env = self._make_env()
  File "/remote-home/zzq/13-MCTS-learn/LightZero/zoo/atari/envs/atari_lightzero_env.py", line 55, in _make_env
    return wrap_lightzero(self.cfg, episode_life=self.cfg.episode_life, clip_rewards=self.cfg.clip_rewards)
  File "/remote-home/zzq/13-MCTS-learn/LightZero/zoo/atari/envs/atari_wrappers.py", line 90, in wrap_lightzero
    env = gym.make(config.env_name, render_mode='human')
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/gym/envs/registration.py", line 662, in make
    env = env_creator(**_kwargs)
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/gym/envs/atari/environment.py", line 136, in __init__
    self.seed()
  File "/remote-home/zzq/anaconda3/envs/MCTS/lib/python3.9/site-packages/gym/envs/atari/environment.py", line 196, in seed
    self.ale.loadROM(getattr(roms, self._game))
RuntimeError: Failed to initialize SDL

need help here

requirement torch<=1.12.1,>=1.1.0 (from di-engine[common-env]) (from versions: 2.0.0, 2.0.1)

INFO: pip is looking at multiple versions of di-engine[common-env] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement torch<=1.12.1,>=1.1.0 (from di-engine[common-env]) (from versions: 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch<=1.12.1,>=1.1.0

cannot import name 'ez_tree' from 'lzero.mcts.ctree.ctree_efficientzero' (D:\LightZero-main\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\init.py)

Hello!
When I run cartpole_muzero_config.py ,somethong wrong~ o.O
Do you know how to fix this?

cannot import name 'ez_tree' from 'lzero.mcts.ctree.ctree_efficientzero' (D:\LightZero-main\LightZero-main\lzero\mcts\ctree\ctree_efficientzero_init_.py)

how to separate training environments and evaluation environments

how we can separate training environment and evaluation environment , the goal of doing that is separate the training data and evaluation data during the training.
there is a way to give 2 env on load as train_env and eval_env ?
or something can help me like a variable or method that check if the loaded environment is for eval or train as custom gym env

thank you for helping

Sampled MuZero and Sampled EfficientZero

Hi, thanks for the awesome implementations.

The algorithm "Sampled MuZero" and "Sampled EfficientZero" seemed to occur interchangeably, and I wonder if both of them meant the same algorithm. If so, do they refer to the continuous space version of MuZero (proposed in https://arxiv.org/pdf/2104.06303.pdf), or do they mean the continuous space version of EfficientZero. If not, it seems that I cannot find the Sampled MuZero algorithm in the lzero folder, I wonder where I can find it since you mentioned that it is implemented in your paper. Thanks a lot.

Can't get mujoco example working on colab

Hi there,

I've been trying to this mujoco example working on Google Colab, but unfortunately I seem to get the following error:

You appear to be missing MuJoCo.  We expected to find the file here: /root/.mujoco/mujoco210

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

    https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website

    https://www.roboti.us/index.html

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/ding/envs/env_manager/base_env_manager.py](https://localhost:8080/#) in __init__(self, env_fn, cfg)
    110         try:
--> 111             self._observation_space = self._env_ref.observation_space
    112             self._action_space = self._env_ref.action_space

16 frames

AttributeError: 'MujocoEnvLZ' object has no attribute '_observation_space'


During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/mujoco_py/utils.py](https://localhost:8080/#) in discover_mujoco()
     76         message = MISSING_MUJOCO_MESSAGE.format(mujoco_path)
     77         print(message, file=sys.stderr)
---> 78         raise Exception(message)
     79 
     80     return mujoco_path

Exception: 
You appear to be missing MuJoCo.  We expected to find the file here: /root/.mujoco/mujoco210

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

    https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website

    https://www.roboti.us/index.html

I think mujoco-py (which I believe is deprecated) is being installed as a result of gym - just wondering if it makes sense to move to https://github.com/Farama-Foundation/Gymnasium?

Many thanks for any help (sorry if I've mistaken the cause of the error), and many many thanks for an amazing lib! :)

Tensors on different devices when using GPU (SampledEfficientZeroPolicy)

Description

I've noticed that when using a GPU, I receive the following error when using the sampled efficient zero algorithm:

Traceback (most recent call last):
  File "zoo/box2d/lunarlander/config/lunarlander_cont_sampled_efficientzero_config.py", line 102, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/entry/train_muzero.py", line 185, in train_muzero
    log_vars = learner.train(train_data, collector.envstep)
  File "/home/ubuntu/pooltool-ml/miniconda/envs/pooltool_ml/lib/python3.8/site-packages/ding/worker/learner/base_learner.py", line 165, in wrapper
    ret = fn(*args, **kwargs)
  File "/home/ubuntu/pooltool-ml/miniconda/envs/pooltool_ml/lib/python3.8/site-packages/ding/worker/learner/base_learner.py", line 205, in train
    log_vars = self._policy.forward(data, **policy_kwargs)
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/policy/sampled_efficientzero.py", line 395, in _forward_learn
    policy_loss, policy_entropy, policy_entropy_loss, target_policy_entropy, target_sampled_actions, mu, sigma = self._calculate_policy_loss_cont(
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/policy/sampled_efficientzero.py", line 651, in _calculate_policy_loss_cont
    target_sampled_actions_clamped = torch.clamp(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument min in method wrapper_CUDA_clamp_Tensor)

The error does not occur when using a GPU-less environment.

I have experienced this for 3 different environments: bipedal walker, lunar lander, and my own custom environment.

Steps to reproduce

Make sure you have access to a GPU:

python -c 'import torch ; print("\nIs available: ", torch.cuda.is_available()) ; print("Pytorch CUDA Compiled version: ", torch._C._cuda_getCompiledVersion()) ; print("Pytorch version: ", torch.__version__) ; print("pytorch file: ", torch.__file__) ; num_of_gpus = torch.cuda.device_count(); print("Number of GPUs: ",num_of_gpus)'

For me, I get

Is available:  True
Pytorch CUDA Compiled version:  12010
Pytorch version:  2.1.1+cu121
pytorch file:  /home/ubuntu/pooltool-ml/miniconda/envs/pooltool_ml/lib/python3.8/site-packages/torch/__init__.py
Number of GPUs:  1

Then try either of these runs from the LightZero root:

python zoo/box2d/bipedalwalker/config/bipedalwalker_cont_sampled_efficientzero_config.py
python zoo/box2d/lunarlander/config/lunarlander_cont_sampled_efficientzero_config.py

Environment

$ uname -a
Linux 146-235-206-171 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov  2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ python --version
Python 3.8.10

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

$ nvidia-smi
Tue Dec 12 06:12:19 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10                     On  | 00000000:07:00.0 Off |                    0 |
|  0%   30C    P8              15W / 150W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

$ pip freeze
absl-py==2.0.0
ale-py==0.8.1
appdirs==1.4.4
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.1.0
AutoROM==0.4.2
AutoROM.accept-rom-license==0.6.1
backcall==0.2.0
backports.zoneinfo==0.2.1
bitmath==1.3.3.1
black==23.11.0
boto3==1.33.12
botocore==1.33.12
box2d-py==2.3.5
bsuite==0.3.5
cachetools==5.3.2
cattrs==23.2.3
certifi==2023.11.17
cffi==1.16.0
cfgv==3.4.0
chardet==4.0.0
charset-normalizer==3.3.2
click==7.1.2
cloudpickle==3.0.0
cmake==3.27.9
colored==1.4.4
comm==0.2.0
contourpy==1.1.1
cryptography==41.0.7
cycler==0.12.1
Cython==3.0.6
debugpy==1.8.0
decorator==5.1.1
deprecation==2.1.0
DI-engine==0.5.0
DI-toolkit==0.2.0
DI-treetensor==0.4.1
dill==0.3.7
distlib==0.3.7
dm-env==1.6
dm-tree==0.1.8
docker-pycreds==0.4.0
docutils==0.20.1
easydict==1.9
enum-tools==0.11.0
exceptiongroup==1.2.0
executing==2.0.1
Farama-Notifications==0.0.4
fasteners==0.19
filelock==3.13.1
Flask==1.1.4
fonttools==4.46.0
frozendict==2.3.10
fsspec==2023.12.2
gitdb==4.0.11
GitPython==3.1.40
glfw==2.6.3
google-auth==2.25.2
google-auth-oauthlib==1.0.0
graphviz==0.20.1
grpcio==1.60.0
gym==0.25.1
gym-notices==0.0.8
gymnasium==0.29.1
h5py==3.10.0
hbutils==0.9.3
hickle==5.0.2
huggingface-hub==0.19.4
identify==2.5.33
idna==3.6
imageio==2.33.1
importlib-metadata==7.0.0
importlib-resources==6.1.1
iniconfig==2.0.0
ipykernel==6.27.1
ipython==8.12.3
isort==5.13.1
itsdangerous==1.1.0
jaraco.classes==3.3.0
jedi==0.19.1
jeepney==0.8.0
Jinja2==2.11.3
jmespath==1.0.1
joblib==1.3.2
jupyter_client==8.6.0
jupyter_core==5.5.0
keyring==24.3.0
kiwisolver==1.4.5
lazy_loader==0.3
LightZero==0.0.2
llvmlite==0.41.1
lz4==4.3.2
Markdown==3.5.1
markdown-it-py==3.0.0
MarkupSafe==2.0.1
matplotlib==3.7.4
matplotlib-inline==0.1.6
mdurl==0.1.2
minigrid==2.3.1
mizani==0.9.3
more-itertools==10.1.0
mpire==2.8.1
mpmath==1.3.0
msgpack==1.0.7
msgpack-numpy==0.4.8
mujoco==2.2.0
mujoco-py==2.1.2.14
mypy-extensions==1.0.0
nest-asyncio==1.5.8
networkx==3.1
nh3==0.2.15
nodeenv==1.8.0
numba==0.58.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
opencv-python==4.8.1.78
packaging==23.2
Panda3D==1.10.13
panda3d-gltf==1.0.1
panda3d-simplepbr==0.11.2
pandas==2.0.3
parso==0.8.3
pathspec==0.12.1
patsy==0.5.4
pexpect==4.9.0
pickleshare==0.7.5
Pillow==10.1.0
pkginfo==1.9.6
platformdirs==4.1.0
plotnine==0.12.4
pluggy==1.3.0
pprofile==2.1.0
pre-commit==3.5.0
prompt-toolkit==3.0.41
protobuf==4.25.1
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pygame==2.5.2
Pygments==2.17.2
Pympler==1.0.1
pynng==0.7.2
PyOpenGL==3.1.7
pyparsing==3.1.1
pytest==7.0.1
python-dateutil==2.8.2
pytimeparse==1.1.8
pytz==2023.3.post1
PyWavelets==1.4.1
PyYAML==6.0.1
pyzmq==25.1.2
readme-renderer==42.0
redis==5.0.1
regex==2023.10.3
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==1.0.0
responses==0.12.1
rfc3986==2.0.0
rich==13.7.0
rsa==4.9
s3transfer==0.8.2
safetensors==0.4.1
scikit-image==0.21.0
scikit-learn==1.3.2
scipy==1.10.1
seaborn==0.13.0
SecretStorage==3.3.3
sentry-sdk==1.38.0
setproctitle==1.3.3
Shimmy==0.2.1
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
stack-data==0.6.3
statsmodels==0.14.0
sympy==1.12
tabulate==0.9.0
tensorboard==2.14.0
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
termcolor==2.4.0
threadpoolctl==3.2.0
tifffile==2023.7.10
tokenizers==0.15.0
tomli==2.0.1
torch==2.1.1
tornado==6.4
tqdm==4.66.1
traitlets==5.14.0
transformers==4.36.0
treevalue==1.4.12
triton==2.1.0
trueskill==0.4.5
twine==4.0.2
types-Pillow==10.1.0.2
types-PyYAML==6.0.12.12
typing_extensions==4.9.0
tzdata==2023.3
urllib3==1.26.18
URLObject==2.4.3
virtualenv==20.25.0
wandb==0.16.1
wcwidth==0.2.12
Werkzeug==1.0.1
yapf==0.29.0
yattag==1.15.2
zipp==3.17.0

$ conda list
# packages in environment at /home/ubuntu/pooltool-ml/miniconda/envs/pooltool_ml:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
absl-py                   2.0.0                    pypi_0    pypi
ale-py                    0.8.1                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
asttokens                 2.4.1                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
autorom                   0.4.2                    pypi_0    pypi
autorom-accept-rom-license 0.6.1                    pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
backports-zoneinfo        0.2.1                    pypi_0    pypi
bitmath                   1.3.3.1                  pypi_0    pypi
black                     23.11.0                  pypi_0    pypi
boto3                     1.33.12                  pypi_0    pypi
botocore                  1.33.12                  pypi_0    pypi
box2d-py                  2.3.5                    pypi_0    pypi
bsuite                    0.3.5                    pypi_0    pypi
ca-certificates           2023.08.22           h06a4308_0
cachetools                5.3.2                    pypi_0    pypi
cattrs                    23.2.3                   pypi_0    pypi
certifi                   2023.11.17               pypi_0    pypi
cffi                      1.16.0                   pypi_0    pypi
cfgv                      3.4.0                    pypi_0    pypi
chardet                   4.0.0                    pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
cmake                     3.27.9                   pypi_0    pypi
colored                   1.4.4                    pypi_0    pypi
comm                      0.2.0                    pypi_0    pypi
contourpy                 1.1.1                    pypi_0    pypi
cryptography              41.0.7                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.6                    pypi_0    pypi
debugpy                   1.8.0                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
deprecation               2.1.0                    pypi_0    pypi
di-engine                 0.5.0                    pypi_0    pypi
di-toolkit                0.2.0                    pypi_0    pypi
di-treetensor             0.4.1                    pypi_0    pypi
dill                      0.3.7                    pypi_0    pypi
distlib                   0.3.7                    pypi_0    pypi
dm-env                    1.6                      pypi_0    pypi
dm-tree                   0.1.8                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
docutils                  0.20.1                   pypi_0    pypi
easydict                  1.9                      pypi_0    pypi
enum-tools                0.11.0                   pypi_0    pypi
exceptiongroup            1.2.0                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
farama-notifications      0.0.4                    pypi_0    pypi
fasteners                 0.19                     pypi_0    pypi
filelock                  3.13.1                   pypi_0    pypi
flask                     1.1.4                    pypi_0    pypi
fonttools                 4.46.0                   pypi_0    pypi
frozendict                2.3.10                   pypi_0    pypi
fsspec                    2023.12.2                pypi_0    pypi
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.40                   pypi_0    pypi
glfw                      2.6.3                    pypi_0    pypi
google-auth               2.25.2                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
grpcio                    1.60.0                   pypi_0    pypi
gym                       0.25.1                   pypi_0    pypi
gym-notices               0.0.8                    pypi_0    pypi
gymnasium                 0.29.1                   pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
hbutils                   0.9.3                    pypi_0    pypi
hickle                    5.0.2                    pypi_0    pypi
huggingface-hub           0.19.4                   pypi_0    pypi
identify                  2.5.33                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        7.0.0                    pypi_0    pypi
importlib-resources       6.1.1                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
ipykernel                 6.27.1                   pypi_0    pypi
ipython                   8.12.3                   pypi_0    pypi
isort                     5.13.1                   pypi_0    pypi
itsdangerous              1.1.0                    pypi_0    pypi
jaraco-classes            3.3.0                    pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
jeepney                   0.8.0                    pypi_0    pypi
jinja2                    2.11.3                   pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jupyter-client            8.6.0                    pypi_0    pypi
jupyter-core              5.5.0                    pypi_0    pypi
keyring                   24.3.0                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libstdcxx-ng              11.2.0               h1234567_1
lightzero                 0.0.3                    pypi_0    pypi
llvmlite                  0.41.1                   pypi_0    pypi
lz4                       4.3.2                    pypi_0    pypi
markdown                  3.5.1                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.0.1                    pypi_0    pypi
matplotlib                3.7.4                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
minigrid                  2.3.1                    pypi_0    pypi
mizani                    0.9.3                    pypi_0    pypi
more-itertools            10.1.0                   pypi_0    pypi
mpire                     2.8.1                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msgpack                   1.0.7                    pypi_0    pypi
msgpack-numpy             0.4.8                    pypi_0    pypi
mujoco                    2.2.0                    pypi_0    pypi
mujoco-py                 2.1.2.14                 pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0
nest-asyncio              1.5.8                    pypi_0    pypi
networkx                  3.1                      pypi_0    pypi
nh3                       0.2.15                   pypi_0    pypi
nodeenv                   1.8.0                    pypi_0    pypi
numba                     0.58.1                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencv-python             4.8.1.78                 pypi_0    pypi
openssl                   1.1.1w               h7f8727e_0
packaging                 23.2                     pypi_0    pypi
panda3d                   1.10.13                  pypi_0    pypi
panda3d-gltf              1.0.1                    pypi_0    pypi
panda3d-simplepbr         0.11.2                   pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
parso                     0.8.3                    pypi_0    pypi
pathspec                  0.12.1                   pypi_0    pypi
patsy                     0.5.4                    pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    10.1.0                   pypi_0    pypi
pip                       23.3.1           py38h06a4308_0
pkginfo                   1.9.6                    pypi_0    pypi
platformdirs              4.1.0                    pypi_0    pypi
plotnine                  0.12.4                   pypi_0    pypi
pluggy                    1.3.0                    pypi_0    pypi
pprofile                  2.1.0                    pypi_0    pypi
pre-commit                3.5.0                    pypi_0    pypi
prompt-toolkit            3.0.41                   pypi_0    pypi
protobuf                  4.25.1                   pypi_0    pypi
psutil                    5.9.6                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
py                        1.11.0                   pypi_0    pypi
pyasn1                    0.5.1                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pygame                    2.5.2                    pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pympler                   1.0.1                    pypi_0    pypi
pynng                     0.7.2                    pypi_0    pypi
pyopengl                  3.1.7                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pytest                    7.0.1                    pypi_0    pypi
python                    3.8.10               h12debd9_8
python-dateutil           2.8.2                    pypi_0    pypi
python-graphviz           0.20.1                   pypi_0    pypi
pytimeparse               1.1.8                    pypi_0    pypi
pytz                      2023.3.post1             pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.2                   pypi_0    pypi
readline                  8.2                  h5eee18b_0
readme-renderer           42.0                     pypi_0    pypi
redis                     5.0.1                    pypi_0    pypi
regex                     2023.10.3                pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
requests-toolbelt         1.0.0                    pypi_0    pypi
responses                 0.12.1                   pypi_0    pypi
rfc3986                   2.0.0                    pypi_0    pypi
rich                      13.7.0                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
s3transfer                0.8.2                    pypi_0    pypi
safetensors               0.4.1                    pypi_0    pypi
scikit-image              0.21.0                   pypi_0    pypi
scikit-learn              1.3.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
seaborn                   0.13.0                   pypi_0    pypi
secretstorage             3.3.3                    pypi_0    pypi
sentry-sdk                1.38.0                   pypi_0    pypi
setproctitle              1.3.3                    pypi_0    pypi
setuptools                66.1.1                   pypi_0    pypi
shimmy                    0.2.1                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smmap                     5.0.1                    pypi_0    pypi
sniffio                   1.3.0                    pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0
stack-data                0.6.3                    pypi_0    pypi
statsmodels               0.14.0                   pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorboardx              2.6.2.2                  pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.7.10                pypi_0    pypi
tk                        8.6.12               h1ccaba5_0
tokenizers                0.15.0                   pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
torch                     2.1.1                    pypi_0    pypi
tornado                   6.4                      pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.14.0                   pypi_0    pypi
transformers              4.36.0                   pypi_0    pypi
treevalue                 1.4.12                   pypi_0    pypi
triton                    2.1.0                    pypi_0    pypi
trueskill                 0.4.5                    pypi_0    pypi
twine                     4.0.2                    pypi_0    pypi
types-pillow              10.1.0.2                 pypi_0    pypi
types-pyyaml              6.0.12.12                pypi_0    pypi
typing-extensions         4.9.0                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
urllib3                   1.26.18                  pypi_0    pypi
urlobject                 2.4.3                    pypi_0    pypi
virtualenv                20.25.0                  pypi_0    pypi
wandb                     0.16.1                   pypi_0    pypi
wcwidth                   0.2.12                   pypi_0    pypi
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.41.2           py38h06a4308_0
xz                        5.4.5                h5eee18b_0
yapf                      0.29.0                   pypi_0    pypi
yattag                    1.15.2                   pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0

Installation fails on MacBook M1 Pro

I am not able to install the package with my M1 Pro MacBook. The problematic library seems to be box2d-py. I already tried to install box2d via brew (brew install box2d) and followed these instructions without any success. Do you have any ideas?

Installation from pypi

❯ pip install LightZero
Collecting LightZero
  Using cached LightZero-0.0.1.tar.gz (194 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "/Users/user/.pyenv/versions/3.10.8/envs/light_zero/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/user/.pyenv/versions/3.10.8/envs/light_zero/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/Users/user/.pyenv/versions/3.10.8/envs/light_zero/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/private/var/folders/b5/vc_ln9s912z8_m6gyf6d98_r0000gn/T/pip-build-env-alil5aea/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 355, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/private/var/folders/b5/vc_ln9s912z8_m6gyf6d98_r0000gn/T/pip-build-env-alil5aea/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in _get_build_requires
          self.run_setup()
        File "/private/var/folders/b5/vc_ln9s912z8_m6gyf6d98_r0000gn/T/pip-build-env-alil5aea/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 507, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/private/var/folders/b5/vc_ln9s912z8_m6gyf6d98_r0000gn/T/pip-build-env-alil5aea/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup
          exec(code, locals())
        File "<string>", line 41, in <module>
        File "<string>", line 25, in _load_req
      FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Directly installing from repo

Skipped some sections as the complete output is 5000 lines.

❯ pip install -e .
Obtaining file:///Users/user/Code/private/light_zero/LightZero
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Collecting DI-engine>=0.4.7 (from DI-engine[common_env]>=0.4.7->LightZero==0.0.2)
  Using cached DI_engine-0.4.9-py3-none-any.whl.metadata (61 kB)
Collecting gym==0.25.1 (from gym[accept-rom-license]==0.25.1->LightZero==0.0.2)
  Using cached gym-0.25.1-py3-none-any.whl

...

Using cached smmap-5.0.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: LightZero, box2d-py
  Building editable for LightZero (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building editable for LightZero (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [2340 lines of output]
      In file included from /Users/user/Code/private/light_zero/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:39:
      In file included from /Users/user/.pyenv/versions/3.10.8/include/python3.10/Python.h:25:
      In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:64:
      /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:93:16: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness]
              unsigned char   *_base;
                              ^
      /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:93:16: note: insert '_Nullable' if the pointer may be null
              unsigned char   *_base;
                              ^
                                _Nullable
      /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:93:16: note: insert '_Nonnull' if the pointer should never be null
              unsigned char   *_base;
                              ^
                                _Nonnull
      /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:138:32: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness]
              int     (* _Nullable _read) (void *, char *, int);
                                                ^
                                                 
...

      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:643:17: note: declared here
      template <class _A1, class _A2>
                      ^
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:646:58: error: expected unqualified-id
                                      std::__promote<_A1, _A2> >::type
                                                               ^
      In file included from Box2D/Box2D_wrap.cpp:3023:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/string:561:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__string/char_traits.h:24:
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cstdio:104:5: error: <cstdio> tried including <stdio.h> but didn't find libc++'s <stdio.h> header.           This usually means that your header search paths are not configured properly.           The header search paths should contain the C++ Standard Library headers before           any C Standard Library, and you are probably using compiler flags that make that           not be the case.
      #   error <cstdio> tried including <stdio.h> but didn't find libc++'s <stdio.h> header. \
          ^
      In file included from Box2D/Box2D_wrap.cpp:3023:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/string:561:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__string/char_traits.h:29:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cwchar:108:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cwctype:54:
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cctype:43:5: error: <cctype> tried including <ctype.h> but didn't find libc++'s <ctype.h> header.           This usually means that your header search paths are not configured properly.            The header search paths should contain the C++ Standard Library headers before           any C Standard Library.
      #   error <cctype> tried including <ctype.h> but didn't find libc++'s <ctype.h> header. \
          ^
      In file included from Box2D/Box2D_wrap.cpp:3023:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/string:561:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__string/char_traits.h:29:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cwchar:108:
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cwctype:59:5: error: <cwctype> tried including <wctype.h> but didn't find libc++'s <wctype.h> header.           This usually means that your header search paths are not configured properly.           The header search paths should contain the C++ Standard Library headers before           any C Standard Library, and you are probably using compiler flags that make that           not be the case.
      #   error <cwctype> tried including <wctype.h> but didn't find libc++'s <wctype.h> header. \
          ^
      In file included from Box2D/Box2D_wrap.cpp:3023:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/string:561:
      In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__string/char_traits.h:29:
      /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cwchar:113:5: error: <cwchar> tried including <wchar.h> but didn't find libc++'s <wchar.h> header.           This usually means that your header search paths are not configured properly.           The header search paths should contain the C++ Standard Library headers before           any C Standard Library, and you are probably using compiler flags that make that           not be the case.
      #   error <cwchar> tried including <wchar.h> but didn't find libc++'s <wchar.h> header. \
          ^
      fatal error: too many errors emitted, stopping now [-ferror-limit=]
      193 warnings and 20 errors generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for box2d-py
  Running setup.py clean for box2d-py
Failed to build LightZero box2d-py
ERROR: Could not build wheels for LightZero, box2d-py, which is required to install pyproject.toml-based projects

random policy is not working for a sampled efficient zero env with continuous actions

How to reproduce it?

add random_collect_episode_num=10 to the some continuous env (e.g. lunarlander_cont_sampled_efficientzero_config.py)

What's happening?

I'm getting the next error on this line:

A typo in the comment of _ucb_score

https://github.com/opendilab/LightZero/blame/3d8fb0b22249aec9e80b7eeac80b83df2b68710f/lzero/mcts/ptree/ptree_az.py#L405

N(\text{parent}) should be \sqrt{N(\text{parent})}

Docker image

Running the Docker commands exactly as shown gives me a KeyError and BrokenPipeErrors:

Traceback (most recent call last):
File "./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py", line 89, in
train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
File "/opendilab/LightZero/lzero/entry/train_muzero.py", line 155, in train_muzero
stop, reward = evaluator.eval(learner.save_checkpoint, learner.train_iter, collector.envstep)
File "/opendilab/LightZero/lzero/worker/muzero_evaluator.py", line 292, in eval
distributions_dict_no_env_id = {k: v['distributions'] for k, v in policy_output.items()}
File "/opendilab/LightZero/lzero/worker/muzero_evaluator.py", line 292, in
distributions_dict_no_env_id = {k: v['distributions'] for k, v in policy_output.items()}
KeyError: 'distributions'
Exception ignored in: <function MuZeroCollector.del at 0x7f8f3dce9dc0>
Traceback (most recent call last):
File "/opendilab/LightZero/lzero/worker/muzero_collector.py", line 193, in del
self.close()
File "/opendilab/LightZero/lzero/worker/muzero_collector.py", line 182, in close
self._env.close()
File "/usr/local/lib/python3.8/dist-packages/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
p.send(['close', None, None])
File "/usr/lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <function MuZeroEvaluator.del at 0x7f8f3dcea430>
Traceback (most recent call last):
File "/opendilab/LightZero/lzero/worker/muzero_evaluator.py", line 179, in del
self.close()
File "/opendilab/LightZero/lzero/worker/muzero_evaluator.py", line 168, in close
self._env.close()
File "/usr/local/lib/python3.8/dist-packages/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
p.send(['close', None, None])
File "/usr/lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Adding Contributors Section in readme.md

Why Contributors section:- A "Contributors" section in a repo gives credit to and acknowledges
the people who have helped with the project, fosters a sense of community, and helps others
know who to contact for questions or issues related to the project.

Issue type

[✅] Docs

Expected Outcome :-

If you were planning this issue kindly assign it to me! I would love to work on it ! Thank you !

Single Player AlphaZero

Do you plan to support a single-player version of AZ anytime? It would be lovely to compare the different methods on similar benchmarks.

Adding Code-Of-Conduct File to the repo

As this repo is open source having a code-of-conduct File Becomes Important:- code-of-conduct:- We propose adding a comprehensive Code of Conduct to our repository to ensure a safe, respectful, and inclusive environment for all contributors and users. This code will
serve as a guideline for behavior, promoting diversity, reducing conflicts, and attracting a wider range of perspectives.

Issue type

[✅] Docs

If you were planning this issue kindly assign it to me! I would love to work on it ! Thank you !

Get a error when "pip install -e ."

I am trying to install your package with the code "pip install -e ."
But I get a error about "cl.exe".
Any suggestions are appreciated!

Building wheels for collected packages: LightZero
  Building editable for LightZero (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building editable for LightZero (pyproject.toml) did not run successfully.
  `│` exit code: 1
  ╰─> [87 lines of output]
      C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\wheel\bdist_wheel.py:100: RuntimeWarning: Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect
        if get_flag("Py_DEBUG", hasattr(sys, "gettotalrefcount"), warn=(impl == "cp")):
      ez_tree.cpp
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(1): warning C4819: 该文件包含不能在当前代码页(936)中表示的字符。请将该文件保存为 Unicode 格式以防止数据丢失
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(105): warning C4267: “初始化”: 从“size_t”转换到“int”，可能丢失数据
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(135): error C2065: “policy”: 未声明的标识符
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(141): error C2065: “policy”: 未声明的标识符
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(147): error C2065: “policy”: 未声明的标识符
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(147): error C2541: “delete”: 不能删除不是指针的对象
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(503): warning C4267: “初始化”: 从“size_t”转换到“int”，可能丢失数据
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(535): warning C4267: “初始化”: 从“size_t”转换到“int”，可能丢失数据
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(624): warning C4305: “初始化”: 从“double”到“float”截断
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(629): warning C4244: “参数”: 从“int”转换到“float”，可能丢失数据
      C:\work\project\pycharm\swing_snake\LightZero-main\lzero\mcts\ctree\ctree_efficientzero\lib/cnode.cpp(629): warning C4244: “参数”: 从“int”转换到“float”，可能丢失数据
      Traceback (most recent call last):
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 419, in compile
          self.spawn(args)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 517, in spawn
          return super().spawn(cmd, env=env)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\ccompiler.py", line 1041, in spawn
          spawn(cmd, dry_run=self.dry_run, **kwargs)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\spawn.py", line 70, in spawn
          raise DistutilsExecError(
      distutils.errors.DistutilsExecError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
     
      During handling of the above exception, another exception occurred:
     
      Traceback (most recent call last):
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 344, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 267, in _run_build_commands
          self._run_build_subcommands()
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\editable_wheel.py", line 294, in _run_build_subcommands
          self.run_command(name)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\dist.py", line 1234, in run_command
          super().run_command(command)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\build_ext.py", line 84, in run
          _build_ext.run(self)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
          self.build_extensions()
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
          objects = self.compiler.compile(
        File "C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 421, in compile
          raise CompileError(msg)
      distutils.errors.CompileError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      C:\Users\Lenovo\AppData\Local\Temp\pip-build-env-5kdgr35h\overlay\Lib\site-packages\setuptools\_distutils\dist.py:988: _DebuggingTips: Problem in editable installation.
      !!
     
              ********************************************************************************
              An error happened while installing `LightZero` in editable mode.
     
              The following steps are recommended to help debug this problem:
     
              - Try to install the project normally, without using the editable mode.
                Does the error still persist?
                (If it does, try fixing the problem before attempting the editable mode).
              - If you are using binary extensions, make sure you have all OS-level
                dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
              - Try the latest version of setuptools (maybe the error was already fixed).
              - If you (or your project dependencies) are using any setuptools extension
                or customization, make sure they support the editable mode.
     
              After following the steps above, if the problem still persists and
              you think this is related to how setuptools handles editable installations,
              please submit a reproducible example
              (see https://stackoverflow.com/help/minimal-reproducible-example) to:
     
                  https://github.com/pypa/setuptools/issues
     
              See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
              ********************************************************************************
     
      !!
        cmd_obj.run()
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for LightZero
Failed to build LightZero
ERROR: Could not build wheels for LightZero, which is required to install pyproject.toml-based projects

AttributeError: 'EasyDict' object has no attribute 'replay_path_gif'

Description

I experienced the following error when running the quickstart command for training tic-tac-toe:

▶▶ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
Traceback (most recent call last):
  File "zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py", line 86, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/Users/evan/Software/LightZero/lzero/entry/train_muzero.py", line 73, in train_muzero
    collector_env = create_env_manager(cfg.env.manager, [partial(env_fn, cfg=c) for c in collector_env_cfg])
  File "/Users/evan/anaconda3/envs/lightzero/lib/python3.8/site-packages/ding/envs/env_manager/base_env_manager.py", line 528, in create_env_manager
    return ENV_MANAGER_REGISTRY.build(manager_type, env_fn=env_fn, cfg=manager_cfg)
  File "/Users/evan/anaconda3/envs/lightzero/lib/python3.8/site-packages/ding/utils/registry.py", line 96, in build
    raise e
  File "/Users/evan/anaconda3/envs/lightzero/lib/python3.8/site-packages/ding/utils/registry.py", line 82, in build
    return build_fn(*obj_args, **obj_kwargs)
  File "/Users/evan/anaconda3/envs/lightzero/lib/python3.8/site-packages/ding/envs/env_manager/subprocess_env_manager.py", line 79, in __init__
    super().__init__(env_fn, cfg)
  File "/Users/evan/anaconda3/envs/lightzero/lib/python3.8/site-packages/ding/envs/env_manager/base_env_manager.py", line 109, in __init__
    self._env_ref = self._env_fn[0]()
  File "/Users/evan/Software/LightZero/zoo/board_games/tictactoe/envs/tictactoe_env.py", line 84, in __init__
    self._replay_path_gif = cfg.replay_path_gif
AttributeError: 'EasyDict' object has no attribute 'replay_path_gif'

Running the cartpole example yields a similar error:

AttributeError: 'EasyDict' object has no attribute 'replay_path'

To reproduce

I installed with pip install -e . on the main branch, and my python version is 3.8.10.

I'm very excited to start using LightZero. Thanks in advance for your help.

gumbel_muzero error

when running:
python3 ./zoo/box2d/lunarlander/config/lunarlander_disc_gumbel_muzero_config.py

It will pop out the following error:

File "/home/1project/LightZero-main/lzero/policy/gumbel_muzero.py", line 561, in _forward_collect
for i, env_id in enumerate(ready_env_id):
TypeError: 'float' object is not iterable

It seems that ready_env_id == 0.0 in this case,

So by changing:

if ready_env_id is None: ready_env_id = np.arange(active_collect_env_num) # print('ready_env_id', ready_env_id) for i, env_id in enumerate(ready_env_id):
to:

if ready_env_id is None or ready_env_id == 0.0: ready_env_id = np.arange(active_collect_env_num) # print('ready_env_id', ready_env_id) for i, env_id in enumerate(ready_env_id):
The error is gone.

Bipedal continuous discretized sampled efficientzero config error

I get the following error when running python zoo/box2d/bipedalwalker/config/bipedalwalker_cont_disc_sampled_efficientzero_config.py from the root directory:

Traceback (most recent call last):
  File "zoo/box2d/bipedalwalker/config/bipedalwalker_cont_disc_sampled_efficientzero_config.py", line 104, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/entry/train_muzero.py", line 160, in train_muzero
    new_data = collector.collect(train_iter=learner.train_iter, policy_kwargs=collect_kwargs)
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/worker/muzero_collector.py", line 411, in collect
    policy_output = self._policy.forward(stack_obs, action_mask, temperature, to_play, epsilon)
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/policy/sampled_efficientzero.py", line 841, in _forward_collect
    legal_actions = [
  File "/home/ubuntu/pooltool-ml/pooltool_ml/LightZero/lzero/policy/sampled_efficientzero.py", line 842, in <listcomp>
    [i for i, x in enumerate(action_mask[j]) if x == 1] for j in range(active_collect_env_num)
TypeError: 'NoneType' object is not iterable

I encountered this error on two different machines.

AttributeError: 'SampledEfficientZeroPolicy' object has no attribute 'inverse_scalar_transform_handle'

Steps to reproduce it:

prepare a sampled efficient zero mlp policy that will be used just to collect episodes. e.g:
call the forward method of the collect_mode attribute. e.g:

What should happen?

the forward method returns the policy_output

What's happening?

this AttributeError is thrown, and the application exit with code 1:

What do I think is happening?

I found that the self.inverse_scalar_transform_handle is only set on the _init_learn method:

based on that knowledge, I see 3 options here:

a) the collect mode can't be used without the learn mode
b) self.inverse_scalar_transform_handle attribute should be set for the scenarios where the learn mode is not enabled
c) we have to adapt the _forward_collect method, so it only uses the inverse_scalar_transform_handle whenever it's available. e.g.

            if hasattr(self, 'inverse_scalar_transform_handle'):
                pred_values = self.inverse_scalar_transform_handle(pred_values).detach().cpu().numpy()

Which one do you consider is the right approach?

Thanks

Clang Issue - `pip install -e .`

HI, I have been following installation issue on M1 -Chip Mac OS.

Any suggestions are appreciated!

Building wheels for collected packages: LightZero
  Building editable for LightZero (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building editable for LightZero (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [195 lines of output]
      In file included from /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:797:
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:86:27: warning: comparison of integers of different signs: 'int' and 'std::vector<float>::size_type' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < this->value.size(); ++i)
                              ~ ^ ~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:104:31: warning: comparison of integers of different signs: 'int' and 'std::vector<unsigned long>::size_type' (aka 'unsigned long') [-Wsign-compare]
                  for (int i = 1; i < hash.size(); ++i)
                                  ~ ^ ~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:374:18: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
                  for (auto prob : probs)
                       ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:374:28: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
                  for (auto prob : probs)
                                 ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:384:44: error: no member named '__emplace_back' in 'std::vector<std::pair<int, double> >'; did you mean 'emplace_back'?
                          disc_action_with_probs.__emplace_back(std::make_pair(iter, disturbed_probs[iter]));
                                                 ^~~~~~~~~~~~~~
                                                 emplace_back
      /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/include/c++/v1/vector:591:19: note: 'emplace_back' declared here
              void      emplace_back(_Args&&... __args);
                        ^
      In file included from /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:797:
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:327:31: warning: comparison of integers of different signs: 'int' and 'std::vector<float>::size_type' (aka 'unsigned long') [-Wsign-compare]
                  for (int i = 0; i < policy_logits.size(); ++i)
                                  ~ ^ ~~~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:331:31: warning: comparison of integers of different signs: 'int' and 'std::vector<float>::size_type' (aka 'unsigned long') [-Wsign-compare]
                  for (int i = 0; i < policy_logits.size(); ++i)
                                  ~ ^ ~~~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:493:14: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
              for (auto a : this->legal_actions)
                   ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:493:21: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
              for (auto a : this->legal_actions)
                          ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:573:27: warning: comparison of integers of different signs: 'int' and 'std::vector<tree::CAction>::size_type' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < traj.size(); ++i)
                              ~ ^ ~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:591:18: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
                  for (auto a : this->legal_actions)
                       ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:591:25: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
                  for (auto a : this->legal_actions)
                              ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:662:35: warning: comparison of integers of different signs: 'int' and 'std::vector<std::vector<float> >::size_type' (aka 'unsigned long') [-Wsign-compare]
                      for (int i = 0; i < this->legal_actions_list.size(); ++i)
                                      ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:776:31: warning: comparison of integers of different signs: 'int' and 'std::vector<tree::CAction>::size_type' (aka 'unsigned long') [-Wsign-compare]
                  for (int j = 0; j < this->roots[i].legal_actions.size(); ++j)
                                  ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:843:18: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
                  for (auto a : node->legal_actions)
                       ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:843:25: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
                  for (auto a : node->legal_actions)
                              ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:815:15: warning: unused variable 'parent_value_prefix' [-Wunused-variable]
              float parent_value_prefix = 0.0;
                    ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:992:14: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
              for (auto a : root->legal_actions)
                   ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:992:21: warning: range-based for loop is a C++11 extension [-Wc++11-extensions]
              for (auto a : root->legal_actions)
                          ^
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:1057:35: warning: comparison of integers of different signs: 'int' and 'std::map<unsigned long, tree::CNode>::size_type' (aka 'unsigned long') [-Wsign-compare]
                      for (int i = 0; i < parent->children.size(); ++i)
                                      ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~
      lzero/mcts/ctree/ctree_sampled_efficientzero/lib/cnode.cpp:1066:35: warning: comparison of integers of different signs: 'int' and 'std::map<unsigned long, tree::CNode>::size_type' (aka 'unsigned long') [-Wsign-compare]
                      for (int i = 0; i < parent->children.size(); ++i)
                                      ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~
      In file included from /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:801:
      In file included from /private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:5:
      In file included from /private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12:
      In file included from /private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1940:
      /private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with "          "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
      #warning "Using deprecated NumPy API, disable it with " \
       ^
      /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:6679:3: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
        0, /*tp_print*/
        ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
          Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
          ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
      #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                           ^
      /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:6787:3: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
        0, /*tp_print*/
        ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
          Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
          ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
      #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                           ^
      /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:6896:3: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
        0, /*tp_print*/
        ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
          Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
          ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
      #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                           ^
      /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:7022:3: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
        0, /*tp_print*/
        ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
          Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
          ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
      #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                           ^
      /Users/anuragkoul/PycharmProjects/LightZero/lzero/mcts/ctree/ctree_sampled_efficientzero/ezs_tree.cpp:7130:3: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
        0, /*tp_print*/
        ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
          Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
          ^
      /opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
      #define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                           ^
      26 warnings and 1 error generated.
      Traceback (most recent call last):
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 185, in _compile
          self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 1041, in spawn
          spawn(cmd, dry_run=self.dry_run, **kwargs)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
          raise DistutilsExecError(
      distutils.errors.DistutilsExecError: command '/usr/bin/clang' failed with exit code 1
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 344, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 267, in _run_build_commands
          self._run_build_subcommands()
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/editable_wheel.py", line 294, in _run_build_subcommands
          self.run_command(name)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
          objects = self.compiler.compile(
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 600, in compile
          self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
        File "/private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 187, in _compile
          raise CompileError(msg)
      distutils.errors.CompileError: command '/usr/bin/clang' failed with exit code 1
      /private/var/folders/53/kbc84lyx1kl8rhp418nhwgl80000gn/T/pip-build-env-grfikt64/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
      !!
      
              ********************************************************************************
              An error happened while installing `LightZero` in editable mode.
      
              The following steps are recommended to help debug this problem:
      
              - Try to install the project normally, without using the editable mode.
                Does the error still persist?
                (If it does, try fixing the problem before attempting the editable mode).
              - If you are using binary extensions, make sure you have all OS-level
                dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
              - Try the latest version of setuptools (maybe the error was already fixed).
              - If you (or your project dependencies) are using any setuptools extension
                or customization, make sure they support the editable mode.
      
              After following the steps above, if the problem still persists and
              you think this is related to how setuptools handles editable installations,
              please submit a reproducible example
              (see https://stackoverflow.com/help/minimal-reproducible-example) to:
      
                  https://github.com/pypa/setuptools/issues
      
              See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
              ********************************************************************************
      
      !!
        cmd_obj.run()
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for LightZero
Failed to build LightZero
ERROR: Could not build wheels for LightZero, which is required to install pyproject.toml-based projects

Is there a missing .gitmodules file?

When I use LightZero as a git submodule of my own project, I encounter a fatal error during cloning.

The command I'm using is:

git clone --recurse-submodules https://github.com/ekiefl/myprivaterepo.git

And after decompressing all the objects, I get the following error:

Submodule 'LightZero' (https://github.com/opendilab/LightZero.git) registered for path 'LightZero'
(...)
Submodule path 'LightZero': checked out 'e763282c4ea149259c568058df4f000aafbb1ab1'
fatal: No url found for submodule path 'LightZero/lzero/mcts/ctree/ctree_alphazero/pybind11' in .gitmodules
fatal: Failed to recurse into submodule path 'LightZero'

I'm confused whether pybind11 is a git submodule of LightZero or not. If it is, shouldn't a .gitmodules file exist? For example, I declare LightZero is a submodule of my project with a .gitmodules file like so:

[submodule "LightZero"]
	path = LightZero
	url = https://github.com/opendilab/LightZero.git

I have not directly tested this, but if pybind11 is a true submodule, I think the following .gitmodules file should be added to the root directory to reflect that.

[submodule "pybind11"]
	path = lzero/mcts/ctree/ctree_alphazero/pybind11
	url = https://github.com/pybind/pybind11.git

Maybe something like this? I may be incorrect about how submodules work.

How to activate multi player with to_play and action_mask

how to let ai after taking an action only observe on specific time tell got reward of the taked action then will be able to take other action , i mean is stay observing and learning with no action tell got reward then allow to make other action

i think can achieve this by using this parammetres right ? like multi player or tell allowed actions

to_play=-1
action_mask = np.array([1., 1., 1.], dtype=np.float32)
obs = {'observation': to_ndarray(obs), 'action_mask': action_mask, 'to_play': to_play}

i try :
to_play=-1
action_mask [0., 1., 0.] but its give me error on child_visit_segment it will be like [1] object array

i also try:
to_play=-1 as ai and to_play=1 as other player
action_mask = np.array([1., 1., 1.], dtype=np.float32)

but i not understand correctly to_play values and how this affect the training

Computer resource and len error while running stochastic_muzero_2048_config

While running
python3 ./zoo/game_2048/config/stochastic_muzero_2048_config.py

It will pop out some error in GameSegment at line 171:
assert len(next_segment_observations) <= self.num_unroll_steps
assert len(next_segment_child_visits) <= self.num_unroll_steps
assert len(next_segment_root_values) <= self.num_unroll_steps + self.num_unroll_steps
assert len(next_segment_rewards) <= self.num_unroll_steps + self.num_unroll_steps - 1

where sometimes the len of these segments will be larger than the number you set.
But this error seems to pop out randomly, so I'm curious that whether you have encounter this problem?

Besides, I'm trying to prepare for the computing resource for doing these kind of RL project. Currently using single CPU and single GPU on my own computer (i5 13th 13600k, GTX-1070) and a single (3090) in my school computer. Both of them seems running so slow, it takes a hour to train iter 4200, envstep 786 when running stochastic_muzero_2048_config.py.

By my calculation, if it take 1000000 steps to finish the process, it take 100 days to run this process, which seems not practical...
So may I ask how long it takes for you to run this process? or what do I missed? or I just should update my computer.

Very appreciate your works! And thank you for reading my question!

Lacking inference script

In the codebase, there are training and evaluation scripts. This is great. But, I lack an inference script here, in which I can run the existing weights on the environment and see how it performs visually. To have the environment rendered visually and see the AI runs is a good addition. Is there already a plan to do this?

Confusion between "battle_mode" and "mcts_mode"

Hello,

I think there is a "bug" in the actual version of the alphago code when using mode "play_with_bot_mode".
Indeed in both tictactoe_env.py and gomoku_env.py there is this line hardcoded:

self.mcts_mode = 'self_play_mode'

So mcts_mode is always set to self_play_mode, no matter what is giving inside the config.
Moreover in both python tree and C++ tree of alphago we can found those lines:

self.simulate_env.battle_mode = self.simulate_env.mcts_mode # In ptree_az.py
simulate_env.attr("battle_mode") = simulate_env.attr("mcts_mode"); # In mcts_alphazero.cpp

So that means that no matter what we give in config for battle_mode, this is overrided with the mcts_mode which is always "self_play_mode"...

In conclusion, after reviewed quickly the code, I think that mcts_mode should just be removed and replaced by battle_mode everywhere because both attributes seems to make the exact same things (but I may be wrong).

To reproduce you can just run the standard tictactoe in 'play_with_bot_mode' (by running tictactoe_alphazero_bot_mode_config.py) and check that the mcts is always using "self_play_mode".

opendilab / lightzero Goto Github PK

lightzero's Introduction

LightZero

Background

Overview

Outline

Features

Framework Structure

Integrated Algorithms

Installation

Installation with Docker

Quick Start

Customization Documentation

Benchmark

Awesome-MCTS Notes

Paper Notes

Algo. Overview

Awesome-MCTS Papers

Key Papers

LightZero Implemented series

AlphaGo series

MuZero series

MCTS Analysis

MCTS Application

Other Papers

ICML

ICLR

NeurIPS

Other Conference or Journal

Feedback and Contribution

Citation

Acknowledgments

License

lightzero's People

Contributors

Stargazers

Watchers

Forkers

lightzero's Issues

Description

Steps to reproduce

Summary of issue

Description

Steps to reproduce

Environment

Installation from pypi

Directly installing from repo

How to reproduce it?

What's happening?

Description

To reproduce

Steps to reproduce it:

What should happen?

What's happening?

What do I think is happening?

Recommend Projects

Recommend Topics

Recommend Org

Jobs