Update for gymnasium!!!

We have updated the baseline for the gymnasium version of the Lux AI Challenge. You can run the baseline with both the old gym and the new gymnasium version. The new gymnasium version is available here: Luxai-s2-Baseline4Gymnasium

The old gym version is available here: Luxai-s2-Baseline4Gym

Luxai-s2-Baseline

Welcome to the Lux AI Challenge Season 2! This repository serves as a baseline for the Lux AI Challenge Season 2, designed to provide participants with a strong starting point for the competition. Our goal is to provide you with a clear, understandable, and modifiable codebase so you can quickly start developing your own AI strategies.

This baseline includes an implementation of the PPO reinforcement learning algorithm, which you can use to train your own agent from scratch. The codebase is designed to be easy to modify, allowing you to experiment with different strategies, reward functions, and other parameters.

In addition to the main training script, we also provide additional tools and resources, including scripts for evaluating your AI strategy, as well as useful debugging and visualization tools. We hope these tools and resources will help you develop and improve your AI strategy more effectively.

More information about the Lux AI Challenge can be found on the competition page: https://www.kaggle.com/competitions/lux-ai-season-2

We look forward to seeing how your AI strategy performs in the competition!

Memory leak in train.py 🐛

We logged the gpu and ram usage and found that ram usage was exploding into swap quickly, tested with 30gb RAM and 15 gb RAM capped at 1/2 RAM. We think that its a memory leak.

CLI command: python train.py --total-timesteps 1050 --num-envs 1 --save-interval=999 --train-num-collect=1024

Running in WSL, configured according to the documentation in this repo.
here are the logs
log.json

this is how we logged it.

memory_tracker = {}
last_check = process.memory_info().rss
def log_memory(name): 
    global last_check
    inc = process.memory_info().rss - last_check
    if name not in memory_tracker:
        memory_tracker[name] = 0
    memory_tracker[name] += inc
    last_check = process.memory_info().rss

def print_memory():
    memory_tracker_formatted = {}
    for k in memory_tracker:
        memory_tracker_formatted[k] = "{:.2f}".format(memory_tracker[k] / 10**9)
    memory_tracker_formatted["current"] = "{:.2f}".format(process.memory_info().rss / 10**9)
    print("current:", memory_tracker_formatted)

This code is allocating the memory. We think its something to do with torch models not being detached.

                # beginning of code block
for player_id, player in enumerate(['player_0', 'player_1']):
                obs[player] += envs.split(next_obs[player])
                dones[train_step] = next_done
                log_memory("-1")

                # ALGO LOGIC: action logic
                # use no_grad() context disables gradient calculation: https://pytorch.org/docs/stable/generated/torch.no_grad.html
                with torch.no_grad():
                    log_memory("0")
                    # under no_grad, any tensors created as a result
                    # of a computation will have their internal requires_grad
                    # state set to false. This means their gradient will not be
                    # calculated by torch. This avoids memory consumption for stuff
                    # that doesn't need gradient calculatios.
                    valid_action = envs.get_valid_actions(player_id)
                    # np2torch = lambda x, dtype: torch.tensor(x).type(dtype).to(device).detach()
                    log_memory("1")
                    # calling agent() like a function actually calls agent.forward() under the hood
                    # its defined in net.py. Here the observation space is being passed in.
                    # an action space which is not completely abstract BUT not completely lux either is
                    # returned. Lets call it intermediate actio space
                    global_feature = np2torch(next_obs[player]['global_feature'], torch.float32)
                    map_feature = np2torch(next_obs[player]['map_feature'], torch.float32)

                    log_memory("1.25")
                    # Note np2torch is a pytorch model in delcared in the folder
                    action_feature = tree.map_structure(lambda x: np2torch(x, torch.int16), next_obs[player]['action_feature'])
                    log_memory("1.5")
                    va = tree.map_structure(lambda x: np2torch(x, torch.bool), valid_action)
                    log_memory("1.75")

                    logprob, value, raw_action, _ = agent(
                        global_feature,
                        map_feature,
                        action_feature,
                        va
                    )
                    values[player][train_step] = value
                    log_memory("2")
                    # action space arrays are partitioned like so
                    # {"transfer_power": [10, 11, 12]} where 10, 11 and 12 represent the 
                    # actions to perform on environments 0, 1 and 2 (vectorized environments)
                    # This function splits the actions into independent trees per environment:
                    # [{'transfer_power': 10}, {'transfer_power': 11}, {'transfer_power': 12}]
                    valid_actions[player] += envs.split(valid_action)
                    log_memory("3")
                # see above comment, split raw_actions into independent trees per vectorized environment
                actions[player] += envs.split(raw_action)
                action[player_id] = raw_action
                logprobs[player][train_step] = logprob
                log_memory("4")

roboeden / luxai-s2-baseline Goto Github PK

luxai-s2-baseline's Introduction

Update for gymnasium!!!

Luxai-s2-Baseline

luxai-s2-baseline's People

Contributors

Stargazers

Watchers

luxai-s2-baseline's Issues

requirements.txt

Memory leak in train.py 🐛

Are any modifications needed for neurips stage?

Submit to kaggle

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs