ronaldosvieira / gym-locm Goto Github PK

OpenAI Gym environments for Legends of Code and Magic, a collectible card game designed for AI research

License: MIT License

Python 100.00%

gym-environment legends-of-code-and-magic reinforcement-learning collectible-card-games

gym-locm's Introduction

# todo: add something here next time I procrastinate

gym-locm's People

Contributors

Stargazers

Watchers

Forkers

dfpetrin luccaaug rattko

gym-locm's Issues

Update dependencies and better manage them

Most of this codebase was developed in 2019, and I have never ensured it supported newer versions of the dependencies. In fact, it does not. :P

Cannot fully reproduce the Coac vs Chad winrate as reported in CEC2020 using NativeAgent

The Coac vs Chad winrate was reported to be 57% in CEC 2020, but I obtained a winrate ~= 80% using locm-runner and NativeAgent.

The evaluation code is:

locm-runner \
    --p1-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Coac/main" \
    --p2-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Chad/agent/target/release/agent" \
    --games 100

where I had commented out the cerr << code for Coac (e.g., here and other similar lines) as I found the self._process.read_nonblocking code from NativeAgent seemed to read both stdout and stderr (a known issue).

And here are the printed results:

...
2022-05-30 22:43:51.392527 Episode 97: 79.38% 20.62%
2022-05-30 22:43:57.315334 Episode 98: 78.57% 21.43%
2022-05-30 22:44:03.639829 Episode 99: 78.79% 21.21%
2022-05-30 22:44:12.195598 Episode 100: 79.00% 21.00%
79.00% 21.00%

NativeAgent class doesn't work with agents that use stderr

As pointed out by @wanhesong, the process infrastructure used in NativeAgent (specifically, self._process.read_nonblocking) considers both stdout and stderr when reading the agent's actions and therefore regards any noise printed on stderr as an actual output of the agent, which leads to errors.

Create and automatize consistency checks

Why we need

The LOCM engine in gym_locm/engine.py should be consistent with the official Java engine. However, it has been a year since I last checked for consistency, and it is a manual (and somewhat painful) process.

What we need

Given a dataset of full matches between two deterministic agents in the official Java engine containing all (state, action, next state) transitions, there should be a script at gym_locm/toolbox that:

Parses the dataset.
Recreate the initial states in gym-locm's engine (including hidden information - make a list of all cards drawn by the players during the match and put those cards at the top of their owner's deck in reverse order).
Play those matches until completion using the same two deterministic agents (use the Java -> Python engine adapter, if needed).
Compare the (state, action, next state) transitions of all matches played at gym-locm's engine to those in the dataset and point out any differences.

Additionally, every commit to the repository should trigger this script (use GitHub actions?).

Add support for Gymnasium

Publish to PyPI

Player features on battle envs are not normalized

`LOCMEnv` sometimes returns `None` instead of a numerical state representation

Hey,

the function LOCMEnv.encode_state returns None when self.state.phase is neither Phase.DECK_BUILDING nor Phase.BATTLE. This happens precisely once in every game. When self.state.phase == Phase.ENDED, neither of the conditions is true and the function implicitly returns None.

gym-locm/gym_locm/envs/base_env.py

Lines 486 to 491 in 270b55b

 def encode_state(self): 

 """Encodes a state object into a numerical matrix.""" 

 if self.state.phase == Phase.DECK_BUILDING: 

 return self._encode_state_deck_building() 

 elif self.state.phase == Phase.BATTLE: 

 return self._encode_state_battle()

This problem is not noticeable in most cases, but RLlib crashes when the state representation is None and I think it would be better to return a vector full of 0s or -1s for example.

Use stable-baseline3 instead of stable-baseline for training draft agents

Allow hand/board randomizing on state representations

In LOCM's battle, theoretically, the order of the cards in a player's hand or on their board should not matter, i.e., a state where a player has cards A and B should be the same state as one where they have B and A. However, in the current battle envs, we present the cards in hand and on the board in a specific order (drawing/playing order), possibly leading to a positional bias when using these envs to train neural networks. For instance, the first card slot on the player's hand will be filled in almost all states seen by the network. The last card slot, however, will rarely be filled.

This issue proposes a simple way to mitigate positional bias: implement a parameter on battle envs that randomize the card slots in the player's hand and lanes, as well as the lanes themselves.

Add Black formatting as a GitHub Action

Coac Agent Implementation

Hi, I guess this line

gym-locm/gym_locm/agents.py

Line 366 in 582dab8

if legal_moves[0].type == ActionType.PASS:

should be something like if len(state.available_actions) == 0:, compared to the original Coac C++ code here:
https://github.com/acatai/Strategy-Card-Game-AI-Competition/blob/master/contest-2020-08-COG/Coac/main.cpp#L1070

Or did I misunderstand the code?

Implement saving models as JSON

Make runner seed agents

Add support for LOCM 1.5

Update the READMEs

Training script: draft needs tensorflow==1.14.0 but battle needs tensorflow>2

Battle training: self-play opponents use deterministic actions while agents don't

Make runner support LOCM 1.5

[Question] Obtaining numerical state for an agent

Hey, I've been working with LOCM for the past couple of weeks and I am at the point now where I would like to use my trained agents in the environment. I have subclassed the Agent abstract class, defined the necessary methods and created the environment as

env = gym.make(
    'LOCM-battle-v0', version='1.5',
    deck_building_agents=[my_draft_agent, opponent],
    battle_agent=opponent,
    reward_functions=['win-loss'],
    reward_weights=[1.0]
)

Now the problem is that during the initialisation, my_draft_agent is used for the draft phase but the agent gets state of the type gym_locm.engine.game_state.State which is unsuitable for a neural network that is used within the agent. Is there any way I can obtain a numerical representation of the state, such as the one returned by env.step?

From what I gathered looking at the source code. State does not have any method that would return the numerical representation of the state. The only place where I found such a method is in the LOCMEnv class which I, unfortunately, cannot access from the agent during the draft phase, I believe. Is there any other way? Thanks!

The engine code and its consistency with the original nim code

Hi, thanks for the great repo which makes life easier for those who want to do the RL training:-) Also, the code structure looks neat and it's easy to single out the desired modules for one's own project (thanks for providing the MIT license:-) )!

I'm trying to figure out how the env.step(...) implements. To my understanding, it seems that you "reproduce" the whole engine code of "the original nim one". In particular, the Python code here receives the input actions (performed by the agents) and modify the State instance accordingly. If this were true, how should we ensure the behaviors (i.e., the (state, action) -> new_state transition) are consistent with the original nim code? (Maybe the only way is to read and compare the nim code and Python code? Also, when locm 1.5 comes out we may need another engine.py as the State and Action can be totally different?)

Create battle envs for LOCM 1.5

Reformat the code with Black

Include action number in the `Action` class for Offline RL use cases

When experimenting with Offline RL, one needs to create a dataset containing (state, action) pairs to pass to a chosen algorithm. In this case, action usually means an integer between 0 and the number of possible actions.

Currently, Gym-LOCM does not allow this as agents return an instance of the Action class which does not store such information. There is, however, an easy fix for this. The LOCMEnv.decode_deck_building_action and LOCMEnv.decode_battle_action methods already contain the correct integer representation of the action. All that is needed is to modify the Action class and its __repr__ method accordingly and pass it the correct action number wherever it is needed. This coupled with a bunch of minor changes in other files should allow training Offline RL algorithms.

I have a working fix that I made for my Offline RL experiments which I can include in a PR once I finish my thesis, if this is something you would like to support.

BTW. I have been using Gym-LOCM for some time now and I have accumulated various fixes for which I'll be creating issues here on GH in case somebody else runs into the same problems. Feel free to ignore them or close them if they are something that you don't see a point in.

The "n-switches" parameter on training.py actually means "amount of times to switch throughout training" and not "amount of episodes to run before switching"

Remove player_hp, enemy_hp, card_draw, and area from creatures on the board

As discussed on acatai/Strategy-Card-Game-AI-Competition#7, in order to be 100% consistent with the original engine, gym-locm should set the player_hp, enemy_hp, card_draw, and area attributes to zero whenever a creature is summoned (moved from hand to board).

	def encode_state(self):
	"""Encodes a state object into a numerical matrix."""
	if self.state.phase == Phase.DECK_BUILDING:
	return self._encode_state_deck_building()
	elif self.state.phase == Phase.BATTLE:
	return self._encode_state_battle()