GithubHelp home page GithubHelp logo

ronaldosvieira / gym-locm Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 4.0 80.31 MB

OpenAI Gym environments for Legends of Code and Magic, a collectible card game designed for AI research

License: MIT License

Python 100.00%
gym-environment legends-of-code-and-magic reinforcement-learning collectible-card-games

gym-locm's Introduction

# todo: add something here next time I procrastinate

gym-locm's People

Contributors

lucca-nas avatar ronaldosvieira avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gym-locm's Issues

Cannot fully reproduce the Coac vs Chad winrate as reported in CEC2020 using NativeAgent

The Coac vs Chad winrate was reported to be 57% in CEC 2020, but I obtained a winrate ~= 80% using locm-runner and NativeAgent.

The evaluation code is:

locm-runner \
    --p1-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Coac/main" \
    --p2-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Chad/agent/target/release/agent" \
    --games 100

where I had commented out the cerr << code for Coac (e.g., here and other similar lines) as I found the self._process.read_nonblocking code from NativeAgent seemed to read both stdout and stderr (a known issue).

And here are the printed results:

...
2022-05-30 22:43:51.392527 Episode 97: 79.38% 20.62%
2022-05-30 22:43:57.315334 Episode 98: 78.57% 21.43%
2022-05-30 22:44:03.639829 Episode 99: 78.79% 21.21%
2022-05-30 22:44:12.195598 Episode 100: 79.00% 21.00%
79.00% 21.00%

See also the original discussion here

Create and automatize consistency checks

Why we need

The LOCM engine in gym_locm/engine.py should be consistent with the official Java engine. However, it has been a year since I last checked for consistency, and it is a manual (and somewhat painful) process.

What we need

Given a dataset of full matches between two deterministic agents in the official Java engine containing all (state, action, next state) transitions, there should be a script at gym_locm/toolbox that:

  • Parses the dataset.
  • Recreate the initial states in gym-locm's engine (including hidden information - make a list of all cards drawn by the players during the match and put those cards at the top of their owner's deck in reverse order).
  • Play those matches until completion using the same two deterministic agents (use the Java -> Python engine adapter, if needed).
  • Compare the (state, action, next state) transitions of all matches played at gym-locm's engine to those in the dataset and point out any differences.

Additionally, every commit to the repository should trigger this script (use GitHub actions?).

`LOCMEnv` sometimes returns `None` instead of a numerical state representation

Hey,

the function LOCMEnv.encode_state returns None when self.state.phase is neither Phase.DECK_BUILDING nor Phase.BATTLE. This happens precisely once in every game. When self.state.phase == Phase.ENDED, neither of the conditions is true and the function implicitly returns None.

def encode_state(self):
"""Encodes a state object into a numerical matrix."""
if self.state.phase == Phase.DECK_BUILDING:
return self._encode_state_deck_building()
elif self.state.phase == Phase.BATTLE:
return self._encode_state_battle()

This problem is not noticeable in most cases, but RLlib crashes when the state representation is None and I think it would be better to return a vector full of 0s or -1s for example.

Allow hand/board randomizing on state representations

In LOCM's battle, theoretically, the order of the cards in a player's hand or on their board should not matter, i.e., a state where a player has cards A and B should be the same state as one where they have B and A. However, in the current battle envs, we present the cards in hand and on the board in a specific order (drawing/playing order), possibly leading to a positional bias when using these envs to train neural networks. For instance, the first card slot on the player's hand will be filled in almost all states seen by the network. The last card slot, however, will rarely be filled.

This issue proposes a simple way to mitigate positional bias: implement a parameter on battle envs that randomize the card slots in the player's hand and lanes, as well as the lanes themselves.

[Question] Obtaining numerical state for an agent

Hey, I've been working with LOCM for the past couple of weeks and I am at the point now where I would like to use my trained agents in the environment. I have subclassed the Agent abstract class, defined the necessary methods and created the environment as

env = gym.make(
    'LOCM-battle-v0', version='1.5',
    deck_building_agents=[my_draft_agent, opponent],
    battle_agent=opponent,
    reward_functions=['win-loss'],
    reward_weights=[1.0]
)

Now the problem is that during the initialisation, my_draft_agent is used for the draft phase but the agent gets state of the type gym_locm.engine.game_state.State which is unsuitable for a neural network that is used within the agent. Is there any way I can obtain a numerical representation of the state, such as the one returned by env.step?

From what I gathered looking at the source code. State does not have any method that would return the numerical representation of the state. The only place where I found such a method is in the LOCMEnv class which I, unfortunately, cannot access from the agent during the draft phase, I believe. Is there any other way? Thanks!

The engine code and its consistency with the original nim code

Hi, thanks for the great repo which makes life easier for those who want to do the RL training:-) Also, the code structure looks neat and it's easy to single out the desired modules for one's own project (thanks for providing the MIT license:-) )!

I'm trying to figure out how the env.step(...) implements. To my understanding, it seems that you "reproduce" the whole engine code of "the original nim one". In particular, the Python code here receives the input actions (performed by the agents) and modify the State instance accordingly. If this were true, how should we ensure the behaviors (i.e., the (state, action) -> new_state transition) are consistent with the original nim code? (Maybe the only way is to read and compare the nim code and Python code? Also, when locm 1.5 comes out we may need another engine.py as the State and Action can be totally different?)

Include action number in the `Action` class for Offline RL use cases

When experimenting with Offline RL, one needs to create a dataset containing (state, action) pairs to pass to a chosen algorithm. In this case, action usually means an integer between 0 and the number of possible actions.

Currently, Gym-LOCM does not allow this as agents return an instance of the Action class which does not store such information. There is, however, an easy fix for this. The LOCMEnv.decode_deck_building_action and LOCMEnv.decode_battle_action methods already contain the correct integer representation of the action. All that is needed is to modify the Action class and its __repr__ method accordingly and pass it the correct action number wherever it is needed. This coupled with a bunch of minor changes in other files should allow training Offline RL algorithms.

I have a working fix that I made for my Offline RL experiments which I can include in a PR once I finish my thesis, if this is something you would like to support.

BTW. I have been using Gym-LOCM for some time now and I have accumulated various fixes for which I'll be creating issues here on GH in case somebody else runs into the same problems. Feel free to ignore them or close them if they are something that you don't see a point in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.