GithubHelp home page GithubHelp logo

peter1591 / hearthstone-ai Goto Github PK

View Code? Open in Web Editor NEW
296.0 42.0 49.0 10.13 MB

A Hearthstone AI based on Monte Carlo tree search and neural nets written in modern C++.

C++ 89.19% Makefile 0.57% C 0.01% C# 9.14% Shell 0.07% Python 1.01%
hearthstone neural-network monte-carlo-tree-search simulation-engine ai

hearthstone-ai's Introduction

Introduction

The is an AI for the card game HearthStone which originally motivated by AlphaGo! This work combines Monte Carlo tree search (with extensions for imperfection game), deep neural network, and a high-performance game engine.

Compete with Mage in basic practice mode. Running on Macbook Pro. AI can easily beat innkeeper (8 = 0). Video

Compete with Warlock in expert practice mode. Running on Macbook Pro. Video

Motivation

  • AlphaGo successfully combines MCTS and deep neural networks to beat human on Go.
  • Games with hidden information are still a big challenge in many ways.
  • Give it a try on Hearhstone.

Modules

  • Header-only implementation. No dependency. No need to compile anything!
  • Use template programming intensively for higher performance.
  • A judgement framework allowing two agents to compete with each other.
  • Monte Carlo tree search
    • Use Multiple-Observer MCTS to handle hidden information
    • Share tree nodes for identical boards
  • Combine with a neural network
    • Act as a policy network to choose the promising steps with higher probabilities
    • Also act as a default network to play the game in simulation phase
    • Guess the game result for early cutoff

Neural Network

Use TensorFlow for training and prediction. The neural network model is defined in model.py. A neural network model can be trained and integrated into the MCTS agent by the following steps:

  1. Prepare training data.
  2. Train the model using Tensorflow.
  3. Save andfreeze the model.
  4. Set the model path to MCTS agent.

A simple example shows the neural network can greatly boost MCTS play strength:

  • A mid-level person knows the arcane missiles should generally not be played in the first turn.
  • If using random default policy, it takes more than 300k iterations (8G+ RAM) to realize this.
  • If using neural network as default policy, it only takes < 15k iterations (less than 5 seconds) to realize this.

Similar to AlphaZero proposed by DeepMind, a reinforcement learning pipeline is also implemented. The pipeline works, but requires intensive computation resource to do a great job. Some results are also outlined.

  • Use the logging feature in HearthStone
  • Written in C# since no critical performance issue occurs here.
  • Parse the logs to get a picture of the game board.
  • Use the C# coroutine to parse the logs in a cleaner way.
  • Integrate everything in one piece.
  • Automatically show a suggestion move as you play the game. See videos like this for a demo.

Installation

  1. Install HearthStone on Windows.
  2. Enable logging to HearthStone, so we can know what's the board looks like.
  3. Open the C# project under this folder.
  4. Compile and run it.

Future Works

The goal of the neural network is to guess who is going to win this game, by looking at only the current board. Several improvements could be done:

  1. Take history data into account: secret cards, played cards, etc.
  2. Take hand cards into account.
  3. Take card id into account. Currently, only HP/Max-HP/Attack are considered.
  4. Take cards in deck into account.

Hope we can have a better accuracy than current result (~79%, which also aligned to the result of AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone (https://knowledgepit.fedcsis.org/mod/page/view.php?id=1022)).

I have tried to embedding the card id to encode the battlecry and deathrattle features for each different card. Maybe we need to find a better way to generate game data automatically, so the neural network can learn the embeddings separately and hopefully more accurately.

This is now dealt with by a separated repository on github.

Balance Between Wide and Deep

Why wide? Due to randomness, the branch factor is quite large (~4000 when drawing a card). So, there are many tree nodes in the game tree.

Why deep? The neural network by itself are not strength enough. We need to think ahead more steps to overcome the weakness in simulation.

In a naive implementation of MCTS, all the children nodes must be expanded before we use UCB formula to choose a child node and continues in selection stage. Few ideas here:

  1. A fixed possibility to continue in selection stage. Even not all children are expanded.
  2. A dynamic possibility based on rest of thinking time and current expansion progress.

Share information between nodes

Even if there are only one card is different, we still need two tree nodes. Otherwise, we will fuse the strategy decision in Monte-Carlo tree search. However, this does not means that, we cannot share information between nodes. On the contrary, AMAF (all-move-as-first) and RAVE (rapid action value estimation) are based on this basic idea.

Automatic Play bot

Right now, Just refer to the move the AI suggested, and do it manually on the game client.

Demo Videos

Contribution

Just me. Any idea/help is always welcomed.

License

Latest GPL license is applied to this project.

Some third party libraries are used, please also obey to their licenses.

hearthstone-ai's People

Contributors

zappybiby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hearthstone-ai's Issues

Calculate spell damage twice?

CardManipulator's Damage() calls
BoardManipulator's CalculateFinalDamageAmount()
which calculate spell damage for spell/secret cards

So, client cards should not add spell damage by itself?

Switch to simulation within a main action

Do we need to switch to simulation mode within a main action?

For example,
A main action is to decide from (PLAY-CARD, HERO-POWER, END-TURN)

Assume we were in selection mode at this main action node, the UCB policy is used to determined from these choices.
Assume we choose the PLAY-CARD action
Assume this is the FIRST TIME we make this choice, so a new node is added to the game tree.

Now, do we want to switch to simulation mode?

In current design, we only switch to simulation mode after this MAIN ACTION + SUB ACTIONS are done.
That is, we switch to simulation after

  1. added a node for PLAY-CARD
  2. added a node for CHOOSE-HAND-CARD
  3. added a node for CHOOSE-TARGET (if any)
  4. more nodes for callback (if any)
    Now, after this main action is done, we switched to simulation mode.

All properties on entity

Similar to tag framework

Pros:

  1. To get attributes, we only need to operate on entities
  2. state::Board acts as a index of the card references. E.g., to quickly enumerate over all minions.

Cons:

  1. Many fields on entity

Current decision:

  • Write all properties on entity (i.e., cards::RawCard)

Refine Valid action helper

Refines on state::State:

  • PrepareValidActions() --> returns ValidActionHelper()
    • Do some process
    • Selection stage can save the board after this is called
  • ApplyAction(ValidActionHelper const* = nullptr)

Requirement

  • No overhead. Keep the simulation quick.

Notes

  • Even in selection stage, the board after 'PrepareValidActions()' is not saved in memory
    • Only save the BoardView
  • Since there's hidden information, a determination phase runs before each episode. So even the board is saved, we will not run that exactly board at following episodes.
  • But, in fact, the hidden information should get nothing to do with the prepare action.
  • So, maybe the valid action helper should not be implemented within the state::State. It should be related to BoardView.

Lower down rate for invalid state

What situations lead to an invalid state?

  1. not enough resource
    a. cost health, but with no enough health
    b. not cost health, but with no crystal
    c. [NOTE] cost might be reduced/added due to some effects
  2. client card cannot be played
  3. client card needs target, but failed
  4. no space for minion
  5. secret already exists
  6. GetDefender() callback returns invalid target
  7. attacker is not attackable
  8. defender VANISHED before attack
  9. hero power is not usable

Need to store (un)enchanted states?

Need to store (un)enchanted states in entity?

Or, when we need to update/re-calculate enchanted states, we...

  1. Load the raw card information from database
  2. If the minion is silenced, add a 'SILENCED' enchantment
    --> which remove divine-shield / charge / spell damage / etc.
  3. Apply all enchantments

Request to update ReadMe

First I added log.config to C:\Users[username]\AppData\Local\Blizzard\Hearthstone and added

[Achievements]
LogLevel=1
FilePrinting=true
ConsolePrinting=true
ScreenPrinting=false

[Power]
LogLevel=1
FilePrinting=true
ConsolePrinting=true
ScreenPrinting=false

If this is needed please add to documentation.

The I follow your directions: Opened C# project under path\hearthstone-ai-master\vs_projects\GameEngineUI

Then I run it. Everything compiles and runs. I see a window with 1 button. I press the button a file picker appears. What do I do?

I see from the code it wants to know where the cpp dll is. I click on it, but then it just pops up with a number (427549). What does this mean? How can I get what the best move is?

Refine cost framework

Some cost modifiers are attached to the entire game (i.e., board), or attached to a particular player

Some of them are permanent effects, some of them are one-turn effects, and some of them are aura effects.

How to deal with them?

Share nodes in MCTS

Use hash table to identify which tree nodes can be shared

TODO: do we really want to share tree nodes?

  • The play history is important in control decks (or even mid-range decks)
  • AMAF or RAVE already relief the slow-start issue
  • Bright side is: this can shrink the memory print of the game tree

hero can be implemented as a card

Pros:

  1. All targetable objects are now of type 'Card'
  2. Unify logic for attacker / defender

Cons:

  1. Weapon mechanism should be re-design
  2. One more card type? Say, kCardTypeHero?

Notes:

  1. Hero can be replaced by a card
  2. When hero is placed/replaced, weapon status should be updated

Class card redeclares typename Card in common.h

Rethink the way to process invalid actions

In some states, only a subset of actions are valid

  1. Minions cannot attack (just summoned, or attacked)
  2. No hand card can be played (not enough resource, no required target)
  3. etc.

In current implementation

  1. All actions are numbered from 1
  2. Invalid actions are pre-filtered out as much as possible
  3. The left (hopefully) valid actions are re-numbered from 1

Since later, a policy network might be used to

  • pick up the most promising action
  • The re-numbering process might not be a good idea
    • E.g., State 1 has a promising action 'PLAY 3RD CARD'
    • State 2, which is similar to State 1, also has a promising action 'PLAY 3RD CARD'
    • But, since the re-number, the PLAY CARD action might be with a different number
    • Might make the underlying policy network (e.g., deep neural network) a hard time to learn

Some thoughts

  1. Do not re-number valid actions. Just filter them out if later the action is picked up.
  2. state::State support find valid actions more deeply.

Profile percentage of game state copying

In current implementation,
before applying each action (i.e., play-card, attack, hero-power, or end-turn)
the game state is saved on stack
This game state is restored if the action is actually an invalid action and make the application failed.

We can make the game state to be copy-on-write,
and support the fast response for those methods which are probably invoked when applying an invalid action

Since there are many discussions on the efficiency of copy-on-write data structures,
it's better to delay after we've done some profiling.

Restart mechanism for invalid actions

Restart mechanism for invalid actions

Problem

Some of the choices might lead to an invalid action.

Current Design

Remember a tree in both selection and simulation stage.
This tree is rooted from the last main action,
and will be traversed again from this root once an invalid
game state is detected.

Issues in current design

  • When an invalid state is detected, we restarted from the last main action
    • The selection/simulation policy is re-calculated again, and then applied
      • Issue: [FIXED] we should re-apply the first few choices, except for the last sub-action?
  • Issue: Cannot switch to simulation stage during sub-actions
    • Discussion: is this really beneficial?
  • The tree structure for the selection stage and the simulation stage are totally different
    • Issue: The restart algorithm are totally different. Make some unification?
    • Can we unify the restart steps, and write in TreeBuilder?
    • Define some interface for the selection/simulation stages
      • GetBoardForMainAction() <-- maybe this should in TreeBuilder
      • GetPendingSubActions()

Analysis

Why an invalid state?

  • No playable hand card
    • Cannot be easily detected beforehand, since card might be played by costing health.
  • No available attacker
    • Most case can be pre-detected by game simulation engine with ValidActionGetter.
    • Special flags: cannot-attack-to-hero
  • No available defender
    • Most case can be pre-detected by game simulation engine with ValidActionGetter.
    • hero is immune
  • No available target
    • A card requires a target, but no target is available

Deal with invalid state

When an invalid state is reached, we cannot finish the current MCTS episode, since that particular move is actually invalid.

Probability of an invalid state

  • No playable hand card
    • No pre-checking for playable hand card.
    • So, all hand cards are considered as playable
    • If a player has no crystal left, then all hand cards are not playable
    • Conclusion: high chance, nearly 100% if no crystal left (except cost health instead of crystals).

Several approaches can be done in this situation.

  1. Discard current MCTS episode, and restart again.
  • The action can be marked as invalid in selection stage
  • But, in simulation stage, there's no tree to remark this.
  1. Restart from the last main action
  2. Restart from the last sub action
  • It's possible that this sub action has no any valid action. Need to restart from the previous sub action.

Selection stage

A tree is established in selection stage, so we can mark a child as invalid easily.

Simulation stage

As discuss in the issue #45, the simulation engine should be able to generate valid actions. At least, with a high probability to generate a valid action.

Discussions

Need tree for simulation?

If we have a tree for simulation, we can remember which action is invalid, and restart quickly.

Since there's a high chance to have an invalid action when picking up a playable card (happens when no crystal left), we should make it fast.

But, for performance, we should lower down the rate of an invalid state as much as possible.

What happens if there's no tree for simulation, and an invalid state is reached during simulation? We can have a linear (not a tree) data structure to record the black-list choices along the path.

Record black-list for choices

A linear data structure to record the black-list choices along the chosen path.

Random node

Assumption: If a state is valid before random. Then, ALL random outcome should yields a valid state.

That is,

  • If ANY random outcome yields an invalid state. Then, the state before the random is invalid.

Interface for stage handler

`
// Make a choice, and modify the progress accordingly
//@return the choice
int Select(Progress & progress);

// Report if a choice leading to an invalid state
void ReportInvalid(Progress & progress)
`

Data structure to record black list choices

  • A linear structure to record all nodes traversed
  • Re-apply the sub-actions from a saved board
  • Each node consists
    • ActionType --> random / manual
    • Choices --> consistency check only
    • variant<selection::Progress, simulation::Progress>
  • The 'Progress' class should be copyable
    • It's guaranteed that, only the last progress will be used for restore

Task list

Selection and simulation stage handler

  • Refactor out progress class
  • Follow new interface
  • DONE

Implement data structure to record black list

  • DONE

Unify logics in TreeBuilder

  • DONE

Analysis

  • Should we switch to simulation within sub-actions?
    • Create another issue

Code refine

  • Simulation stage handler
    • ChooseAction() and ApplyAction() are too similar

Lower down simulation invalid rate

  • Currently, the action applied in a simulation stage is with success rate about 21%
  • This mainly due to that we cannot check if we can play a card or not before applying the action
  • Should modify game engine to support this kind of queries.

unify logic in tree builder

  • unify for both selection and simulation
    • extract 'Progress' from selection class?
    • the whole selection/simulation stage handler can be seen as the Progress class
    • but, the simulation stage handler needs a ChoiceBlacklist on the stack

Review interface of manipulators

  • client cards use manipulators, not directly using state::State or FlowControl::FlowContext

    • if an enchantment is bound with an event, the event should be triggered correctly after a minion became a copy of it.
  • enchantments should have a method:

    • AfterAdded()
    • event can only be registered there
      • bring a event manager pointer as a context field
    • called after a minion is copied / transformed-as

stealth overrides taunt

also,
As with Stealth, Taunt minions that are Immune have their Taunt ability temporarily suppressed, and can thus be bypassed.

remove the extra iterator consistency checking mechanism

Currently the minions are stored in a c++ list container,

when insert, the other iterators are still valid (as opposite to the std::vector)

In current code, we have implemented our own consistency-checking mechanism,

and when a minion is inserted, the consistency-checking framework will invalidate all other iterators

this is not necessary as long as the std::list is used.

Note: when a minion is removed, the other iterators are still valid in std::list,

BUT!! the iterators pointing to the removed minion should be invalidated.

However, since we have no tracking info for such iterators, all iterators are marked invalidated.

This is a desired behavior since the game engine should not introduce such behavior.

Manipulators should have hierarchy structure

Cards manipulator: set zone, set zone position, etc.
Characters manipulator: all above, attack/defend
Minion manipulator: all above, enchant, aura,

They should have hierarchy like this.

Remove any enchantments on draw

When a card is drawn
remove all its enchantments

restore the aura to default value in card database
(maybe it got silenced before going to graveyard)

In this sense,
maybe we can just store card_id when cards are in deck

or, in other words,
we only need to store the whole Cards::CardData when cards are in HAND / PLAY zone

if a minion is freezed twice, it should be thaw at once

Freezed twice --> thaw at once

Taunted twice --> broken at once

Divine shield twice --> broken at once

stealth twice --> shown at once

Also, the minion stat can be reduced to below zero since some stats (e.g., taunt) can be removed during game flow (e.g., attack)

refine enchantments framework

aura enchantment should be applied in-order

You play an Amani Berserker and Enrage it, giving it 5 Attack. You then play Humility on it, giving it 1 Attack. You then heal and Enrage it a second time - the new Enrage is at the end of order of play, going after the Humility effect and it now has 4 Attack.

use std::variant
each enchantment entry is either an 'normal enchantment', or an 'aura enchantment'

Check all targetable filter

Check all battlecry
--> they should all apply Targetable() filter

Check all spell target
--> they should all apply SpellTargetable() filter

Confused by project structure

Hello @peter1591, I wanted to ask you about the project structure. So I've been trying to resolve https://github.com/zappybiby/hearthstone-ai/issues/1 but I am not seeing any obvious issues with compiling or anything like that.

Now I'm wondering if any of the projects you have in the main repo (HearthstoneAI, MCTS, and vs_projects) are linked together in some way. I've never dealt with a repo with multiple projects (and I am new to coding as well) so the structure here confuses me. Maybe we should rename the solutions? Sorry for being a newbie! I hope to add more to this project soon after I get this resolved.

Cannot run using vs 2017

严重性 代码 说明 项目 文件 行 禁止显示状态
错误 CS0103 当前上下文中不存在名称“GameEngineCppWrapper” GameEngineUI D:\Code\hearthstone-ai-master\hearthstone-ai-master\vs_projects\GameEngineUI\Form1.cs 32 活动的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.