peter1591 / hearthstone-ai Goto Github PK

A Hearthstone AI based on Monte Carlo tree search and neural nets written in modern C++.

C++ 89.19% Makefile 0.57% C 0.01% C# 9.14% Shell 0.07% Python 1.01%

hearthstone neural-network monte-carlo-tree-search simulation-engine ai

hearthstone-ai's Introduction

Introduction

The is an AI for the card game HearthStone which originally motivated by AlphaGo! This work combines Monte Carlo tree search (with extensions for imperfection game), deep neural network, and a high-performance game engine.

Compete with Mage in basic practice mode. Running on Macbook Pro. AI can easily beat innkeeper (8 = 0).

Compete with Warlock in expert practice mode. Running on Macbook Pro.

Motivation

AlphaGo successfully combines MCTS and deep neural networks to beat human on Go.
Games with hidden information are still a big challenge in many ways.
Give it a try on Hearhstone.

Modules

Game engine

Header-only implementation. No dependency. No need to compile anything!
Use template programming intensively for higher performance.

Judgement framework

A judgement framework allowing two agents to compete with each other.

MCTS Agent

Monte Carlo tree search
- Use Multiple-Observer MCTS to handle hidden information
- Share tree nodes for identical boards
Combine with a neural network
- Act as a policy network to choose the promising steps with higher probabilities
- Also act as a default network to play the game in simulation phase
- Guess the game result for early cutoff

Neural Network

Use TensorFlow for training and prediction. The neural network model is defined in model.py. A neural network model can be trained and integrated into the MCTS agent by the following steps:

Prepare training data.
Train the model using Tensorflow.
Save andfreeze the model.
Set the model path to MCTS agent.

A simple example shows the neural network can greatly boost MCTS play strength:

A mid-level person knows the arcane missiles should generally not be played in the first turn.
If using random default policy, it takes more than 300k iterations (8G+ RAM) to realize this.
If using neural network as default policy, it only takes < 15k iterations (less than 5 seconds) to realize this.

Reinforcement Learning Pipeline

Similar to AlphaZero proposed by DeepMind, a reinforcement learning pipeline is also implemented. The pipeline works, but requires intensive computation resource to do a great job. Some results are also outlined.

Game Board Recognition

Use the logging feature in HearthStone
Written in C# since no critical performance issue occurs here.
Parse the logs to get a picture of the game board.
Use the C# coroutine to parse the logs in a cleaner way.

Graphical User Interface

Integrate everything in one piece.
Automatically show a suggestion move as you play the game. See videos like this for a demo.

Installation

Install HearthStone on Windows.
Enable logging to HearthStone, so we can know what's the board looks like.
Open the C# project under this folder.
Compile and run it.

Future Works

Neural Network Improvements

The goal of the neural network is to guess who is going to win this game, by looking at only the current board. Several improvements could be done:

Take history data into account: secret cards, played cards, etc.
Take hand cards into account.
Take card id into account. Currently, only HP/Max-HP/Attack are considered.
Take cards in deck into account.

Hope we can have a better accuracy than current result (~79%, which also aligned to the result of AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone (https://knowledgepit.fedcsis.org/mod/page/view.php?id=1022)).

I have tried to embedding the card id to encode the battlecry and deathrattle features for each different card. Maybe we need to find a better way to generate game data automatically, so the neural network can learn the embeddings separately and hopefully more accurately.

This is now dealt with by a separated repository on github.

Balance Between Wide and Deep

Why wide? Due to randomness, the branch factor is quite large (~4000 when drawing a card). So, there are many tree nodes in the game tree.

Why deep? The neural network by itself are not strength enough. We need to think ahead more steps to overcome the weakness in simulation.

In a naive implementation of MCTS, all the children nodes must be expanded before we use UCB formula to choose a child node and continues in selection stage. Few ideas here:

A fixed possibility to continue in selection stage. Even not all children are expanded.
A dynamic possibility based on rest of thinking time and current expansion progress.

Share information between nodes

Even if there are only one card is different, we still need two tree nodes. Otherwise, we will fuse the strategy decision in Monte-Carlo tree search. However, this does not means that, we cannot share information between nodes. On the contrary, AMAF (all-move-as-first) and RAVE (rapid action value estimation) are based on this basic idea.

Automatic Play bot

Right now, Just refer to the move the AI suggested, and do it manually on the game client.

Demo Videos

First demo video is on!!! https://youtu.be/z0I1nM6_k0w
Another demo video with higher quality: https://youtu.be/L6kr_zJKCQI
First demo video with expert warlock: https://youtu.be/wLvBlKChFW0
Another demo video played with expert warlock innkeeper: https://youtu.be/yVX8nTo8o00

Contribution

Just me. Any idea/help is always welcomed.

License

Latest GPL license is applied to this project.

Some third party libraries are used, please also obey to their licenses.

hearthstone-ai's People

Contributors

Stargazers

Watchers

hearthstone-ai's Issues

Change namespaces to lowercases

Calculate spell damage twice?

CardManipulator's Damage() calls
BoardManipulator's CalculateFinalDamageAmount()
which calculate spell damage for spell/secret cards

So, client cards should not add spell damage by itself?

secret cards

fix bug: minion enchantment listened to a event + become-of-a-minion

the event listened by the enchantment should be registered after a new minion became a copy of the minion.

add a new test:

minion: gain +1/+1 after turn end
faceless manipulator: become the minion
turn end
check both gained +1/+1

Switch to simulation within a main action

Do we need to switch to simulation mode within a main action?

For example,
A main action is to decide from (PLAY-CARD, HERO-POWER, END-TURN)

Assume we were in selection mode at this main action node, the UCB policy is used to determined from these choices.
Assume we choose the PLAY-CARD action
Assume this is the FIRST TIME we make this choice, so a new node is added to the game tree.

Now, do we want to switch to simulation mode?

In current design, we only switch to simulation mode after this MAIN ACTION + SUB ACTIONS are done.
That is, we switch to simulation after

added a node for PLAY-CARD
added a node for CHOOSE-HAND-CARD
added a node for CHOOSE-TARGET (if any)
more nodes for callback (if any)
Now, after this main action is done, we switched to simulation mode.

All properties on entity

Similar to tag framework

Pros:

To get attributes, we only need to operate on entities
state::Board acts as a index of the card references. E.g., to quickly enumerate over all minions.

Cons:

Many fields on entity

Current decision:

Write all properties on entity (i.e., cards::RawCard)

Refine Valid action helper

Refines on state::State:

PrepareValidActions() --> returns ValidActionHelper()
- Do some process
- Selection stage can save the board after this is called
ApplyAction(ValidActionHelper const* = nullptr)

Requirement

No overhead. Keep the simulation quick.

Notes

Even in selection stage, the board after 'PrepareValidActions()' is not saved in memory
- Only save the BoardView
Since there's hidden information, a determination phase runs before each episode. So even the board is saved, we will not run that exactly board at following episodes.
But, in fact, the hidden information should get nothing to do with the prepare action.
So, maybe the valid action helper should not be implemented within the state::State. It should be related to BoardView.

Lower down rate for invalid state

What situations lead to an invalid state?

not enough resource
a. cost health, but with no enough health
b. not cost health, but with no crystal
c. [NOTE] cost might be reduced/added due to some effects
client card cannot be played
client card needs target, but failed
no space for minion
secret already exists
GetDefender() callback returns invalid target
attacker is not attackable
defender VANISHED before attack
hero power is not usable

Zone PutASide should be SetASide

As title

dead entries should called aura update one more time

Dead entries should call aura update at least one more time,
to remove the aura enchantments.

Maybe a aura manager?

client card should not directly access state::State

Client card needs:

FlowControl::Manipulators
state::EventManager

but should not touch:

state::Cards (zone changer, etc.)
since manipulators might need to trigger events when zone changed

Need to store (un)enchanted states?

Need to store (un)enchanted states in entity?

Or, when we need to update/re-calculate enchanted states, we...

Load the raw card information from database
If the minion is silenced, add a 'SILENCED' enchantment
--> which remove divine-shield / charge / spell damage / etc.
Apply all enchantments

Request to update ReadMe

First I added log.config to C:\Users[username]\AppData\Local\Blizzard\Hearthstone and added

[Achievements]
LogLevel=1
FilePrinting=true
ConsolePrinting=true
ScreenPrinting=false

[Power]
LogLevel=1
FilePrinting=true
ConsolePrinting=true
ScreenPrinting=false

If this is needed please add to documentation.

The I follow your directions: Opened C# project under path\hearthstone-ai-master\vs_projects\GameEngineUI

Then I run it. Everything compiles and runs. I see a window with 1 button. I press the button a file picker appears. What do I do?

I see from the code it wants to know where the cpp dll is. I click on it, but then it just pops up with a number (427549). What does this mean? How can I get what the best move is?

Refine cost framework

Some cost modifiers are attached to the entire game (i.e., board), or attached to a particular player

Some of them are permanent effects, some of them are one-turn effects, and some of them are aura effects.

How to deal with them?

Communicate with c++ using c#

visual studio debug visualizer

https://msdn.microsoft.com/en-us/library/ms164759.aspx

A easier way to use state::Manipulators::StateManipulator

state::State state;

Implement state.Manipulate() to replace: state::Manipulators::StateManipulator(state)

remove disable warning C4127 after if-constexpr supported

A /wd4127 compiler option is added to suppress this warning

Potential Typo

In StaticEventTriggerrer.h line 17, class 'Invoker' doesn't resolve. Did you mean to use class 'Invoker2'?

Visualize game tree

D3 JS
https://skillsmatter.com/skillscasts/7460-visualising-game-trees-with-d3-js
https://bl.ocks.org/mbostock/4062045

Freezing attribute

Water Elemental + Betrayal

Emperor Cobra + Betrayal

http://hearthstone.gamepedia.com/Betrayal

So maybe the 'freezing attack' and 'poisonous attack' should be implemented via event triggers

Share nodes in MCTS

Use hash table to identify which tree nodes can be shared

TODO: do we really want to share tree nodes?

The play history is important in control decks (or even mid-range decks)
AMAF or RAVE already relief the slow-start issue
Bright side is: this can shrink the memory print of the game tree

hero can be implemented as a card

Pros:

All targetable objects are now of type 'Card'
Unify logic for attacker / defender

Cons:

Weapon mechanism should be re-design
One more card type? Say, kCardTypeHero?

Notes:

Hero can be replaced by a card
When hero is placed/replaced, weapon status should be updated

Class card redeclares typename Card in common.h

Two lines affected in common.h:

naming for manipulator and underlying POD structures

manipulator postfix can be removed,

and the underlying POD structures can be added with the 'Data' postfix

Refine EventHookedEnchantment

specify event type in template parameter
check enchantment existence in framework

Rethink the way to process invalid actions

In some states, only a subset of actions are valid

Minions cannot attack (just summoned, or attacked)
No hand card can be played (not enough resource, no required target)
etc.

In current implementation

All actions are numbered from 1
Invalid actions are pre-filtered out as much as possible
The left (hopefully) valid actions are re-numbered from 1

Since later, a policy network might be used to

pick up the most promising action
The re-numbering process might not be a good idea
- E.g., State 1 has a promising action 'PLAY 3RD CARD'
- State 2, which is similar to State 1, also has a promising action 'PLAY 3RD CARD'
- But, since the re-number, the PLAY CARD action might be with a different number
- Might make the underlying policy network (e.g., deep neural network) a hard time to learn

Some thoughts

Do not re-number valid actions. Just filter them out if later the action is picked up.
state::State support find valid actions more deeply.

Refine sfinae

https://jguegant.github.io/blogs/tech/sfinae-introduction.html

Profile percentage of game state copying

In current implementation,
before applying each action (i.e., play-card, attack, hero-power, or end-turn)
the game state is saved on stack
This game state is restored if the action is actually an invalid action and make the application failed.

We can make the game state to be copy-on-write,
and support the fast response for those methods which are probably invoked when applying an invalid action

Since there are many discussions on the efficiency of copy-on-write data structures,
it's better to delay after we've done some profiling.

Restart mechanism for invalid actions

Problem

Some of the choices might lead to an invalid action.

Current Design

Remember a tree in both selection and simulation stage.
This tree is rooted from the last main action,
and will be traversed again from this root once an invalid
game state is detected.

Issues in current design

When an invalid state is detected, we restarted from the last main action
- The selection/simulation policy is re-calculated again, and then applied
  - Issue: [FIXED] we should re-apply the first few choices, except for the last sub-action?
Issue: Cannot switch to simulation stage during sub-actions
- Discussion: is this really beneficial?
The tree structure for the selection stage and the simulation stage are totally different
- Issue: The restart algorithm are totally different. Make some unification?
- Can we unify the restart steps, and write in TreeBuilder?
- Define some interface for the selection/simulation stages
  - GetBoardForMainAction() <-- maybe this should in TreeBuilder
  - GetPendingSubActions()

Analysis

Why an invalid state?

No playable hand card
- Cannot be easily detected beforehand, since card might be played by costing health.
No available attacker
- Most case can be pre-detected by game simulation engine with ValidActionGetter.
- Special flags: cannot-attack-to-hero
No available defender
- Most case can be pre-detected by game simulation engine with ValidActionGetter.
- hero is immune
No available target
- A card requires a target, but no target is available

Deal with invalid state

When an invalid state is reached, we cannot finish the current MCTS episode, since that particular move is actually invalid.

Probability of an invalid state

No playable hand card
- No pre-checking for playable hand card.
- So, all hand cards are considered as playable
- If a player has no crystal left, then all hand cards are not playable
- Conclusion: high chance, nearly 100% if no crystal left (except cost health instead of crystals).

Several approaches can be done in this situation.

Discard current MCTS episode, and restart again.

The action can be marked as invalid in selection stage
But, in simulation stage, there's no tree to remark this.

Restart from the last main action
Restart from the last sub action

It's possible that this sub action has no any valid action. Need to restart from the previous sub action.

Selection stage

A tree is established in selection stage, so we can mark a child as invalid easily.

Simulation stage

As discuss in the issue #45, the simulation engine should be able to generate valid actions. At least, with a high probability to generate a valid action.

Discussions

Need tree for simulation?

If we have a tree for simulation, we can remember which action is invalid, and restart quickly.

Since there's a high chance to have an invalid action when picking up a playable card (happens when no crystal left), we should make it fast.

But, for performance, we should lower down the rate of an invalid state as much as possible.

What happens if there's no tree for simulation, and an invalid state is reached during simulation? We can have a linear (not a tree) data structure to record the black-list choices along the path.

Record black-list for choices

A linear data structure to record the black-list choices along the chosen path.

Random node

Assumption: If a state is valid before random. Then, ALL random outcome should yields a valid state.

That is,

If ANY random outcome yields an invalid state. Then, the state before the random is invalid.

Interface for stage handler

`
// Make a choice, and modify the progress accordingly
//@return the choice
int Select(Progress & progress);

// Report if a choice leading to an invalid state
void ReportInvalid(Progress & progress)
`

Data structure to record black list choices

A linear structure to record all nodes traversed
Re-apply the sub-actions from a saved board
Each node consists
- ActionType --> random / manual
- Choices --> consistency check only
- variant<selection::Progress, simulation::Progress>
The 'Progress' class should be copyable
- It's guaranteed that, only the last progress will be used for restore

Task list

Selection and simulation stage handler

Refactor out progress class
Follow new interface
DONE

Implement data structure to record black list

DONE

Unify logics in TreeBuilder

DONE

Analysis

Should we switch to simulation within sub-actions?
- Create another issue

Code refine

Simulation stage handler
- ChooseAction() and ApplyAction() are too similar

Lower down simulation invalid rate

Currently, the action applied in a simulation stage is with success rate about 21%
This mainly due to that we cannot check if we can play a card or not before applying the action
Should modify game engine to support this kind of queries.

unify logic in tree builder

unify for both selection and simulation
- extract 'Progress' from selection class?
- the whole selection/simulation stage handler can be seen as the Progress class
- but, the simulation stage handler needs a ChoiceBlacklist on the stack

event trigger loop should not be a infinite loop

https://www.youtube.com/watch?v=YlaP_kF823k

Card implementation: Renounce Darkness

Review interface of manipulators

client cards use manipulators, not directly using state::State or FlowControl::FlowContext
- if an enchantment is bound with an event, the event should be triggered correctly after a minion became a copy of it.
enchantments should have a method:
- AfterAdded()
- event can only be registered there
  - bring a event manager pointer as a context field
- called after a minion is copied / transformed-as

stealth overrides taunt

also,
As with Stealth, Taunt minions that are Immune have their Taunt ability temporarily suppressed, and can thus be bypassed.

remove the extra iterator consistency checking mechanism

Currently the minions are stored in a c++ list container,

when insert, the other iterators are still valid (as opposite to the std::vector)

In current code, we have implemented our own consistency-checking mechanism,

and when a minion is inserted, the consistency-checking framework will invalidate all other iterators

this is not necessary as long as the std::list is used.

Note: when a minion is removed, the other iterators are still valid in std::list,

BUT!! the iterators pointing to the removed minion should be invalidated.

However, since we have no tracking info for such iterators, all iterators are marked invalidated.

This is a desired behavior since the game engine should not introduce such behavior.

Review categorized event triggers

Don't use vector, use hash table instead

Client cards should consider to use categorized event triggers instead

Game ai

Q learning + deep neural network

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

New attribute: immune while attacking

Gladiator's Longbow

Gorehowl

Profile Guided Optimization

Runtime analysis of a execution hot path is crucial. It sometimes make a JIT out-performed a static program like C++.

Profile-guided optimization maybe a rescue for this.
https://textslashplain.com/2016/01/10/getting-started-with-profile-guided-optimization/

g++ optimization flags:
-fprofile-generate
-fprofile-use
-fprofile-dir

Card implementation memo: Gorehowl

Refer to Gladiator's Longbow
Apply an enchantment (make it immune), and remove it after attack

Manipulators should have hierarchy structure

Cards manipulator: set zone, set zone position, etc.
Characters manipulator: all above, attack/defend
Minion manipulator: all above, enchant, aura,

They should have hierarchy like this.

Remove any enchantments on draw

When a card is drawn
remove all its enchantments

restore the aura to default value in card database
(maybe it got silenced before going to graveyard)

In this sense,
maybe we can just store card_id when cards are in deck

or, in other words,
we only need to store the whole Cards::CardData when cards are in HAND / PLAY zone

Remove event manager's handler containers controller

already replaced by return value
return false: remove it from container

if a minion is freezed twice, it should be thaw at once

Freezed twice --> thaw at once

Taunted twice --> broken at once

Divine shield twice --> broken at once

stealth twice --> shown at once

Also, the minion stat can be reduced to below zero since some stats (e.g., taunt) can be removed during game flow (e.g., attack)

refine enchantments framework

aura enchantment should be applied in-order

You play an Amani Berserker and Enrage it, giving it 5 Attack. You then play Humility on it, giving it 1 Attack. You then heal and Enrage it a second time - the new Enrage is at the end of order of play, going after the Humility effect and it now has 4 Attack.

use std::variant
each enchantment entry is either an 'normal enchantment', or an 'aura enchantment'

add play order

Check all targetable filter

Check all battlecry
--> they should all apply Targetable() filter

Check all spell target
--> they should all apply SpellTargetable() filter

Confused by project structure

Hello @peter1591, I wanted to ask you about the project structure. So I've been trying to resolve https://github.com/zappybiby/hearthstone-ai/issues/1 but I am not seeing any obvious issues with compiling or anything like that.

Now I'm wondering if any of the projects you have in the main repo (HearthstoneAI, MCTS, and vs_projects) are linked together in some way. I've never dealt with a repo with multiple projects (and I am new to coding as well) so the structure here confuses me. Maybe we should rename the solutions? Sorry for being a newbie! I hope to add more to this project soon after I get this resolved.

Use event trigger list instead of play order

Death rattle is triggered by play order
Maybe we can use event trigger to check death
Then no need for play order anymore

Cannot run using vs 2017

严重性代码说明项目文件行禁止显示状态
错误 CS0103 当前上下文中不存在名称“GameEngineCppWrapper” GameEngineUI D:\Code\hearthstone-ai-master\hearthstone-ai-master\vs_projects\GameEngineUI\Form1.cs 32 活动的

peter1591 / hearthstone-ai Goto Github PK

hearthstone-ai's Introduction

Introduction

Motivation

Modules

Neural Network

Installation

Future Works

Balance Between Wide and Deep

Share information between nodes

Automatic Play bot

Demo Videos

Contribution

License

hearthstone-ai's People

Contributors

Stargazers

Watchers

Forkers

hearthstone-ai's Issues

Problem

Current Design

Issues in current design

Analysis

Why an invalid state?

Deal with invalid state

Probability of an invalid state

Selection stage

Simulation stage

Discussions

Need tree for simulation?

Record black-list for choices

Random node

Interface for stage handler

Data structure to record black list choices

Task list

Selection and simulation stage handler

Implement data structure to record black list

Unify logics in TreeBuilder

Analysis

Code refine

Lower down simulation invalid rate

unify logic in tree builder

Recommend Projects

Recommend Topics

Recommend Org

Jobs