pbsinclair42 / mcts Goto Github PK

View Code? Open in Web Editor NEW

194.0 194.0 69.0 13 KB

A simple package to allow users to run Monte Carlo Tree Search on any perfect information domain

License: MIT License

Python 100.00%

mcts's People

Contributors

Stargazers

Watchers

mcts's Issues

Does that mean ONE action just towards ONE state?

https://github.com/pbsinclair42/MCTS/blob/f4b6226ce06840e51b66bdca4bedbe2c4c143012/mcts.py#L95C35-L95C35

    def expand(self, node):
        actions = node.state.getPossibleActions()
        for action in actions:
            if action not in node.children:
                newNode = treeNode(node.state.takeAction(action), node)
                node.children[action] = newNode
                if len(actions) == len(node.children):
                    node.isFullyExpanded = True
                return newNode

        raise Exception("Should never reach here")

we can see the node.children[action] = newNode, does that mean this algorithm assumes that if we do the same action mutiple times in the same state, we can always get the same next-state?

Incorrectly exploring worst possible move for player 2?

Hi! I am having some issues running this library, and I would like some clarification about how/why this implementation works.

It appears that, in this MCTS implementation, the getReward() function (https://github.com/pbsinclair42/MCTS/blob/master/naughtsandcrosses.py) does not vary based on the "perspective" of the player (i.e. which player is currently selecting a move). This means that a reward of 1 means first-player wins, while a reward of -1 means second-player wins. The backpropagation step sums this reward to get the totalReward for a node.

However, if you look at getBestChild() (https://github.com/pbsinclair42/MCTS/blob/master/mcts.py), the following lines are worrying to me:

nodeValue = child.totalReward / child.numVisits + explorationValue * math.sqrt(
2 * math.log(node.numVisits) / child.numVisits)
if nodeValue > bestValue:

Doesn't this mean that, regardless of who the current player is, this function is choosing to explore the child node that is best for player 1 only? Where is the minimax aspect of this algorithm, where the reward is "flipped/inverted" based on which player is playing?

Usually, MCTS implementations will choose one of two strategies:

make the reward relative to the current player (so that +1 means current player wins, not specifically player 1 or player 2 wins), so that at each node the reward result of the rollout might be added or subtracted when summing each node's totalReward depending on which agent is represented by that node.
keep the reward relative to player 1 (so +1 is always good for player 1), but allow player 2 to minimize that value instead of maximize.

I think that some people choose 1) because it's more general and can apply to games with 1, 2, 3+ players, whereas 2) is limited to two-player adversarial games.

Why does NaughtsAndCrossesState not play so well?

I play with the AI, but it seems not to play so well in this easy game.
For example,

    def __init__(self):
        self.board = [[1, -1, 0], 
                             [1,  1, 0], 
                             [-1, 0, -1]]
        self.currentPlayer = 1

I think humans will definitely choose (1,2), but the code seem to usually choose (0, 2)
Is my operation wrong?Or what caused it to rush the wrong way.

Fantasy Football Draft Implementation

I'm trying to implement a rather complicated application of this. I'm trying to do a fantasy football draft. I have a class for the draft, league members, and nfl players. Everytime it is my turn to draft, I want to simulate the entire draft to determine the best possible pick. Can somebody try to point me in the right direction? I feel like I'm really close. I've added only a few members and players and set the number of rounds to 2 for testing. Obviously these classes are bare bones and I would have to make some more constraints. I'm just trying to get logic to work. Thanks.

from __future__ import division

from copy import deepcopy
from mcts import mcts
from functools import reduce
import operator

# Class for a Member of a League. Has a name, draft position, and me = True if member is me
class Member:
  def __init__(self,name,draft_pos,me=False):
    self.name = name
    self.draft_pos = draft_pos
    self.me = me
    self.roster = []

  @property
  def total_pts(self):
    return sum([p.proj for p in self.roster])

# Class for a NFL Player. Has a name, team, position, and projected Points.
class Player:
  def __init__(self,name,team,pos,proj):
    self.name = name
    self.team = team
    self.pos = pos
    self.proj = proj
    self.drafted = False


  def __repr__(self):
    return f'{self.name} {self.proj}'

# Class for a Draft. Has members, players, and number of rounds
class Draft:
  def __init__(self,members,players,num_rounds):
    self.members = members
    self.players = players
    self.num_rounds = num_rounds

  # Get Players that have been drafted
  @property
  def drafted_players(self):
    return [p for p in self.players if p.drafted == True]

  # Get Players that are undrafted
  @property
  def undrafted_players(self):
    return [p for p in self.players if p.drafted == False]

  # Get current overall pick in draft
  @property
  def current_overall_pick(self):
    return len(self.drafted_players) + 1

  # Get current round of draft
  @property
  def current_round(self):
    return ((self.current_overall_pick -1) // len(self.members)) + 1

  # Get current pick in the round
  @property
  def current_pick_in_round(self):
    return self.current_overall_pick - (len(self.members)*(self.current_round - 1))

  # Get the member that is currently drafting
  @property
  def current_member_drafting(self):
    if self.current_round % 2 != 0:
      return next((m for m in self.members if m.draft_pos == self.current_pick_in_round), None)
    elif self.current_round % 2 == 0:
      return next((m for m in self.members if m.draft_pos == (len(self.members) - self.current_pick_in_round + 1)), None)

  # Draft a player to the member that is currently drafting
  def draft_player(self,player):
    self.current_member_drafting.roster.append(player)
    print(f'{self.current_member_drafting.name} drafted {player.name}')
    player.drafted = True

  # If I am drafting, return 1, else return -1
  def getCurrentPlayer(self):
    if self.current_member_drafting.me == True:
      return 1
    else:
      return -1

  # Get a list of all players available to draft
  def getPossibleActions(self):
    possible_actions = []
    for player in self.undrafted_players:
      possible_actions.append(player)

    return possible_actions

  # Create copy of state and draft player to the current member drafting
  def takeAction(self,action):
    new = deepcopy(self)
    new.draft_player(action)
    return new

  # Node is terminal when all teams are full
  def isTerminal(self):

    if all(len(m.roster) == self.num_rounds for m in self.members):
      return True
    else:
      return False


  def getReward(self):
    me = next((m for m in self.members if m.me == True), None)
    others = [m for m in self.members if m.me == False]
    if all(me.total_pts > m.total_pts for m in others):
      return 1
    else:
      return False



if __name__=="__main__":
  m1 = Member('m1',1,me=True)
  m2 = Member('m2',2)
  m3 = Member('m3',3)

  members = [m1,m2,m3]

  p1 = Player('Tom Brady','TB','QB',300)
  p2 = Player('Josh Allen','BUF','QB',325)
  p3 = Player('Russell Wilson','DEN','QB',275)
  p4 = Player('Mac Jones','NE','QB',350)
  p5 = Player('Matt Ryan','DEN','QB',250)
  p6 = Player('Drew Lock','NE','QB',225)

  players = [p1,p2,p3,p4,p5,p6]
  draft = Draft(members,players,num_rounds=2)

  searcher = mcts(iterationLimit=10)
  action = searcher.search(initialState=draft)

  print(action)

Pip install gives ModuleNotFoundError

After Pip install being successful (pip show mcts), I still get a modulenotfounderror.

AttributeError: 'NoneType' object has no attribute 'state'

Hi, Thanks for your great work. When I run your example NaughtsAndCrossesState, I get the info that is AttributeError: 'NoneType' object has no attribute 'state'

naughtsandcrosses not playing optimally

I'm trying out your code, just expanding on your demo. I set up the computer to play itself, but it's not doing a very good job, even when I extend the time limit from 1 second to 10. X almost always wins. Any idea why? Am I running the code wrong, or is there something wrong with your algorithm and/or naughtsandcrosses implementation?

I can share my NaughtsAndCrossesState.renderBoard() code if you'd like. Everything else is unchanged from your code.

mcts = mcts(timeLimit=1000)

state = NaughtsAndCrossesState()
print(state.renderBoard())

while not state.isTerminal():
    action = mcts.search(state)
    state = state.takeAction(action)
    print(state.renderBoard())

Correctly selecting best possible branch?

Hi,
thank you for your work, really good stuff.
I am a bit confused however on one point of the implementation logic.
Let's assume I need to decide the best possible action between A and B, given an initial state and a number of iteration limit parameter (let's say limit = 1000).

In the first two iterations, until the node is fully expanded all the branches (or node's childs) are explored. Therefore, going back to our example, after two iterations both branch A and B are explored once. Great.
However when the node is expanded and both branches are visited once, there are a number of things that can happen:

When exploring branch A, A wins. When exploring branch B, B loses
When exploring A, A loses. When exploring B, B wins
Both A and B wins or loses.

In case A wins (same applies with reversed letters in case B wins), then when I execute round number 3, on selectNode() the best child is selected, in this case A.

The problem I have noticed is that at the end of the 1000 iterations, branch A is explored 999 times (node.numVisit), while branch B is explored just once, at the beginning.
It looks like that getBestChild() get stuck on the first branch which won and never recover from there, and in at the end of the simulation that would be the winning branch.

Something to consider is that I am using continuous rewards and not True, False (win or lose) rewards.
I tried to play with the exploration constant, but didn't help.

Is the implementation correct? is there something to mitigate this you would suggest?
Thank you

Feature request: have more information returned with the action chosen

Hi, thank you for your great work. Is it possible to have more information returned with the action chosen? like probability of success, expected reward or anything that could be interesting? thank you

mcts object is not callable after the first usage

As far as I understood, through MCTS we receive only the next most beneficial play after iterations but I also wonder about putting mcts object to a for loop so that I might end up with the best path. But I get the same error each time which is TypeError: 'mcts' object is not callable when I try to use the object for a second time.

Not entirely understanding the getBestChild function

    def getBestChild(self, node, explorationValue):
        bestValue = float("-inf")
        bestNodes = []
        for child in node.children.values():
            nodeValue = child.totalReward / child.numVisits + explorationValue * math.sqrt(
                2 * math.log(node.numVisits) / child.numVisits)
            if nodeValue >= bestValue:
                bestValue = nodeValue
                bestNodes.append(child)
        return random.choice(bestNodes)

This implementation looks like it will always append at least the first child it explores to bestNodes, which means it could be selected at random at the end, right?

I might be misunderstanding the process, but that might explain the performance issues raised by the other poster.

pbsinclair42 / mcts Goto Github PK

mcts's People

Contributors

Stargazers

Watchers

Forkers

mcts's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs