Hi! I am having some issues running this library, and I would like some clarification

That would make sense as a possible implementation, (<a class="issue-link js-issue-lin

I agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Incorrectly exploring worst possible move for player 2? about mcts HOT 7 CLOSED

pbsinclair42 commented on August 19, 2024 1

Incorrectly exploring worst possible move for player 2?

from mcts.

Comments (7)

kinoc commented on August 19, 2024 1

I think in this implementation everything is relative to the root node player/action selector, and in particular the final evaluation function. It assumes the final evaluation is able to perform all the min/max's up to the point of final evaluation, and the top level nodeValues will converge to the expected average final reward for that action for player 1 whatever their evaluation function might be. The eventual eval function could be looking for max/max (win/win) over multiple players instead of min/max (win/lose). So it is more of a general action selection optimizer than specifically for min/max games (but can be used for them), and that just happens to fit in my use case.
The one possible "solution" would be to embed a "player eval variable" in the state, to either provide the toggle with each level (a float multiplied by -1 for each switch in sides, and fractional for friends or friends of enemies), or a list of evaluation operations to apply to the final static evaluation to get something closer to your desired value.

from mcts.

esparano commented on August 19, 2024 1

I think a simple fix would be to modify
def getReward(self):
to something like:
def getReward(self, agentPerspective):

Edit: I have to think a bit more about the actual implementation but this is generally how I see it done.

from mcts.

esparano commented on August 19, 2024

That would make sense as a possible implementation, (#1 in my examples), but I don't think that's what this code actually does. If you look at the "backpropagate" function, a single reward is given to all nodes, regardless of which agent is at play.

from mcts.

harrdb12 commented on August 19, 2024

I agree with @esparano on this; there is definitely something wrong with the way it handles reward and the current player. As it is, it always seems to optimize for player 1, causing it to play correctly if it has the first move, but causing it to intentionally try to lose if it goes 2nd.

Right now I'm looking into a fix for this, either editing mcts.py or the GetReward function in the Naughts&Crosses example. I think there's also a similar issue with the default value of the exploration constant, causing it to assume that the opponent is going to make bad moves. For example, in the Naught&Crosses game, the AI will often ignore blocking an opponent who is about to win in favor of setting up its own win in the future.

I'm working on thinking through some fixes for these, and will likely create an issue soon. I'd also like to think about how this can be expanded to handle non-perfect information games, like simple card games. Any help at that time would be appreciated.

from mcts.

pbsinclair42 commented on August 19, 2024

Thanks all for the feedback, you're quite correct there's an issue here as identified. The easy fix is what's in @harrdb12 's pull request, however this does then limit the usage to only minimax games. I've merged that for now to fix the immediate bug, but will also look into adapting the library to fix this for n-player games too.

from mcts.

esparano commented on August 19, 2024

@pbsinclair42 @harrdb12 As mentioned, the suggested fix only works for 2-player adversarial games. The more general approach for n-player games is to have a function (agent, node) -> reward. This way, the agent could not only return a positive when agent.id == node.state.currentPlayer (or negative when !=), but could do more advanced logic like agent.team == node.state.currentPlayer.team. Or even just always returning a positive value for 1-player games.

Basically, it's up to the agent themselves to determine if the value of the node is a 'good" or "bad".

from mcts.

harrdb12 commented on August 19, 2024

Hey @esparano , thanks for the feedback. I'm tinkering around with AI in another game right now (in my spare time), but plan to come back and make some modifications to this repo at some point once I have come up with some good ideas for improvements. I'll be sure to look into your suggestions as well to make it more general when I do.

from mcts.

Incorrectly exploring worst possible move for player 2? about mcts HOT 7 CLOSED

Comments (7)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs