GithubHelp home page GithubHelp logo

Comments (7)

kinoc avatar kinoc commented on August 19, 2024 1

I think in this implementation everything is relative to the root node player/action selector, and in particular the final evaluation function. It assumes the final evaluation is able to perform all the min/max's up to the point of final evaluation, and the top level nodeValues will converge to the expected average final reward for that action for player 1 whatever their evaluation function might be. The eventual eval function could be looking for max/max (win/win) over multiple players instead of min/max (win/lose). So it is more of a general action selection optimizer than specifically for min/max games (but can be used for them), and that just happens to fit in my use case.
The one possible "solution" would be to embed a "player eval variable" in the state, to either provide the toggle with each level (a float multiplied by -1 for each switch in sides, and fractional for friends or friends of enemies), or a list of evaluation operations to apply to the final static evaluation to get something closer to your desired value.

from mcts.

esparano avatar esparano commented on August 19, 2024 1

I think a simple fix would be to modify
def getReward(self):
to something like:
def getReward(self, agentPerspective):

Edit: I have to think a bit more about the actual implementation but this is generally how I see it done.

from mcts.

esparano avatar esparano commented on August 19, 2024

That would make sense as a possible implementation, (#1 in my examples), but I don't think that's what this code actually does. If you look at the "backpropagate" function, a single reward is given to all nodes, regardless of which agent is at play.

from mcts.

harrdb12 avatar harrdb12 commented on August 19, 2024

I agree with @esparano on this; there is definitely something wrong with the way it handles reward and the current player. As it is, it always seems to optimize for player 1, causing it to play correctly if it has the first move, but causing it to intentionally try to lose if it goes 2nd.

Right now I'm looking into a fix for this, either editing mcts.py or the GetReward function in the Naughts&Crosses example. I think there's also a similar issue with the default value of the exploration constant, causing it to assume that the opponent is going to make bad moves. For example, in the Naught&Crosses game, the AI will often ignore blocking an opponent who is about to win in favor of setting up its own win in the future.

I'm working on thinking through some fixes for these, and will likely create an issue soon. I'd also like to think about how this can be expanded to handle non-perfect information games, like simple card games. Any help at that time would be appreciated.

from mcts.

pbsinclair42 avatar pbsinclair42 commented on August 19, 2024

Thanks all for the feedback, you're quite correct there's an issue here as identified. The easy fix is what's in @harrdb12 's pull request, however this does then limit the usage to only minimax games. I've merged that for now to fix the immediate bug, but will also look into adapting the library to fix this for n-player games too.

from mcts.

esparano avatar esparano commented on August 19, 2024

@pbsinclair42 @harrdb12 As mentioned, the suggested fix only works for 2-player adversarial games. The more general approach for n-player games is to have a function (agent, node) -> reward. This way, the agent could not only return a positive when agent.id == node.state.currentPlayer (or negative when !=), but could do more advanced logic like agent.team == node.state.currentPlayer.team. Or even just always returning a positive value for 1-player games.

Basically, it's up to the agent themselves to determine if the value of the node is a 'good" or "bad".

from mcts.

harrdb12 avatar harrdb12 commented on August 19, 2024

Hey @esparano , thanks for the feedback. I'm tinkering around with AI in another game right now (in my spare time), but plan to come back and make some modifications to this repo at some point once I have come up with some good ideas for improvements. I'll be sure to look into your suggestions as well to make it more general when I do.

from mcts.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.