GithubHelp home page GithubHelp logo

Comments (9)

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on August 18, 2024

I'm surprised that no one has noticed this issue with the reward function though admittedly on edge cases

What is your suggested changes to the reward in terms of code?
Could you provide some testing code for this behaviour? (you can use known seeds and actions to testing particular outcomes)

from gymnasium.

peterhungh3 avatar peterhungh3 commented on August 18, 2024

@pseudo-rnd-thoughts

For case 1: player 21 vs Dealer BJ and vice versa. Currently, the returned reward is 0 (draw)

Ex code to test for player = BJ + dealer = 21

from gymnasium.envs.toy_text.blackjack import BlackjackEnv, sum_hand 

env = BlackjackEnv(natural=True)
while True: 
    obs, _ = env.reset()
    if obs[0] == 21: # Player BJ
        action = 0
        next_obs, reward, terminated, truncated, info = env.step(action)
        if sum_hand(env.dealer) == 21 and len(env.dealer) > 2: # 21
            print(f"Player: {env.player} & dealer = {env.dealer}, " 
                f"reward = {reward}")
            assert reward == 1.5, reward

Ex code to test for player = 21 & dealer = BJ

    env = BlackjackEnv(natural=True)
    while True: 
        obs, _ = env.reset()
        if sum_hand(env.player) == 21: # ignore player BJ, we want to find 21
            continue 

        action = 1
        next_obs, reward, terminated, truncated, info = env.step(action)
        if (sum_hand(env.player) == 21 and len(env.player) > 2 and # player 21
            sum_hand(env.dealer) == 21 and len(env.dealer) == 2 # dealer BJ
        ): 
            print(f"Player: {env.player} & dealer = {env.dealer}, " 
                f"reward = {reward}")
            assert reward == -1, reward

For case 2: when both busted: I've rechecked and actually the current codes could handle this.
This was my mistake as I was trying to extend the env to support double-down and that case happened to enter this code path:
reward = cmp(score(self.player), score(self.dealer))
which would produce a reward of 0, which is incorrect. But the current codes already handle the busted player case in another path.
Nevertheless, the above line of code still seems a bit "dangerous" as it made me think it would seem to be able to handle all cases.

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on August 18, 2024

Could you make a PR with the suggested changes and tests for the relative rules

from gymnasium.

frischzenger avatar frischzenger commented on August 18, 2024
def is_natural(hand):  # Is this hand a natural blackjack?
    return sorted(hand) == [1, 10]

and i am also confused about natural, why here should be sorted?

from gymnasium.

frischzenger avatar frischzenger commented on August 18, 2024
    def step(self, action):
        assert self.action_space.contains(action)
        if action:  # hit: add a card to players hand and return
            self.player.append(draw_card(self.np_random))
            if is_bust(self.player):
                terminated = True
                reward = -1.0
            else:
                terminated = False
                reward = 0.0
        else:  # stick: play out the dealers hand, and score
            terminated = True
            while sum_hand(self.dealer) < 17:
                self.dealer.append(draw_card(self.np_random))
            reward = cmp(score(self.player), score(self.dealer))
            if self.sab and is_natural(self.player) and not is_natural(self.dealer):
                # Player automatically wins. Rules consistent with S&B
                reward = 1.0
            elif (
                not self.sab
                and self.natural
                and is_natural(self.player)
                and reward == 1.0
            ):
                # Natural gives extra points, but doesn't autowin. Legacy implementation
                reward = 1.5

        if self.render_mode == "human":
            self.render()
        return self._get_obs(), reward, terminated, False, {}

on this line:
if action: # hit: add a card to players hand and return
the player after the reset should has two cards, but step into this line, the player has three cards, why ?

from gymnasium.

CloseChoice avatar CloseChoice commented on August 18, 2024

Current rewards seem wrong for 2 edge cases:

  1. Player: Blackjack vs Dealer: 21. Currently this return a draw (reward = 0) while it should be 1 (b/c player wins).
  2. Player and dealer both bust: Currently this return a draw (reward = 0) while it should be -1 (dealer wins).

I think both cases are handled correctly according to standard rules.
1: (cited from Wikipedia)

A player total of 21 on the first two cards is a "natural" or "blackjack", and the player wins immediately unless the dealer also has one, in which case the hand ties. In the case of a tie ("push" or "standoff"), bets are returned without adjustment.

This should also be the case if both player and dealer hit 21 (without either of them having a blackjack) but haven't found this stated explicitly on Wikipedia.
2: (also Wikipedia)

Number cards count as their number, the jack, queen, and king ("face cards" or "pictures") count as 10, and aces count as either 1 or 11 according to the player's choice. If the total exceeds 21 points, it busts, and all bets on it immediately lose.

If the player busts, then the dealer does not need to play as indicated by "immediately lose". So the second case should never happen.

from gymnasium.

CloseChoice avatar CloseChoice commented on August 18, 2024
def is_natural(hand):  # Is this hand a natural blackjack?
    return sorted(hand) == [1, 10]

and i am also confused about natural, why here should be sorted?

because a natural does not depend on the order, you can either have an Ace + something worth a 10 or have it the other way round. So by sorting first the check is independent of order.

As for your other question:

on this line:
if action: # hit: add a card to players hand and return
the player after the reset should has two cards, but step into this line, the player has three cards, why ?

This should not be possible when I look at the current implementation. Can you write code that reproduces this? I would suspect that you did not call .reset() here.

from gymnasium.

pseudo-rnd-thoughts avatar pseudo-rnd-thoughts commented on August 18, 2024

I believe that @CloseChoice is correct however if @peterhungh3 or @frischzenger disagree, please provide an example case where you believe an issue occurs

from gymnasium.

peterhungh3 avatar peterhungh3 commented on August 18, 2024

@pseudo-rnd-thoughts @CloseChoice

I think what @CloseChoice said is correct, and if that's what gets implemented.

Again, my recommendation is just to make the reward calculation function more explicit, because the reward depends on the state of the game, not just what's calculated in score() func I cited above.

Back to the case:
If we take a look at the step() func of this file:

If the user has a BJ, the action will be stand. But the step() function won't check if the user has a BJ or not, but will add cards to the dealer until the sum is above 17. As a result, the dealer may get to 21 with say 3 cards. Then the comp() func will return a reward of 0, instead of 1 or 1.5 for users (users always win when having BJ, unless dealer also has BJ).

Again, this is a small point. My recommendation is just to make the reward calculation func taking into consideration the state of the game, not just the score of the hand.

from gymnasium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.