GithubHelp home page GithubHelp logo

Comments (5)

hunkim avatar hunkim commented on August 30, 2024

Good catch. For lab 08, we are experimenting with several reward functions.

Also do you have any comments on:

random_noise = np.random.uniform(0, 1, output_size)
        action = np.argmax(action_prob + random_noise)

I guess this is perhaps correct implementation:

action = np.argmax(np.random.multinomial(n=1, pvals=action_prob, size=1)[0])

I really need help on PG. Please give us more comments. Thanks in advance.

from reinforcementzerotoall.

kkweon avatar kkweon commented on August 30, 2024

If it's a policy gradient, the agent should follow the given policy distribution, and it shouldn't just follow argmax at least when it's training.

In the policy gradient agent for CartPole case, a single action should be chosen as following:

actions = [0, 1] # suppose there are two discrete actions
action_prob = [0.7, 0.3] # distribution given from policy network
action = np.random.choice(actions, size=1, p=action_prob)

I didn't actually run the file yet but for problems like cartpole, it's always the case that derivative free model or any simpler model will outperform policy gradient methods. So, I wouldn't be surprised if it's actually doing worse. Though I have to check if other implementations are correct. Will let you know!

from reinforcementzerotoall.

kkweon avatar kkweon commented on August 30, 2024

Today I tested the above code.

It turns out there was a problems with numpy.dtype in the above code.
The default was set to numpy.int

The correct implementation of discount rewards should be:

def discount_correct_rewards(r, gamma=0.99):
    """ take 1D float array of rewards and compute discounted reward """
    discounted_r = np.zeros_like(r, dtype=np.float32)
    running_add = 0
    for t in reversed(range(len(r))):
        running_add = running_add * gamma + r[t]
        discounted_r[t] = running_add

    # discounted_r -= discounted_r.mean()
    # discounted_r /- discounted_r.std()
    return discounted_r

It works well.

The reason why the original implementation works?
It's due to the normalization factor. It kinda has similar effects.
That's why people love normalization I guess lol.

However, it should work without the normalization.
The correct implementation will work always with the normalization or without the normalization.

Suggestion

  • Update discount_rewards to the above implementation

from reinforcementzerotoall.

hunkim avatar hunkim commented on August 30, 2024

Please feel free to fix/send PR.

In addition, could you also fix the max 200 limit for cart pole in QN and previous examples?

Thanks in advance!

from reinforcementzerotoall.

Androbin avatar Androbin commented on August 30, 2024

Just noticed a fatal typo:
discounted_r /- discounted_r.std()

Please update for future readers:
discounted_r /= discounted_r.std()

from reinforcementzerotoall.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.