How to repeat success in video{<a href="https://www.bilibili.com/video/BV1wi4y187tC%7D

args.gamma = 0.99 is better than <code class="notrans

How to repeat success in video{<a href="https://www.bilibili.com/video/BV

How to crack the BipedalWalkerHardcore using eRL in colab?,about ai4finance-foundation/elegantrl

Comments (2)

Yonv1943 commented on May 10, 2024

args.gamma = 0.99 is better than 0.95

args.agent = AgentModSAC() is better than AgentSAC() (AgentSAC can't pass 'BipedalWalkerHardcore-v3')

Ok, I will add 'BipedalWalkerHardcore-v3' to demo.py.

We have fully upgraded ElegantRL and now supports multiple GPU training (1~8 GPU).

Now the problem you mentioned has been resolved. I'm sorry that we have been busy developing the 80 GPU version (Cloud platform) of ElegantRL, and we were unable to reply to you in time.

from elegantrl.

Yonv1943 commented on May 10, 2024

How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?

I tried:
args = Arguments(if_on_policy=False)
args.agent = AgentSAC()
args.env = PreprocessEnv(gym.make('BipedalWalkerHardcore-v3'))
args.reward_scale = 2 ** -1 
args.gamma = 0.95
args.rollout_num = 2
args.if_remove = False
train_and_evaluate_mp(args)
After a million step (3 hours), the agent scored 37 points (MaxR)

I had add a demo about BipedalWalkerHardcore-v3 in elegantrl/demo.py line 53. And I had no time to do fine tuning on this env.


    if_train_bipedal_walker_hard_core = 0
    if if_train_bipedal_walker_hard_core:
        "TotalStep: 10e5, TargetReward:   0, UsedTime: 10ks ModSAC"
        "TotalStep: 25e5, TargetReward: 150, UsedTime: 20ks ModSAC"
        "TotalStep: 35e5, TargetReward: 295, UsedTime: 40ks ModSAC"
        "TotalStep: 40e5, TargetReward: 300, UsedTime: 50ks ModSAC"
        args.env = build_env(env='BipedalWalkerHardcore-v3')
        args.target_step = args.env.max_step
        args.gamma = 0.98
        args.net_dim = 2 ** 8
        args.batch_size = args.net_dim * 2
        args.learning_rate = 2 ** -15
        args.repeat_times = 1.5

        args.max_memo = 2 ** 22
        args.break_step = 2 ** 24

        args.eval_gap = 2 ** 8
        args.eval_times1 = 2 ** 2
        args.eval_times2 = 2 ** 5

        args.target_step = args.env.max_step * 1

    # train_and_evaluate(args)  # single process
    args.worker_num = 4
    args.visible_gpu = sys.argv[-1]
    train_and_evaluate_mp(args)  # multiple process
    # args.worker_num = 4
    # args.visible_gpu = '0,1'
    # train_and_evaluate_mp(args)  # multiple GPU

Here are two result of ElegantRL ModSAC for BipedalWalkerHardcore-v3.