GithubHelp home page GithubHelp logo

Comments (4)

JacopoPan avatar JacopoPan commented on June 19, 2024 1

@ArminBaz the reward function in the latest commit is not the same of when I wrote the message above.
Have you tried looking at the performance of the trained agent using script test_singleagent.py?
It should be under folder gym-pybullet-drones/experiments/learning/results

$ python ./test_singleagent.py --exp ./results/save-<env>-<algo>-<obs>-<act>-<time-date>

(-30 over the episode should be ok, as there are negative rewards for any point except the desired hover one)

from gym-pybullet-drones.

ArminBaz avatar ArminBaz commented on June 19, 2024 1

@JacopoPan That makes a lot of sense, thank you for getting back so quickly!

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on June 19, 2024

Hello @amijeet,
apologies if I break workflows (especially around learn.py) as I am actively modifying the code.
If you want to get started on single agent RL, look at this commit and, in particular,

These 2 scripts:

And these 2 classes:

This is a much simplified take-off and hover scenario with a 2-D obs space (z and velocity in z) and a 1-D action space (the RPM for all motors).

The reward is 1 for z between 0.75 and 0.99 and 0 otherwise.

In this example, running stable-baselines3's PPO finds a solution in just a few minutes.

$ cd gym-pybullet-drones/experiments/learning/
$ python singleagent.py --env takeoff --algo ppo --pol mlp --input rpm

Output:

Eval num_timesteps=10000, episode_reward=26.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=20000, episode_reward=29.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=30000, episode_reward=58.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=40000, episode_reward=173.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Stopping training because the mean reward 173.00  is above the threshold 100

Of course, more complicated tasks, using higher dimensional observations and action vectors can require:

  • More sophisticated reward engineering (see TakeoffAviary.py)
  • And/or to customize the learning networks architecture (see singleagent.py)

as well as much longer training times. E.g. simply making the input 4-D complicates the problem enough that PPO only collects 1/5 of the reward in 15x the number of iterations:

Eval num_timesteps=680000, episode_reward=31.00 +/- 0.00
Episode length: 86.00 +/- 0.00
New best mean reward!

I don't have all the answers, the purpose of this gym is exactly to try (and let others try) these things.

from gym-pybullet-drones.

ArminBaz avatar ArminBaz commented on June 19, 2024

Hey @JacopoPan, forgive me if this is a naive question as I am still relatively new to reinforcement learning and your library. I just ran singleagent.py (from the most recent commit) on takeoff and I noticed that my model seems to be far slower than the one you showed.

It seems that you were able to break the mean reward threshold after 40000 timesteps. While I am stuck in -30 at around 120000. Do you know why this may be happening and do you have any suggestions on how to speed up the training? Thanks!

Here is the output for reference:

Eval num_timesteps=110000, episode_reward=-30.23 +/- 0.00
Episode length: 242.00 +/- 0.00
Eval num_timesteps=115000, episode_reward=-30.18 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=120000, episode_reward=-30.15 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=125000, episode_reward=-30.12 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.