GithubHelp home page GithubHelp logo

Comments (11)

JacopoPan avatar JacopoPan commented on July 18, 2024

Thank you again @chris-aeviator,
the whole repo is still partially work-in-progress and every feedback is appreciated!

Just a few remarks to make sure we are on the same page:

  • fly.py per se does not involve learning, it is purely a flight simulation using PID control
  • Your implementation of wind can make sense, maybe you would want to randomize the external force, check out PyBullet's Quick Start Guide to understand how to use applyExternalForce

I guess that your problem arises from the fact that the default ActionType used in singleagent.py is one_d_rpm

parser.add_argument('--act', default='one_d_rpm', type=ActionType, help='Help (default: ..)', metavar='')

I.e., the RPM of all propellers are the same in the learned controller: you would want to change that to just rpm to apply 4 different actions to the propellers. I have to warn you that that's where the learning problem gets a lot more complicated (even a simple hover might take hours, rather than just minutes to learn).

from gym-pybullet-drones.

chris-aeviator avatar chris-aeviator commented on July 18, 2024

@JacopoPan thanks for your explanation, yes we are on the same line. Since I'm evaluating an airframe design I've mentioned fly.py to test and compare certain parameters and also evaluated the wind.

I'm ok training for hrs and this is expected, with the ´one_d_rpm` setting (thanks for making that clear!) my loss converged against around -5576 and I stopped the training after 3 hrs. I will now try to understand and implement the necessary separate actions for each of the propellers.

EDIT: you are saying I will just need to set --act rpm to achieve the 4 rotors being processed independently?

EDIT 2: it seems like so 👍 🚀 🥳

image

EDIT 3: for anybody interested in my usecase: after 1 hr of training with a2c I still got really poor results (not even flying), so I switched to PPO and it shows a much bigger GPU utilization (up to 75% compared to max 20 on A2C) and seems to perform way better, the time before the craft crashes is 1.5 s (A2C) after 1 hr of training and about 4.5 seconds after about 15 min of training with --alg ppo. Even though I've set --cpu 12 I can only see one core being utilized. I'll keep posting results here.

GPU Utilization --alg ppo

image

GPU Utilization --alg a2c

steady 20%

EDIT 4:

Training 1hr 20 min in

Vehicle manages to counteract the Y-directed constant wind force though desired Z-position (1) is not reached yet

video-01.10.2021_13.40.48.mp4

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

Yes, not all algorithms might be equally successful. You might want to look at changing the number of neurons and layers in the networks in singleagent.py. Reward shaping and trying to limit the range of RPMs for each propeller might also be options (e.g., if your wind is along the global y axis and the quad in the x configuration facing +x, you might simplify the problem commanding the same RPMs to prop 0-1 and 2-3).

The goal of this repo is to give you the tools to try all these things, I don't think I've solved the entire problem of addressing generic control with RL yet :D

This is an example of hover that was learned by PPO over ~8hrs.

video-10.28.2020_09.45.37.mp4

from gym-pybullet-drones.

chris-aeviator avatar chris-aeviator commented on July 18, 2024

In your video the vehicle has a crazy spin around the Z axis when hovering, my training vehicle is tending to flip over (no new best reward for 2 hrs), would

Reward shaping

mean "punishing it" for this behaviour? Would I for example apply a reward-decreasing factor when experiencing high angular velocities or can I even punish no-go scenarios like flipping over with a -1 ?

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

Yes, if the reward function does not account for yaw or the z-axis turn rate (as it's the case in that example), the learning agent cannot distinguish between a spinning and non-spinning hover.

You can try to speed-up/"guide" your learning agent by customizing _computeReward()—the reward of a given state or state-action pair—and _computeDone() —the conditions for terminating an episode.

from gym-pybullet-drones.

chris-aeviator avatar chris-aeviator commented on July 18, 2024

Thanks Jacopo for all your help and responsiveness! I'll plan this next steps and leave more findings in this GH issue within the next days, please feel free to close it (or not) :0 .

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

👌note that I don't think there's a silver bullet for those implementations and every use case is very much of general interest (I'll keep the issue open).

from gym-pybullet-drones.

rogerscristo avatar rogerscristo commented on July 18, 2024

Hi @JacopoPan and @chris-aeviator . I've been following this issue and it helped me a lot to understand the simulator better.
Regarding training the hover task with PPO: @JacopoPan, can you please provide the hyperparameters and reward to achieve such good results? I've tried to train using default parameters over 50 million timesteps but did not reach anything like your video. Also, I've tried some reward shaping, as well as tuning PPO parameters but again without success.

Thank you in advance!

Yes, not all algorithms might be equally successful. You might want to look at changing the number of neurons and layers in the networks in singleagent.py. Reward shaping and trying to limit the range of RPMs for each propeller might also be options (e.g., if your wind is along the global y axis and the quad in the x configuration facing +x, you might simplify the problem commanding the same RPMs to prop 0-1 and 2-3).

The goal of this repo is to give you the tools to try all these things, I don't think I've solved the entire problem of addressing generic control with RL yet :D

This is an example of hover that was learned by PPO over ~8hrs.

video-10.28.2020_09.45.37.mp4
Download

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

@rogerscristo I didn't tag the commit that result came out but I remember it was one of those I obtained when I was testing sa_script.bash and sa_script.slrm on the computing cluster (for 8+ hrs). I don't think the reward has changed much (even if I tried a few variations of it, it has always been either a stepwise function, a distance, or a quadratic distance along the z axis.) Originally I had 256 instead of 512 units in the first layer of the networks. I did not touch any other hyperparameter.

I don't think it should be too surprising if some of the training runs do not succeed: during that set of experiments only PPO and SAC produced "decent" policies. My general suggestion is to start a few experiments in parallel and make sure that the network capacities are appropriate for the task at hand by checking that the learning curves are somewhat stable.

from gym-pybullet-drones.

rogerscristo avatar rogerscristo commented on July 18, 2024

Thank you @JacopoPan for the directions. I will try to generate new learning examples to complement the documentation.

Thanks a lot!

from gym-pybullet-drones.

4ku avatar 4ku commented on July 18, 2024

@rogerscristo @chris-aeviator Do you solve this problem? I also have some problems with training. I am trying to train PPO from stable_baselines3 but I don't have any good result.

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.