Comments (11)
Thank you again @chris-aeviator,
the whole repo is still partially work-in-progress and every feedback is appreciated!
Just a few remarks to make sure we are on the same page:
fly.py
per se does not involve learning, it is purely a flight simulation using PID control- Your implementation of wind can make sense, maybe you would want to randomize the external force, check out PyBullet's Quick Start Guide to understand how to use
applyExternalForce
I guess that your problem arises from the fact that the default ActionType
used in singleagent.py
is one_d_rpm
I.e., the RPM of all propellers are the same in the learned controller: you would want to change that to just
rpm
to apply 4 different actions to the propellers. I have to warn you that that's where the learning problem gets a lot more complicated (even a simple hover might take hours, rather than just minutes to learn).from gym-pybullet-drones.
@JacopoPan thanks for your explanation, yes we are on the same line. Since I'm evaluating an airframe design I've mentioned fly.py
to test and compare certain parameters and also evaluated the wind.
I'm ok training for hrs and this is expected, with the ´one_d_rpm` setting (thanks for making that clear!) my loss converged against around -5576 and I stopped the training after 3 hrs. I will now try to understand and implement the necessary separate actions for each of the propellers.
EDIT: you are saying I will just need to set --act rpm
to achieve the 4 rotors being processed independently?
EDIT 2: it seems like so 👍 🚀 🥳
EDIT 3: for anybody interested in my usecase: after 1 hr of training with a2c
I still got really poor results (not even flying), so I switched to PPO and it shows a much bigger GPU utilization (up to 75% compared to max 20 on A2C) and seems to perform way better, the time before the craft crashes is 1.5 s (A2C) after 1 hr of training and about 4.5 seconds after about 15 min of training with --alg ppo
. Even though I've set --cpu 12
I can only see one core being utilized. I'll keep posting results here.
GPU Utilization --alg ppo
GPU Utilization --alg a2c
steady 20%
EDIT 4:
Training 1hr 20 min in
Vehicle manages to counteract the Y-directed constant wind force though desired Z-position (1) is not reached yet
video-01.10.2021_13.40.48.mp4
from gym-pybullet-drones.
Yes, not all algorithms might be equally successful. You might want to look at changing the number of neurons and layers in the networks in singleagent.py
. Reward shaping and trying to limit the range of RPMs for each propeller might also be options (e.g., if your wind is along the global y axis and the quad in the x configuration facing +x, you might simplify the problem commanding the same RPMs to prop 0-1 and 2-3).
The goal of this repo is to give you the tools to try all these things, I don't think I've solved the entire problem of addressing generic control with RL yet :D
This is an example of hover that was learned by PPO over ~8hrs.
video-10.28.2020_09.45.37.mp4
from gym-pybullet-drones.
In your video the vehicle has a crazy spin around the Z axis when hovering, my training vehicle is tending to flip over (no new best reward for 2 hrs), would
Reward shaping
mean "punishing it" for this behaviour? Would I for example apply a reward-decreasing factor when experiencing high angular velocities or can I even punish no-go scenarios like flipping over with a -1
?
from gym-pybullet-drones.
Yes, if the reward function does not account for yaw or the z-axis turn rate (as it's the case in that example), the learning agent cannot distinguish between a spinning and non-spinning hover.
You can try to speed-up/"guide" your learning agent by customizing _computeReward()
—the reward of a given state or state-action pair—and _computeDone()
—the conditions for terminating an episode.
from gym-pybullet-drones.
Thanks Jacopo for all your help and responsiveness! I'll plan this next steps and leave more findings in this GH issue within the next days, please feel free to close it (or not) :0 .
from gym-pybullet-drones.
👌note that I don't think there's a silver bullet for those implementations and every use case is very much of general interest (I'll keep the issue open).
from gym-pybullet-drones.
Hi @JacopoPan and @chris-aeviator . I've been following this issue and it helped me a lot to understand the simulator better.
Regarding training the hover task with PPO: @JacopoPan, can you please provide the hyperparameters and reward to achieve such good results? I've tried to train using default parameters over 50 million timesteps but did not reach anything like your video. Also, I've tried some reward shaping, as well as tuning PPO parameters but again without success.
Thank you in advance!
Yes, not all algorithms might be equally successful. You might want to look at changing the number of neurons and layers in the networks in
singleagent.py
. Reward shaping and trying to limit the range of RPMs for each propeller might also be options (e.g., if your wind is along the global y axis and the quad in the x configuration facing +x, you might simplify the problem commanding the same RPMs to prop 0-1 and 2-3).The goal of this repo is to give you the tools to try all these things, I don't think I've solved the entire problem of addressing generic control with RL yet :D
This is an example of hover that was learned by PPO over ~8hrs.
video-10.28.2020_09.45.37.mp4
Download
from gym-pybullet-drones.
@rogerscristo I didn't tag the commit that result came out but I remember it was one of those I obtained when I was testing sa_script.bash
and sa_script.slrm
on the computing cluster (for 8+ hrs). I don't think the reward has changed much (even if I tried a few variations of it, it has always been either a stepwise function, a distance, or a quadratic distance along the z axis.) Originally I had 256 instead of 512 units in the first layer of the networks. I did not touch any other hyperparameter.
I don't think it should be too surprising if some of the training runs do not succeed: during that set of experiments only PPO and SAC produced "decent" policies. My general suggestion is to start a few experiments in parallel and make sure that the network capacities are appropriate for the task at hand by checking that the learning curves are somewhat stable.
from gym-pybullet-drones.
Thank you @JacopoPan for the directions. I will try to generate new learning examples to complement the documentation.
Thanks a lot!
from gym-pybullet-drones.
@rogerscristo @chris-aeviator Do you solve this problem? I also have some problems with training. I am trying to train PPO from stable_baselines3 but I don't have any good result.
from gym-pybullet-drones.
Related Issues (20)
- is there a specific ordinary differential equation (ODE) of model system? HOT 1
- RPM Motor Mapping HOT 6
- Some camera associated issues HOT 1
- What does pycffirmware do HOT 1
- run learn.py
- rgb and GL HOT 2
- path planning algorithms HOT 1
- Ctrl Freq and Simulation Freq Questions HOT 3
- Clarification on Each Dimension's Meaning for ActionType.VEL HOT 3
- Discrete action space implementation based on BaseRLAviary HOT 1
- Visualize drone cameras in explorer HOT 1
- Location of Paper on Dynamics Code HOT 1
- -1 to 1 action space meaning HOT 1
- Pybullet drones
- No module named 'gym_pybullet_drones.envs.VisionAviary'
- Units HOT 1
- High frequency in RPMs when include action buffer in observation space can couse problems in real hardware HOT 2
- Why might my rewards be inversely proportional to the target height in the HoverAviary environment? HOT 2
- Error while running velocity.py and fly.py examples HOT 1
- ray 1.9 error while installing gym-pybullet-drones HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym-pybullet-drones.