Comments (4)
@ArminBaz the reward function in the latest commit is not the same of when I wrote the message above.
Have you tried looking at the performance of the trained agent using script test_singleagent.py
?
It should be under folder gym-pybullet-drones/experiments/learning/results
$ python ./test_singleagent.py --exp ./results/save-<env>-<algo>-<obs>-<act>-<time-date>
(-30 over the episode should be ok, as there are negative rewards for any point except the desired hover one)
from gym-pybullet-drones.
@JacopoPan That makes a lot of sense, thank you for getting back so quickly!
from gym-pybullet-drones.
Hello @amijeet,
apologies if I break workflows (especially around learn.py
) as I am actively modifying the code.
If you want to get started on single agent RL, look at this commit and, in particular,
These 2 scripts:
- singleagent.pyโusing a few of
stable-baselines3
algorithms - test_singleagent.pyโto re-run a model trained with the previous script
And these 2 classes:
This is a much simplified take-off and hover scenario with a 2-D obs space (z and velocity in z) and a 1-D action space (the RPM for all motors).
The reward is 1 for z between 0.75 and 0.99 and 0 otherwise.
In this example, running stable-baselines3
's PPO finds a solution in just a few minutes.
$ cd gym-pybullet-drones/experiments/learning/
$ python singleagent.py --env takeoff --algo ppo --pol mlp --input rpm
Output:
Eval num_timesteps=10000, episode_reward=26.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=20000, episode_reward=29.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=30000, episode_reward=58.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=40000, episode_reward=173.00 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Stopping training because the mean reward 173.00 is above the threshold 100
Of course, more complicated tasks, using higher dimensional observations and action vectors can require:
- More sophisticated reward engineering (see
TakeoffAviary.py
) - And/or to customize the learning networks architecture (see
singleagent.py
)
as well as much longer training times. E.g. simply making the input 4-D complicates the problem enough that PPO only collects 1/5 of the reward in 15x the number of iterations:
Eval num_timesteps=680000, episode_reward=31.00 +/- 0.00
Episode length: 86.00 +/- 0.00
New best mean reward!
I don't have all the answers, the purpose of this gym is exactly to try (and let others try) these things.
from gym-pybullet-drones.
Hey @JacopoPan, forgive me if this is a naive question as I am still relatively new to reinforcement learning and your library. I just ran singleagent.py (from the most recent commit) on takeoff and I noticed that my model seems to be far slower than the one you showed.
It seems that you were able to break the mean reward threshold after 40000 timesteps. While I am stuck in -30 at around 120000. Do you know why this may be happening and do you have any suggestions on how to speed up the training? Thanks!
Here is the output for reference:
Eval num_timesteps=110000, episode_reward=-30.23 +/- 0.00
Episode length: 242.00 +/- 0.00
Eval num_timesteps=115000, episode_reward=-30.18 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=120000, episode_reward=-30.15 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
Eval num_timesteps=125000, episode_reward=-30.12 +/- 0.00
Episode length: 242.00 +/- 0.00
New best mean reward!
from gym-pybullet-drones.
Related Issues (20)
- ModuleNotFoundError: No module named 'gym_pybullet_drones' HOT 3
- Have you ever comfirm controlling one drone with "rpm" using learn.py ? HOT 14
- PX4 integration HOT 1
- Cannot run simulation on linux Fedora HOT 9
- Question about action buffer HOT 2
- Custom training environment HOT 4
- Questions Related to Multiagent Evaluation HOT 8
- How can I use the NVIDIA GPU in Docker to run this project? HOT 2
- self.TIMESTEP not defined in BaseAviary Class HOT 1
- A Drag function mistake HOT 1
- ResetBasePositionAndOrientation didn't work while simulation is running. HOT 1
- is there a specific ordinary differential equation (ODE) of model system? HOT 1
- RPM Motor Mapping HOT 6
- Some camera associated issues HOT 1
- What does pycffirmware do HOT 1
- run learn.py
- rgb and GL HOT 2
- path planning algorithms HOT 1
- Ctrl Freq and Simulation Freq Questions HOT 1
- Clarification on Each Dimension's Meaning for ActionType.VEL HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym-pybullet-drones.