Comments (14)
Do I just run "python learn.py" with action type as rpm?
yes
Do I need to set up a new action that does not control yaw?
no, the action will be a vector of size 4 with the desired RPMs (in fact a plus/minus 5% centered in the hover RPMs) of each motor
Do I also need to change the reward settings?
What is mainly different in the current HoverAviary is that the reward is always positive (instead of including negative penalties), it is only based on position (the result above also included a reward component based on the velocity) and the environment does not early terminate if the quadrotor flips or flies out of bound. It might be necessary to reintroduce some of those details.
from gym-pybullet-drones.
Hi @paehal
I added back the truncation condition and trained this in ~10' (this is the current code in main
)
RL.mp4
from gym-pybullet-drones.
The current version of script gym_pybullet_drones/examples/learn.py
does include re-loading the model and rendering it's performance, you should be able do what you desire by modifying it (I would guess your error arises from not having initialized a PPO model with the target environment before loading the trained model but I haven't encountered it myself).
from gym-pybullet-drones.
The sim/pybullet frequency is the actual physics integration frequency, yes.
The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).
For custom SB3 policies, I can only refer you to the relative documentation https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html
I used different critic/actor network sizes in past SB3 versions but the current focus of this repo is having very few dependencies and compatibility with the simplest/most stock versions of them.
from gym-pybullet-drones.
Hi @paehal , I trained stable-baseline3 PPO to do hover with just RPMs (in the plus/minus 5% range of the hover value) back in 2020 without yaw control (as it wasn't penalized in the reward). I agree it's a more difficult RL problem and that's why the base RL aviary class includes simplified actions spaces for the 1D and the velocity control cases.
video-10.28.2020_09.45.37.mp4
This was a 4 layer architecture [256, 256, 256, 128, 2 shared 2 separate for qf and pol], with a 12 vector input [position, ori, vel, ang_vel] to 4 motor velocities (in the +-5% RPMs around the hover RPMs) after 8 hours and ~5M time steps (48Hz ctrl).
from gym-pybullet-drones.
Thanks for the reply and sharing the video. Glad to hear that rpm control has been stable in the past.
I would like to do a study under the same conditions as yours in the latest repository, is that possible?
Here is what I am wondering.
Do I just run "python learn.py" with action type as rpm?
Do I need to set up a new action that does not control yaw?
Do I also need to change the reward settings?
from gym-pybullet-drones.
the environment does not early terminate if the quadrotor flips or flies out of bound
Let me confirm. In latest repository, does the environment terminate if the quadrotor flips or flies out of bound?
If so, how to change the simulation setting?
from gym-pybullet-drones.
No, you can add that to the
method
(FYI, that the reward achieved by a "successful" one-dimensional hover is ~470 (in 3' on my machine), I just tried training the 3D hover, as is, for ~30' and it stopped at a reward of ~250).
from gym-pybullet-drones.
@JacopoPan
Thank you for your response, it was very informative. I tried training in a similar way and obtained the following results. (Although the training time was different, I believe the results are quite close to yours.)
Related to this, I have a question: how can I load a trained model in a different job and save a video of its performance? Even setting --record_video to True, the video is not being saved. Also, when I tried to load a different trained model with the following settings, targeting a model in a specified folder, an error occurred. Since I'm not familiar with stable_baseline3, I would appreciate if you could help me identify the cause.
if resume and os.path.isfile(filename+'/best_model.zip'): path = filename+'/best_model.zip' model = PPO.load(path) print("Resume Model Complete")
[Error content]
python3.10/site-packages/stable_baselines3/common/base_class.py", line 422, in _setup_learn
assert self.env is not None
AssertionError
In a previous version, there was something like test_learning.py, which, when executed, allowed me to verify the behavior in a video.
from gym-pybullet-drones.
Quick response, thank you. I was able to understand what you were saying by carefully reading the code. I confirmed that the evaluation is working for the first time after training. I was able to achieve this by making some changes to the code since I wanted to run a pretrained model without retraining it.
Also, this is a different question, but (please let me know if it's better to create a separate issue), I believe that increasing the control_freq generally improves control (e.g., Hovering). So, here are the following questions:
- Is control_freq the same as the frequency of obtaining observations?
- Are there any key points that need to be changed as learning conditions when increasing control_freq? I think I probably need to increase gamma, but I'd like to know if there are any other adjustments I should make.
from gym-pybullet-drones.
Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.
(Sim freq is the frequency at which the PyBullet step is called, normally greater than ctrl freq).
The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.
from gym-pybullet-drones.
Thank you for your reply.
Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.
My understanding aligns with this, which is great. Is it also correct to say that this PyBullet step is responsible for the actual physics simulation?
The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.
This corresponds to the following part in the code, right?
self.ACTION_BUFFER_SIZE = int(ctrl_freq//2)
I'm asking out of curiosity, but where did the idea of using actions from the last 0.5 seconds as observations come from? Was it from a paper or some other source?
Additionally, if I want to change the MLP network model when increasing ctrl_freq because the last buffer action becomes too large, would the following setup be appropriate? Have you had any experience with changing the MLP network structure in a similar situation?
# Define policy network
class CustomPolicy(ActorCriticPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[256, 256])
# Make PPO model using policy network
model = PPO(CustomPolicy,
DummyVecEnv([train_env]),
verbose=1)
from gym-pybullet-drones.
@JacopoPan
Thank you for your comment. I have tried several experiments since last week, but it seems that entering the actions taken in the previous step leads to unstable learning as a conclusion. Although I haven't fully learned the control at 240Hz yet, I plan to try out various conditions in the future. If I have any further questions, I will ask.
from gym-pybullet-drones.
@JacopoPan how did you calculate the 470 for a "successful" training or value for a successful hover?
from gym-pybullet-drones.
Related Issues (20)
- self.TIMESTEP not defined in BaseAviary Class HOT 1
- A Drag function mistake HOT 1
- ResetBasePositionAndOrientation didn't work while simulation is running. HOT 1
- is there a specific ordinary differential equation (ODE) of model system? HOT 1
- RPM Motor Mapping HOT 6
- Some camera associated issues HOT 1
- What does pycffirmware do HOT 1
- run learn.py
- rgb and GL HOT 2
- path planning algorithms HOT 1
- Ctrl Freq and Simulation Freq Questions HOT 3
- Clarification on Each Dimension's Meaning for ActionType.VEL HOT 3
- Discrete action space implementation based on BaseRLAviary HOT 1
- Visualize drone cameras in explorer HOT 1
- Location of Paper on Dynamics Code HOT 1
- -1 to 1 action space meaning HOT 1
- Pybullet drones
- No module named 'gym_pybullet_drones.envs.VisionAviary'
- Units HOT 1
- High frequency in RPMs when include action buffer in observation space can couse problems in real hardware HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym-pybullet-drones.