GithubHelp home page GithubHelp logo

Comments (14)

JacopoPan avatar JacopoPan commented on July 19, 2024 1

Do I just run "python learn.py" with action type as rpm?

yes

Do I need to set up a new action that does not control yaw?

no, the action will be a vector of size 4 with the desired RPMs (in fact a plus/minus 5% centered in the hover RPMs) of each motor

Do I also need to change the reward settings?

What is mainly different in the current HoverAviary is that the reward is always positive (instead of including negative penalties), it is only based on position (the result above also included a reward component based on the velocity) and the environment does not early terminate if the quadrotor flips or flies out of bound. It might be necessary to reintroduce some of those details.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024 1

Hi @paehal

I added back the truncation condition and trained this in ~10' (this is the current code in main)

RL.mp4

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024 1

The current version of script gym_pybullet_drones/examples/learn.py does include re-loading the model and rendering it's performance, you should be able do what you desire by modifying it (I would guess your error arises from not having initialized a PPO model with the target environment before loading the trained model but I haven't encountered it myself).

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024 1

The sim/pybullet frequency is the actual physics integration frequency, yes.

The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).

For custom SB3 policies, I can only refer you to the relative documentation https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html

I used different critic/actor network sizes in past SB3 versions but the current focus of this repo is having very few dependencies and compatibility with the simplest/most stock versions of them.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024

Hi @paehal , I trained stable-baseline3 PPO to do hover with just RPMs (in the plus/minus 5% range of the hover value) back in 2020 without yaw control (as it wasn't penalized in the reward). I agree it's a more difficult RL problem and that's why the base RL aviary class includes simplified actions spaces for the 1D and the velocity control cases.

video-10.28.2020_09.45.37.mp4

This was a 4 layer architecture [256, 256, 256, 128, 2 shared 2 separate for qf and pol], with a 12 vector input [position, ori, vel, ang_vel] to 4 motor velocities (in the +-5% RPMs around the hover RPMs) after 8 hours and ~5M time steps (48Hz ctrl).

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

@JacopoPan

Thanks for the reply and sharing the video. Glad to hear that rpm control has been stable in the past.

I would like to do a study under the same conditions as yours in the latest repository, is that possible?

Here is what I am wondering.

Do I just run "python learn.py" with action type as rpm?
Do I need to set up a new action that does not control yaw?
Do I also need to change the reward settings?

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

the environment does not early terminate if the quadrotor flips or flies out of bound

Let me confirm. In latest repository, does the environment terminate if the quadrotor flips or flies out of bound?
If so, how to change the simulation setting?

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024

No, you can add that to the

def _computeTruncated(self):

method

(FYI, that the reward achieved by a "successful" one-dimensional hover is ~470 (in 3' on my machine), I just tried training the 3D hover, as is, for ~30' and it stopped at a reward of ~250).

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

@JacopoPan
Thank you for your response, it was very informative. I tried training in a similar way and obtained the following results. (Although the training time was different, I believe the results are quite close to yours.)

image

Related to this, I have a question: how can I load a trained model in a different job and save a video of its performance? Even setting --record_video to True, the video is not being saved. Also, when I tried to load a different trained model with the following settings, targeting a model in a specified folder, an error occurred. Since I'm not familiar with stable_baseline3, I would appreciate if you could help me identify the cause.

if resume and os.path.isfile(filename+'/best_model.zip'): path = filename+'/best_model.zip' model = PPO.load(path) print("Resume Model Complete")

[Error content]
python3.10/site-packages/stable_baselines3/common/base_class.py", line 422, in _setup_learn
assert self.env is not None
AssertionError

In a previous version, there was something like test_learning.py, which, when executed, allowed me to verify the behavior in a video.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

@JacopoPan

Quick response, thank you. I was able to understand what you were saying by carefully reading the code. I confirmed that the evaluation is working for the first time after training. I was able to achieve this by making some changes to the code since I wanted to run a pretrained model without retraining it.

Also, this is a different question, but (please let me know if it's better to create a separate issue), I believe that increasing the control_freq generally improves control (e.g., Hovering). So, here are the following questions:

  1. Is control_freq the same as the frequency of obtaining observations?
  2. Are there any key points that need to be changed as learning conditions when increasing control_freq? I think I probably need to increase gamma, but I'd like to know if there are any other adjustments I should make.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024

Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.
(Sim freq is the frequency at which the PyBullet step is called, normally greater than ctrl freq).

The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

Thank you for your reply.

Ctrl freq is both the frequency at which observations are produced and actions are taken by the environment.

My understanding aligns with this, which is great. Is it also correct to say that this PyBullet step is responsible for the actual physics simulation?

The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space.

This corresponds to the following part in the code, right?
self.ACTION_BUFFER_SIZE = int(ctrl_freq//2)

I'm asking out of curiosity, but where did the idea of using actions from the last 0.5 seconds as observations come from? Was it from a paper or some other source?

Additionally, if I want to change the MLP network model when increasing ctrl_freq because the last buffer action becomes too large, would the following setup be appropriate? Have you had any experience with changing the MLP network structure in a similar situation?

# Define policy network
class CustomPolicy(ActorCriticPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicy, self).__init__(*args, **kwargs,
                                           net_arch=[256, 256])

# Make PPO model using policy network
model = PPO(CustomPolicy,
            DummyVecEnv([train_env]),
            verbose=1)

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

@JacopoPan
Thank you for your comment. I have tried several experiments since last week, but it seems that entering the actions taken in the previous step leads to unstable learning as a conclusion. Although I haven't fully learned the control at 240Hz yet, I plan to try out various conditions in the future. If I have any further questions, I will ask.

from gym-pybullet-drones.

zcase avatar zcase commented on July 19, 2024

@JacopoPan how did you calculate the 470 for a "successful" training or value for a successful hover?

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.