GithubHelp home page GithubHelp logo

Comments (7)

JacopoPan avatar JacopoPan commented on July 18, 2024

Hi @nbenave,

when you run

python  gym-pybullet-drones/examples/learn.py

what you see at the end is a trained model applied to the quadrotor, i.e. line 88:

action, _states = model.predict(obs,

the resulting performance is not great because learn.py is an example script that learns over "only" 10000 steps
model.learn(total_timesteps=10000) # Typically not enough

if you want to look at those 10000 steps, you only need to change this line
env = gym.make("takeoff-aviary-v0")

to

env = gym.make("takeoff-aviary-v0", gui=True) 

however, I think you'll realize that adding the frontend and rendering can make the learning prohibitively time consuming

in singleagent.py I used stable-baselines3's EvalCallback to save a model every time it improves performance

eval_callback = EvalCallback(eval_env,

you might want to do something similar to visualize how the agent changes during learning "offline"

from gym-pybullet-drones.

nbenave avatar nbenave commented on July 18, 2024

Thank you for your quick and detailed answer!

in the Show performance code section (line 72-101)
the environment will display the model top performance ?

I have a few short questions if you can clarify few things

  1. 10,000 timestamps equivalent to 10 seconds of training ?
  2. The reward is changing from -200 at the initial steps and can reach to about -20, what is the optimal reward ? what is this numeric value is representing in this environment ?
  3. in line 81 , the range of the for loop is range(3*env.SIM_FERQ) , can you explain why iterating over SIM_FREQ ? and why multiply by 3 ?

Thanks again.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

Briefly

  • yes
  1. no, 10'000 env.steps()'s, i.e. 10'000/(env.SIM_FREQ/env.AGGR_PHY_STEPS) seconds
  2. the reward depends on the environment/task, the one you are referring to penalizes the quadrotor for not being at the desired hover position, it cannot be 0, as the quadrotor does not start at that position but it gets smaller as quickly as the quadrotor gets and stays there
  3. it only means you'll be shown an arbitrary number (3*env.AGGR_PHY_STEPS) seconds of simulation (different environments have different episode length)

from gym-pybullet-drones.

nbenave avatar nbenave commented on July 18, 2024

Thank you again mate.
now its more clear for me :)

Another question about the multi-agent learning.
The training is for both of the quad-copter? each quadcopter is training separately ?
Each of them observe simultaneously ? or there's a joint observation ?

The reward in each step related to the follower / the leader / or both of them ?

Thanks!

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

The MARL example in multiagent.py is based on the centralized critic examples of RLlib so yes, both agents learn, there is some postprocessing that goes into creating the observations of each agent and each agent has its own reward signal.
The multi-agent script, in my intention, was meant as a demonstration of how a multi-agent environment can be used.
The best way to do MARL is still a bit up for debate, imho.

from gym-pybullet-drones.

nbenave avatar nbenave commented on July 18, 2024

Thanks again,
how the reward is calculated in multiagent learning ?

there's a reward for drone 0 , and reward for drone 1, but I dont understand how you calculate the overall reward.

There's an equation for calculating these two rewards into one overall reward ?

I'm using tensorboard and the mean-reward graph displays only one parameter, and not for two drones.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 18, 2024

The multi-agent aviary returns a dictionary of rewards because each agent can receive its own signal. How to use these to learn multiple critics/value functions depends on the MARL approach you are implementing (see parameter sharing vs. fully independent learning vs. centralized critic, etc.). Off the top of my head, I don't remember what value you'd see on TB.

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.