Comments (7)
Hi @nbenave,
when you run
python gym-pybullet-drones/examples/learn.py
what you see at the end is a trained model applied to the quadrotor, i.e. line 88:
gym-pybullet-drones/examples/learn.py
Line 88 in c62e67a
the resulting performance is not great because
learn.py
is an example script that learns over "only" 10000 stepsgym-pybullet-drones/examples/learn.py
Line 56 in c62e67a
if you want to look at those 10000 steps, you only need to change this line
gym-pybullet-drones/examples/learn.py
Line 42 in c62e67a
to
env = gym.make("takeoff-aviary-v0", gui=True)
however, I think you'll realize that adding the frontend and rendering can make the learning prohibitively time consuming
in singleagent.py
I used stable-baselines3
's EvalCallback
to save a model every time it improves performance
you might want to do something similar to visualize how the agent changes during learning "offline"
from gym-pybullet-drones.
Thank you for your quick and detailed answer!
in the Show performance code section (line 72-101)
the environment will display the model top performance ?
I have a few short questions if you can clarify few things
- 10,000 timestamps equivalent to 10 seconds of training ?
- The reward is changing from -200 at the initial steps and can reach to about -20, what is the optimal reward ? what is this numeric value is representing in this environment ?
- in line 81 , the range of the for loop is range(3*env.SIM_FERQ) , can you explain why iterating over SIM_FREQ ? and why multiply by 3 ?
Thanks again.
from gym-pybullet-drones.
Briefly
- yes
- no, 10'000
env.steps()
's, i.e. 10'000/(env.SIM_FREQ
/env.AGGR_PHY_STEPS
) seconds - the reward depends on the environment/task, the one you are referring to penalizes the quadrotor for not being at the desired hover position, it cannot be 0, as the quadrotor does not start at that position but it gets smaller as quickly as the quadrotor gets and stays there
- it only means you'll be shown an arbitrary number (3*
env.AGGR_PHY_STEPS
) seconds of simulation (different environments have different episode length)
from gym-pybullet-drones.
Thank you again mate.
now its more clear for me :)
Another question about the multi-agent learning.
The training is for both of the quad-copter? each quadcopter is training separately ?
Each of them observe simultaneously ? or there's a joint observation ?
The reward in each step related to the follower / the leader / or both of them ?
Thanks!
from gym-pybullet-drones.
The MARL example in multiagent.py
is based on the centralized critic examples of RLlib so yes, both agents learn, there is some postprocessing that goes into creating the observations of each agent and each agent has its own reward signal.
The multi-agent script, in my intention, was meant as a demonstration of how a multi-agent environment can be used.
The best way to do MARL is still a bit up for debate, imho.
from gym-pybullet-drones.
Thanks again,
how the reward is calculated in multiagent learning ?
there's a reward for drone 0 , and reward for drone 1, but I dont understand how you calculate the overall reward.
There's an equation for calculating these two rewards into one overall reward ?
I'm using tensorboard and the mean-reward graph displays only one parameter, and not for two drones.
from gym-pybullet-drones.
The multi-agent aviary returns a dictionary of rewards because each agent can receive its own signal. How to use these to learn multiple critics/value functions depends on the MARL approach you are implementing (see parameter sharing vs. fully independent learning vs. centralized critic, etc.). Off the top of my head, I don't remember what value you'd see on TB.
from gym-pybullet-drones.
Related Issues (20)
- RPM Motor Mapping HOT 6
- Some camera associated issues HOT 1
- What does pycffirmware do HOT 1
- run learn.py
- rgb and GL HOT 2
- path planning algorithms HOT 1
- Ctrl Freq and Simulation Freq Questions HOT 3
- Clarification on Each Dimension's Meaning for ActionType.VEL HOT 3
- Discrete action space implementation based on BaseRLAviary HOT 1
- Visualize drone cameras in explorer HOT 1
- Location of Paper on Dynamics Code HOT 1
- -1 to 1 action space meaning HOT 1
- Pybullet drones
- No module named 'gym_pybullet_drones.envs.VisionAviary'
- Units HOT 1
- High frequency in RPMs when include action buffer in observation space can couse problems in real hardware HOT 2
- Why might my rewards be inversely proportional to the target height in the HoverAviary environment? HOT 2
- Error while running velocity.py and fly.py examples HOT 1
- ray 1.9 error while installing gym-pybullet-drones HOT 1
- Sim2real transfer for betaflight HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym-pybullet-drones.