I am currently testing a task involving two agents, each moving to a specified target

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Questions Related to Multiagent Evaluation about gym-pybullet-drones HOT 8 OPEN

paehal commented on July 19, 2024

Questions Related to Multiagent Evaluation

from gym-pybullet-drones.

Comments (8)

JacopoPan commented on July 19, 2024 1

Hi @paehal

w.r.t. 1. I don't think it's the issue but you might double check following SB3 instructions to visualize the default models

w.r.t. 2., I am a bit confused, if the evaluated policy is scoring the same as you saw in the training, that tends to rule out 1.. you also say that you are obtaining twice the reward of the single agent case that leads you to believe the learning should be successful but then that the same problem (what problem?) "also occurs in the single agent scenario"

in general, do not assume that a high reward certainly means the system is behaving how you desire, RL is known to "game" the simulation: are you sure that the high reward you see can be achieved IIF the drone move as/where you want?

observation length and control frequency CAN affect learning and control performance but they should not be "breaking" anything (but note that the number of steps/freq is proportional to the times you collect reward so it can change its value per episode)

from gym-pybullet-drones.

JacopoPan commented on July 19, 2024 1

I would do the modification inside PPO, to create multiple independent networks operating on different parts of the obs and act vector of the environment, but yes, it requires to understand the SB3 implementation in a certain degree of depth.

from gym-pybullet-drones.

paehal commented on July 19, 2024

I would like to provide some corrections and additional details related to my previous post about the multiagent experiment. To state the conclusion first, it appears that the problem I encountered with the multiagent setup also occurs in the single agent scenario.

An important detail I initially omitted is that I set ctrl_freq not to 30, but to 80. This might be a significant contributing factor to the issue. Could this change in ctrl_freq be affecting functions like sync? Any insights on this would be highly appreciated.

from gym-pybullet-drones.

paehal commented on July 19, 2024

Apologies for the delayed response, and thank you for your answer. I have figured out the cause of the issue. I was setting the target location for the drone movement in the init part of HoverAviary.py, but I was not aware that this only sets up as many environments as there are in parallel execution. I assumed that this target location would be updated every time I reset, which seems to be why the learning was not successful. I apologize for any inconvenience caused.

I have two additional questions related to this multiagent simulation. If you know, could you please enlighten me?

When conducting a multiagent simulation, I understand that each agent is trained based on the obs set in the _computeObs function. I want to separate the observational information for each agent. How can I implement this? Specifically, I want to set it up so that the drone of agent No.0 cannot obtain the position information of agent No.1's drone, and vice versa for agent No.1's drone.
Is there a way to display the trajectory of the drones when checking their behavior visually with gui=True during training?

I appreciate your help and look forward to your response.

from gym-pybullet-drones.

JacopoPan commented on July 19, 2024

w.r.t. to 1, I think that the easiest way is to modify the desired DRL agent in SB3 to have a collections of actor and policy networks (a pair for each agent) and simply slice the observation when training, recombine the action when predicting/testing (effectively, you have N independent RL problems and agents but note that the environment of each is no longer stationary).

w.r.t. 2, you should be able to simply force the GUI for the training environment (you can do it by changing the defaults in the constructors, for example) but it would lead to incredibly slow training, I am not sure it will work too well, especially with multiple agents.

from gym-pybullet-drones.

paehal commented on July 19, 2024

w.r.t. to 1, I think that the easiest way is to modify the desired DRL agent in SB3 to have a collections of actor and policy networks (a pair for each agent) and simply slice the observation when training, recombine the action when predicting/testing (effectively, you have N independent RL problems and agents but note that the environment of each is no longer stationary).

Thank you for your response. Are you suggesting that we should set up multiple models and train them multiple times as you've described below? As I asked earlier, my understanding is that in the case of a simulation with multiagents, the same policy model is used for all agents. Therefore, if we set up different models for each agent, does it mean we need to train each of them separately? In any case, it seems like this would be a fairly complex modification, wouldn't it?


<For agent0>
    model_0 = PPO('MlpPolicy',
                train_env,
                # tensorboard_log=filename+'/tb/',
                verbose=1,
                batch_size=custom_batch_size,
                **custom_learning_params)

<For agent1>
    model_1 = PPO('MlpPolicy',
                train_env,
                # tensorboard_log=filename+'/tb/',
                verbose=1,
                batch_size=custom_batch_size,
                **custom_learning_params)

model_0.learn(total_timesteps=int(1e7) if local else int(1e2)
model_1.learn(total_timesteps=int(1e7) if local else int(1e2)

from gym-pybullet-drones.

paehal commented on July 19, 2024

Thanks, I'll ask the experts at stablebaseline3 github.

from gym-pybullet-drones.

paehal commented on July 19, 2024

I apologize for any confusion on my part, but I would like to clarify one thing. The _computeObs function returns the state of all agents, but does each agent make decisions based solely on their own information? Until now, I had assumed that each agent outputs actions based on the information of all agents.

from gym-pybullet-drones.

Questions Related to Multiagent Evaluation about gym-pybullet-drones HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs