GithubHelp home page GithubHelp logo

Comments (8)

JacopoPan avatar JacopoPan commented on July 19, 2024 1

Hi @paehal

w.r.t. 1. I don't think it's the issue but you might double check following SB3 instructions to visualize the default models

w.r.t. 2., I am a bit confused, if the evaluated policy is scoring the same as you saw in the training, that tends to rule out 1.. you also say that you are obtaining twice the reward of the single agent case that leads you to believe the learning should be successful but then that the same problem (what problem?) "also occurs in the single agent scenario"

in general, do not assume that a high reward certainly means the system is behaving how you desire, RL is known to "game" the simulation: are you sure that the high reward you see can be achieved IIF the drone move as/where you want?

observation length and control frequency CAN affect learning and control performance but they should not be "breaking" anything (but note that the number of steps/freq is proportional to the times you collect reward so it can change its value per episode)

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024 1

I would do the modification inside PPO, to create multiple independent networks operating on different parts of the obs and act vector of the environment, but yes, it requires to understand the SB3 implementation in a certain degree of depth.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

I would like to provide some corrections and additional details related to my previous post about the multiagent experiment. To state the conclusion first, it appears that the problem I encountered with the multiagent setup also occurs in the single agent scenario.

An important detail I initially omitted is that I set ctrl_freq not to 30, but to 80. This might be a significant contributing factor to the issue. Could this change in ctrl_freq be affecting functions like sync? Any insights on this would be highly appreciated.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

Apologies for the delayed response, and thank you for your answer. I have figured out the cause of the issue. I was setting the target location for the drone movement in the init part of HoverAviary.py, but I was not aware that this only sets up as many environments as there are in parallel execution. I assumed that this target location would be updated every time I reset, which seems to be why the learning was not successful. I apologize for any inconvenience caused.

I have two additional questions related to this multiagent simulation. If you know, could you please enlighten me?

  1. When conducting a multiagent simulation, I understand that each agent is trained based on the obs set in the _computeObs function. I want to separate the observational information for each agent. How can I implement this? Specifically, I want to set it up so that the drone of agent No.0 cannot obtain the position information of agent No.1's drone, and vice versa for agent No.1's drone.

  2. Is there a way to display the trajectory of the drones when checking their behavior visually with gui=True during training?

I appreciate your help and look forward to your response.

from gym-pybullet-drones.

JacopoPan avatar JacopoPan commented on July 19, 2024

w.r.t. to 1, I think that the easiest way is to modify the desired DRL agent in SB3 to have a collections of actor and policy networks (a pair for each agent) and simply slice the observation when training, recombine the action when predicting/testing (effectively, you have N independent RL problems and agents but note that the environment of each is no longer stationary).

w.r.t. 2, you should be able to simply force the GUI for the training environment (you can do it by changing the defaults in the constructors, for example) but it would lead to incredibly slow training, I am not sure it will work too well, especially with multiple agents.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

w.r.t. to 1, I think that the easiest way is to modify the desired DRL agent in SB3 to have a collections of actor and policy networks (a pair for each agent) and simply slice the observation when training, recombine the action when predicting/testing (effectively, you have N independent RL problems and agents but note that the environment of each is no longer stationary).

Thank you for your response. Are you suggesting that we should set up multiple models and train them multiple times as you've described below? As I asked earlier, my understanding is that in the case of a simulation with multiagents, the same policy model is used for all agents. Therefore, if we set up different models for each agent, does it mean we need to train each of them separately? In any case, it seems like this would be a fairly complex modification, wouldn't it?


<For agent0>
    model_0 = PPO('MlpPolicy',
                train_env,
                # tensorboard_log=filename+'/tb/',
                verbose=1,
                batch_size=custom_batch_size,
                **custom_learning_params)

<For agent1>
    model_1 = PPO('MlpPolicy',
                train_env,
                # tensorboard_log=filename+'/tb/',
                verbose=1,
                batch_size=custom_batch_size,
                **custom_learning_params)

model_0.learn(total_timesteps=int(1e7) if local else int(1e2)
model_1.learn(total_timesteps=int(1e7) if local else int(1e2)

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

Thanks, I'll ask the experts at stablebaseline3 github.

from gym-pybullet-drones.

paehal avatar paehal commented on July 19, 2024

I apologize for any confusion on my part, but I would like to clarify one thing. The _computeObs function returns the state of all agents, but does each agent make decisions based solely on their own information? Until now, I had assumed that each agent outputs actions based on the information of all agents.

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.