Hi, Firstly, thank you for making this amazing piece of work availab

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I am also curious as to what <a class="user-mention notranslate" data-hovercard-type="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Navigation with a Learned Policy about gym-pybullet-drones HOT 4 OPEN

utiasdsl commented on July 18, 2024 4

Navigation with a Learned Policy

from gym-pybullet-drones.

Comments (4)

JacopoPan commented on July 18, 2024 1

The agent is so greedy smh - it only wants one thing.

lol

So now, instead of monitoring the average reward ep_rew_mean which has some unknown value at the converged solution, you could just look at the length of the episode, because the drone would have only been able to solve the stabilisation problem if it could keep the drone at the set-point for the entire duration of the episode.

Yes, for stabilization, e.g. in the classic CartPole environment, one way to do this is to give positive rewards at each tilmestep (regardless of state) and having a done condition that terminates for the robot to leave the desired position.
As you pointed out, the agent will try to stay in the episode longer to collect more reward.

Does an already trained model still use the reward function at all or is the specific problem the agent was trained to solve something which is internal to the model (i.e. parsing a set-point to the reward function does nothing for an already trained model).

Short answer, no.

One path you might want to try (but requires some work), what about training an RL agent with a stabilization policy first; then plug this policy into the action preprocessing; and finally training a second agent that uses the knowledge of the previously trained one but also allows for small perturbations of it to navigate waypoints/reaching a desire position?

I'm glad you are having fun with this repo and I'd want to re-state that having people trying to do the things you are trying to do is exactly why it exists.

from gym-pybullet-drones.

alchemi5t commented on July 18, 2024 1

@GM-Whooshi Just reading this thread gave me so much more knowledge about the codebase than the readme did (not to put the readme down, that helped a lot too). I really think this should be part of some example readme if not already! Truly appreciate the details you've gone into with this issue!

Getting back to the question, Did you figure out how to train a model to a specific trajectory? If yes, how did you create the trajectory, and did you use euclidean norm for the reward function?

from gym-pybullet-drones.

ngurnard commented on July 18, 2024

I am also curious as to what @alchemi5t asked!

from gym-pybullet-drones.

HimGautam commented on July 18, 2024

@GM-Whooshi, what initalization and activation you have used in ur neural neworks.

from gym-pybullet-drones.

Recommend Projects

Navigation with a Learned Policy about gym-pybullet-drones HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs