GithubHelp home page GithubHelp logo

Comments (4)

JacopoPan avatar JacopoPan commented on July 18, 2024 1

The agent is so greedy smh - it only wants one thing.

lol

So now, instead of monitoring the average reward ep_rew_mean which has some unknown value at the converged solution, you could just look at the length of the episode, because the drone would have only been able to solve the stabilisation problem if it could keep the drone at the set-point for the entire duration of the episode.

Yes, for stabilization, e.g. in the classic CartPole environment, one way to do this is to give positive rewards at each tilmestep (regardless of state) and having a done condition that terminates for the robot to leave the desired position.
As you pointed out, the agent will try to stay in the episode longer to collect more reward.

Does an already trained model still use the reward function at all or is the specific problem the agent was trained to solve something which is internal to the model (i.e. parsing a set-point to the reward function does nothing for an already trained model).

Short answer, no.

One path you might want to try (but requires some work), what about training an RL agent with a stabilization policy first; then plug this policy into the action preprocessing; and finally training a second agent that uses the knowledge of the previously trained one but also allows for small perturbations of it to navigate waypoints/reaching a desire position?

I'm glad you are having fun with this repo and I'd want to re-state that having people trying to do the things you are trying to do is exactly why it exists.

from gym-pybullet-drones.

alchemi5t avatar alchemi5t commented on July 18, 2024 1

@GM-Whooshi Just reading this thread gave me so much more knowledge about the codebase than the readme did (not to put the readme down, that helped a lot too). I really think this should be part of some example readme if not already! Truly appreciate the details you've gone into with this issue!

Getting back to the question, Did you figure out how to train a model to a specific trajectory? If yes, how did you create the trajectory, and did you use euclidean norm for the reward function?

from gym-pybullet-drones.

ngurnard avatar ngurnard commented on July 18, 2024

I am also curious as to what @alchemi5t asked!

from gym-pybullet-drones.

HimGautam avatar HimGautam commented on July 18, 2024

@GM-Whooshi, what initalization and activation you have used in ur neural neworks.

from gym-pybullet-drones.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.