aidudezzz / deepbots Goto Github PK

View Code? Open in Web Editor NEW

219.0 8.0 49.0 1.27 MB

A wrapper framework for Reinforcement Learning in the Webots robot simulator using Python 3.

Home Page: https://deepbots.readthedocs.io/

License: GNU General Public License v3.0

Python 96.56% Dockerfile 3.44%

robotics reinforcement-learning openai-gym-environment webots python

deepbots's People

Contributors

Stargazers

Watchers

deepbots's Issues

Create documentation site

readthedocs
Clean up main README

Is it possible to run several simulations in parallel?

Dear authors!

Thanks for the great interface. I am wondering that is it possible to run several simulations in parallel? Or maybe this is more of a question related to webots. By doing that we can speed up the evaluations significantly.

Millions of thanks in advance!

Can't access message information in get_reward() and is_done()

Hi! Firstly, I would like to say thanks for work you have done! It's very pleasing to work with the framework you created.

During development I have faced one issue and hope you can help me solve it. The problem is that I am not able access message information in get_reward() and is_done() methods of the Supervisor. The main motivation is that I need sensor readings from my robot to shape a reward and stop the episode (ex. when laserscan readings shows a very close obstacle).
supervisor.step([selectedAction]) returns the observations in a required way, but when I call self.get_observations() inside either get_reward() and is_done(), I am not able to read a message sent by an agent (self.handle_receiver() returns None) . I am sure, I am missing some important design feature and hope you can help me with that.

A bug of robot never moving

Hi, as a newer of robot simulation, I really feel interested in this plugin able to implement RL algorithm in a 3D simulation environment of Webots s.t. I studied the package in detail. Based on the tutorial implementing CartPole game, one of the most important factor of deepbots is the data structure of observation and action space, which includes Box class for continuous parameters (such as the four observation in CartPole) and Discrete class for discrete parameters (such as discrete action). Based on this, I tried to implement an environment of my own project, a path planning project with e-punk robot.
Firstly, the code below means the observation space with six parameters (including x-axis position, y-axis position, the sensor value of ps0, ps1, ps6 and ps7 for the measurement of obstacles) and three allowed actions of e-punk:

self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
                                     high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
                                     dtype=np.float64)
# set the action space (turn left, turn right, go ahead)
self.action_space = Discrete(3)

and the reference of nodes on e-punk robots:

self.robot = self.getSelf()
# wheels
self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)

# proximity sensors
self.ps = []
for ps_name in ['ps0', 'ps1', 'ps2', 'ps3', 'ps4', 'ps5', 'ps6', 'ps7']:
      ps = self.getDevice(ps_name)
      self.ps.append(ps)
      self.ps[len(self.ps)-1].enable(self.timestep)

Additionally, in apply_action() function, I have added the behaviors corresponding to the three actions:

def apply_action(self, action):
    action = int(action[0])

    l_speed = 3.14
    r_speed = 3.14

    if action == 0:     # go straight
        l_speed = 3.14
        r_speed = 3.14
    if action == 1:     # turn right
        l_speed = 3.14
        r_speed = -3.14
    if action == 2:     # turn left
        l_speed = -3.14
        r_speed = 3.14

    self.leftMotor.setVelocity(l_speed)
    self.rightMotor.setVelocity(r_speed)

My initial assumption was that if there were no compilation errors in the PPO algorithm, then my robot would at least be able to do random movements. Unfortunately, the e-punk robot kept staying the initialized coordinate and never move even one step instead, despite of the fact that the speed parameters of the corresponding action in each iteration step has worked on the motors based on the print-output information (in which the left speed and right speed is returned by getVelocity() function):

...
new_observation:  [0.05999999998225913, 0.09000000009921671, 70.17436453508148, 111.77694130888727, 87.54746205144774, 70.4848960311054]
time:  0.031017780303955078
action:  [0]
action:  0
left speed:  3.14
right speed:  3.14
...

In general, the only differences between my project and CartPole are the dimension of state-action space (the data-structure class) and the agent robots (I just deployed an e-punk robot instead of the four-wheel). Besides, the code has no errors throughout the execution process except for the weird non-moving bug. I will be really graceful if you can help me to solve it or give any advice to fix it. Many thanks!!

Remove `models` file

Unable to install deepbots

I get the following error when I run pip install deepbots

Collecting deepbots
  Using cached deepbots-1.0.0-py3-none-any.whl (30 kB)
Collecting gym==0.21 (from deepbots)
  Using cached gym-0.21.0.tar.gz (1.5 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in gym setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Is it possible to resolve this? Is it possible to build deepbots from source rather than using pip?

How to use ray to train in deepbots environment.

Hello, when Something is wrong when I try to use the RLlib to train in deepbots environment. I found that your group is also try to use the ray or stable base-line to train. Could you give me some advice to use ray in the deepbots environment. Thank you very much.

2021-09-24 15:06:42,723 ERROR trial_runner.py:773 -- Trial SAC_M02RobotSupervisor_f7f86_00000: Error processing event.
Traceback (most recent call last):
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 739, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 746, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
    return func(*args, **kwargs)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/worker.py", line 1623, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::SAC.__init__() (pid=40915, ip=192.168.50.170)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 136, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 592, in __init__
    super().__init__(config, logger_creator)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trainable.py", line 103, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 146, in setup
    super().setup(config)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 739, in setup
    self._init(self.config, self.env_creator)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 175, in _init
    num_workers=self.config["num_workers"])
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 827, in _make_workers
    logdir=self.logdir)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 111, in __init__
    spaces=spaces,
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 440, in _make_worker
    spaces=spaces,
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 401, in __init__
    self.env = env_creator(env_context)
  File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 1647, in <lambda>
    register_env(name, lambda config: env_object(config))`

get_timestep deprecation warning in supervisor_emitter_receiver

/deepbots/supervisor/controllers/supervisor_emitter_receiver.py:97: DeprecationWarning: get_timestep is deprecated, use .timestep instead warn("get_timestep is deprecated, use .timestep instead"

Error in gym setup command: 'extras_require'

Running into this error when attempting pip install deepbots, I have upgraded pip, installed gym as a dependency, and am running Python 3.11.3. Am running into the same issue on other machines when attempting the same install.

Add changelog in README

README.md images are 404

CSV handle_emitter data types

Even though robot handle_emitter is supposed to accept plain string (that is not an Iterable), the assert checks only against Iterable.

CSVSupervisorEnv handle_emitter is also supposed to be able to send plain strings which is not the case right now.

Extend deepbots to support stable-baselines and implement gym-style default enviroments

OpenAI Gym provides several environments to demonstrate the capabilities of RL in different problems. Deepbots goal is to demonstrate capabilities of RL in a 3D, high fidelity simulator such as Webots. In this way, it is aimed to eliminate the gap between software base RL problems and real life scenarios. On this way, different environments could a perfect test bed for researchers. The great thing about OpenAI Gym is that include several easy-to-use examples. This project goal is to implement some of the existing OpenAI Gym examples using deepbots in Webots simulator.

For example, some of OpenAI gym environments have already been implemented on deepworlds repository:

CartPole-v1 have already been implemented using a custom robot with same objectives as the original one.
MountainCar-v0 have been implemented by using BB-8 robot with same objectives as the original one.

Several other OpenAI gym environments can be replicate in Webots:

MountainCarContinuous-v0 this is similar as MountainCar-v0 problem but with continuous action space
CarRacing-v0 is a car-like robot that tries to remain on a road while tries to achieve the best possible score. A similar objective can be replicated in Webots using any existing robot (such as Boe-Bot, Elisa, e-puck, Khepera IV etc) or even a car-like robots (such as ALTINO) with similar objectives as the original problem.
BipedalWalker-v2 is a 'two-legged' robots which tries to 'walk' on a straight line. A similar environment can be replicated on Webots using the existing 'two-legged' robots (such as Atlas, HOAP-2, KHR-3HV etc)
BipedalWalkerHardcore-v2 is a more challenging version of BipedalWalker-v2 which not only tries to walk on a straight line but also overcome the objects that it founds on it's way.
LunarLander-v2 is a drone-like robot that tries to land on the ground. This can be perfectly replicated in Webots using Mavic 2 PRO
LunarLanderContinuous-v2 is the same problem as LunarLander-v2 but with continuous action space
Ant-v2 is a four-legged creature walk forward as fast as possible. Several four-legged robots are include in Webots (such as Aibo ERS7, bioloid, GhostDog etc)
Robotic arm problem such us FetchPickAndPlace-v1, FetchPush-v1, FetchReach-v1 and FetchSlide-v1 can be integrated on deepworld using Webots robots such as IPR, IRB 4600/40, P-Rob 3, etc
MultiCartPole is similar as the CartPole but with more than one Cart that tries to stabilize two (or more) linked poles.

Of course Webots include a various of robots that potentially can be used on a wide range of problem. Any ideas will be more than welcome. However, we recommend to start with OpenAI gym environment since there are already known to the community and can be easily solved without much research.

It is highly recommended that contributors who are interested on this project to use stable-baselines algorithms which are well established algorithms for Reinforcements Learning problems. Since deepbots version v0.1.3-dev2 stable baselines are supported by deepbots framework.

Finally, a great enhancement on deepworld repository will be a mechanism to run those environments easily and out of the box. A well established infrastructure that users can install each environment separately and run it easily. This feature will bring deepbots closer to OpenAI gym toolkit. Reference issue#7

Regarding the GSoC proposals: We do not expect to include all the above recommended environments but those that you are interested more (2-3 environments will be more that great) and can be fit on your timeline. At the and of the program we except to deliver some of those environments and the setup tool.

Feel free to post your ideas, thoughts or any disagreements.

RobotSupervisor/SupervisorRobot abstract class

This new class will cover use cases that, for whatever reason (e.g. performance), do not want to use the emitter-receiver loop. The emitter and receiver classes are basically joined in a RobotSupervisor/SupervisorRobot class that will handle all the functionality.

RobotSupervisor/SupervisorRobot class will act the same as the other existing classes, to guide the user in creating their environment, so a user will create their own class that inherits this one. This class will contain functionality for both the robot (applying the action, getting data from sensors) and the supervisor (gym environment methods, etc.).

Abstract methods defined in RobotSupervisor/SupervisorRobot:

step(action) (+ run or similar)
create_message() analogous method (to be named)
use_message_data() analogous method (to be named)
reset(), this one will use the new reset procedure of Webots R2020 providing generic reset functionality for the user

Update:
Current (2020a rev1) version's reset bug affects this too, issue is resolved in latest (rev2) version.

RobotEmitterReceiver class should inherit from Webots Robot class

RobotEmitterReceiver class should inherit from Webots Robot class if possible, similarly to other deepbots classes which inherit from Webots Supervisor class, so that it can access whatever Webots method directly.

Right now it needs this:

        self.robot = Robot()

and then accesses internal methods as self.robot.getBasicTimeStep() or self.robot.step(self.timestep).

Inheriting the Robot class will make this class more consistent with other deepbots classes.

Deepbots GoalEnv

Question: how to get kinect camera information

I try to combined a panda robot with a kinect camera. The figure is camera definition. I try to run script of the panda demo, even I attach the camera, the webots simulator did not show image. Could you please tell me how can I get image from kinect camera and show on webots simulator if I use deepbots framework.

Bug in supervisor_emiter_receiver.py

💗💗It's a nice job!
However, I find there exists a tiny 👾 in supervisor_emiter_receiver.py, L-77, that occurs when the receiver gets nothing from the queue, the _last_message may destroy the logic of the supervisor. In detail, deepbots-tutorials will not work because of this issue.

So I modified the function below:

def handle_receiver(self):
        if self.receiver.getQueueLength() > 0:
            string_message = self.receiver.getData().decode("utf-8")
            self._last_message = string_message.split(",")

            self.receiver.nextPacket()

        return self._last_message

👉

def handle_receiver(self):
        if self.receiver.getQueueLength() > 0:
            string_message = self.receiver.getData().decode("utf-8")
            self._last_message = string_message.split(",")

            self.receiver.nextPacket()
        else:
            self._last_message = None

        return self._last_message

then everything works smoothly for me 😃.

Usage of snake_case

Make sure all deepbots variables follow the snake_case convention.

Webots 2020a rev2 changes with reset

Webots 2020a rev2 will fix some issues related to resetting the Webots world without resetting the controllers (https://cyberbotics.com/doc/reference/supervisor#wb_supervisor_simulation_reset).

When the new version is released, deepbots will provide a default implementation for resetting a world, so there would be no need for the end user to implement a reset method, at least a basic one. Of course it will always be possible to override the reset method to add functionality for the use-case.

Related aidudezzz/deepworlds#8

Modify Docker

Docker should install webots, deepbots with its generic requirements and then whatever requirements a specific example has, instead of installing specific example requirements by default.

RobotSupervisor docstring contains dated and misleading information

deepbots/deepbots/supervisor/controllers/robot_supervisor.py

Lines 13 to 19 in e99d210

  Reset method 

  This class contains a default implementation of reset() that uses Webots-provided methods to reset 

  the world to its starting state. 

  *Note that this works properly only with Webots versions >R2020b and must be 

  overridden with a custom reset method when using earlier versions. It is backwards compatible 

  due to the fact that the new reset method gets overridden by whatever the user has previously 

  implemented, so an old supervisor such as SupervisorCSV can be migrated easily to use this class.

Here the docstring states that the default implemented reset method is included in RobotSupervisor class. This information is dated and no longer valid, default reset implementation now resides in SupervisorEnv.

deepbots/deepbots/supervisor/controllers/robot_supervisor.py

Lines 25 to 26 in e99d210

  get_default_observation(): 

  This method should be implemented to return a per use-case default observation that is used when resetting.

Moreover, here the docstring explains get_default_observation method which is no longer included in RobotSupervisor but now is in SupervisorEnv.

Edit: It would also be nice to note in the docstring that RobotSupervisor controllers should run on a Robot node with Supervisor privileges.

doubt about the implement of the emitter-receiver scheme

Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s_{t}$, $a_{t}$, $r_{t}$, $s_{t+1})$, but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$, there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

My doubt is somewhat relative with this issue

There seems to be a bug in the step function of the robot_supervisor.py

        if super(Supervisor, self).step(self.timestep) == -1:
            exit()

        self.apply_action(action)
        return (
            self.get_observations(),
            self.get_reward(action),
            self.is_done(),
            self.get_info(),
        )

In RL, it seems to be more natural to apply_action and then Supervisor.step(). Otherwise, you will not get correct response of your action (delay by one timestep!)

New reset method incompatible with emitter/receiver scheme causing bug

When using the new reset method with the emitter-receiver scheme, supervisor receiver gets two extra messages causing two additional resets. See screenshot showing the console of the regular discrete cartpole modified to use the new reset scheme.

According to the simulationReset() method documentation, the controllers need to be restarted separately. The robot controller seems to send two extra messages after the first time the done condition causes a reset.

According to Webots user guide emitter/receivers queues are flushed when calling reset. This was bugged in earlier versions of Webots (<R2020b, more here).

So far, i've found out that if robotController is forced to reset using restartController() as soon as the done condition becomes True (inside is_done() implementation), one of the two extra messages goes away.

I'm investigating further to find out where the second one comes from and when, and whether it is caused by us or it is caused by Webots.

[Tracker] Extension to Evolutionary Algorithms

TO DO

Add Changelog Automation (PR #87)
Integrate PyGAD into the framework
- Add multi-cartpole emmiter-receiver example (PR #48)
- Add SupervisorEnvolutionary class (PR #93)
- Extend to multi robots
Add Cartpole Example
- Single Cartpole evolutionary example
- Multi Cartpole evolutionary example
Final Report

RELEVANT ISSUES

Provide implemented supervisor/robot using "customData" Robot field for communication

Link to Webots doc, customData field can be used to implement robot/supervisor communication without receivers/emitters.

This can be useful when observation data gathered from robot are big (e.g. medium/high resolution images from camera) and should provide way better performance than packing big observation data into structs to sent via the emitter.

Migrating to gymnasium

This issue serves to document stuff regarding migration from gym==0.21 to gymnasium. Will update as necessary.

Extend deepbots to support Evolutionary Algorithms

Initially deepbots was developed to support Reinforcement Learning algorithms however we expect that easily can be extended to support Evolutionary Algorithms. When it comes to evolutionary algorithm a population of agents are trained and mutated to solve a given task. At every episode the best agents are chosen to mutate in order to reach in a good enough solution.

This project is quite open. We recommend to choose an easy task such as Cartpole and adjust it on Evolutionary manner. We expect a grid of different agents that they try to solve the problem while the episodes are passed. We are open on using any evolutionary algorithm but we highly recommend to use a well established one. Finally, we expect to integrate the Evolution-Guided Policy Gradient in Reinforcement Learning as proposed in NIPS2018.

Any questions about what evolutionary algorithms can be uses, general questions or ideas are more than welcome!

StableBaselines Intergration

Hey guys,
Pretty amazing work here. Actually found this only a day before having completed my own gym & stableBaselines integration with Webots. But yours is a nice package format, so I actually prefer this I guess. Alright, here's my question: Do you guys are already working on adding stableBaselines to deepbots, or should I give it a shot ?

SupervisorEmitterReceiver step

SupervisorEmitterReceiver class implements the following step method:

def step(self, action):
        self.supervisor.step(self.timestep)

        self.handle_emitter(action)
        return (
            self.get_observations(),
            self.get_reward(action),
            self.is_done(),
            self.get_info(),
        )

self.supervisor.step(self.timestep) is a Webots method for stepping the controller and needs to be part of a conditional, e.g. from Webots docs:
while supervisor.step(timestep) != -1:

This probably need to change to:

if self.supervisor.step(self.timestep) == -1:
        exit()

to allow for the controller to exit normally.

Request for Assistance with Sim2Real Transformation Using Emitter Receiver in DeepBots

Hello Aidudezz/Deepbots,

Thank you for sharing this amazing support repository for WeBots RL simulation.
I am wondering about Sim2Real transformations, specifically within the context of using emitter-receiver configurations in DeepBots. I am seeking guidance or any helpful resources that could steer me in the right direction for this endeavor.

My current project involves a Sim2Real transformation where I am attempting to bridge the gap between the simulated environment and the real-world application using a setup based on DeepBots. I am particularly focused on understanding and implementing the emitter-receiver functionality effectively for this transition.

As a beginner in this area, any advice, tutorials, or shared experiences would be incredibly valuable to me. I am especially interested in any best practices or common pitfalls to avoid during this process.

Thank you in advance for your time and assistance. Your insights will be greatly appreciated and will surely contribute significantly to my learning journey in this exciting field.

class SupervisorCSV(SupervisorEmitterReceiver):
    def __init__(
        self, emitter_name="emitter", receiver_name="receiver", time_step=None
    ):
        super(SupervisorCSV, self).__init__(
            emitter_name, receiver_name, time_step
        )

    def handle_emitter(self, action):
        assert isinstance(action, Iterable), \
            "The action object should be Iterable"

        message = (",".join(map(str, action))).encode("utf-8")
        self.emitter.send(message)

    def handle_receiver(self):
        if self.receiver.getQueueLength() > 0:
            string_message = self.receiver.getData().decode("utf-8")
            self.receiver.nextPacket()
            return string_message.split(",")
        else:
            return None

This way handle_receiver explicitly returns None when supervisor does not receive a new message, instead of returning the last message, implying the same message was received.

	Reset method
	This class contains a default implementation of reset() that uses Webots-provided methods to reset
	the world to its starting state.
	*Note that this works properly only with Webots versions >R2020b and must be
	overridden with a custom reset method when using earlier versions. It is backwards compatible
	due to the fact that the new reset method gets overridden by whatever the user has previously
	implemented, so an old supervisor such as SupervisorCSV can be migrated easily to use this class.

	get_default_observation():
	This method should be implemented to return a per use-case default observation that is used when resetting.

aidudezzz / deepbots Goto Github PK

deepbots's People

Contributors

Stargazers

Watchers

Forkers

deepbots's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs