aidudezzz / deepbots Goto Github PK
View Code? Open in Web Editor NEWA wrapper framework for Reinforcement Learning in the Webots robot simulator using Python 3.
Home Page: https://deepbots.readthedocs.io/
License: GNU General Public License v3.0
A wrapper framework for Reinforcement Learning in the Webots robot simulator using Python 3.
Home Page: https://deepbots.readthedocs.io/
License: GNU General Public License v3.0
Dear authors!
Thanks for the great interface. I am wondering that is it possible to run several simulations in parallel? Or maybe this is more of a question related to webots. By doing that we can speed up the evaluations significantly.
Millions of thanks in advance!
Hi! Firstly, I would like to say thanks for work you have done! It's very pleasing to work with the framework you created.
During development I have faced one issue and hope you can help me solve it. The problem is that I am not able access message information in get_reward() and is_done() methods of the Supervisor. The main motivation is that I need sensor readings from my robot to shape a reward and stop the episode (ex. when laserscan readings shows a very close obstacle).
supervisor.step([selectedAction])
returns the observations in a required way, but when I call self.get_observations()
inside either get_reward() and is_done(), I am not able to read a message sent by an agent (self.handle_receiver()
returns None) . I am sure, I am missing some important design feature and hope you can help me with that.
Hi, as a newer of robot simulation, I really feel interested in this plugin able to implement RL algorithm in a 3D simulation environment of Webots s.t. I studied the package in detail. Based on the tutorial implementing CartPole game, one of the most important factor of deepbots is the data structure of observation and action space, which includes Box class for continuous parameters (such as the four observation in CartPole) and Discrete class for discrete parameters (such as discrete action). Based on this, I tried to implement an environment of my own project, a path planning project with e-punk robot.
Firstly, the code below means the observation space with six parameters (including x-axis position, y-axis position, the sensor value of ps0, ps1, ps6 and ps7 for the measurement of obstacles) and three allowed actions of e-punk:
self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
dtype=np.float64)
# set the action space (turn left, turn right, go ahead)
self.action_space = Discrete(3)
and the reference of nodes on e-punk robots:
self.robot = self.getSelf()
# wheels
self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)
# proximity sensors
self.ps = []
for ps_name in ['ps0', 'ps1', 'ps2', 'ps3', 'ps4', 'ps5', 'ps6', 'ps7']:
ps = self.getDevice(ps_name)
self.ps.append(ps)
self.ps[len(self.ps)-1].enable(self.timestep)
Additionally, in apply_action()
function, I have added the behaviors corresponding to the three actions:
def apply_action(self, action):
action = int(action[0])
l_speed = 3.14
r_speed = 3.14
if action == 0: # go straight
l_speed = 3.14
r_speed = 3.14
if action == 1: # turn right
l_speed = 3.14
r_speed = -3.14
if action == 2: # turn left
l_speed = -3.14
r_speed = 3.14
self.leftMotor.setVelocity(l_speed)
self.rightMotor.setVelocity(r_speed)
My initial assumption was that if there were no compilation errors in the PPO algorithm, then my robot would at least be able to do random movements. Unfortunately, the e-punk robot kept staying the initialized coordinate and never move even one step instead, despite of the fact that the speed parameters of the corresponding action in each iteration step has worked on the motors based on the print-output information (in which the left speed and right speed is returned by getVelocity()
function):
...
new_observation: [0.05999999998225913, 0.09000000009921671, 70.17436453508148, 111.77694130888727, 87.54746205144774, 70.4848960311054]
time: 0.031017780303955078
action: [0]
action: 0
left speed: 3.14
right speed: 3.14
...
In general, the only differences between my project and CartPole are the dimension of state-action space (the data-structure class) and the agent robots (I just deployed an e-punk robot instead of the four-wheel). Besides, the code has no errors throughout the execution process except for the weird non-moving bug. I will be really graceful if you can help me to solve it or give any advice to fix it. Many thanks!!
I get the following error when I run pip install deepbots
Collecting deepbots
Using cached deepbots-1.0.0-py3-none-any.whl (30 kB)
Collecting gym==0.21 (from deepbots)
Using cached gym-0.21.0.tar.gz (1.5 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in gym setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Is it possible to resolve this? Is it possible to build deepbots from source rather than using pip?
Hello, when Something is wrong when I try to use the RLlib to train in deepbots environment. I found that your group is also try to use the ray or stable base-line to train. Could you give me some advice to use ray in the deepbots environment. Thank you very much.
2021-09-24 15:06:42,723 ERROR trial_runner.py:773 -- Trial SAC_M02RobotSupervisor_f7f86_00000: Error processing event.
Traceback (most recent call last):
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 739, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 746, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/worker.py", line 1623, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::SAC.__init__() (pid=40915, ip=192.168.50.170)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 136, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 592, in __init__
super().__init__(config, logger_creator)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/tune/trainable.py", line 103, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 146, in setup
super().setup(config)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 739, in setup
self._init(self.config, self.env_creator)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 175, in _init
num_workers=self.config["num_workers"])
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 827, in _make_workers
logdir=self.logdir)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 111, in __init__
spaces=spaces,
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 440, in _make_worker
spaces=spaces,
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 401, in __init__
self.env = env_creator(env_context)
File "/home/cwj/miniconda3/envs/rllib/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 1647, in <lambda>
register_env(name, lambda config: env_object(config))`
/deepbots/supervisor/controllers/supervisor_emitter_receiver.py:97: DeprecationWarning: get_timestep is deprecated, use .timestep instead warn("get_timestep is deprecated, use .timestep instead"
Even though robot handle_emitter is supposed to accept plain string (that is not an Iterable), the assert checks only against Iterable.
CSVSupervisorEnv handle_emitter is also supposed to be able to send plain strings which is not the case right now.
OpenAI Gym provides several environments to demonstrate the capabilities of RL in different problems. Deepbots goal is to demonstrate capabilities of RL in a 3D, high fidelity simulator such as Webots. In this way, it is aimed to eliminate the gap between software base RL problems and real life scenarios. On this way, different environments could a perfect test bed for researchers. The great thing about OpenAI Gym is that include several easy-to-use examples. This project goal is to implement some of the existing OpenAI Gym examples using deepbots in Webots simulator.
For example, some of OpenAI gym environments have already been implemented on deepworlds repository:
Several other OpenAI gym environments can be replicate in Webots:
Of course Webots include a various of robots that potentially can be used on a wide range of problem. Any ideas will be more than welcome. However, we recommend to start with OpenAI gym environment since there are already known to the community and can be easily solved without much research.
It is highly recommended that contributors who are interested on this project to use stable-baselines algorithms which are well established algorithms for Reinforcements Learning problems. Since deepbots version v0.1.3-dev2 stable baselines are supported by deepbots framework.
Finally, a great enhancement on deepworld repository will be a mechanism to run those environments easily and out of the box. A well established infrastructure that users can install each environment separately and run it easily. This feature will bring deepbots closer to OpenAI gym toolkit. Reference issue#7
Regarding the GSoC proposals: We do not expect to include all the above recommended environments but those that you are interested more (2-3 environments will be more that great) and can be fit on your timeline. At the and of the program we except to deliver some of those environments and the setup tool.
Feel free to post your ideas, thoughts or any disagreements.
This new class will cover use cases that, for whatever reason (e.g. performance), do not want to use the emitter-receiver loop. The emitter and receiver classes are basically joined in a RobotSupervisor/SupervisorRobot class that will handle all the functionality.
RobotSupervisor/SupervisorRobot class will act the same as the other existing classes, to guide the user in creating their environment, so a user will create their own class that inherits this one. This class will contain functionality for both the robot (applying the action, getting data from sensors) and the supervisor (gym environment methods, etc.).
Abstract methods defined in RobotSupervisor/SupervisorRobot:
Update:
Current (2020a rev1) version's reset bug affects this too, issue is resolved in latest (rev2) version.
RobotEmitterReceiver class should inherit from Webots Robot class if possible, similarly to other deepbots classes which inherit from Webots Supervisor class, so that it can access whatever Webots method directly.
Right now it needs this:
self.robot = Robot()
and then accesses internal methods as self.robot.getBasicTimeStep()
or self.robot.step(self.timestep)
.
Inheriting the Robot class will make this class more consistent with other deepbots classes.
I try to combined a panda robot with a kinect camera. The figure is camera definition. I try to run script of the panda demo, even I attach the camera, the webots simulator did not show image. Could you please tell me how can I get image from kinect camera and show on webots simulator if I use deepbots framework.
💗💗It's a nice job!
However, I find there exists a tiny 👾 in supervisor_emiter_receiver.py
, L-77, that occurs when the receiver gets nothing from the queue, the _last_message
may destroy the logic of the supervisor. In detail, deepbots-tutorials will not work because of this issue.
So I modified the function below:
def handle_receiver(self):
if self.receiver.getQueueLength() > 0:
string_message = self.receiver.getData().decode("utf-8")
self._last_message = string_message.split(",")
self.receiver.nextPacket()
return self._last_message
👉
def handle_receiver(self):
if self.receiver.getQueueLength() > 0:
string_message = self.receiver.getData().decode("utf-8")
self._last_message = string_message.split(",")
self.receiver.nextPacket()
else:
self._last_message = None
return self._last_message
then everything works smoothly for me 😃.
Make sure all deepbots variables follow the snake_case convention.
Webots 2020a rev2 will fix some issues related to resetting the Webots world without resetting the controllers (https://cyberbotics.com/doc/reference/supervisor#wb_supervisor_simulation_reset).
When the new version is released, deepbots will provide a default implementation for resetting a world, so there would be no need for the end user to implement a reset method, at least a basic one. Of course it will always be possible to override the reset method to add functionality for the use-case.
Related aidudezzz/deepworlds#8
Docker should install webots, deepbots with its generic requirements and then whatever requirements a specific example has, instead of installing specific example requirements by default.
deepbots/deepbots/supervisor/controllers/robot_supervisor.py
Lines 13 to 19 in e99d210
reset
method is included in RobotSupervisor
class. This information is dated and no longer valid, default reset
implementation now resides in SupervisorEnv
.
deepbots/deepbots/supervisor/controllers/robot_supervisor.py
Lines 25 to 26 in e99d210
Moreover, here the docstring explains get_default_observation
method which is no longer included in RobotSupervisor
but now is in SupervisorEnv
.
Edit: It would also be nice to note in the docstring that RobotSupervisor
controllers should run on a Robot node with Supervisor privileges.
Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action
On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like
To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!
My doubt is somewhat relative with this issue
if super(Supervisor, self).step(self.timestep) == -1:
exit()
self.apply_action(action)
return (
self.get_observations(),
self.get_reward(action),
self.is_done(),
self.get_info(),
)
In RL, it seems to be more natural to apply_action and then Supervisor.step(). Otherwise, you will not get correct response of your action (delay by one timestep!)
When using the new reset method with the emitter-receiver scheme, supervisor receiver gets two extra messages causing two additional resets. See screenshot showing the console of the regular discrete cartpole modified to use the new reset scheme.
According to the simulationReset()
method documentation, the controllers need to be restarted separately. The robot controller seems to send two extra messages after the first time the done condition causes a reset.
According to Webots user guide emitter/receivers queues are flushed when calling reset. This was bugged in earlier versions of Webots (<R2020b, more here).
So far, i've found out that if robotController is forced to reset using restartController()
as soon as the done condition becomes True (inside is_done()
implementation), one of the two extra messages goes away.
I'm investigating further to find out where the second one comes from and when, and whether it is caused by us or it is caused by Webots.
TO DO
RELEVANT ISSUES
Link to Webots doc, customData field can be used to implement robot/supervisor communication without receivers/emitters.
This can be useful when observation data gathered from robot are big (e.g. medium/high resolution images from camera) and should provide way better performance than packing big observation data into structs to sent via the emitter.
This issue serves to document stuff regarding migration from gym==0.21 to gymnasium. Will update as necessary.
Initially deepbots was developed to support Reinforcement Learning algorithms however we expect that easily can be extended to support Evolutionary Algorithms. When it comes to evolutionary algorithm a population of agents are trained and mutated to solve a given task. At every episode the best agents are chosen to mutate in order to reach in a good enough solution.
This project is quite open. We recommend to choose an easy task such as Cartpole and adjust it on Evolutionary manner. We expect a grid of different agents that they try to solve the problem while the episodes are passed. We are open on using any evolutionary algorithm but we highly recommend to use a well established one. Finally, we expect to integrate the Evolution-Guided Policy Gradient in Reinforcement Learning as proposed in NIPS2018.
Any questions about what evolutionary algorithms can be uses, general questions or ideas are more than welcome!
Hey guys,
Pretty amazing work here. Actually found this only a day before having completed my own gym & stableBaselines integration with Webots. But yours is a nice package format, so I actually prefer this I guess. Alright, here's my question: Do you guys are already working on adding stableBaselines to deepbots, or should I give it a shot ?
SupervisorEmitterReceiver class implements the following step method:
def step(self, action):
self.supervisor.step(self.timestep)
self.handle_emitter(action)
return (
self.get_observations(),
self.get_reward(action),
self.is_done(),
self.get_info(),
)
self.supervisor.step(self.timestep)
is a Webots method for stepping the controller and needs to be part of a conditional, e.g. from Webots docs:
while supervisor.step(timestep) != -1:
This probably need to change to:
if self.supervisor.step(self.timestep) == -1:
exit()
to allow for the controller to exit normally.
Hello Aidudezz/Deepbots,
Thank you for sharing this amazing support repository for WeBots RL simulation.
I am wondering about Sim2Real transformations, specifically within the context of using emitter-receiver configurations in DeepBots. I am seeking guidance or any helpful resources that could steer me in the right direction for this endeavor.
My current project involves a Sim2Real transformation where I am attempting to bridge the gap between the simulated environment and the real-world application using a setup based on DeepBots. I am particularly focused on understanding and implementing the emitter-receiver functionality effectively for this transition.
As a beginner in this area, any advice, tutorials, or shared experiences would be incredibly valuable to me. I am especially interested in any best practices or common pitfalls to avoid during this process.
Thank you in advance for your time and assistance. Your insights will be greatly appreciated and will surely contribute significantly to my learning journey in this exciting field.
I would like to use Webots along with stable baselines. Is that possible to do using deepbots?
Hi,
Do you know how to do randomization in webots via python? Like set the position of a solid randomly. I didn't find a way to move a object with python
Possible solution?
Get rid of self._last_message altogether.
New SupervisorCSV:
class SupervisorCSV(SupervisorEmitterReceiver):
def __init__(
self, emitter_name="emitter", receiver_name="receiver", time_step=None
):
super(SupervisorCSV, self).__init__(
emitter_name, receiver_name, time_step
)
def handle_emitter(self, action):
assert isinstance(action, Iterable), \
"The action object should be Iterable"
message = (",".join(map(str, action))).encode("utf-8")
self.emitter.send(message)
def handle_receiver(self):
if self.receiver.getQueueLength() > 0:
string_message = self.receiver.getData().decode("utf-8")
self.receiver.nextPacket()
return string_message.split(",")
else:
return None
This way handle_receiver explicitly returns None when supervisor does not receive a new message, instead of returning the last message, implying the same message was received.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.