acmece / rl-collision-avoidance Goto Github PK

Implementation of the paper "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning"

Home Page: https://arxiv.org/abs/1709.10082

Python 68.27% CMake 1.24% Shell 0.32% C++ 30.17%

ppo reinforcement-learning collision-avoidance ros crowd-navigation

rl-collision-avoidance's People

Contributors

Stargazers

Watchers

Forkers

yangyutu chesternimiz w470062742 denghan6688 joeyleenpu fabricejumel samnew1 ren-yuxiang tengfei-unmannedship lijuncheng001 zeroun shamcondor yebo92 hongyi-zhou inmo-jang priyablue running-mars hanlinniu lanxinl lucianzhong longshao0967 weiluo1992 hityunhongxu jason33wang geonhee-lee toaapython xiaogaogaoxiao ecustboy slam-box iwangshuo fangzheng81 sundycoders deanofthewebb deepduke wang-zheng-jiu gogunubuntu zhouzhiqian roschiweiming simple667 sharpeieast pedramrabiee o-can sinaqahremani chenshengduo arshsagar shaikhalid zhangwei1993 zwq2018 colorfulss agcxgz321 txtr yohannnchen saimukund303 hzcirving h-fu jc-bao hewudi666 bobo1119 fairys-chef huster-wugj boyranger jianming1481 dengliyuan fuleilei17 yan-jiarun ustc-lizheng yinflight elizabeth-palacios xiyou521 xuhaolau niusha-tw l53317 xiaobaigou233 atmos1264 xingxiaoyu1109 drzhongch deeplearning-cn rennico aleky-g wyx1314dd pikachuzhao rw-jhyt maaans yanchaojunyi ksambaa lirun-sat apricityz dtbinh

rl-collision-avoidance's Issues

Questions about cmdpose tests.py

Hello, I would like to ask what is the function of cmdpose tests.py?Can it be implemented?I would appreciate it if you could answer my question.

how to dectect crash

hi,

in the main loop, the environment is subscribing to the \crash topic, but I didn't find any evidence how it produces the crash topic considering the distance the obstacle and robots.

I am curious that you create many environment for each robot, how does other agent's policy influence the current training one?

some questions during trainning stage1

Hello, I have some questions when I train stage1.

When I run rosrun stage_ros_add_pose_and_crash stageros -g worlds/stage1.world, the output is as follows without any visual interface:
[Loading worlds/stage1.world][threads 16] [ INFO] [1610369354.678821077]: Found 1 laser devices and 0 cameras in robot 0 [ INFO] [1610369354.693095245]: Found 1 laser devices and 0 cameras in robot 1 [ INFO] [1610369354.708531831]: Found 1 laser devices and 0 cameras in robot 2 [ INFO] [1610369354.720706122]: Found 1 laser devices and 0 cameras in robot 3 [ INFO] [1610369354.734728776]: Found 1 laser devices and 0 cameras in robot 4 [ INFO] [1610369354.748230607]: Found 1 laser devices and 0 cameras in robot 5 [ INFO] [1610369354.762854094]: Found 1 laser devices and 0 cameras in robot 6 [ INFO] [1610369354.775696698]: Found 1 laser devices and 0 cameras in robot 7 [ INFO] [1610369354.787860822]: Found 1 laser devices and 0 cameras in robot 8 [ INFO] [1610369354.802445547]: Found 1 laser devices and 0 cameras in robot 9 [ INFO] [1610369354.814354865]: Found 1 laser devices and 0 cameras in robot 10 [ INFO] [1610369354.828385483]: Found 1 laser devices and 0 cameras in robot 11 [ INFO] [1610369354.840344842]: Found 1 laser devices and 0 cameras in robot 12 [ INFO] [1610369354.855495273]: Found 1 laser devices and 0 cameras in robot 13 [ INFO] [1610369354.869669496]: Found 1 laser devices and 0 cameras in robot 14 [ INFO] [1610369354.882580702]: Found 1 laser devices and 0 cameras in robot 15 [ INFO] [1610369354.897105963]: Found 1 laser devices and 0 cameras in robot 16 [ INFO] [1610369354.909743873]: Found 1 laser devices and 0 cameras in robot 17 [ INFO] [1610369354.924472736]: Found 1 laser devices and 0 cameras in robot 18 [ INFO] [1610369354.936563229]: Found 1 laser devices and 0 cameras in robot 19 [ INFO] [1610369354.950033612]: Found 1 laser devices and 0 cameras in robot 20 [ INFO] [1610369354.963278519]: Found 1 laser devices and 0 cameras in robot 21 [ INFO] [1610369354.975240964]: Found 1 laser devices and 0 cameras in robot 22 [ INFO] [1610369354.990895507]: Found 1 laser devices and 0 cameras in robot 23
And there is nothing output when I run mpiexec -np 24 python2 ppo_stage1.py.
I don't know why and how to solve them.
Thank you very much!

Segmentation fault (core dumped) problem

Hello. In order to learn with two different policies, I opened 2 mpis and tried to learn a total of 7 agents at once.

$ rosrun stage_ros_add_pose_and_crash stageros -u /home/nscl/rl_ws/src/RLCA_trainning/worlds/servingbot_agent.world
$ mpiexec -np 3 python ppo_stage.py
$ mpiexec -np 4 python ppo_stage.py

In the process, a segemtation fault (core dumped) occurs.

Segmentation fault (core dumped)

Do you know a solution to this?.

Learning proceeds from 300 episodes to 1000 episodes, but an error always occurs after that.

When training from scratch in stage 1, I am not able to get the policy to converge to the optimal policy, even after training for 10 hours. Is there a way to do a bit of supervised learning initially to form the basic framework for the policy, so that the DRL algorithm does not have to fly blind when training?

Thank you very much for your time.

Output from training

Hello,
Thank you very much for providing this code. A student and I have been following the training example for Stage1, but when one of the environments reaches the max number of episodes it looks like the code enters an infinite loop, and the other environments do not seem to be continuing their iterations. Is this supposed to occur? if not, what is the expected output after the number of episodes is completed ?

Thanks,

Julio Godoy

Stage 2 error

Env 03, Goal (-07.0, 009.5), Episode 00000, setp 097, Reward 12.6 , Reach Goal,
Env 04, Goal (-12.5, 004.0), Episode 00000, setp 052, Reward -33.4, Crashed,
Env 00, Goal (-18.0, 011.5), Episode 00000, setp 110, Reward 13.0 , Reach Goal,
Env 01, Goal (-18.0, 009.5), Episode 00000, setp 095, Reward 12.9 , Reach Goal,
Env 05, Goal (-12.5, 017.0), Episode 00000, setp 081, Reward -28.1, Crashed,
Env 02, Goal (-07.0, 011.5), Episode 00000, setp 044, Reward -30.0, Crashed,
Traceback (most recent call last):
File "ppo_stage2.py", line 212, in
run(comm=comm, env=env, policy=policy, policy_path=policy_path, action_bound=action_bound, optimizer=opt)
File "ppo_stage2.py", line 120, in run
obs_size=OBS_SIZE, act_size=ACT_SIZE)
File "/home/balaji/rover_ws/src/rl-collision-avoidance/model/ppo.py", line 204, in ppo_update_stage2
obss = obss.reshape((num_step*num_env, frames, obs_size))
ValueError: cannot reshape array of size 1966080 into shape (5632,3,512)
Hi, I got this error when I trained the second stage mpiexec -np 44 python ppo_stage2.py Do I need to change anything to train the model.
and also I finished the training for the first model can you guys give me an idea of how to implement the code on the real robot. If any of you guys have any idea please share with me here or at Gmail([email protected]) you can mail me. This might be a great help for us to get succeed.
@Acmece @BingHan0458

Segmentation fault

Hello,
Thank you very much for providing this code. I have questions about the simulator. When training, the Stage suddenly crashed and throw an error,"Segmentation fault", did you meet that? Thanks a lot.

About the ERROR: [rospack] Error: package 'stage_ros_add_pose_and_crash' not found

Error compiling code

priya@ros:~/qlearning_ws$ rosrun stage_ros_add_pose_and_crash stageros -g worlds/stage1.world
[FATAL] [1585235899.273644693]: The world file worlds/stage1.world does not exist.
[FATAL] [1585235899.274600526]: BREAKPOINT HIT
file = /home/priya/qlearning_ws/src/stage_ros-add_pose_and_crash/src/stageros.cpp
line=333

Can you hep me why is this error??

visualize

How to visualize in circle_test.py

A ”segmentation error“ occurred while the program was running.

A segmentation error occurred after the program ran for a while, and the stack was as follows.

Thread 14 "stageros" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd67fc700 (LWP 15891)]
0x00007ffff693d979 in Stg::World::Raytrace(Stg::Ray const&) () from /opt/ros/melodic/lib/libstage.so.4.3.0
(gdb) bt
#0  0x00007ffff693d979 in Stg::World::Raytrace(Stg::Ray const&) () from /opt/ros/melodic/lib/libstage.so.4.3.0
#1  0x00007ffff6933c9a in Stg::ModelRanger::Sensor::Update(Stg::ModelRanger*) () from /opt/ros/melodic/lib/libstage.so.4.3.0
#2  0x00007ffff6934b1a in Stg::ModelRanger::Update() () from /opt/ros/melodic/lib/libstage.so.4.3.0
#3  0x00007ffff69232fd in Stg::Model::UpdateWrapper(Stg::Model*, void*) () from /opt/ros/melodic/lib/libstage.so.4.3.0
#4  0x00007ffff693effe in Stg::World::ConsumeQueue(unsigned int) () from /opt/ros/melodic/lib/libstage.so.4.3.0
#5  0x00007ffff693f08e in Stg::World::update_thread_entry(std::pair<Stg::World*, int>*) () from /opt/ros/melodic/lib/libstage.so.4.3.0
#6  0x00007ffff6b706db in start_thread (arg=0x7fffd67fc700) at pthread_create.c:463
#7  0x00007ffff59b988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Query on stage1_1.pth

Hello,

Could you please explain the difference between loading the policy of stage1_1.pth and no policy at all? Is the stage1_1.pth some form of supervised learning?

Hello, can you tell me how the model file "stage2.pth" in your "rl-collision-avoidance" code was trained? Why does the model I trained based on the code you gave use "circle_test.py" to test poorly (because it can't be turned in advance, it causes a collision)?

convergence

sorry to bother you, I want to know how many agents you used in stage_1 those trained in three PC?And my rewards are not convergent, how many eposides you used? Thanks a lot.

Implementation of the code with only single robot

Thank you for this open source implementation. I wanted to train this model for only one robot instead of a team of robots. Is it possible to do so? In that case what variables do I need to change? Is changing the value associated with the variable 'NUM_ENV' to 1 enough or will I be required to make other changes as well?

ROS time moved backwards

Hi, I get the following error when I run mpiexec -np 50 python circle_test.py

[ERROR] [1586937246.001479, 18.100000]: ROS time moved backwards: 4165.032s
Traceback (most recent call last):
  File "circle_test.py", line 94, in <module>
    env = StageWorld(OBS_SIZE, index=rank, num_env=NUM_ENV)
  File "/code/rl-collision-avoidance/circle_world.py", line 76, in __init__
    rospy.sleep(1.)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/timer.py", line 164, in sleep
    raise rospy.exceptions.ROSTimeMovedBackwardsException(time_jump)
rospy.exceptions.ROSTimeMovedBackwardsException: ROS time moved backwards

Difference between stage1_1.pth and stage1_2.pth

Hi, what is the difference between loading the model stage1_1.pth and stage1_2.pth?

How to train a new model from stage1_1.pth or stage1_2.pth ?

Hi.
I am confused about how to train such a stage2.pth model from stage1_1.pth or stage1_2.pth .
I'm currently loading stage1_1.pth and learning stage2, but I can't generate a model like stage2.pth.
Do you have any tips?

Why trained policy is not as good as yours

Hi, I followed all your steps and trained the policy from scratch for stage 1.

I am not able to get a policy as good as yours (still always crashes) even after training for 12 hours.

May I ask if you used anything special to train the policy? I have tried many times but cannot get a good policy, and starting from scratch seems very bad.

Question about ppo algorithm

Hello,
I have 2 questions about the implementation of PPO.

what is dist_entropy used in evaluate_action method of CNNPolicy implemented in net.py and in ppo_update_stage1 function?

And I need theory background of this line in ppo.py:
loss = policy_loss + 20 * value_loss - coeff_entropy * dist_entropy

dynamic obstacle model

nice job! Besides,I remember dynamic obstacle is also considered in the original paper, I wanna consult that how to set dynamic obstacle model in the stage simulator~~

some question about code

Hello Professor,
Recently, I've been studying your paper and reproducing your code, and I have some question as follows:

After training the stage1 and stage2, I got Figure 4 in your paper, but the other figures such as Figure 5? And how to get the result such as success rate and extra time and so on in Table II by your code or by doing some calculations?
How to get the code of the baseline such as SL-policy and NH-ORCA?
How long did you train stage1 and stage2? What results in Terminal or in GUI can be shown to prove that the policy has been trained well?
I hope you can give me some advice, thank you very much!

stage package not found

CMake Error at stage_ros-add_pose_and_crash/CMakeLists.txt:17 (find_package):
By not providing "Find stage.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "stage", but
CMake did not find one.

Could not find a package configuration file provided by "stage" with any of
the following names:

stageConfig.cmake
stage-config.cmake

Environment

Hi, I encountered some environment problems while training the agent, can you please tell me your cuda, pytorch version. Thanks a lot

How to train a new model from scratch to pass the test? Is it sufficient to train in stage 2 only?

The provided model stage2.pth works well in the test, but I am confused about how to train such a model from scratch?
Should I train in only stage 2, only stage 1, or even randomly train in both stage 1 and stage 2 for some episodes, which I think may not converge?

Currently I am trying to train in stage 2 only, since this world looks complex and diverse. I have tried training for several days. But the robots still collide in the test. Is it right to train in stage 2 only?

Working on longer training by loading newest model. I'd appreciate it if anyone can give me some advice.

Hyperparmeters values are different from the paper

Hello,
I have noticed that there is a difference in some hyper-parameters values between this implementation and the paper, is there a justification for that?
For an example the coefficient multiplied by the value loss in the paper is 1.0 while in your implementation it's 20.0.
Also I would like to ask if this is the official repo for the paper or is it your attempt at reimplementing the paper?
Thanks and enjoy the holidays

acmece / rl-collision-avoidance Goto Github PK

rl-collision-avoidance's People

Contributors

Stargazers

Watchers

Forkers

rl-collision-avoidance's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs