GithubHelp home page GithubHelp logo

deadlock about geometry2 HOT 33 CLOSED

ros avatar ros commented on July 19, 2024
deadlock

from geometry2.

Comments (33)

tfoote avatar tfoote commented on July 19, 2024

Can you get the backtrace for the other threads?

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

@bricerebsamen can you try out #93 I think that should resolve this issue.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

will do right away. I can't reproduce for sure though, but I'll report asap.

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Great thanks

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

here is the full backtrace (for all threads), without #93:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff540b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff540b480 in __GI___pthread_mutex_lock (mutex=0x7ffffffefbc8) at ../nptl/pthread_mutex_lock.c:79
#3  0x00000000004311d8 in pthread_mutex_lock (m=0x7ffffffefbc8) at /usr/include/boost/thread/pthread/mutex.hpp:61
#4  boost::mutex::lock (this=0x7ffffffefbc8) at /usr/include/boost/thread/pthread/mutex.hpp:113
#5  0x000000000043124b in boost::unique_lock<boost::mutex>::lock (this=0x7ffffffeedb0) at /usr/include/boost/thread/lock_types.hpp:346
#6  0x00007ffff7211e11 in boost::unique_lock<boost::mutex>::unique_lock(boost::mutex&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#7  0x00007ffff6ebea9f in tf2::BufferCore::cancelTransformableRequest(unsigned long) () from /home/brice/code/workspaces/merged/devel/lib/libtf2.so
#8  0x0000000000441e4c in tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >::add (this=0x7fffffff00e0, evt=...) at /home/brice/code/workspaces/merged/src/driving/external/geometry_experimental/tf2_ros/include/tf2_ros/message_filter.h:381
#9  0x0000000000437f0d in operator() (a0=..., this=0x6d1d18) at /usr/include/boost/function/function_template.hpp:767
#10 message_filters::CallbackHelper1T<ros::MessageEvent<pcl::PointCloud<stdr_velodyne::PointType> const> const&, pcl::PointCloud<stdr_velodyne::PointType> >::call (this=0x6d1d10, event=..., nonconst_force_copy=<optimized out>) at /opt/ros/indigo/include/message_filters/signal1.h:76
#11 0x0000000000437a3f in message_filters::Signal1<pcl::PointCloud<stdr_velodyne::PointType> >::call (this=<optimized out>, event=...) at /opt/ros/indigo/include/message_filters/signal1.h:119
#12 0x0000000000435ea6 in operator() (a0=..., this=0x6d2c58) at /usr/include/boost/function/function_template.hpp:767
#13 ros::SubscriptionCallbackHelperT<ros::MessageEvent<pcl::PointCloud<stdr_velodyne::PointType> const> const&, void>::call (this=0x6d2c50, params=...) at /opt/ros/indigo/include/ros/subscription_callback_helper.h:144
#14 0x00007ffff6194695 in ros::SubscriptionQueue::call() () from /opt/ros/indigo/lib/libroscpp.so
#15 0x00007ffff61524f7 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#16 0x00007ffff6153303 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#17 0x00007ffff61971b5 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so
#18 0x00007ffff617f7bb in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so
#19 0x0000000000423a6c in main (argc=1, argv=0x7fffffffd838) at /home/brice/code/workspaces/merged/src/driving/packages/velodyne_localizer/src/velodyne_localizer.cpp:750

Thread 15 (Thread 0x7fffbbfff700 (LWP 14258)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffbbfff700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 14 (Thread 0x7fffc0e89700 (LWP 14257)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc0e89700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 13 (Thread 0x7fffc168a700 (LWP 14256)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc168a700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 12 (Thread 0x7fffc1e8b700 (LWP 14255)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc1e8b700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 11 (Thread 0x7fffc268c700 (LWP 14254)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc268c700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 10 (Thread 0x7fffc2e8d700 (LWP 14253)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc2e8d700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 9 (Thread 0x7fffc368e700 (LWP 14252)):
#0  0x00007ffff6c45b42 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#1  0x00007ffff6c4435e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#2  0x00007ffff5409182 in start_thread (arg=0x7fffc368e700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 8 (Thread 0x7fffc3fff700 (LWP 14251)):
#0  0x00007ffff5410b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff5a4e408 in ros::ros_wallsleep(unsigned int, unsigned int) () from /opt/ros/indigo/lib/librostime.so
#2  0x00007ffff5a4f923 in ros::Duration::sleep() const () from /opt/ros/indigo/lib/librostime.so
#3  0x00007ffff5a4e0b8 in ros::Rate::sleep() () from /opt/ros/indigo/lib/librostime.so
#4  0x0000000000428d19 in VelodyneLocalizer::localize (this=0x7ffffffef7f0) at /home/brice/code/workspaces/merged/src/driving/packages/velodyne_localizer/src/velodyne_localizer.cpp:561
#5  0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#6  0x00007ffff5409182 in start_thread (arg=0x7fffc3fff700) at pthread_create.c:312
#7  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 7 (Thread 0x7fffd9c09700 (LWP 14231)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff540b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff540b480 in __GI___pthread_mutex_lock (mutex=0x7fffffff01a0) at ../nptl/pthread_mutex_lock.c:79
#3  0x00000000004311d8 in pthread_mutex_lock (m=0x7fffffff01a0) at /usr/include/boost/thread/pthread/mutex.hpp:61
#4  boost::mutex::lock (this=0x7fffffff01a0) at /usr/include/boost/thread/pthread/mutex.hpp:113
#5  0x000000000043124b in boost::unique_lock<boost::mutex>::lock (this=this@entry=0x7fffd9c07f40) at /usr/include/boost/thread/lock_types.hpp:346
#6  0x0000000000440f6e in unique_lock (m_=..., this=0x7fffd9c07f40) at /usr/include/boost/thread/lock_types.hpp:124
#7  tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >::transformable (this=0x7fffffff00e0, request_handle=161, target_frame=..., source_frame=..., time=..., result=tf2::TransformAvailable) at /home/brice/code/workspaces/merged/src/driving/external/geometry_experimental/tf2_ros/include/tf2_ros/message_filter.h:458
#8  0x000000000042cb13 in operator() (a5=<optimized out>, a4=..., a3=..., a2=..., a1=<optimized out>, p=<optimized out>, this=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:619
#9  operator()<boost::_mfi::mf5<void, tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >, long unsigned int, const std::basic_string<char>&, const std::basic_string<char>&, ros::Time, tf2::TransformableResult>, boost::_bi::list5<long unsigned int&, const std::basic_string<char>&, const std::basic_string<char>&, ros::Time&, tf2::TransformableResult&> > (a=<synthetic pointer>, f=..., this=<optimized out>) at /usr/include/boost/bind/bind.hpp:596
#10 operator()<long unsigned int, const std::basic_string<char>, const std::basic_string<char>, ros::Time, tf2::TransformableResult> (a5=<synthetic pointer>, a4=..., a3=..., a2=..., a1=<synthetic pointer>, this=<optimized out>) at /usr/include/boost/bind/bind_template.hpp:174
#11 boost::detail::function::void_function_obj_invoker5<boost::_bi::bind_t<void, boost::_mfi::mf5<void, tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >, unsigned long, std::string const&, std::string const&, ros::Time, tf2::TransformableResult>, boost::_bi::list6<boost::_bi::value<tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >*>, boost::arg<1>, boost::arg<2>, boost::arg<3>, boost::arg<4>, boost::arg<5> > >, void, unsigned long, std::string const&, std::string const&, ros::Time, tf2::TransformableResult>::invoke (function_obj_ptr=..., a0=<optimized out>, a1=..., a2=..., a3=..., a4=<optimized out>) at /usr/include/boost/function/function_template.hpp:153
#12 0x00007ffff6ec86e4 in boost::function5<void, unsigned long, std::string const&, std::string const&, ros::Time, tf2::TransformableResult>::operator()(unsigned long, std::string const&, std::string const&, ros::Time, tf2::TransformableResult) const () from /home/brice/code/workspaces/merged/devel/lib/libtf2.so
#13 0x00007ffff6ebf20d in tf2::BufferCore::testTransformableRequests() () from /home/brice/code/workspaces/merged/devel/lib/libtf2.so
#14 0x00007ffff6ebb907 in tf2::BufferCore::setTransform(geometry_msgs::TransformStamped_<std::allocator<void> > const&, std::string const&, bool) () from /home/brice/code/workspaces/merged/devel/lib/libtf2.so
#15 0x00007ffff7201251 in tf2_ros::TransformListener::subscription_callback_impl(ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&, bool) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#16 0x00007ffff7201042 in tf2_ros::TransformListener::subscription_callback(ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#17 0x00007ffff72098ae in boost::_mfi::mf1<void, tf2_ros::TransformListener, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>::operator()(tf2_ros::TransformListener*, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&) const () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#18 0x00007ffff7208d65 in void boost::_bi::list2<boost::_bi::value<tf2_ros::TransformListener*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, tf2_ros::TransformListener, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>, boost::_bi::list1<boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&> >(boost::_bi::type<void>, boost::_mfi::mf1<void, tf2_ros::TransformListener, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>&, boost::_bi::list1<boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>&, int) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#19 0x00007ffff7208214 in void boost::_bi::bind_t<void, boost::_mfi::mf1<void, tf2_ros::TransformListener, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>, boost::_bi::list2<boost::_bi::value<tf2_ros::TransformListener*>, boost::arg<1> > >::operator()<boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> >(boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#20 0x00007ffff7207595 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, tf2_ros::TransformListener, ros::MessageEvent<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>, boost::_bi::list2<boost::_bi::value<tf2_ros::TransformListener*>, boost::arg<1> > >, void, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#21 0x00007ffff7209afd in boost::function1<void, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&>::operator()(boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&) const () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#22 0x00007ffff7208f48 in boost::detail::function::void_function_obj_invoker1<boost::function<void (boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&)>, void, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> >::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const>) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#23 0x00007ffff720b95a in boost::function1<void, boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> >::operator()(boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const>) const () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#24 0x00007ffff720b156 in ros::SubscriptionCallbackHelperT<boost::shared_ptr<tf2_msgs::TFMessage_<std::allocator<void> > const> const&, void>::call(ros::SubscriptionCallbackHelperCallParams&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#25 0x00007ffff6194695 in ros::SubscriptionQueue::call() () from /opt/ros/indigo/lib/libroscpp.so
#26 0x00007ffff61524f7 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#27 0x00007ffff6153303 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#28 0x00007ffff720289b in tf2_ros::TransformListener::dedicatedListenerThread() () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#29 0x00007ffff720bec9 in boost::_mfi::mf0<void, tf2_ros::TransformListener>::operator()(tf2_ros::TransformListener*) const () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#30 0x00007ffff720bc46 in void boost::_bi::list1<boost::_bi::value<tf2_ros::TransformListener*> >::operator()<boost::_mfi::mf0<void, tf2_ros::TransformListener>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, tf2_ros::TransformListener>&, boost::_bi::list0&, int) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#31 0x00007ffff720b4c7 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, tf2_ros::TransformListener>, boost::_bi::list1<boost::_bi::value<tf2_ros::TransformListener*> > >::operator()() () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#32 0x00007ffff720adcc in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, tf2_ros::TransformListener>, boost::_bi::list1<boost::_bi::value<tf2_ros::TransformListener*> > > >::run() () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#33 0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#34 0x00007ffff5409182 in start_thread (arg=0x7fffd9c09700) at pthread_create.c:312
#35 0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 6 (Thread 0x7fffda40a700 (LWP 14218)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ffff6155145 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> const&) () from /opt/ros/indigo/lib/libroscpp.so
#2  0x00007ffff615343d in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#3  0x00007ffff617f5c4 in ros::internalCallbackQueueThreadFunc() () from /opt/ros/indigo/lib/libroscpp.so
#4  0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#5  0x00007ffff5409182 in start_thread (arg=0x7fffda40a700) at pthread_create.c:312
#6  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x7fffdac0b700 (LWP 14206)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff617cecb in ros::ROSOutAppender::logThread() () from /opt/ros/indigo/lib/libroscpp.so
#2  0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#3  0x00007ffff5409182 in start_thread (arg=0x7fffdac0b700) at pthread_create.c:312
#4  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7fffdb40c700 (LWP 14205)):
#0  0x00007ffff490dda3 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff24f2498 in XmlRpc::XmlRpcDispatch::work(double) () from /opt/ros/indigo/lib/libxmlrpcpp.so
#2  0x00007ffff612b07a in ros::XMLRPCManager::serverThreadFunc() () from /opt/ros/indigo/lib/libroscpp.so
#3  0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#4  0x00007ffff5409182 in start_thread (arg=0x7fffdb40c700) at pthread_create.c:312
#5  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7fffdbc0d700 (LWP 14204)):
#0  0x00007ffff490912d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6134746 in ros::poll_sockets(pollfd*, unsigned long, int) () from /opt/ros/indigo/lib/libroscpp.so
#2  0x00007ffff619ca97 in ros::PollSet::update(int) () from /opt/ros/indigo/lib/libroscpp.so
#3  0x00007ffff6141905 in ros::PollManager::threadFunc() () from /opt/ros/indigo/lib/libroscpp.so
#4  0x00007ffff562aa4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#5  0x00007ffff5409182 in start_thread (arg=0x7fffdbc0d700) at pthread_create.c:312
#6  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7fffdc8e3700 (LWP 14202)):
#0  0x00007ffff490912d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fffe6032248 in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#2  0x00007ffff5409182 in start_thread (arg=0x7fffdc8e3700) at pthread_create.c:312
#3  0x00007ffff491647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7ffff7f84a80 (LWP 14193)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff540b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff540b480 in __GI___pthread_mutex_lock (mutex=0x7ffffffefbc8) at ../nptl/pthread_mutex_lock.c:79
#3  0x00000000004311d8 in pthread_mutex_lock (m=0x7ffffffefbc8) at /usr/include/boost/thread/pthread/mutex.hpp:61
#4  boost::mutex::lock (this=0x7ffffffefbc8) at /usr/include/boost/thread/pthread/mutex.hpp:113
#5  0x000000000043124b in boost::unique_lock<boost::mutex>::lock (this=0x7ffffffeedb0) at /usr/include/boost/thread/lock_types.hpp:346
#6  0x00007ffff7211e11 in boost::unique_lock<boost::mutex>::unique_lock(boost::mutex&) () from /home/brice/code/workspaces/merged/devel/lib/libtf2_ros.so
#7  0x00007ffff6ebea9f in tf2::BufferCore::cancelTransformableRequest(unsigned long) () from /home/brice/code/workspaces/merged/devel/lib/libtf2.so
#8  0x0000000000441e4c in tf2_ros::MessageFilter<pcl::PointCloud<stdr_velodyne::PointType> >::add (this=0x7fffffff00e0, evt=...) at /home/brice/code/workspaces/merged/src/driving/external/geometry_experimental/tf2_ros/include/tf2_ros/message_filter.h:381
#9  0x0000000000437f0d in operator() (a0=..., this=0x6d1d18) at /usr/include/boost/function/function_template.hpp:767
#10 message_filters::CallbackHelper1T<ros::MessageEvent<pcl::PointCloud<stdr_velodyne::PointType> const> const&, pcl::PointCloud<stdr_velodyne::PointType> >::call (this=0x6d1d10, event=..., nonconst_force_copy=<optimized out>) at /opt/ros/indigo/include/message_filters/signal1.h:76
#11 0x0000000000437a3f in message_filters::Signal1<pcl::PointCloud<stdr_velodyne::PointType> >::call (this=<optimized out>, event=...) at /opt/ros/indigo/include/message_filters/signal1.h:119
#12 0x0000000000435ea6 in operator() (a0=..., this=0x6d2c58) at /usr/include/boost/function/function_template.hpp:767
#13 ros::SubscriptionCallbackHelperT<ros::MessageEvent<pcl::PointCloud<stdr_velodyne::PointType> const> const&, void>::call (this=0x6d2c50, params=...) at /opt/ros/indigo/include/ros/subscription_callback_helper.h:144
#14 0x00007ffff6194695 in ros::SubscriptionQueue::call() () from /opt/ros/indigo/lib/libroscpp.so
#15 0x00007ffff61524f7 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#16 0x00007ffff6153303 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#17 0x00007ffff61971b5 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so
#18 0x00007ffff617f7bb in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so
#19 0x0000000000423a6c in main (argc=1, argv=0x7fffffffd838) at /home/brice/code/workspaces/merged/src/driving/packages/velodyne_localizer/src/velodyne_localizer.cpp:750

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

I believe that this confirms that the fix in #93 will fix the problem.

Threads 1 and 16 are blocked waiting on the lock in question. While thread 7 is holding the lock in frame 13.

But thread 7 is then waiting on another lock (speculatively one of the message list locks in message filters) which presumably threads 1 or 16 is holding. With the patch, thread 7 will no longer maintain the lock during the user's registerd callbacks thus will not block the cancelTransformable operators in 1 and 16 despite testTransformable invoking arbitrary user callbacks.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

hard for me to confirm that it fixes my deadlock since it happens pretty infrequently. I'll keep testing and I'll report if it happens again.

In order to automate my testing, I'd like to know, when the test (rostest) is over, all the nodes are killed, what signal is being sent to them? i.e. my test has a playback node that is tagged as required="true" and so when the data playback is over the test ends. My program is configured to run in gdb with the following prefix line.

launch-prefix="xterm -e gdb -ex 'set logging overwrite on' -ex 'set logging on' -ex 'set width unlimited' -ex 'set height unlimited' -ex run -ex bt -ex 'thread apply all bt' --args"

If the program crashes or receives a SIGINT it prints the backtrace, but when there is a deadlock it does not return anything at all.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

actually, in this case, my rostest ends when the test node returns

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

I made some extensive testing, and the deadlock is still there.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

and to clarify, the first backtrace is for the blocked thread, which is thread 1. There is no thread 16

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

do you have a backtrace with #93 in place?

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

it's the same one

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Looking at your call tree I see geometry symbols from /opt/ros but geometry_experimental symbols from your workspace. Can you make sure to build all dependencies of your workspace from source? We don't have guaranteed stable ABIs which might cause this and #92

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

I'm trying but I'm getting some catkin errors. I have in my workspace geometry_experimental and geometry and I get this error:

[ 64%] Building CXX object geometry/tf/CMakeFiles/pytf_py.dir/src/tf.cpp.o
/home/brice/code/workspaces/localizer_test/src/geometry/tf/src/tf.cpp: In member function ‘std::string tf::Transformer::allFramesAsDot(double) const’:
/home/brice/code/workspaces/localizer_test/src/geometry/tf/src/tf.cpp:441:50: error: no matching function for call to ‘tf2_ros::Buffer::_allFramesAsDot(double&) const’
   return tf2_buffer_._allFramesAsDot(current_time);
                                                  ^
/home/brice/code/workspaces/localizer_test/src/geometry/tf/src/tf.cpp:441:50: note: candidate is:
In file included from /opt/ros/indigo/include/tf2_ros/buffer_interface.h:35:0,
                 from /opt/ros/indigo/include/tf2_ros/buffer.h:35,
                 from /home/brice/code/workspaces/localizer_test/src/geometry/tf/include/tf/tf.h:48,
                 from /home/brice/code/workspaces/localizer_test/src/geometry/tf/src/tf.cpp:32:
/opt/ros/indigo/include/tf2/buffer_core.h:296:15: note: std::string tf2::BufferCore::_allFramesAsDot() const
   std::string _allFramesAsDot() const;
               ^
/opt/ros/indigo/include/tf2/buffer_core.h:296:15: note:   candidate expects 0 arguments, 1 provided
make[2]: *** [geometry/tf/CMakeFiles/pytf_py.dir/src/tf.cpp.o] Error 1
make[1]: *** [geometry/tf/CMakeFiles/pytf_py.dir/all] Error 2
make: *** [all] Error 2
Invoking "make -j1" failed

why is catkin trying to compile tf.cpp against tf2 from /opt/ros/ and not from my workspace?

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

in a workspace that contains only geometry and geometry_experimental, catkin_make is able to complete without error. But in a more complex workspace, with both of them and many other stacks, it fails.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

removing SYSTEM from the include_directories directive in tf solved that compilation problem, even though I don't understand why.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

OK I got my new workspace to compile, with all the dependencies of the workspace built from source. Namely all the following repos plus my own packages:

actionlib  angles  bond_core  class_loader  common_msgs  diagnostics dynamic_reconfigure  gencpp  genlisp  genmsg  genpy  geometry  message_generation  message_runtime  nodelet_core  pcl_msgs  pluginlib  robot_state_publisher  ros  ros_comm  ros_comm_msgs  roscpp_core  rospack  std_msgs geometry_experimental  pcl_conversions  perception_pcl  velodyne

With that, I still experience deadlocks and crashes as described here and in #92, with the exact same backtraces

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Can you repaste the backtraces, maybe in a gist. I'd like to see the new line numbers etc. And can you compile with debug symbols on to get more info? -DCMAKE_BUILD_TYPE=RelWithDebInfo

from geometry2.

bdou0716 avatar bdou0716 commented on July 19, 2024

Hi Tully,
I am working with Brice on this issue, here is a link to a github gist containing a gdb back trace, where gdb was attached to the dead locked process once the dead locked was observed. I will get you soon a version with more debug symbols:
https://gist.github.com/abca9756d0f7acaff72a.git
bertrand

from geometry2.

bdou0716 avatar bdou0716 commented on July 19, 2024

Correct link here:
https://gist.github.com/bdou0716/abca9756d0f7acaff72a#file-gdb-bt

from geometry2.

bdou0716 avatar bdou0716 commented on July 19, 2024

This back trace was obtained from a different program from the one Brice has been debugging.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

here is the backtrace that I obtained with debug symbols on:
https://gist.github.com/bricerebsamen/100445f376bd6806686c

from geometry2.

bdou0716 avatar bdou0716 commented on July 19, 2024

I have updated the gist above with additional debug symbols (for other program we are testing on, not the one Brice has been working on), the link again:
https://gist.github.com/bdou0716/abca9756d0f7acaff72a#file-gdb-bt

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Great, those line numbers will be helpful. If it's not the fix from #93 then I'll have to trace usages of the other mutex.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

Here is my analysis of my backtrace. Thread 7 is the transform listener's thread. Thread 1 is the ros spin thread, which is processing a point cloud and adding it to the message filter.

In Thread 1, the message filter's queue is full and so it's popping out old messages. As of MessageFilter::add it's holding the MessageFilter::messages_mutex_ and later in BufferCore::cancelTransformableRequest() it's also holding BufferCore::transformable_requests_mutex_.

In Thread 7 (transform listener), it's adding a new transform to the buffer and calls BufferCore::testTransformableRequests(). In my understanding this is a sort of a signal, since we have a new transform, we check whether any pending transform requests can be processed. This locks the BufferCore::transformable_requests_mutex_ then later it calls the requests callback, in our case MessageFilter::transformable(), which locks MessageFilter::messages_mutex_, hence the deadlock.

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

it appears to be the exact same behavior in @bdou0716 's backtrace.

@tfoote thanks for working on this. 3 of us here will try to figure out what's happening as well. Hopefully, all together we will find a solution rapidly.

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

That sounds reasonable. I haven't had a chance to trace it. But if that's the case, I think that we likely can simply add the to contract that when transformable is called that the messages_mutex_ should be called for it already. Thus we should move the messages_mutex_ lock outside the transformable_request_mutex_ and if you always lock in the same order they won't deadlock.

So moving https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2_ros/include/tf2_ros/message_filter.h#L455 to https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2/src/buffer_core.cpp#L1266 would probably prevent this deadlock.

I'll need to double check a little close for target_frames_mutex_ but I think that's always in smaller scope than the other two.

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Also, we need to verify there's no other codepaths into transformable that would need protection.

from geometry2.

jhpanetta avatar jhpanetta commented on July 19, 2024

Moving https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2_ros/include/tf2_ros/message_filter.h#L455 to https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2/src/buffer_core.cpp#L1266 won't resolve the deadlock Brice is talking about. That lock chain involves messages_mutex_ and frame_mutex_ , not transformable_request_mutex_.

The interaction between the two problem locks are https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2_ros/include/tf2_ros/message_filter.h#L378 and https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2/src/buffer_core.cpp#L1309. From message_filter, L378 is the only place where bc_ is called while messages_mutex_ is locked. (There are other places where bc_ is called while the mutex is unlocked, but we can ignore those.) From buffer_core, L1309 is the only place the callbacks are executed (at all). A "simple" fix is to simply wrap the two lines in an unlock/lock sequence.

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

@jhpanetta My proposal to move
https://github.com/ros/geometry_experimental/blob/10370e59d2594e907a71c448a2bfb69dbb2e6996/tf2_ros/include/tf2_ros/message_filter.h#L455 to https://github.com/ros/geometry_experimental/blob/eb8dc6109d44ae59488a294177c1d08c00fc4d89/tf2/src/buffer_core.cpp#L1266 places the messages_mutex_ immediately before the transformable_request_mutex_ thereby avoiding the deadlock that one thread may get the messages_mutex_ and then need the transformable_request_mutex_ while another thread locks transformable_request_mutex_ and needs messages_mutex_.

It's not possible to unlock either lock at https://github.com/ros/geometry_experimental/blob/eb8dc6109d44ae59488a294177c1d08c00fc4d89/tf2/src/buffer_core.cpp#L1309 in the middle of an transformable iterator in that function, and at the higher level the message_mutex_ is locked because we're iterating over messages considering erasing them.

@bricerebsamen @bdou0716 can you test this change?

from geometry2.

bricerebsamen avatar bricerebsamen commented on July 19, 2024

I'm a bit confused: how can BufferCore lock the message_mutex_ in MessageFilter when BufferCore knows nothing about MessageFilter?

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

Yeah sorry, that's not going to work.

from geometry2.

tfoote avatar tfoote commented on July 19, 2024

I've switched to a shared lock for the messages mutex, which should allow both entry point to iterate the messages simultaneously, and only the mutations require the unique lock. Please try the shared_lock branch.

from geometry2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.