Comments (12)
Hi @Schobesberger A C++ hardware reset script at #9287 (comment) may be a helpful reference. It includes a mechanism to check whether reset has completed and pauses until reset is complete.
from librealsense.
Hi MartyG, thank you for your reply.
Unfortunately issue #9287 does not fix the problem for two reasons.
-
Our application does not explicitly call a device.hardware_reset(), but the library itself does, when downloading a new firmware. Therefore the trick with the
ctx.set_devices_changed_callback()
cannot be applied. -
More important, after calling the device.hardware_reset() the very first thread, which is created in the
rs2::context ctx;
C´tor environment, never dies. -
Sometimes I get core dumps of our productive code from the customer, where it seems that the machine's running out of memory. Maybe there is an issue, whenthe library looses and reestablished the connection to the camera due to poor USB signal quality.
Running the following program in gdb shows that first created thread never dies
gdb:
50 us ThID 18256 realsense_test: is starting ...
265 us ThID 18256 realsense_test: create rs2::ctx
[New Thread 0xf62f4b40 (LWP 18285)] // this thread never dies !!
[New Thread 0xf5af3b40 (LWP 18286)]
#include <iostream>
#include <mutex>
#include <unistd.h>
#include <syscall.h> // SYS_gettid
// using librealsense-2.54.2
#include <librealsense2/h/rs_device.h>
#include <librealsense2/h/rs_option.h>
#include <librealsense2/h/rs_sensor.h>
#include <librealsense2/hpp/rs_sensor.hpp>
#include <librealsense2/hpp/rs_types.hpp>
#include <librealsense2/rs_advanced_mode.hpp>
// #include <chrono>
std::recursive_mutex M;
std::chrono::steady_clock::time_point beg;
template <
class result_t = std::chrono::microseconds,
class clock_t = std::chrono::steady_clock,
class duration_t = std::chrono::microseconds
>
auto ts()
{
return std::chrono::duration_cast<result_t>(clock_t::now() - beg).count();
}
std::string tt(void)
{
std::stringstream ss;
ss << ts() << " us ";
return ss.str();
}
std::string str_program_name;
std::string ttx(void)
{
std::stringstream ss;
ss << tt() << std::dec << "ThID " << syscall(SYS_gettid) << " " << str_program_name << ": ";
return ss.str();
}
// to give some pending try_sleeps the time to timeout
int main(int argc, char *argv[])
{
beg = std::chrono::steady_clock::now(); // mark start time point of program
str_program_name = "realsense_test";
std::cout << ttx() << "is starting ...\n";
bool loop_forever = true;
bool resetCompleteIntelRealsense = false;
while (loop_forever)
{
// loop_forever = false;
// code for testing the rs2::device.hardware_reset() command
if (true)
{
std::cout << ttx() << "create rs2::ctx " << std::endl;
rs2::context ctx;
std::cout << ttx() << "ctx.set_devices_changed_callback() " << std::endl;
ctx.set_devices_changed_callback([&](rs2::event_information& info)
{
// loop thru all new devices - that is one that has been reset effectively
for (auto&& dev : info.get_new_devices())
{
std::string devName = "";
std::string devSerialNumber = "";
std::string devFirmware = "";
std::string devProdId = "";
devProdId = dev.get_info(RS2_CAMERA_INFO_PRODUCT_ID);
devSerialNumber = dev.get_info(RS2_CAMERA_INFO_SERIAL_NUMBER);
devName = dev.get_info(RS2_CAMERA_INFO_NAME);
if (devName == "Intel RealSense D455")
{
devFirmware = dev.get_info(RS2_CAMERA_INFO_FIRMWARE_VERSION);
std::cout << ttx() << " in custom devices_changed_callback : camera found. Mark hardware reset to be finished " << std::endl;
resetCompleteIntelRealsense = true;
}
}
});
rs2::device_list devs = ctx.query_devices(RS2_PRODUCT_LINE_DEPTH);
for (rs2::device &&device : devs) // only one device is connected !
{
std::cout << ttx() << " RS2_CAMERA_INFO_NAME " << device.get_info(RS2_CAMERA_INFO_NAME) << std::endl;
std::cout << ttx() << " RS2_CAMERA_INFO_SERIAL_NUMBER " << device.get_info(RS2_CAMERA_INFO_SERIAL_NUMBER) << std::endl;
std::cout << ttx() << " RS2_CAMERA_INFO_FIRMWARE_VERSION " << device.get_info(RS2_CAMERA_INFO_FIRMWARE_VERSION) << std::endl;
std::cout << ttx() << " RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR " << device.get_info(RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR) << std::endl;
resetCompleteIntelRealsense = false;
device.hardware_reset();
while (resetCompleteIntelRealsense == false) // wait until camera seen again
{
int cnt=0;
if (cnt++ > 100) // just to calm down the console output
{
cnt = 0;
std::cout << ttx() << "Wait until HW reset has been completed." << std::endl;
}
}
}
std::cout << ttx() << "Leaving scope of object rs2::context ctx" << std::endl;
}
std::cout << ttx() << " End of loop reached" << std::endl;
} // end while(loop_forever)
return 0;
}
This is the full gdb output
(gdb) r
Starting program: /home/applic/th_realsense_bug/home/applic/exe/realsense_test
warning: File "/home/shared/vendor/install-gcc-i686-6.3-r1452/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /home/shared/vendor/install-gcc-i686-6.3-r1452/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
50 us ThID 18256 realsense_test: is starting ...
265 us ThID 18256 realsense_test: create rs2::ctx
[New Thread 0xf62f4b40 (LWP 18285)]
[New Thread 0xf5af3b40 (LWP 18286)]
[Thread 0xf5af3b40 (LWP 18286) exited]
[New Thread 0xf5af3b40 (LWP 18287)]
[Thread 0xf5af3b40 (LWP 18287) exited]
[New Thread 0xf5af3b40 (LWP 18288)]
[Thread 0xf5af3b40 (LWP 18288) exited]
28100 us ThID 18256 realsense_test: ctx.set_devices_changed_callback()
[New Thread 0xf5af3b40 (LWP 18289)]
[Thread 0xf5af3b40 (LWP 18289) exited]
[New Thread 0xf5af3b40 (LWP 18290)]
[Thread 0xf5af3b40 (LWP 18290) exited]
[New Thread 0xf5af3b40 (LWP 18291)]
[Thread 0xf5af3b40 (LWP 18291) exited]
[New Thread 0xf5af3b40 (LWP 18292)]
[New Thread 0xf52f2b40 (LWP 18293)]
[Thread 0xf52f2b40 (LWP 18293) exited]
[New Thread 0xf52f2b40 (LWP 18294)]
[New Thread 0xf4af1b40 (LWP 18295)]
[New Thread 0xf42f0b40 (LWP 18296)]
[New Thread 0xf3aefb40 (LWP 18297)]
[New Thread 0xf30ffb40 (LWP 18298)]
[New Thread 0xf26ffb40 (LWP 18299)]
[New Thread 0xf1efeb40 (LWP 18300)]
[Thread 0xf30ffb40 (LWP 18298) exited]
[New Thread 0xf16fdb40 (LWP 18301)]
[New Thread 0xf0efcb40 (LWP 18302)]
[New Thread 0xf06fbb40 (LWP 18303)]
[New Thread 0xefefab40 (LWP 18304)]
[New Thread 0xef6f9b40 (LWP 18305)]
[New Thread 0xeeef8b40 (LWP 18306)]
[New Thread 0xf30ffb40 (LWP 18307)]
[Thread 0xf30ffb40 (LWP 18307) exited]
165206 us ThID 18256 realsense_test: RS2_CAMERA_INFO_NAME Intel RealSense D455
165277 us ThID 18256 realsense_test: RS2_CAMERA_INFO_SERIAL_NUMBER 117222250835
165535 us ThID 18256 realsense_test: RS2_CAMERA_INFO_FIRMWARE_VERSION 5.15.1
165646 us ThID 18256 realsense_test: RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR 3.2
[New Thread 0xf30ffb40 (LWP 18308)]
[Thread 0xf30ffb40 (LWP 18308) exited]
[New Thread 0xee4ffb40 (LWP 18324)]
[New Thread 0xedcfeb40 (LWP 18325)]
[New Thread 0xed4fdb40 (LWP 18326)]
[New Thread 0xeccfcb40 (LWP 18327)]
[New Thread 0xec2ffb40 (LWP 18328)]
[New Thread 0xeb8ffb40 (LWP 18329)]
[New Thread 0xeb0feb40 (LWP 18330)]
[Thread 0xec2ffb40 (LWP 18328) exited]
[New Thread 0xea8fdb40 (LWP 18331)]
[New Thread 0xea0fcb40 (LWP 18332)]
[New Thread 0xe98fbb40 (LWP 18333)]
[New Thread 0xe90fab40 (LWP 18334)]
[New Thread 0xe88f9b40 (LWP 18335)]
[New Thread 0xe80f8b40 (LWP 18336)]
[New Thread 0xec2ffb40 (LWP 18337)]
[Thread 0xec2ffb40 (LWP 18337) exited]
2139352 us ThID 18285 realsense_test: in custom devices_changed_callback : camera found. Mark hardware reset to be finished
[Thread 0xeb8ffb40 (LWP 18329) exited]
[Thread 0xf26ffb40 (LWP 18299) exited]
[Thread 0xf5af3b40 (LWP 18292) exited]
[Thread 0xee4ffb40 (LWP 18324) exited]
[Thread 0xedcfeb40 (LWP 18325) exited]
[Thread 0xf4af1b40 (LWP 18295) exited]
[Thread 0xed4fdb40 (LWP 18326) exited]
[Thread 0xf42f0b40 (LWP 18296) exited]
[Thread 0xeccfcb40 (LWP 18327) exited]
[Thread 0xf3aefb40 (LWP 18297) exited]
[Thread 0xf1efeb40 (LWP 18300) exited]
[Thread 0xeb0feb40 (LWP 18330) exited]
[Thread 0xea8fdb40 (LWP 18331) exited]
[Thread 0xf16fdb40 (LWP 18301) exited]
[Thread 0xf0efcb40 (LWP 18302) exited]
[Thread 0xf06fbb40 (LWP 18303) exited]
[Thread 0xea0fcb40 (LWP 18332) exited]
[Thread 0xe98fbb40 (LWP 18333) exited]
[Thread 0xefefab40 (LWP 18304) exited]
[Thread 0xef6f9b40 (LWP 18305) exited]
[Thread 0xeeef8b40 (LWP 18306) exited]
[Thread 0xf52f2b40 (LWP 18294) exited]
2147597 us ThID 18256 realsense_test: Leaving scope of object rs2::context ctx
[Thread 0xe90fab40 (LWP 18334) exited]
[Thread 0xe88f9b40 (LWP 18335) exited]
[Thread 0xe80f8b40 (LWP 18336) exited]
2149009 us ThID 18256 realsense_test: End of loop reached
from librealsense.
I would recommend not using threads unless the project absolutely requires it, as it can introduce instability that does not occur in non-threaded scripts that perform the same function.
A memory leak that causes the computer to run out of memory could be caused if there is something in your loop that should be placed outside of the loop.
from librealsense.
Hi @Schobesberger Do you require further assistance with this case, please? Thanks!
from librealsense.
Hi Marty,
thank you yery much for your inquiry.
Unfortunately your suggestions did not help.
issue: deadlock when device.hardware_reset() is called
As I mentioned, our productivity code does not call the method device.hardware_reset(), but the library does it by itself. Therefore I cannot fix the deadlock problem.
The cancellable_timer.try_sleep()
in the polling_device_watcher.polling()
method
class polling_device_watcher : public librealsense:platform::device_watcher
{
...
void polling( dispatcher::cancellable_timer cancellable_timer )
{
cancellable_timer.try_sleep() at" << __LINE__ << std::endl;M.unlock();}
if( cancellable_timer.try_sleep( std::chrono::milliseconds( POLLING_DEVICES_INTERVAL_MS ) ) )
{
finally calls _owner->_was_stopped_cv.wait_for( lock, sleep_time, [&]() { return was_stopped(); } );
bool try_sleep( Duration sleep_time )
{
using namespace std::chrono;
std::unique_lock<std::mutex> lock(_owner->_was_stopped_mutex);
if( was_stopped() )
{
return false;
}
// wait_for() returns "false if the predicate pred still evaluates to false after the
// rel_time timeout expired, otherwise true"
bool retval = _owner->_was_stopped_cv.wait_for( lock, sleep_time, [&]() { return was_stopped(); } );
The wait_for()
method times out after the time POLLING_DEVICES_INTERVAL_MS (which is 2000 ms). If , for reasons, within this time, the rs2::context ctx object is deleted because leaving its scope, the deadlock with some nested mutexes occurs.
I have analyzed the deadlock situation in the already posted file
https://github.com/IntelRealSense/librealsense/files/14281898/realsense_deadlock_cleaned_log.txt
Issue: hanging thread and memory exhaustion
The example code uses no threads whatsoever. It is the problem with the threads the librealsense library creates by itself.
The example code uses your suggested bugfix to get rid if the deadlock by using thectx.set_devices_changed_callback([&](rs2::event_information& info)
to check if the device is online again after a hardware_reset().
With this solution however one thread created by the librealsense will never die.
I probably found the reason, why the thread never dies, but have no idea how to fix this.
The relevant code of the demo program starts with creating an rs2::context
object.
The constructor of the rs2::context ctx;
object, creates the rs2_context _context
object and indirectly a librealsense::context
object in the used factory rs2_create_context()
. The factory rs2_create_context()
creates the std::shared_ptr<librealsense::context> ctx;
shared pointer of rs2_context
.
struct rs2_context
{
~rs2_context()
{
ctx->stop();
}
std::shared_ptr<librealsense::context> ctx;
};
context()
{
rs2_error* e = nullptr;
_context = std::shared_ptr<rs2_context>(
rs2_create_context(RS2_API_VERSION, &e),
rs2_delete_context);
This shared pointer is then copied many times at different places. Some examples:
std::vector<std::shared_ptr<device_info>> context::create_devices(platform::backend_device_group devices,
const std::map<std::string, std::weak_ptr<device_info>>& playback_devices,
int mask) const
{
std::vector<std::shared_ptr<device_info>> list;
auto t = const_cast<context*>(this); // While generally a bad idea, we need to provide mutable reference to the devices
// to allow them to modify context later on
auto ctx = t->shared_from_this(); **// shared pointer ctx copied !!!**
if (mask & RS2_PRODUCT_LINE_D400)
{
auto d400_devices = d400_info::pick_d400_devices(ctx, devices); **// shared pointer ctx copied !!!**
When the rs2::context ctx
object leaves its scope in the demo program, the 'rs2::~context()' destructor is called, which calls the deleter rs2_delete_context()
of its std::shared_ptr<rs2_context>_context
member.
rs2_delete_context()
calls delete context
, where context
is a rs2_context object
.
void rs2_delete_context(rs2_context* context) BEGIN_API_CALL
{
VALIDATE_NOT_NULL(context);
delete context; // calls ~rs2_context()
}
The destructor of rs2_context
calls ctx->stop()
and finally should destroy its member std::shared_ptr<librealsense::context> ctx
struct rs2_context
{
~rs2_context()
{
std::cout << "~rs2_context(): ctx.use_count() = " << ctx.use_count() << std::endl;
ctx->stop();
}
std::shared_ptr<librealsense::context> ctx;
};
The shared pointer std::shared_ptr<librealsense::context> ctx
of struct rs2_context
however has a use_count > 1 and the destructor of librealsense::context
is not called.
The librealsense::context destructor is not called and hence _device_watcher->stop();
neither.
{
_device_watcher->stop(); //ensure that the device watcher will stop before the _devices_changed_callback will be deleted
}
I added a console output of the use_count of the rs2_context std::shared_ptrlibrealsense::context ctx member.
The use_count of the shared pointer ctx varies between 7 and 15.
Some outputs of
..
~rs2_context(): ctx.use_count() = 15
Hope this description helps.
from librealsense.
There is a detailed discussion at #7098 about how threads work.
In that discussion, it is suggested to use the librealsense SDK in V4L2 backend mode. In the 'Required Info' information provided at the top of your case, it states that you are using the RSUSB backend, which is 'V4L2 backend = false'. So you could try building librealsense from source code with CMake with the flag -DFORCE_RSUSB_BACKEND=FALSE to build the SDK with the V4L2 Backend to see whether it makes a difference to your deadlock problem if you have not tried it already.
from librealsense.
Hi @Schobesberger Do you require further assistance with this case, please? Thanks!
from librealsense.
Hi Marty,
building librealsense with -DFORCE_RSUSB_BACKEND=FALSE using the V4L2 backend makes no difference. Neither the deadlock problem nor the issue with the not called librealsense::~context
destructor, which probably leads to the hanging thread, have been solved.
from librealsense.
You mention at the start of this case that you have a sleep() workaround that prevents deadlock, but if the very first thread is allowed to keep existing without being destroyed then this apparently causes the computer to run out of memory.
If the first thread will not destroy then an alternative might be to release frames to free up memory capacity using the rs2_release_frame() instruction, as advised at #4006 (comment)
from librealsense.
Hi @Schobesberger After the advice provided in the comment above, do you have an update about this case that you can provide please? Thanks!
from librealsense.
Hi Marty,
Unfortunately I am currently busy with other stuff and have momentarily no time for further investigations in this topic. My sleep() workaround does not block any thread, but only the main tread. The main thread pauses for some time (until a try_wait() of some of the library created workthreads times out within its 2 seconds timeout period) which prevents the deadlock. My sleep() workaround does technically the same as the workaround in issue #9287.
I experienced a growing instability of the librealsense, when I tried to add more and more console debug output. Process increasingly often terminates with a segmentation fault.
I'll keep you informed as soon as I have new knowledge. Thanks for your help !
from librealsense.
Thanks very much for the update. As you will not be working on the problem for the time being, I will close the issue and you are welcome to re-open it at a future date when you resume working on it. Good luck!
from librealsense.
Related Issues (20)
- Record .bag file from RealSense camera with Python HOT 4
- RealSense D456 not working on Jetson Orin Nano with ROS2 packages HOT 4
- Layer stuctured of PCD of scan using Intel D415 HOT 8
- 软件打不开 HOT 3
- Program terminated due to an unrecoverable SEH exception:Access Violation! HOT 2
- JetPack 5.10 and ubuntu 20 on xavier nx board
- Launching multiple L515 cameras at once take time to start the camera HOT 38
- Segment tree (foreground) from background with RGB and **complicated** depth data HOT 5
- how to enable firmware logs through c++? HOT 5
- RealSense Can't list camera, D435 for windows 11 HOT 13
- can't open camera HOT 3
- Windows connects to D435i over ethernet HOT 8
- How to generate a solid (not just a surface) after 3D scanning and reconstruction with realsense D405 camera? HOT 3
- Flickering Issue on realsense d450 HOT 16
- How to read through a bag file ONCE HOT 2
- Jfrog repo is not available HOT 9
- IMU - roll can fluctuate between 10 degree difference. Is that normal? HOT 5
- Distorsion Correction HOT 3
- Recording and using advanced mode at the same time HOT 14
- Failed to resolve the request HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from librealsense.