GithubHelp home page GithubHelp logo

Comments (12)

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

Hi @Schobesberger A C++ hardware reset script at #9287 (comment) may be a helpful reference. It includes a mechanism to check whether reset has completed and pauses until reset is complete.

from librealsense.

Schobesberger avatar Schobesberger commented on June 19, 2024

Hi MartyG, thank you for your reply.

Unfortunately issue #9287 does not fix the problem for two reasons.

  1. Our application does not explicitly call a device.hardware_reset(), but the library itself does, when downloading a new firmware. Therefore the trick with the ctx.set_devices_changed_callback() cannot be applied.

  2. More important, after calling the device.hardware_reset() the very first thread, which is created in the rs2::context ctx; C´tor environment, never dies.

  3. Sometimes I get core dumps of our productive code from the customer, where it seems that the machine's running out of memory. Maybe there is an issue, whenthe library looses and reestablished the connection to the camera due to poor USB signal quality.

Running the following program in gdb shows that first created thread never dies
gdb:
50 us ThID 18256 realsense_test: is starting ...
265 us ThID 18256 realsense_test: create rs2::ctx
[New Thread 0xf62f4b40 (LWP 18285)] // this thread never dies !!
[New Thread 0xf5af3b40 (LWP 18286)]

 #include <iostream>

#include <mutex>
#include <unistd.h>
#include <syscall.h> // SYS_gettid

 // using librealsense-2.54.2
 
#include <librealsense2/h/rs_device.h>
#include <librealsense2/h/rs_option.h>
#include <librealsense2/h/rs_sensor.h>
#include <librealsense2/hpp/rs_sensor.hpp>
#include <librealsense2/hpp/rs_types.hpp>
#include <librealsense2/rs_advanced_mode.hpp>

// #include <chrono>


 std::recursive_mutex M;

 std::chrono::steady_clock::time_point beg;
 
 template <
    class result_t   = std::chrono::microseconds,
    class clock_t    = std::chrono::steady_clock,
    class duration_t = std::chrono::microseconds
 >
 auto ts() 
 {
   return std::chrono::duration_cast<result_t>(clock_t::now() - beg).count();
 }

 std::string tt(void)
 {
  std::stringstream ss;
  
   ss << ts() << " us ";
   
   return ss.str();
 }

 std::string str_program_name;

 std::string ttx(void)
 {
  std::stringstream ss;
  
   ss << tt() << std::dec << "ThID " << syscall(SYS_gettid) << " " << str_program_name << ": ";
   
   return ss.str();
 }

                                                         // to give some pending try_sleeps the time to timeout 


int main(int argc, char *argv[])
{
  
  beg =  std::chrono::steady_clock::now(); // mark start time point of program
  
  str_program_name = "realsense_test";
  
  std::cout << ttx() <<  "is starting ...\n";
  
  bool loop_forever = true;
  bool resetCompleteIntelRealsense = false;
  
  
  while (loop_forever)
  {  
    // loop_forever = false;  
    
    // code for testing the rs2::device.hardware_reset() command
    if (true)
    {
      std::cout << ttx() << "create rs2::ctx " << std::endl;
    
      rs2::context ctx;

      std::cout << ttx() << "ctx.set_devices_changed_callback() " << std::endl;

      ctx.set_devices_changed_callback([&](rs2::event_information& info)
      {

        // loop thru all new devices - that is one that has been reset effectively
 
         for (auto&& dev : info.get_new_devices())
         {

           std::string devName         = "";
           std::string devSerialNumber = "";
           std::string devFirmware     = "";
           std::string devProdId       = "";

           devProdId       = dev.get_info(RS2_CAMERA_INFO_PRODUCT_ID);
           devSerialNumber = dev.get_info(RS2_CAMERA_INFO_SERIAL_NUMBER);
           devName         = dev.get_info(RS2_CAMERA_INFO_NAME);
           
           if (devName == "Intel RealSense D455")
           {
              devFirmware = dev.get_info(RS2_CAMERA_INFO_FIRMWARE_VERSION);
              
              std::cout << ttx() << " in custom devices_changed_callback : camera  found. Mark hardware reset to be finished " << std::endl;              
              resetCompleteIntelRealsense = true;
           }
          
         }
      });
      

      rs2::device_list devs = ctx.query_devices(RS2_PRODUCT_LINE_DEPTH);
         

      for (rs2::device &&device : devs)  // only one device is connected !
      {

        std::cout << ttx() << " RS2_CAMERA_INFO_NAME                " << device.get_info(RS2_CAMERA_INFO_NAME)                << std::endl;
        std::cout << ttx() << " RS2_CAMERA_INFO_SERIAL_NUMBER       " << device.get_info(RS2_CAMERA_INFO_SERIAL_NUMBER)       << std::endl;
        std::cout << ttx() << " RS2_CAMERA_INFO_FIRMWARE_VERSION    " << device.get_info(RS2_CAMERA_INFO_FIRMWARE_VERSION)    << std::endl;
        std::cout << ttx() << " RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR " << device.get_info(RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR) << std::endl;
     
            
        resetCompleteIntelRealsense = false;
        
        device.hardware_reset();                     

        while (resetCompleteIntelRealsense == false) // wait until camera seen again
        {
          int cnt=0;
          if (cnt++ > 100) // just to calm down the console output
          {
            cnt = 0;
            std::cout << ttx() << "Wait until HW reset has been completed." << std::endl;
          }  
        }
     }

     std::cout << ttx() << "Leaving scope of object rs2::context ctx" << std::endl;
  }
  
  std::cout << ttx() << " End of loop reached" << std::endl;
   
  } // end while(loop_forever)
 
  return 0;
}

This is the full gdb output

(gdb) r
Starting program: /home/applic/th_realsense_bug/home/applic/exe/realsense_test
warning: File "/home/shared/vendor/install-gcc-i686-6.3-r1452/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/shared/vendor/install-gcc-i686-6.3-r1452/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
50 us ThID 18256 realsense_test: is starting ...
265 us ThID 18256 realsense_test: create rs2::ctx
[New Thread 0xf62f4b40 (LWP 18285)]
[New Thread 0xf5af3b40 (LWP 18286)]
[Thread 0xf5af3b40 (LWP 18286) exited]
[New Thread 0xf5af3b40 (LWP 18287)]
[Thread 0xf5af3b40 (LWP 18287) exited]
[New Thread 0xf5af3b40 (LWP 18288)]
[Thread 0xf5af3b40 (LWP 18288) exited]
28100 us ThID 18256 realsense_test: ctx.set_devices_changed_callback()
[New Thread 0xf5af3b40 (LWP 18289)]
[Thread 0xf5af3b40 (LWP 18289) exited]
[New Thread 0xf5af3b40 (LWP 18290)]
[Thread 0xf5af3b40 (LWP 18290) exited]
[New Thread 0xf5af3b40 (LWP 18291)]
[Thread 0xf5af3b40 (LWP 18291) exited]
[New Thread 0xf5af3b40 (LWP 18292)]
[New Thread 0xf52f2b40 (LWP 18293)]
[Thread 0xf52f2b40 (LWP 18293) exited]
[New Thread 0xf52f2b40 (LWP 18294)]
[New Thread 0xf4af1b40 (LWP 18295)]
[New Thread 0xf42f0b40 (LWP 18296)]
[New Thread 0xf3aefb40 (LWP 18297)]
[New Thread 0xf30ffb40 (LWP 18298)]
[New Thread 0xf26ffb40 (LWP 18299)]
[New Thread 0xf1efeb40 (LWP 18300)]
[Thread 0xf30ffb40 (LWP 18298) exited]
[New Thread 0xf16fdb40 (LWP 18301)]
[New Thread 0xf0efcb40 (LWP 18302)]
[New Thread 0xf06fbb40 (LWP 18303)]
[New Thread 0xefefab40 (LWP 18304)]
[New Thread 0xef6f9b40 (LWP 18305)]
[New Thread 0xeeef8b40 (LWP 18306)]
[New Thread 0xf30ffb40 (LWP 18307)]
[Thread 0xf30ffb40 (LWP 18307) exited]
165206 us ThID 18256 realsense_test:  RS2_CAMERA_INFO_NAME                Intel RealSense D455
165277 us ThID 18256 realsense_test:  RS2_CAMERA_INFO_SERIAL_NUMBER       117222250835
165535 us ThID 18256 realsense_test:  RS2_CAMERA_INFO_FIRMWARE_VERSION    5.15.1
165646 us ThID 18256 realsense_test:  RS2_CAMERA_INFO_USB_TYPE_DESCRIPTOR 3.2
[New Thread 0xf30ffb40 (LWP 18308)]
[Thread 0xf30ffb40 (LWP 18308) exited]
[New Thread 0xee4ffb40 (LWP 18324)]
[New Thread 0xedcfeb40 (LWP 18325)]
[New Thread 0xed4fdb40 (LWP 18326)]
[New Thread 0xeccfcb40 (LWP 18327)]
[New Thread 0xec2ffb40 (LWP 18328)]
[New Thread 0xeb8ffb40 (LWP 18329)]
[New Thread 0xeb0feb40 (LWP 18330)]
[Thread 0xec2ffb40 (LWP 18328) exited]
[New Thread 0xea8fdb40 (LWP 18331)]
[New Thread 0xea0fcb40 (LWP 18332)]
[New Thread 0xe98fbb40 (LWP 18333)]
[New Thread 0xe90fab40 (LWP 18334)]
[New Thread 0xe88f9b40 (LWP 18335)]
[New Thread 0xe80f8b40 (LWP 18336)]
[New Thread 0xec2ffb40 (LWP 18337)]
[Thread 0xec2ffb40 (LWP 18337) exited]
2139352 us ThID 18285 realsense_test:  in custom devices_changed_callback : camera  found. Mark hardware reset to be finished
[Thread 0xeb8ffb40 (LWP 18329) exited]
[Thread 0xf26ffb40 (LWP 18299) exited]
[Thread 0xf5af3b40 (LWP 18292) exited]
[Thread 0xee4ffb40 (LWP 18324) exited]
[Thread 0xedcfeb40 (LWP 18325) exited]
[Thread 0xf4af1b40 (LWP 18295) exited]
[Thread 0xed4fdb40 (LWP 18326) exited]
[Thread 0xf42f0b40 (LWP 18296) exited]
[Thread 0xeccfcb40 (LWP 18327) exited]
[Thread 0xf3aefb40 (LWP 18297) exited]
[Thread 0xf1efeb40 (LWP 18300) exited]
[Thread 0xeb0feb40 (LWP 18330) exited]
[Thread 0xea8fdb40 (LWP 18331) exited]
[Thread 0xf16fdb40 (LWP 18301) exited]
[Thread 0xf0efcb40 (LWP 18302) exited]
[Thread 0xf06fbb40 (LWP 18303) exited]
[Thread 0xea0fcb40 (LWP 18332) exited]
[Thread 0xe98fbb40 (LWP 18333) exited]
[Thread 0xefefab40 (LWP 18304) exited]
[Thread 0xef6f9b40 (LWP 18305) exited]
[Thread 0xeeef8b40 (LWP 18306) exited]
[Thread 0xf52f2b40 (LWP 18294) exited]
2147597 us ThID 18256 realsense_test: Leaving scope of object rs2::context ctx
[Thread 0xe90fab40 (LWP 18334) exited]
[Thread 0xe88f9b40 (LWP 18335) exited]
[Thread 0xe80f8b40 (LWP 18336) exited]
2149009 us ThID 18256 realsense_test:  End of loop reached

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

I would recommend not using threads unless the project absolutely requires it, as it can introduce instability that does not occur in non-threaded scripts that perform the same function.

A memory leak that causes the computer to run out of memory could be caused if there is something in your loop that should be placed outside of the loop.

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

Hi @Schobesberger Do you require further assistance with this case, please? Thanks!

from librealsense.

Schobesberger avatar Schobesberger commented on June 19, 2024

Hi Marty,
thank you yery much for your inquiry.

Unfortunately your suggestions did not help.

issue: deadlock when device.hardware_reset() is called

As I mentioned, our productivity code does not call the method device.hardware_reset(), but the library does it by itself. Therefore I cannot fix the deadlock problem.

The cancellable_timer.try_sleep() in the polling_device_watcher.polling() method

class polling_device_watcher : public librealsense:platform::device_watcher
{
    ...

    void polling( dispatcher::cancellable_timer cancellable_timer )
    {
       cancellable_timer.try_sleep() at" << __LINE__ << std::endl;M.unlock();}

        if( cancellable_timer.try_sleep( std::chrono::milliseconds( POLLING_DEVICES_INTERVAL_MS ) ) )
        {

finally calls _owner->_was_stopped_cv.wait_for( lock, sleep_time, [&]() { return was_stopped(); } );

        bool try_sleep( Duration sleep_time )
        {
            using namespace std::chrono;

            std::unique_lock<std::mutex> lock(_owner->_was_stopped_mutex);

            if( was_stopped() )
            {
                return false;
            }   
            // wait_for() returns "false if the predicate pred still evaluates to false after the
            // rel_time timeout expired, otherwise true"
            bool retval = _owner->_was_stopped_cv.wait_for( lock, sleep_time, [&]() { return was_stopped(); } );

The wait_for() method times out after the time POLLING_DEVICES_INTERVAL_MS (which is 2000 ms). If , for reasons, within this time, the rs2::context ctx object is deleted because leaving its scope, the deadlock with some nested mutexes occurs.
I have analyzed the deadlock situation in the already posted file
https://github.com/IntelRealSense/librealsense/files/14281898/realsense_deadlock_cleaned_log.txt

Issue: hanging thread and memory exhaustion

The example code uses no threads whatsoever. It is the problem with the threads the librealsense library creates by itself.
The example code uses your suggested bugfix to get rid if the deadlock by using thectx.set_devices_changed_callback([&](rs2::event_information& info) to check if the device is online again after a hardware_reset().
With this solution however one thread created by the librealsense will never die.

I probably found the reason, why the thread never dies, but have no idea how to fix this.

The relevant code of the demo program starts with creating an rs2::context object.

The constructor of the rs2::context ctx; object, creates the rs2_context _context object and indirectly a librealsense::context object in the used factory rs2_create_context() . The factory rs2_create_context() creates the std::shared_ptr<librealsense::context> ctx; shared pointer of rs2_context.

  struct rs2_context
  {
      ~rs2_context() 
      {
        ctx->stop(); 
      }
      std::shared_ptr<librealsense::context> ctx;
  };
  context()   
  {
            rs2_error* e = nullptr;
            
            _context = std::shared_ptr<rs2_context>(
                rs2_create_context(RS2_API_VERSION, &e),
                rs2_delete_context);

This shared pointer is then copied many times at different places. Some examples:

   std::vector<std::shared_ptr<device_info>> context::create_devices(platform::backend_device_group devices,
                                                                      const std::map<std::string, std::weak_ptr<device_info>>& playback_devices,
                                                                      int mask) const
    {
        std::vector<std::shared_ptr<device_info>> list;

        auto t = const_cast<context*>(this); // While generally a bad idea, we need to provide mutable reference to the devices
        // to allow them to modify context later on
        auto ctx = t->shared_from_this();     **// shared pointer ctx  copied !!!**

       if (mask & RS2_PRODUCT_LINE_D400)
        {
            auto d400_devices = d400_info::pick_d400_devices(ctx, devices);  **// shared pointer ctx  copied !!!**

When the rs2::context ctx object leaves its scope in the demo program, the 'rs2::~context()' destructor is called, which calls the deleter rs2_delete_context() of its std::shared_ptr<rs2_context>_context member.

rs2_delete_context() calls delete context, where context is a rs2_context object.

void rs2_delete_context(rs2_context* context) BEGIN_API_CALL
{
    VALIDATE_NOT_NULL(context);
 
    delete context;   // calls  ~rs2_context() 
 }

The destructor of rs2_context calls ctx->stop() and finally should destroy its member std::shared_ptr<librealsense::context> ctx

struct rs2_context
{
    ~rs2_context() 
    {
      std::cout << "~rs2_context():  ctx.use_count() = " << ctx.use_count() << std::endl; 
      ctx->stop(); 
    }
    std::shared_ptr<librealsense::context> ctx;
};

The shared pointer std::shared_ptr<librealsense::context> ctx of struct rs2_context however has a use_count > 1 and the destructor of librealsense::context is not called.

The librealsense::context destructor is not called and hence _device_watcher->stop(); neither.

 {
     _device_watcher->stop(); //ensure that the device watcher will stop before the _devices_changed_callback will be deleted
 }

I added a console output of the use_count of the rs2_context std::shared_ptrlibrealsense::context ctx member.
The use_count of the shared pointer ctx varies between 7 and 15.

Some outputs of

..
~rs2_context():  ctx.use_count() = 15

Hope this description helps.

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

There is a detailed discussion at #7098 about how threads work.

In that discussion, it is suggested to use the librealsense SDK in V4L2 backend mode. In the 'Required Info' information provided at the top of your case, it states that you are using the RSUSB backend, which is 'V4L2 backend = false'. So you could try building librealsense from source code with CMake with the flag -DFORCE_RSUSB_BACKEND=FALSE to build the SDK with the V4L2 Backend to see whether it makes a difference to your deadlock problem if you have not tried it already.

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

Hi @Schobesberger Do you require further assistance with this case, please? Thanks!

from librealsense.

Schobesberger avatar Schobesberger commented on June 19, 2024

Hi Marty,

building librealsense with -DFORCE_RSUSB_BACKEND=FALSE using the V4L2 backend makes no difference. Neither the deadlock problem nor the issue with the not called librealsense::~context destructor, which probably leads to the hanging thread, have been solved.

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

You mention at the start of this case that you have a sleep() workaround that prevents deadlock, but if the very first thread is allowed to keep existing without being destroyed then this apparently causes the computer to run out of memory.

If the first thread will not destroy then an alternative might be to release frames to free up memory capacity using the rs2_release_frame() instruction, as advised at #4006 (comment)

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

Hi @Schobesberger After the advice provided in the comment above, do you have an update about this case that you can provide please? Thanks!

from librealsense.

Schobesberger avatar Schobesberger commented on June 19, 2024

Hi Marty,

Unfortunately I am currently busy with other stuff and have momentarily no time for further investigations in this topic. My sleep() workaround does not block any thread, but only the main tread. The main thread pauses for some time (until a try_wait() of some of the library created workthreads times out within its 2 seconds timeout period) which prevents the deadlock. My sleep() workaround does technically the same as the workaround in issue #9287.
I experienced a growing instability of the librealsense, when I tried to add more and more console debug output. Process increasingly often terminates with a segmentation fault.

I'll keep you informed as soon as I have new knowledge. Thanks for your help !

from librealsense.

MartyG-RealSense avatar MartyG-RealSense commented on June 19, 2024

Thanks very much for the update. As you will not be working on the problem for the time being, I will close the issue and you are welcome to re-open it at a future date when you resume working on it. Good luck!

from librealsense.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.