GithubHelp home page GithubHelp logo

Comments (8)

pavelkumbrasev avatar pavelkumbrasev commented on September 27, 2024

Hi @blonded04,
Sorry I didn't understand your question.
What is the problem you highlighted in gdb?

from onetbb.

blonded04 avatar blonded04 commented on September 27, 2024

Hi @blonded04, Sorry I didn't understand your question. What is the problem you highlighted in gdb?

I get segfaults on that line :(

new (&ctx.my_cpu_ctl_env) d1::cpu_ctl_env(*src_ctl);

Probably something wrong with ctx.my_cpu_ctl_env (null potentially?) or with src_ctl.

from onetbb.

pavelkumbrasev avatar pavelkumbrasev commented on September 27, 2024

It hard to say what went wrong. Could you provide reproducer?

from onetbb.

blonded04 avatar blonded04 commented on September 27, 2024

Sorry, it took a while, however the smallest reproducer I got problems with task_group_context is 65 lines of diff to TBB (Ubuntu, Ryzen 5500u).

My hypothesis is that TBB is not a friend of creating a context inside local_wait_for_all where context already exists. If I'm right, how can I safely change context.

main.cpp:

#include <atomic>
#include <iostream>

// shared_state between tbb and main.cpp
#include <oneapi/tbb/problems.h>
#include <tbb/parallel_for.h>

constexpr unsigned nthread = 12u;

int main() {
    // force nthread threads to join arena (they wont leave because wait-limit in task dispatcher is increased)
    std::atomic<unsigned> spin_barrier(nthread);
    tbb::parallel_for(tbb::blocked_range<int>(0, nthread), [&spin_barrier](tbb::blocked_range<int>) {
        spin_barrier.fetch_sub(1, std::memory_order_release);
        while (spin_barrier.load(std::memory_order_acquire)) {
            asm volatile ("pause\npause\npause\npause");
        }
    });
    std::cout << "parallel_for_finished" << std::endl;

    // enable problematic behaviour
    tbb::set_flag();

    // some time later we will face SEGFAULT
    while (tbb::get_counter() < 1000000u) {
        std::cout << "\t" << tbb::get_counter() << std::endl;
    }

    // we won't even get to it
    tbb::set_flag(false);
}

GDB backtrace for segfault:

Thread 2 "main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff799c700 (LWP 12481)]
tbb::detail::r1::context_guard_helper<false>::set_ctx (ctx=0x7ffff799bcc0, this=0x7ffff799bb80) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/scheduler_common.h:155
155	            curr_cpu_ctl_env.set_env();
(gdb) bt
#0  tbb::detail::r1::context_guard_helper<false>::set_ctx (ctx=0x7ffff799bcc0, this=0x7ffff799bb80) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/scheduler_common.h:155
#1  tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7ffff0001e80, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:311
#2  tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:472
#3  tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.cpp:168
#4  0x000000000040c0c7 in tbb::detail::d1::execute_and_wait (w_ctx=..., wait_ctx=..., t_ctx=..., t=warning: RTTI symbol not found for class 'tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>'
...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/detail/_task.h:191
#5  tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&, tbb::detail::d1::task_group_context&) (context=..., partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:113
#6  tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&, tbb::detail::d1::task_group_context&) (context=..., partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:105
#7  tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&) (partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:102
#8  tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}>(tbb::detail::d1::blocked_range<unsigned int> const&, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&) (body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:230
#9  tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter> (this=this@entry=0x46a480, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:238
#10 0x0000000000407c68 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (waiter=..., t=0x0, this=0x46a480) at /usr/include/c++/11/bits/atomic_base.h:818
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (waiter=..., t=0x0, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:472
#12 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/arena.cpp:137
#13 0x0000000000418b0b in tbb::detail::r1::market::process (this=0x460400, j=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/market.cpp:599
#14 0x000000000041be19 in tbb::detail::r1::rml::private_worker::run (this=0x468e00) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/private_server.cpp:271
#15 0x000000000041bfad in tbb::detail::r1::rml::private_worker::thread_routine (arg=<optimized out>) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/private_server.cpp:221
#16 0x00007ffff7b9c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007ffff7ac1133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Diff in TBB:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 47872941..eaa81b12 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -66,7 +66,7 @@ include(CMakeDependentOption)
 # Handle C++ standard version.
 if (NOT MSVC)  # no need to cover MSVC as it uses C++14 by default.
     if (NOT CMAKE_CXX_STANDARD)
-        set(CMAKE_CXX_STANDARD 11)
+        set(CMAKE_CXX_STANDARD 20)
     endif()
 
     if (CMAKE_CXX${CMAKE_CXX_STANDARD}_STANDARD_COMPILE_OPTION)  # if standard option was detected by CMake
@@ -108,7 +108,7 @@ option(TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH "Disable HWLOC automatic search by pkg
 option(TBB_ENABLE_IPO "Enable Interprocedural Optimization (IPO) during the compilation" ON)
 
 if (NOT DEFINED BUILD_SHARED_LIBS)
-    set(BUILD_SHARED_LIBS ON)
+    set(BUILD_SHARED_LIBS OFF)
 endif()
 
 if (NOT BUILD_SHARED_LIBS)
diff --git a/include/oneapi/tbb/parallel_for.h b/include/oneapi/tbb/parallel_for.h
index 91c7c44c..fd3f6aee 100644
--- a/include/oneapi/tbb/parallel_for.h
+++ b/include/oneapi/tbb/parallel_for.h
@@ -29,6 +29,7 @@
 #include "task_group.h"
 
 #include <cstddef>
+#include <functional>
 #include <new>
 
 namespace tbb {
@@ -109,12 +110,12 @@ struct start_for : public task {
             // defer creation of the wait node until task allocation succeeds
             wait_node wn;
             for_task.my_parent = &wn;
-            execute_and_wait(for_task, context, wn.m_wait, context);
+            d1::execute_and_wait(for_task, context, wn.m_wait, context);
         }
     }
     //! Run body for range, serves as callback for partitioner
     void run_body( Range &r ) {
-        tbb::detail::invoke(my_body, r);
+        my_body(r);
     }
 
     //! spawn right task, serves as callback for partitioner
diff --git a/include/oneapi/tbb/problems.h b/include/oneapi/tbb/problems.h
new file mode 100644
index 00000000..fa380b53
--- /dev/null
+++ b/include/oneapi/tbb/problems.h
@@ -0,0 +1,37 @@
+#pragma once
+
+#include <atomic>
+
+namespace tbb {
+
+namespace internal {
+
+inline std::atomic<bool>& get_flag_impl() {
+    static std::atomic<bool> flag(false);
+    return flag;
+}
+
+inline std::atomic<unsigned>& get_counter_impl() {
+    static std::atomic<unsigned> counter(0u);
+    return counter;
+}
+
+} // namespace internal
+
+inline void set_flag(bool value=true) {
+    internal::get_flag_impl().store(value, std::memory_order_release);
+}
+
+inline bool get_flag() {
+    return internal::get_flag_impl().load(std::memory_order_acquire);
+}
+
+inline unsigned get_counter() {
+    return internal::get_counter_impl().load(std::memory_order_acquire);
+}
+
+inline void increment_counter() {
+    internal::get_counter_impl().fetch_add(1, std::memory_order_release);
+}
+
+} // namespace tbb
\ No newline at end of file
diff --git a/src/tbb/scheduler_common.h b/src/tbb/scheduler_common.h
index 9e103657..ddf082aa 100644
--- a/src/tbb/scheduler_common.h
+++ b/src/tbb/scheduler_common.h
@@ -254,7 +254,7 @@ public:
         // threshold value tuned separately for macOS due to high cost of sched_yield there
         , my_yield_threshold{10 * yields_multiplier}
 #else
-        , my_yield_threshold{100 * yields_multiplier}
+        , my_yield_threshold{10000000 * yields_multiplier}
 #endif
         , my_pause_count{}
         , my_yield_count{}
diff --git a/src/tbb/task_dispatcher.h b/src/tbb/task_dispatcher.h
index f6ff3f17..aa16bf57 100644
--- a/src/tbb/task_dispatcher.h
+++ b/src/tbb/task_dispatcher.h
@@ -30,6 +30,10 @@
 #include "itt_notify.h"
 #include "concurrent_monitor.h"
 
+#include "oneapi/tbb/task_group.h"
+#include "oneapi/tbb/parallel_for.h"
+#include "oneapi/tbb/problems.h"
+
 #include <atomic>
 
 #if !__TBB_CPU_CTL_ENV_PRESENT
@@ -229,6 +233,16 @@ d1::task* task_dispatcher::receive_or_steal_task(
         }
         // Nothing to do, pause a little.
         waiter.pause(slot);
+
+        if (get_flag()) {
+            tbb::parallel_for(
+                tbb::blocked_range<unsigned>{0u, arena_index + 1u}, 
+                [] (tbb::blocked_range<unsigned> range) {
+                    for (unsigned idx = range.begin(); idx < range.end(); idx++) {
+                        increment_counter();
+                    }
+                });
+        }
     } // end of nonlocal task retrieval loop
 
     __TBB_ASSERT(is_alive(a.my_guard), nullptr);

from onetbb.

blonded04 avatar blonded04 commented on September 27, 2024

What happens internally is I end up calling local_wait_for_all inside local_wait_for_all, but not inside any other task.

Is there any assumptions that forbid that kind of behavior and are there any workarounds that respect existing invariants?

from onetbb.

sarathnandu avatar sarathnandu commented on September 27, 2024

Hi @blonded04

It seems you are trying to do a parallel_for inside the steal loop which is something we never tried. Looking at your gdb it's missing debug information (possibly due to optimizations turned on ). So i would like to ask you to run it on debug mode so that assertions should provide some context on what is going on.

But I would like to clarify running parallel_for inside the steal loop is something we never recommend to do.

from onetbb.

sarathnandu avatar sarathnandu commented on September 27, 2024

Hi @blonded04 , can we close this issue?

from onetbb.

blonded04 avatar blonded04 commented on September 27, 2024

Hi @blonded04 , can we close this issue?

Yes! Thank you very much for help

from onetbb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.