Comments (8)
Hi @blonded04,
Sorry I didn't understand your question.
What is the problem you highlighted in gdb?
from onetbb.
Hi @blonded04, Sorry I didn't understand your question. What is the problem you highlighted in gdb?
I get segfaults on that line :(
new (&ctx.my_cpu_ctl_env) d1::cpu_ctl_env(*src_ctl);
Probably something wrong with ctx.my_cpu_ctl_env
(null potentially?) or with src_ctl
.
from onetbb.
It hard to say what went wrong. Could you provide reproducer?
from onetbb.
Sorry, it took a while, however the smallest reproducer I got problems with task_group_context
is 65 lines of diff to TBB (Ubuntu, Ryzen 5500u).
My hypothesis is that TBB is not a friend of creating a context inside local_wait_for_all
where context already exists. If I'm right, how can I safely change context.
main.cpp:
#include <atomic>
#include <iostream>
// shared_state between tbb and main.cpp
#include <oneapi/tbb/problems.h>
#include <tbb/parallel_for.h>
constexpr unsigned nthread = 12u;
int main() {
// force nthread threads to join arena (they wont leave because wait-limit in task dispatcher is increased)
std::atomic<unsigned> spin_barrier(nthread);
tbb::parallel_for(tbb::blocked_range<int>(0, nthread), [&spin_barrier](tbb::blocked_range<int>) {
spin_barrier.fetch_sub(1, std::memory_order_release);
while (spin_barrier.load(std::memory_order_acquire)) {
asm volatile ("pause\npause\npause\npause");
}
});
std::cout << "parallel_for_finished" << std::endl;
// enable problematic behaviour
tbb::set_flag();
// some time later we will face SEGFAULT
while (tbb::get_counter() < 1000000u) {
std::cout << "\t" << tbb::get_counter() << std::endl;
}
// we won't even get to it
tbb::set_flag(false);
}
GDB backtrace for segfault:
Thread 2 "main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff799c700 (LWP 12481)]
tbb::detail::r1::context_guard_helper<false>::set_ctx (ctx=0x7ffff799bcc0, this=0x7ffff799bb80) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/scheduler_common.h:155
155 curr_cpu_ctl_env.set_env();
(gdb) bt
#0 tbb::detail::r1::context_guard_helper<false>::set_ctx (ctx=0x7ffff799bcc0, this=0x7ffff799bb80) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/scheduler_common.h:155
#1 tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7ffff0001e80, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:311
#2 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:472
#3 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.cpp:168
#4 0x000000000040c0c7 in tbb::detail::d1::execute_and_wait (w_ctx=..., wait_ctx=..., t_ctx=..., t=warning: RTTI symbol not found for class 'tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>'
...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/detail/_task.h:191
#5 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&, tbb::detail::d1::task_group_context&) (context=..., partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:113
#6 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&, tbb::detail::d1::task_group_context&) (context=..., partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:105
#7 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<unsigned int> const&, {lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&, tbb::detail::d1::auto_partitioner&) (partitioner=..., body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:102
#8 tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1}>(tbb::detail::d1::blocked_range<unsigned int> const&, tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool)::{lambda(tbb::detail::d1::blocked_range<unsigned int>)#1} const&) (body=..., range=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/../../include/oneapi/tbb/parallel_for.h:230
#9 tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter> (this=this@entry=0x46a480, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:238
#10 0x0000000000407c68 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (waiter=..., t=0x0, this=0x46a480) at /usr/include/c++/11/bits/atomic_base.h:818
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (waiter=..., t=0x0, this=0x46a480) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/task_dispatcher.h:472
#12 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/arena.cpp:137
#13 0x0000000000418b0b in tbb::detail::r1::market::process (this=0x460400, j=...) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/market.cpp:599
#14 0x000000000041be19 in tbb::detail::r1::rml::private_worker::run (this=0x468e00) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/private_server.cpp:271
#15 0x000000000041bfad in tbb::detail::r1::rml::private_worker::thread_routine (arg=<optimized out>) at /home/valery/repos/tbb_test/cmake_deps/onetbb-src/src/tbb/private_server.cpp:221
#16 0x00007ffff7b9c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007ffff7ac1133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Diff in TBB:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 47872941..eaa81b12 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -66,7 +66,7 @@ include(CMakeDependentOption)
# Handle C++ standard version.
if (NOT MSVC) # no need to cover MSVC as it uses C++14 by default.
if (NOT CMAKE_CXX_STANDARD)
- set(CMAKE_CXX_STANDARD 11)
+ set(CMAKE_CXX_STANDARD 20)
endif()
if (CMAKE_CXX${CMAKE_CXX_STANDARD}_STANDARD_COMPILE_OPTION) # if standard option was detected by CMake
@@ -108,7 +108,7 @@ option(TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH "Disable HWLOC automatic search by pkg
option(TBB_ENABLE_IPO "Enable Interprocedural Optimization (IPO) during the compilation" ON)
if (NOT DEFINED BUILD_SHARED_LIBS)
- set(BUILD_SHARED_LIBS ON)
+ set(BUILD_SHARED_LIBS OFF)
endif()
if (NOT BUILD_SHARED_LIBS)
diff --git a/include/oneapi/tbb/parallel_for.h b/include/oneapi/tbb/parallel_for.h
index 91c7c44c..fd3f6aee 100644
--- a/include/oneapi/tbb/parallel_for.h
+++ b/include/oneapi/tbb/parallel_for.h
@@ -29,6 +29,7 @@
#include "task_group.h"
#include <cstddef>
+#include <functional>
#include <new>
namespace tbb {
@@ -109,12 +110,12 @@ struct start_for : public task {
// defer creation of the wait node until task allocation succeeds
wait_node wn;
for_task.my_parent = &wn;
- execute_and_wait(for_task, context, wn.m_wait, context);
+ d1::execute_and_wait(for_task, context, wn.m_wait, context);
}
}
//! Run body for range, serves as callback for partitioner
void run_body( Range &r ) {
- tbb::detail::invoke(my_body, r);
+ my_body(r);
}
//! spawn right task, serves as callback for partitioner
diff --git a/include/oneapi/tbb/problems.h b/include/oneapi/tbb/problems.h
new file mode 100644
index 00000000..fa380b53
--- /dev/null
+++ b/include/oneapi/tbb/problems.h
@@ -0,0 +1,37 @@
+#pragma once
+
+#include <atomic>
+
+namespace tbb {
+
+namespace internal {
+
+inline std::atomic<bool>& get_flag_impl() {
+ static std::atomic<bool> flag(false);
+ return flag;
+}
+
+inline std::atomic<unsigned>& get_counter_impl() {
+ static std::atomic<unsigned> counter(0u);
+ return counter;
+}
+
+} // namespace internal
+
+inline void set_flag(bool value=true) {
+ internal::get_flag_impl().store(value, std::memory_order_release);
+}
+
+inline bool get_flag() {
+ return internal::get_flag_impl().load(std::memory_order_acquire);
+}
+
+inline unsigned get_counter() {
+ return internal::get_counter_impl().load(std::memory_order_acquire);
+}
+
+inline void increment_counter() {
+ internal::get_counter_impl().fetch_add(1, std::memory_order_release);
+}
+
+} // namespace tbb
\ No newline at end of file
diff --git a/src/tbb/scheduler_common.h b/src/tbb/scheduler_common.h
index 9e103657..ddf082aa 100644
--- a/src/tbb/scheduler_common.h
+++ b/src/tbb/scheduler_common.h
@@ -254,7 +254,7 @@ public:
// threshold value tuned separately for macOS due to high cost of sched_yield there
, my_yield_threshold{10 * yields_multiplier}
#else
- , my_yield_threshold{100 * yields_multiplier}
+ , my_yield_threshold{10000000 * yields_multiplier}
#endif
, my_pause_count{}
, my_yield_count{}
diff --git a/src/tbb/task_dispatcher.h b/src/tbb/task_dispatcher.h
index f6ff3f17..aa16bf57 100644
--- a/src/tbb/task_dispatcher.h
+++ b/src/tbb/task_dispatcher.h
@@ -30,6 +30,10 @@
#include "itt_notify.h"
#include "concurrent_monitor.h"
+#include "oneapi/tbb/task_group.h"
+#include "oneapi/tbb/parallel_for.h"
+#include "oneapi/tbb/problems.h"
+
#include <atomic>
#if !__TBB_CPU_CTL_ENV_PRESENT
@@ -229,6 +233,16 @@ d1::task* task_dispatcher::receive_or_steal_task(
}
// Nothing to do, pause a little.
waiter.pause(slot);
+
+ if (get_flag()) {
+ tbb::parallel_for(
+ tbb::blocked_range<unsigned>{0u, arena_index + 1u},
+ [] (tbb::blocked_range<unsigned> range) {
+ for (unsigned idx = range.begin(); idx < range.end(); idx++) {
+ increment_counter();
+ }
+ });
+ }
} // end of nonlocal task retrieval loop
__TBB_ASSERT(is_alive(a.my_guard), nullptr);
from onetbb.
What happens internally is I end up calling local_wait_for_all
inside local_wait_for_all
, but not inside any other task.
Is there any assumptions that forbid that kind of behavior and are there any workarounds that respect existing invariants?
from onetbb.
Hi @blonded04
It seems you are trying to do a parallel_for inside the steal loop which is something we never tried. Looking at your gdb it's missing debug information (possibly due to optimizations turned on ). So i would like to ask you to run it on debug mode so that assertions should provide some context on what is going on.
But I would like to clarify running parallel_for inside the steal loop is something we never recommend to do.
from onetbb.
Hi @blonded04 , can we close this issue?
from onetbb.
Hi @blonded04 , can we close this issue?
Yes! Thank you very much for help
from onetbb.
Related Issues (20)
- Accessible non-virtual destructor HOT 5
- Will using different versions of the library conflict with each other and cause undefined behavior? HOT 4
- TBB cannot be build from FetchContent if OVERRIDE_FIND_PACKAGE is specified
- Build fails within freedesktop runtime HOT 2
- ASAN with gcc11 (RTLD_DEEPBIND flag which is incompatible with sanitizer runtime) HOT 4
- Setting `__TBB_DEFAULT_PARTITIONER` doesn't set partitioner in `parallel_for` with step HOT 3
- Undefined reference linking error HOT 6
- Is oneTBB signal-safe? HOT 1
- getting compile error on tag 2021.9.0 with gcc 8.5.0 HOT 3
- Deadlock issue in OpenBLAS with TBB HOT 4
- 2021.12.0: missing tag/release? HOT 3
- Global task_schedule_observer leads to error on WASM and static lib x86 HOT 9
- doc/README.md is misleading HOT 2
- Immediately scheduling task on a different thread HOT 9
- std::aligned_storage is deprecated in C++23 HOT 1
- Security.md: replace incorrect email address
- Support priority for join_node HOT 3
- CMake FetchContent fails to install oneTBB HOT 3
- Avoid split constructor for the Range concept HOT 4
- Compiling TBB with recent GCC 13.2, 11.4 fails HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onetbb.