My program utilizes a 'for' loop parallelized with two methods: OpenMP and oneTBB, with oneTBB executing faster.
What's the difference between OpenMP and oneTBB at a low level (memory organization, thread creation, cache)?
I use a parallel 'for' loop in the Gauss algorithm, and regardless of the matrix size, oneTBB executes faster than OpenMP.