OpenMP overhead and linux kernel version

832 views Asked by user3111657 At 17 December 2013 at 15:20

I have used a little test program to test the efficiency of OpenMP for parallelizing a recursive computation using arbitrary precision with the mpfr/gmp libraries. As expected OpenMP overhead makes the parallel version slower at first, but with sufficient bits used the parallel version becomes faster.

The sequential loops go like:

....
for ( i = 0; i < 1000; i++ ) {
    mpfr_set_d ( z1, 0.0, MPFR_RNDN );
    mpfr_set_d ( z2, 0.0, MPFR_RNDN );
    ...
    iter = 0;
    while ( iter < 10000 ) {
         mpfr_sqr ( tmp1, z1, MPFR_RNDN );
         mpfr_sqr ( tmp2, z2, MPFR_RNDN );
         mpfr_sub ( tr, tmp1, tmp2, MPFR_RNDN );
         mpfr_add ( tr, tr, cr, MPFR_RNDN );
         mpfr_mul_2si ( tmp3, z1, 1, MPFR_RNDN );
         ...
         iter++;
    }
}

and the parallel version:

....
omp_set_dynamic(0);
for ( i = 0; i < 10; i++ ) {
    mpfr_set_d ( z2, 0.0, MPFR_RNDN );
    mpfr_set_d ( z1, 0.0, MPFR_RNDN );
    ...
    iter = 0;
    while ( iter < 10000 ) {
#pragma omp parallel num_threads(4)
    {
        switch ( omp_get_thread_num() ) {
        case 0:
        mpfr_sqr ( tmp1, z1, MPFR_RNDN );
        mpfr_sqr ( tmp2, z2, MPFR_RNDN );
        mpfr_sub ( tr, tmp1, tmp2, MPFR_RNDN );
        mpfr_add ( tr, tr, cr, MPFR_RNDN ); break;
        case 1:
        mpfr_mul_2si ( tmp3, z1, 1, MPFR_RNDN );
        mpfr_mul ( ti, tmp3, z2, MPFR_RNDN );
        mpfr_add ( ti, ti, ci, MPFR_RNDN ); break;
        ...
        mpfr_mul_2si ( tti, tti, 1, MPFR_RNDN ); break;
        }
    }
        mpfr_set ( z1, tr, MPFR_RNDN );
        mpfr_set ( z2, ti, MPFR_RNDN );
        mpfr_set ( d1, ttr, MPFR_RNDN );
        mpfr_set ( d2, tti, MPFR_RNDN );
        iter++;
    }
}

Running times in seconds system A: Sequential

320 Bits: 11
640 Bits: 16
960 Bits: 21
2560 Bits: 60
5000 Bits: 152

Running times in seconds system A: Parallel

320 Bits: 15
640 Bits: 16
960 Bits: 18
2560 Bits: 32
5000 Bits: 65

Running times in seconds system B: Sequential

320 Bits: 13
640 Bits: 18
960 Bits: 27
2560 Bits: 80
5000 Bits: 202

Running times in seconds system B: Parallel

320 Bits: 51
640 Bits: 54
960 Bits: 56
2560 Bits: 76
5000 Bits: 128

System A is Fedora 19 kernel 3.11.10-200.fc19.x86_64

Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

System B is Linux Centos 6.5 kernel 2.6.32-431.1.2.0.1.el6.x86_64

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

ltrace shows about same percentages for called functions/system calls. Both systems use latest gmp, mpfr and gcc versions. Why is system B so much worse (e.g. many times more OpenMP overhead) than system A? Has the Linux kernel got so much better in this regard? Any kernel parameters etc. I should look at? CPU hardware differences/limitations? Any other explanations? Do I have to install Fedora 19 on B to fix this?

Update: Thanks for the tip. It did change results for system B.

Running times in seconds system B: Parallel

320 Bits: 51 -> 23
640 Bits: 54 -> 26
960 Bits: 56 -> 29
2560 Bits: 76 -> 47
5000 Bits: 128 -> 99

B still is behind A but the gap has got a lot smaller.

Original Q&A

TechQA.

OpenMP overhead and linux kernel version

There are 0 answers

Related Questions in PARALLEL-PROCESSING

Related Questions in CENTOS

Related Questions in OPENMP

Related Questions in FEDORA

Related Questions in MPFR

Popular Questions

Trending Questions