I just found out that while this C code gives an ordered list of integers (as expected):
#include <stdio.h>
#include <unistd.h>
#include <omp.h>
int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num(); fflush(stdout);
    }
  }
}
With both gcc as well as icc, the following gives undesired behaviour:
#include <stdio.h>
#include <unistd.h>
#include <omp.h>
int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }
    usleep(100*omp_get_thread_num());
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);
    usleep(100*omp_get_thread_num());
#pragma omp ordered
    {
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }
  }
} 
What I'd love to see is:
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    WORK IS DONE
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9  
But with gcc is get:
    0             (tid=5)
    WORK IS DONE  (tid=5)
      0           (tid=5)
    1             (tid=2)
    WORK IS DONE  (tid=2)
      1           (tid=2)
    2             (tid=0)
    WORK IS DONE  (tid=0)
      2           (tid=0)
    3             (tid=6)
    WORK IS DONE  (tid=6)
      3           (tid=6)
    4             (tid=7)
    WORK IS DONE  (tid=7)
      4           (tid=7)
    5             (tid=3)
    WORK IS DONE  (tid=3)
      5           (tid=3)
    6             (tid=4)
    WORK IS DONE  (tid=4)
      6           (tid=4)
    7             (tid=1)
    WORK IS DONE  (tid=1)
      7           (tid=1)
    8             (tid=5)
    WORK IS DONE  (tid=5)
      8           (tid=5)
    9             (tid=2)
    WORK IS DONE  (tid=2)
      9           (tid=2)
(so everything get's ordered - even the parallelizable work part)
And with icc:
    1             (tid=0)
    2             (tid=5)
    3             (tid=1)
    4             (tid=2)
    WORK IS DONE  (tid=1)
    WORK IS DONE  (tid=3)
      3           (tid=1)
    6             (tid=4)
    7             (tid=7)
    8             (tid=1)
    WORK IS DONE  (tid=0)
    5             (tid=6)
    WORK IS DONE  (tid=2)
      1           (tid=0)
    9             (tid=0)
    WORK IS DONE  (tid=0)
    WORK IS DONE  (tid=5)
    WORK IS DONE  (tid=1)
      9           (tid=0)
      0           (tid=3)
      8           (tid=1)
    WORK IS DONE  (tid=4)
    WORK IS DONE  (tid=6)
      2           (tid=5)
    WORK IS DONE  (tid=7)
      6           (tid=4)
      5           (tid=6)
      4           (tid=2)
      7           (tid=7)
(so nothing get's ordered not even the ordered clauses)
Is using multiple ordered clauses within one ordered loop undefined behaviour or what is going on here? I couldn't find anything disallowing multiple clauses per loop in any of the OpenMP documentations I could find.
I know that in this trivial example I could just part the loops like
int main() {  
  for (int i=0; i<10; i++) {  
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }  
#pragma omp parallel for schedule(dynamic)  
  for (int i=0; i<10; i++) {  
    usleep(100*omp_get_thread_num());  
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);  
    usleep(100*omp_get_thread_num());  
  }  
  for (int i=0; i<10; i++) {  
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }          
}  
So I'm not looking for a workaround. I really want to understand what is going on here, so that I can handle the real situation without running into anything devastating/unexpected.
I really hope you can help me.
                        
According to OpenMP 4.0 API specifications you can't.