I have several similar functions, say A, B, C. I want to choose one of them with command line options. Also, I'm calling that function billion times because of that instead of checking a variable inside a function billion times, I'm defining a function pointer Phi and set it to desired function just one time. But when I set, Phi = A, (so no user input considered) my code runs in ~24 secs, when I add an if-else and set Phi to desired function, my code runs in ~30 secs with exact same parameters. (Of course command line option sets Phi to A) What is the efficient way to handle this case?
My functions:
double funcA(double r)
{
return 0;
}
double funcB(double r)
{
return 1;
}
double funcC(double r)
{
return r;
}
void computationFunctionFast(Context *userInputs) {
double (*Phi)(double) = funcA;
/* computation codes */
}
void computationFunctionSlow(Context *userInputs) {
double (*Phi)(double);
switch (userInputs->funcEnum) {
case A:
Phi = funcA;
break;
case B:
Phi = funcB;
break;
case C:
Phi = funcC;
}
/* computation codes */
}
I've tried gcc, clang, icx with -O2 and -O3 optimizations. (gcc has no performance difference in mentioned cases but has the worst performance) Although I'm using C, I've tried std::function too. I've tried defining Phi function in different scopes etc.
Generally, there are a few things here that are slightly bad for performance:
Here's an example based on your code:
clang 15.0.0 x86_64 -O3 gives:
Even though the numbers I picked are adjacent, the usual compilers fail to optimize out the comparison
cmp. Even when I include adefault: return 0;it is still there. You can quite easily manually optimize anyswitchwith contiguous indices like this into a function pointer jump table:clang 15.0.0 x86_64 -O3 gives:
This leads to slightly better code here as the comparison instruction/branch is now removed. However, this is really a micro optimization that shouldn't have that much impact of performance. You have to benchmark it for sure to see if there's any improvement.
(Also gcc 12.2 didn't optimize this code as good, why I went with clang for this example.)
Godbolt link: https://godbolt.org/z/ja4zerj7o