I run valgrind --tool=callgrind ./executable on the executable file generated by the following code:
#include <cstdlib>
#include <stdio.h>
using namespace std;
class XYZ{
public:
int Count() const {return count;}
void Count(int val){count = val;}
private:
int count;
};
int main() {
XYZ xyz;
xyz.Count(10000);
int sum = 0;
for(int i = 0; i < xyz.Count(); i++){
//My interest is to see how the compiler optimizes the xyz.Count() call
sum += i;
}
printf("Sum is %d\n", sum);
return 0;
}
I make a debug build with the following options: -fPIC -fno-strict-aliasing -fexceptions -g -std=c++14. The release build is with the following options: -fPIC -fno-strict-aliasing -fexceptions -g -O2 -std=c++14.
Running valgrind generates two dump files. When these files (one file for debug executable, the other for release executable) are viewed in KCachegrind, the debug build is understandable as shown below:
As expected, the function XYZ::Count() const is called 10001 times. However, the optimized release build is much harder to decipher and it is not clear how many times the function is called at all. I am aware that the function call might be inlined. But how does one figure out that it has infact been inlined? The callgraph for the release build is as shown below:
There seems to be no indication of function XYZ::Count() const at all from main().
My questions are:
(1)Without looking at the assembly language code generated by the debug/release builds, and by using using KCachegrind, how can one figure out how many times a particular function, (in this case XYZ::Count() const) is called? In the release build call graph above, the function is not even called once.
(2)Is there a way to understand the callgraph and other details provided by KCachegrind for release/optimized builds? I have already looked at the KCachegrind manual available at https://docs.kde.org/trunk5/en/kdesdk/kcachegrind/kcachegrind.pdf, but I was wondering if there are some useful hacks/rules of thumb that one should look for in release builds.


The output of valgrind is easy to understand: As valgrind+kcachegrind are telling you, this function was not called at all in the release build.
The question is, what do you mean by called? If a function is inlined, is it still "called"? Actually, the situation is more complex, as it seems at the first sight and your example isn't that trivial.
Was
Count()inlined in the release build? Sure, kind of. The code transformation during the optimization is often quite remarkable, like in your case - and the best way to judge, is to look into the resulting assembler (here for clang):You can see, that the
maindoesn't execute the for-loop at all, but just prints the result (49995000), which is calculated during the optimization because the number of iterations is known during the compile-time.So was
Count()inlined? Yes, somewhere during the first steps of optimization, but then the code became something completely different - there is no place whereCount()was inlined in the final assembler.So what happens, when we "hide" the number of iteration from compiler? E.g. pass it via the command line:
In the resulting assembler, we still don't encounter a for-loop, because the optimizer can figure out, that the call of
Count()doesn't have side-effect and optimizes the whole thing out:The optimizer came up with the formula
(n-1)*(n-2)/2for the sumi=0..n-1!Let's now hide the definition of
Count()in an separate translation unitclass.cpp, so the optimizer cannot see it's definition:Now we get our for-loop and a call to
Count()in every iteration, the most important part of the assembler is:The result of the
Count()(in%rax) is compared to the current counter (in%ebx) in every iteration step. Now, if we run it with valgrind we can see in the list of callees, thatXYZ::Count()was called10001times.However, for modern tool-chains it is not enough to see the assembler of the single translation units - there is a thing called
link-time-optimization. We can use it by building somewhere along these lines:And running the resulting executable with valgrind we once again see, that
Count()was not called!However looking into the machine code (here I used gcc, my clang-installation seems to have an issue with lto):
We can see, that the call to the function
Count()was inlined but - there is still a for-loop (I guess this is a gcc vs clang thing).But what is of most interest to you: the function
Count()is "called" only once - its value is saved to register%ecxand the loop is actually only:This all you could also see with help of Kcachegrid, if valgrind were run with option `--dump-instr=yes.