Compile-time performance of constexpr computation

1.1k views Asked by At

I have some non-trivial C++17 functions marked as constexpr. They are doing graph-related computation (depth-first traversal) and general algorithms (e.g., find, sort, unique...).

If I try to force evaluation at compile-time by putting the result in a constexpr global variable, 3 things can happen:

  1. For small-size computation (to give an idea, lets say graphs of ~100 nodes, nodes being more or less integers) the compilation is fine (takes ~2s)
  2. With ~500 nodes, the compilation takes ~1min and takes 30GB of memory (!).
  3. With ~1000 nodes, the compilation requires too much memory for me to let it finish.

If I remove the constexpr qualifiers and ask for a run-time computation, compilation and execution are very fast (less than 5s)

I used g++ 8.2 with -O3 -std=c++17.

Why is it taking so long ? Is g++ known for compile-time optimization problems of constexpr ? What performance should I expect from constexpr functions during compilation? From what I understand, the compiler turns itself into an interpreter for constexpr computations. But I have absolutely no doubt that evaluating the same program in Python would be very fast, given the very small size of the data.

Edit: Such problems are mentionned here (Blog of a GCC developer)

1

There are 1 answers

0
Yakk - Adam Nevraumont On

g++ memorizes compile time structures. What more, compile time structures may be created and inspected along both the branch you want to take, and the one you do not, unless you are careful.

Exponential blowup is quite possible, and may be what you are seeing.

There are strategies to reduce compile time complexity. Avoid deep recursion. Pay attention to accumulated symbol length. Ensure that only the branches you want to take have to be examined.

I mean, examine a really simple:

std::conditional_t< (A<B), make_type_A<Ts...>, make_type_B<Ts...> >

The writer of this code probably intended to make only one type, but this code requires that both types be created.

This is unlikely to be your problem, but a similar problem could occur when running constexpr code.

For each call, work out the size of the state required. Add up the total state needed. Throw in a 10x overhead.

You can also analyze what the O-notation of your problem is by having more samples than 2 completed. Check 100, 200, 300, 400, 500 size graphs. Try linear graphs, trivial graphs, complete graphs, random graphs with constant or percentage connectivity.

The O-notation of the growth in compile time might help you narrow down where the problem is. If it is linear, polynomial or exponential you are going to be looking at different kinds of problems.

Linear with a sharp inflection means you are hitting a resource bottleneck. Maybe memory. Start graphing other resource usage and see if you can find the bottleneck.

Exponential looks a lot like linear with a cliff if you don't log-graph it and zoom in on the "cliff". There may be a narrow portion where the exponential part leaves the constant factor behind.

Polynomial gets interesting. The order of the polynomial (log graph can help find that) can tell what kind of operation is screwing you over. Much like knowing your conventional algorithm is O(n^3) means you are looking for a triple-loop. An O(n^3) compile time means you are somehow instantiating the equivalent of a triple-loop.