I am trying to understand how Eigen::Ref works to see if I can take some advantage of it in my code.
I have designed a benchmark like this
static void value(benchmark::State &state) {
for (auto _ : state) {
const Eigen::Matrix<double, Eigen::Dynamic, 1> vs =
Eigen::Matrix<double, 9, 1>::Random();
auto start = std::chrono::high_resolution_clock::now();
const Eigen::Vector3d v0 = vs.segment<3>(0);
const Eigen::Vector3d v1 = vs.segment<3>(3);
const Eigen::Vector3d v2 = vs.segment<3>(6);
const Eigen::Vector3d vt = v0 + v1 + v2;
const Eigen::Vector3d v = vt.transpose() * vt * vt + vt;
benchmark::DoNotOptimize(v);
auto end = std::chrono::high_resolution_clock::now();
auto elapsed_seconds =
std::chrono::duration_cast<std::chrono::duration<double>>(end - start);
state.SetIterationTime(elapsed_seconds.count());
}
}
I have two more tests like thise, one using const Eigen::Ref<const Eigen::Vector3D> and auto for the v0, v1, v2, vt.
The results of this benchmarks are
Benchmark Time CPU Iterations
--------------------------------------------------------------------
value/manual_time 23.4 ns 113 ns 29974946
ref/manual_time 23.0 ns 111 ns 29934053
with_auto/manual_time 23.6 ns 112 ns 29891056
As you can see, all the tests behave exactly the same. So I thought that maybe the compiler was doing its magic and decided to test with -O0. These are the results:
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
value/manual_time 2475 ns 3070 ns 291032
ref/manual_time 2482 ns 3077 ns 289258
with_auto/manual_time 2436 ns 3012 ns 263170
Again, the three cases behave the same.
If I understand correctly, the first case, using Eigen::Vector3d should be slower, as it has to keep the copies, perform the v0+v1+v2` operation and save it, and then perform another operation and save.
The auto case should be the fastest, as it should be skipping all the writings.
The ref case I think it should be as fast as auto. If I understand correctly, all my operations can be stored in a reference to a const Eigen::Vector3d, so the operations should be skipped, right?
Why are the results all the same? Am I misunderstanding something or is the benchmark just bad designed?
Because everything happens inside a function, the compiler can do escape analysis and optimize away the copies into the vectors.
To check this, I put the code in a function, to look at the assembler:
which turns into this assembler
Notice how the only memory operations are two GP register loads to get start pointer and length, then a couple of memory loads to get the vector content into registers, before we write the result to memory in the end.
This only works since we deal with fixed-sized vectors. With
VectorXdcopies would definitely take place.Alternative benchmarks
Ref is typically used on function calls. Why not try it with a function that cannot be inlined? Or come up with an example where escape analysis cannot work and the objects really have to be materialized. Something like this:
By splitting initialization and usage into non-inline functions, the copies really have to be done. Of course we now change the entire use case. You have to decide if this is still relevant to you.
Purpose of Ref
Ref exists for the sole purpose of providing a function interface that can accept both a full matrix/vector and a slice of one. Consider this:
This interface can only accept a full vector as its input. If you want to call
foo(vector.head(10))you have to allocate a new vector to hold the vector segment. Likewise it always returns a newly allocated vector which is wasteful if you want to call it asoutput.head(10) = foo(input). So we can instead writeand use it as
foo(output.head(10), input.head(10))without any copies being created. This is only ever useful across compilation units. If you have one cpp file declaring a function that is used in another, Ref allows this to happen. Within a cpp file, you can simply use a template.A template will always be faster because it can make use of the concrete data type. For example if you pass an entire vector, Eigen knows that the array is properly aligned. And in a full matrix it knows that there is no stride between columns. It doesn't know either of these with Ref. In this regard, Ref is just a fancy wrapper around
Eigen::Map<Type, Eigen::Unaligned, Eigen::OuterStride<>>.Likewise there are cases where Ref has to create temporary copies. The most common case is if the inner stride is not 1. This happens for example if you pass a row of a matrix (but not a column. Eigen is column-major by default) or the real part of a complex valued matrix. You will not even receive a warning for this, your code will simply run slower than expected.
The only reasons to use Ref inside a single cpp file are
Use with fixed-size types
Since your use case seems to involve fixed-sized vectors, let's consider this case in particular and look at the internals.
The following things will happen when you call foo:
in[6]to the stack. This takes 4 instructions (2 movapd, 2 movsd).The following happens when we call bar:
Ref<Vector>object. For this, we put a pointer toin.data() + 6on the stackNotice how there is barely any difference. Maybe Ref saves a few instructions but it also introduces one indirection more. Compared to everything else, this is hardly significant. It is certainly too little to measure.
We are also entering the realm of microoptimizations here. This can lead to situations like this, where just the arrangement of code results in measurably different optimizations. Eigen: Why is Map slower than Vector3d for this template expression?