pmr::vector is slower than std::vector

3.5k views Asked by At

I wrote a simple benchmark to see what is the gain of using pmr::vector in comparison with std::vector.

In benchmarks , pmr::vector uses n unsynchronized_pool_resource . As upstream a monotonic buffer is used as well. Have a look in benchmark code

#include <iostream>
#include <memory_resource>
#include <chrono>
#include <functional>

static void pmrVector(benchmark::State& state)
{
   constexpr size_t BUF_SIZE = 2048;
   std::pmr::pool_options options;
   options.max_blocks_per_chunk = 4;
   options.largest_required_pool_block = 64;
   
    alignas(8) std::array<char,BUF_SIZE> buffer; // a small buffer on the stack
  
    //std::cout <<options.largest_required_pool_block << std::endl;
   std::pmr::monotonic_buffer_resource pool{std::data(buffer), std::size(buffer)};
  std::pmr::unsynchronized_pool_resource mem (options,&pool);
  for (auto _ : state) 
  {
    
    std::pmr::vector<char> vec{ &mem };
    for(char i = 'a'; i < 'z';++i)
    {
            
            vec.emplace_back(i);
            benchmark::DoNotOptimize(vec);
            
    }

    
  }
}
static void stdVector(benchmark::State& state)
{
   
   for (auto _ : state) 
   {
      std::vector<char> vec{};
      for(char i = 'a'; i < 'z';++i)
      {
            
            vec.emplace_back(i);
            benchmark::DoNotOptimize(vec);
            
      }
   }
}
BENCHMARK(pmrVector);
BENCHMARK(stdVector);

The pmr::vector is 3x slower that std::vector. Comparing with monotonic buffer seems that the unsynchronized_pool_resource has huge penalty. Have a look in this benchmark benchmark monotonic buffer only

Performance comparison using unsynchronized_pool_resource

enter image description here

Have a look in performance comparison using only monotonic buffer enter image description here

1

There are 1 answers

2
alfC On

You have to be very careful of what you are timing really. I did some reasonable changes to your test and I can see an improvement, probably mostly coming from not needing to use the system-wide allocation. (Advantages of PMR go beyond this.)

  1. Test more elements
  2. Since you need to test more elements, don't use a stack buffer: use a single allocation and don't measure it.
  3. Reset the memory resource, otherwise you are making a bad use of the memory resource by overflowing the monotonic resource simply because the loop is tested repeatedly.

With all this changes I get to a more expected results. https://quick-bench.com/q/ylppu2cug3S25q1xGrRCGdTEjd4

bench

important part of the code:

static void pmrVector(benchmark::State& state)
{
  constexpr size_t BUF_SIZE = 1000000000;
  std::pmr::pool_options options;
  options.max_blocks_per_chunk = 40;
  options.largest_required_pool_block = 640;
    
  char* buffer = new char[BUF_SIZE];
  std::pmr::monotonic_buffer_resource pool{buffer, BUF_SIZE};
  std::pmr::unsynchronized_pool_resource mem (options,&pool);
  for (auto _ : state) 
  {
    {
      std::pmr::vector<char> vec{ &mem };
      for(int i = 0; i != 100000000; ++i)
      {
            
            vec.emplace_back('a');
            benchmark::DoNotOptimize(vec);
            
      }
    }
    mem.release();
    pool.release();
  }
  delete[] buffer;
}