I wrote and tested the following program (i.e. timed it using strace -c command) and there was no meaningful difference between the execution times of the one that fully seeds the engine and the other one that doesn't.
Here:
#include <random>
#include <array>
#include <algorithm>
#include <functional>
#include <numeric>
#include <execution>
#include <iostream>
#define SEEDING_ENABLED 1
int main()
{
std::random_device rand_dev { };
#if SEEDING_ENABLED == 1
std::array<int, std::mt19937::state_size> seed_data;
std::ranges::generate( seed_data, std::ref( rand_dev ) );
std::seed_seq seq { std::cbegin( seed_data ), std::cend( seed_data ) };
std::mt19937 engine { seq };
#elif SEEDING_ENABLED == 0
std::mt19937 engine { rand_dev( ) };
#endif
std::uniform_int_distribution int_dist { 0, 100 };
std::array<int, 1000> random_numbers;
const auto result { std::transform_reduce( std::execution::par,
std::cbegin( random_numbers ), std::cend( random_numbers ),
0, std::plus { },
[ &engine, &int_dist ]( [[ maybe_unused ]] const auto value )
{ return int_dist( engine ); } ) };
std::cout << result << '\n';
}
Isn't std::mt19937 a huge object (sizeof(std::mt19937) == 5000 on libstdc++) and shouldn't initializing its internal bits take a measurable amount of time? Or am I doing it the wrong way?
Note: The transform-reduce part is there to prevent the compiler from optimizing the initialization part away.