It's very easy to generate normally distributed data with a desired mean and standard distribution:
IEnumerable<double> sample = MathNet.Numerics.Distributions.Normal.Samples(mean, sd).Take(n);
However with a sufficiently large value for n you will get values miles away from the mean. To put it into context I have a real world data set with mean = 15.93 and sd = 6.84. For this data set it is impossible to have a value over 30 or under 0, but I cannot see a way to add upper and lower bounds to the data that is generated.
I can remove data that falls outside of this range as below, but this results in the mean and SD for the generated sample differing significantly (in my opinion, probably not statistically) from the values I requested.
Normal.Samples(mean, sd).Where(x => x is >= 0 and <= 30).Take(n);
Is there any way to ensure that the values generated fall within a specified range without effecting the mean and SD of the generated data?
The following proposed solution relies on a specific formula for calculating the standard deviation relative to the bounds: the standard deviation has to be a third of the difference between the mean and the required minimum or maximum.
This first code block is the TruncatedNormalDistribution class, which encapsulates MathNet's Normal class. The main technique for making a truncated normal distribution is in the constructor. Note the resulting workaround that is required in the Sample method:
And here is a usage example. For a visual representation of the results, open each of the two output CSV files in a spreadsheet, such as Excel, and map its data to a line chart: