Analytic Highest Density Interval in Python (preferably for Beta distributions)

Question

Analytic Highest Density Interval in Python (preferably for Beta distributions)

809 views Asked by elzurdo At 16 June 2021 at 18:39

I was wondering if anybody knows of a reliable and fast analytic HDI calculation, preferably for beta functions.

The definition of the HDI is in this question called "Highest Posterior Density Region".

I am looking for a function that has the following I/O:

Input

credMass - credible interval mass (e.g, 0.95 for 95% credible interval)
a - shape parameter (e.g, number of HEADS coin tosses)
b - shape parameter (e.g, number of TAILS coin tosses)

output

ci_min - minimum bound of mass (value between 0. to ci_max)
ci_max - maximum bound of mass (value between ci_min to 1.)

One way to go about it is to speed up this script that I found in the same said question (adapted from R to Python) and taken from the book Doing bayesian data analysis by John K. Kruschke). I have used this solution, which is quite reliable, but it's a bit too slow for multiple calls. A speedup by 100X or even 10X would be very helpful!

from scipy.optimize import fmin
from scipy.stats import *

def HDIofICDF(dist_name, credMass=0.95, **args):
    # freeze distribution with given arguments
    distri = dist_name(**args)
    # initial guess for HDIlowTailPr
    incredMass =  1.0 - credMass

    def intervalWidth(lowTailPr):
        return distri.ppf(credMass + lowTailPr) - distri.ppf(lowTailPr)

    # find lowTailPr that minimizes intervalWidth
    HDIlowTailPr = fmin(intervalWidth, incredMass, ftol=1e-8, disp=False)[0]
    # return interval as array([low, high])
    return distri.ppf([HDIlowTailPr, credMass + HDIlowTailPr])

usage

print HDIofICDF(beta, credMass=0.95, a=5, b=4)

Word of caution! Some solutions confuse this HDI with the Equal Tail Interval solution (which the said question calls "Central Credible Region") which is far easier to compute, but does not answer the same question. (E.g, see Kruschke's Why HDI and not equal-tailed interval?)

Also this question does not concern the MCMC approaches that I have seen in PyMC3 (pymc3.stats.hpd(a), where a is a random variable sample), but rather regards an analytical solution.

Thank you!

Original Q&A

There are 1 answers

**aloctavodia** · Answer 1 · 2022-08-02T15:56:04+00:00

Not analytical, but just in case you still find this useful. Instead of evaluating the ppf (that can be very slow for many distributions) you can find an approximate solution by evaluating the pdf, something like this.

pdf = dist.pdf(x_vals)
pdf = pdf / pdf.sum()
idx = np.argsort(pdf)[::-1]
mass_cum = 0
indices = []
for i in idx:
    mass_cum += pdf[i]
    indices.append(i)
    if mass_cum >= mass:
        break
return x_vals[np.sort(indices)[[0, -1]]]

Where mass is a number between 0 and 1. And x_vals is a vector of equally spaced values in the support of the distribution. So for the beta x_vals = np.linspace(0, 1, size) should do the work. You can control the tradeoff of accuracy and speed by setting a proper value of size for your problem. How much faster this solution is compared with the other one (using the ppf and optimization) will depend on the details of the SciPy's ppf and pdf method (that will be different for different distributions). And could even depend on the parameters of the distribution.

TechQA.

Analytic Highest Density Interval in Python (preferably for Beta distributions)

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in SCIPY-OPTIMIZE

Related Questions in SCIPY.STATS

Related Questions in CREDIBLE-INTERVAL

Popular Questions

Trending Questions