Comparison of Approximate Methods for Handling
(also known as: Hyperparameters: optimize, or integrate out?)
David J C MacKay
I examine two approximate methods for computational implementation of
Bayesian hierarchical models, that is, models which include unknown
hyperparameters such as regularization constants. In the `evidence
framework' the model parameters are integrated over, and the
resulting evidence is maximized over the hyperparameters. The
optimized hyperparameters are used to define a Gaussian approximation
to the posterior distribution. In the alternative `\MAP' method, the
true posterior probability is found by integrating over the
hyperparameters. The true posterior is then maximized over the
model parameters, and a Gaussian approximation is made. The
similarities of the two approaches, and their relative merits, are
discussed, and comparisons are made with the ideal hierarchical
In moderately ill-posed problems, integration over hyperparameters
yields a probability distribution with a skew peak which causes
significant biases to arise in the MAP method. In contrast, the
evidence framework is shown to introduce negligible predictive error,
under straightforward conditions.
General lessons are drawn concerning the distinctive properties of
inference in many dimensions.
Published in Neural Computation.
postscript (100K). |
AUTHOR ="D. J. C. MacKay",
TITLE ="Comparison of Approximate Methods for Handling Hyperparameters",
ANNOTE ="Date submitted: 1994/1996/1997; Date accepted: October 1998; Collaborating institutes: none"