# Introduction to Gaussian Processes

## David J C MacKay

Feedforward neural networks such as multilayer perceptrons are popular tools for nonlinear regression and classification problems. From a Bayesian perspective, a choice of a neural network model can be viewed as defining a prior probability distribution over non-linear functions, and the neural network's learning process can be interpreted in terms of the posterior probability distribution over the unknown function. (Some learning algorithms search for the function with maximum posterior probability and other Monte Carlo methods draw samples from this posterior probability). In the limit of large but otherwise standard networks, \citeasnoun{Radford_book} has shown that the prior distribution over non-linear functions implied by the Bayesian neural network falls in a class of probability distributions known as Gaussian processes. The hyperparameters of the neural network model determine the characteristic lengthscales of the Gaussian process. Neal's observation motivates the idea of discarding parameterized networks and working directly with Gaussian processes. Computations in which the parameters of the network are optimized are then replaced by simple matrix operations using the covariance matrix of the Gaussian process. In this chapter I will review work on this idea by \citeasnoun{williams_rasmussen:96}, \citeasnoun{Neal_gp}, \citeasnoun{williams:96} and \citeasnoun{Gibbs_MacKay97b}, and will assess whether, for supervised regression and classification tasks, the feedforward network has been superceded.

Known typos in this paper:

equation 25 should read:

C_{nn'} = ... + \sigma_{\nu}^2 \delta_{nn'}