The Generalized Boltzmann Machine (Hinton, 2001, personal communication) is a deterministic mapping from an observable space \bx to an energy function E(\bx;\bw), parameterized by parameters \bw. The energy defines a probability P(\bx|\bw) = \exp(E(\bx;\bw))/Z(\bw). A maximum likelihood learning algorithm for this density model is \Delta \bw = \left< \bg \right>_0 - \left< \bg \right>_{\infty} where \left< \bg \right>_0 is the average of the gradient \bg = \partial E/\partial \bw evaluated at points \bx drawn from the data density, and \left< \bg \right>_{\infty} is the average gradient for points \bx drawn from P(\bx|\bw). If T is a Markov chain in \bx-space that has P(\bx|\bw) as its unique invariant density then we can approximate \left< \bg \right>_{\infty} by taking the data points \bx and hitting each of them I times with T, where I is a large integer. In the one-step learning algorithm of Hinton (2001), we set I to 1. In this paper I give examples of models E(\bx;\bw) and Markov chains T for which the true likelihood is unimodal in the parameters, but the one-step algorithm does not necessarily converge to the maximum likelihood parameters. It is hoped that these negative examples will help pin down the conditions for the one-step algorithm to be a correctly convergent algorithm.
All postscript files are compressed with gzip - see this page for advice about gzip, if needed.
@misc{MacKayContrastiveDivergence2001, title={Failures of the One-Step Learning Algorithm}, year={2001}, author={MacKay, David J. C.}, note={Unpublished Technical Report}, url={http://www.inference.org.uk/mackay/abstracts/gbm.html} }