Failures of the One-Step Learning Algorithm

David J C MacKay

The Generalized Boltzmann Machine (Hinton, 2001, personal communication) is a deterministic mapping from an observable space \bx to an energy function E(\bx;\bw), parameterized by parameters \bw. The energy defines a probability P(\bx|\bw) = \exp(E(\bx;\bw))/Z(\bw). A maximum likelihood learning algorithm for this density model is \Delta \bw = \left< \bg \right>_0 - \left< \bg \right>_{\infty} where \left< \bg \right>_0 is the average of the gradient \bg = \partial E/\partial \bw evaluated at points \bx drawn from the data density, and \left< \bg \right>_{\infty} is the average gradient for points \bx drawn from P(\bx|\bw). If T is a Markov chain in \bx-space that has P(\bx|\bw) as its unique invariant density then we can approximate \left< \bg \right>_{\infty} by taking the data points \bx and hitting each of them I times with T, where I is a large integer. In the one-step learning algorithm of Hinton (2001), we set I to 1. In this paper I give examples of models E(\bx;\bw) and Markov chains T for which the true likelihood is unimodal in the parameters, but the one-step algorithm does not necessarily converge to the maximum likelihood parameters. It is hoped that these negative examples will help pin down the conditions for the one-step algorithm to be a correctly convergent algorithm.

postscript (Cambridge UK).

postscript (Canada mirror).

pdf (Cambridge UK).

pdf (Canada mirror).

All postscript files are compressed with gzip - see this page for advice about gzip, if needed.


@misc{MacKayContrastiveDivergence2001,
title={Failures of the One-Step Learning Algorithm},
year={2001},
author={MacKay, David J. C.},
note={Unpublished Technical Report},
url={http://www.inference.org.uk/mackay/abstracts/gbm.html}
}


related publications.
David MacKay's: home page, publications. bibtex file.
Canadian mirrors: home page, publications. bibtex file.