#
Failures of the One-Step Learning Algorithm

##
David J C MacKay

The Generalized Boltzmann Machine (Hinton, 2001, personal communication)
is a deterministic mapping
from an observable space \bx to an energy function E(\bx;\bw),
parameterized by parameters \bw. The energy defines a
probability P(\bx|\bw) = \exp(E(\bx;\bw))/Z(\bw).
A maximum likelihood learning algorithm for this density model
is \Delta \bw = \left< \bg \right>_0 - \left< \bg \right>_{\infty}
where \left< \bg \right>_0
is the average of the gradient \bg = \partial E/\partial \bw
evaluated at points
\bx drawn from the data density, and
\left< \bg \right>_{\infty} is the average
gradient for points \bx drawn from P(\bx|\bw).
If T is a Markov chain in \bx-space that has P(\bx|\bw) as its
unique invariant density then we can approximate \left< \bg \right>_{\infty}
by taking the data points \bx and hitting each of them I times with
T, where I is a large integer.
In the one-step learning algorithm of Hinton (2001), we set I to 1.
In this paper I give examples of models E(\bx;\bw) and Markov
chains T for which the true likelihood is unimodal in the
parameters, but the one-step algorithm does not
necessarily converge to the maximum likelihood parameters.
It is hoped that these negative examples will help pin down
the conditions for the one-step algorithm to
be a correctly convergent algorithm.

postscript (Cambridge UK).

postscript (Canada mirror).

pdf (Cambridge UK).

pdf (Canada mirror).

All postscript files are compressed with gzip -
see this page for advice about gzip, if needed.

`
@misc{MacKayContrastiveDivergence2001,
title={Failures of the One-Step Learning Algorithm},
year={2001},
author={MacKay, David J. C.},
note={Unpublished Technical Report},
url={http://www.inference.org.uk/mackay/abstracts/gbm.html}
}
`

related publications.

David MacKay's:
home page,
publications.
bibtex file.

Canadian mirrors:
home page,
publications.
bibtex file.