A Hierarchical Dirichlet Language Model.
David J C MacKay and Linda C B Peto
We discuss a hierarchical probabilistic model whose
predictions are similar to those of the popular language
modelling procedure known as `smoothing'. A number of
interesting differences from smoothing
emerge. The insights gained from a probabilistic view of this
problem point towards new directions for language modelling.
The ideas of this paper are also applicable to other problems
such as the modelling of triphomes in speech, and DNA and
protein sequences in molecular biology.
The new algorithm is compared with smoothing
on a two million word corpus. The methods prove
to be about equally accurate, with the hierarchical model
using fewer computational resources.
postscript.
tree-saving postscript (two pages per page).
David MacKay's:
home page,
publications.