Probabilities over trees: generalizations of the Dirichlet Diffusion
Tree and the Kingman Coalescent
by David MacKay and Tamara Broderick
Sat 29/9/07
What's new: Sun 7/10/07 a pdf file on this topic.
Background and goal
Kingman's coalescent (see [2]
for a review) is a way to define a probability
distribution over ordered binary trees with N labelled leaves.
(Ordered binary trees with N leaves
are binary trees in which each of the (N1)
internal nodes has an integer index in (1,2,...N1)
defining, if you like, the order of the times at which
the divergences in evolution happened.)
The
Dirichlet diffusion tree^{[1]}
is another way to define
a probability distribution over ordered binary trees. (If we discard
the DDT's observable variables and retain the normally hidden
ordered binary tree.)
Both these ways of defining distributions P_N over trees of size N
have the exchangeability consistency property:
if you draw a tree of size N from P_N and delete the
Nth node, you get an (N1)tree whose distribution
is P_{N1}.
The goal of this project is to
find all probability distributions over trees that have
this property.
The Kingman coalescent and DDT both define distributions with
no adjustable parameters at all. We believe that
there are other exchangeabilityconsistent distributions
on trees.
We believe that there is at least a onedimensional
family of such distributions.
Generalization
Here is a way of defining a distribution on trees,
starting from the DDT prescription.
Let the original DDT work like this:

When an individual walks
down a path that n
individuals have taken before,
the divergence probability density at time t in (0,1) is
ρ(t)/n.
(The details of ρ(t) won't matter for us.)

When an individual reaches a junction where (n1,n2) went each
way, he goes in the two directions with probabilities
proportional to
(n1,n2).

We generalize the DDT framework thus:

When an individual walks
down a path that n
individuals have taken before,
the divergence probability density at time t in (0,1) is
ρ(t)/f(n),
where f(n) is a function to be tweaked, satisfying
f(1) = 1.
Formerly, f(n) was n.

When an individual reaches a junction where (n1,n2) went each
way, he goes in the two directions with probabilities
proportional to
p(n1,n2), q(n1,n2)

where p and q are functions to be tweaked
subject to consistency with f(n) and subject to
p+q=1, p>0, q>0
Constraints
Symmetry forces many constraints on the functions f, p, and q, beyond the
obvious p(m,n)=q(n,m).
 Figure 1


Figure 1 illustrates some such constraints, derived by requiring
that the probabilities of making two identical trees of size 4 in different
orders should be equal.
Examples include:
 f(2) = 2 [no choice]
 p(m,m) = 1/2 [for all m]
 p(2,1) = 2/f(3)
 p(3,1)=f(3)/f(4)
 p(2,1) q(3,1) = q(2,1) /2
 p(3,1) q(4,1) = q(3,1) p(3,2)
 p(3,2) q(4,2) = q(3,2) /2
 p(4,1) = f(4) / f(5)
In fact rather than being so verbose, perhaps I can just state
all the constraints like this:
f(n) = f(n1) / p(n1,1)
p(m,n) q(m+1,n)
=
q(m,n) p(m,n+1)

These numerous constraints still leave us with some freedom.
How much freedom exactly? What's the dimensionality of the
space of probability distributions?
We haven't resolved the number of degrees of freedom, but we are pretty sure that it's 1.
f(3), f(4), f(5), f(6).... are not all independent.
For example, if f(3) is chosen then f(4) becomes determined;
and if f(5) is then chosen, f(6) becomes determined.
In fact we now think that f(5) must also be determined by f(3), though
we haven't a proof yet.
The conjectured oneparameter family of distributions
We think that the following functions p and f, parameterized
by a single hyperparameter a in (1,∞), define
a consistent distribution:
p(m,n)(a) = (m+a)/(m+n+2*a)
f(2)(a) = 2
f(n)(a) = f(n1)(a) / p(n1,1)(a)
for example...
f(3)(a) = 2*(3+2*a)/(2+a)
f(4)(a) = 4*(3+2*a)/(3+a)
f(5)(a) = 4*(5+2*a)*(3+2*a)/(4+a)/(3+a)

In the graphs below note the special case a=0
where f3=3, f4=4, and f5=5. This is the original DDT.
Negative values of a give
trees that are more unbalanced than random trees;
positive values of a favour more balanced trees.
Figure 2: Randomly generated
trees with N=50 leaves, from three DDT(α)
distributions 




We thank Richard Durbin, Zoubin Ghahramani, Carl Rasmussen, and
YeeWhye Teh for helpful discussions.
[1] Radford Neal
[3] Gibbs fragmentation trees
by Pitman et al looks relevant.
See also Pitman on arxiv and
Pitman home.
[4] The Standard Additive Coalescent (1997) by
David Aldous, Jim Pitman may also be relevant.
Appendix
Other notes:
 the number of plain binary trees with N labelled leaves
is (2N3)!!.

The number of ordered binary trees with N labelled leaves is given by the recursion
f(N+1) = f(N) * N(N+1)/2
f(2) = 1
i.e.,
f(N) = N!*(N1)! / 2**(N1).

The DDT(alpha=0) does not produce a uniform
distribution over ordered binary trees,
nor a uniform distribution over binary trees.
For example the probability of the balanced tree
on 4 nodes shown at the top left of the top figure on this page
is 3/11=6/22  not 6/18 or 3/15=6/30.
David MacKay
Last modified: Mon Oct 8 15:08:05 2007