tree

Probabilities over trees: generalizations of the Dirichlet Diffusion Tree and the Kingman Coalescent

by David MacKay and Tamara Broderick

Sat 29/9/07

What's new: Sun 7/10/07 a pdf file on this topic.

Background and goal

Kingman's coalescent (see [2] for a review) is a way to define a probability distribution over ordered binary trees with N labelled leaves. (Ordered binary trees with N leaves are binary trees in which each of the (N-1) internal nodes has an integer index in (1,2,...N-1) defining, if you like, the order of the times at which the divergences in evolution happened.)

The Dirichlet diffusion tree[1] is another way to define a probability distribution over ordered binary trees. (If we discard the DDT's observable variables and retain the normally hidden ordered binary tree.)

Both these ways of defining distributions P_N over trees of size N have the exchangeability consistency property: if you draw a tree of size N from P_N and delete the Nth node, you get an (N-1)-tree whose distribution is P_{N-1}.

The goal of this project is to find all probability distributions over trees that have this property. The Kingman coalescent and DDT both define distributions with no adjustable parameters at all. We believe that there are other exchangeability-consistent distributions on trees.

We believe that there is at least a one-dimensional family of such distributions.

Generalization

Here is a way of defining a distribution on trees, starting from the DDT prescription.

Let the original DDT work like this:

  1. When an individual walks down a path that n individuals have taken before, the divergence probability density at time t in (0,1) is
    ρ(t)/n.
    (The details of ρ(t) won't matter for us.)
  2. When an individual reaches a junction where (n1,n2) went each way, he goes in the two directions with probabilities proportional to
    (n1,n2).

We generalize the DDT framework thus:

  1. When an individual walks down a path that n individuals have taken before, the divergence probability density at time t in (0,1) is
    ρ(t)/f(n),
    where f(n) is a function to be tweaked, satisfying f(1) = 1. Formerly, f(n) was n.
  2. When an individual reaches a junction where (n1,n2) went each way, he goes in the two directions with probabilities proportional to
    p(n1,n2), q(n1,n2)
where p and q are functions to be tweaked subject to consistency with f(n) and subject to p+q=1, p>0, q>0

Constraints

Symmetry forces many constraints on the functions f, p, and q, beyond the obvious p(m,n)=q(n,m).
tree2
Figure 1
Figure 1 illustrates some such constraints, derived by requiring that the probabilities of making two identical trees of size 4 in different orders should be equal. Examples include:
  1. f(2) = 2 [no choice]
  2. p(m,m) = 1/2 [for all m]
  3. p(2,1) = 2/f(3)
  4. p(3,1)=f(3)/f(4)
  5. p(2,1) q(3,1) = q(2,1) /2
  6. p(3,1) q(4,1) = q(3,1) p(3,2)
  7. p(3,2) q(4,2) = q(3,2) /2
  8. p(4,1) = f(4) / f(5)

In fact rather than being so verbose, perhaps I can just state all the constraints like this:

f(n) = f(n-1) / p(n-1,1)
p(m,n) q(m+1,n) = q(m,n) p(m,n+1)

These numerous constraints still leave us with some freedom.

How much freedom exactly? What's the dimensionality of the space of probability distributions?

We haven't resolved the number of degrees of freedom, but we are pretty sure that it's 1. f(3), f(4), f(5), f(6).... are not all independent. For example, if f(3) is chosen then f(4) becomes determined; and if f(5) is then chosen, f(6) becomes determined. In fact we now think that f(5) must also be determined by f(3), though we haven't a proof yet.

The conjectured one-parameter family of distributions

We think that the following functions p and f, parameterized by a single hyperparameter a in (-1,∞), define a consistent distribution:

p(m,n)(a) = (m+a)/(m+n+2*a)
f(2)(a) = 2
f(n)(a) = f(n-1)(a) / p(n-1,1)(a)
for example...
f(3)(a) = 2*(3+2*a)/(2+a)
f(4)(a) = 4*(3+2*a)/(3+a)
f(5)(a) = 4*(5+2*a)*(3+2*a)/(4+a)/(3+a)
In the graphs below note the special case a=0 where f3=3, f4=4, and f5=5. This is the original DDT.
f
p

Negative values of a give trees that are more unbalanced than random trees; positive values of a favour more balanced trees.


Figure 2: Randomly generated trees with N=50 leaves, from three DDT(α) distributions
a=10.0
a=0.0
a=-0.5

We thank Richard Durbin, Zoubin Ghahramani, Carl Rasmussen, and Yee-Whye Teh for helpful discussions.
[1] Radford Neal
[3] Gibbs fragmentation trees by Pitman et al looks relevant. See also Pitman on arxiv and Pitman home.
[4] The Standard Additive Coalescent (1997) by David Aldous, Jim Pitman may also be relevant.

Appendix

Other notes:
  1. the number of plain binary trees with N labelled leaves is (2N-3)!!.
  2. The number of ordered binary trees with N labelled leaves is given by the recursion
    f(N+1) = f(N) * N(N+1)/2
    f(2) = 1
    i.e.,
    f(N) = N!*(N-1)! / 2**(N-1).
  3. The DDT(alpha=0) does not produce a uniform distribution over ordered binary trees, nor a uniform distribution over binary trees. For example the probability of the balanced tree on 4 nodes shown at the top left of the top figure on this page is 3/11=6/22 - not 6/18 or 3/15=6/30.

David MacKay
Last modified: Mon Oct 8 15:08:05 2007