The data processing theorem

The data processing theorem states that data processing destroys information. Prove this by considering an ensemble WDR in which w is the state of the world, d is data gathered, and r is the processed data, so that these three variables form a chain 
that is, the probability P(w,d,r) can be written as
Show that the information that R conveys about W, H(W;R), is less than or equal to the information that D conveys about W, H(W;D). Incidentally, this theorem is as much a caution about our definition of `information' as it is a caution about data processing!

David J.C. MacKay
Sat May 10 23:05:10 BST 1997