Further ideas for other solutions
Position code plus `bits back'
Here's a fun idea.
To encode a file of length N=10,000, of which roughly 100 of the bits are 1s,
we could encode the position of each of the 1s in the file.
Since each position can be represented by a 14bit integer, the compressed
file length will be roughly 100x14 = 1400 bits.
Now, that's some way off from the expected Shannon information content,
800 bits or so.
Why?
Well, an encoding of all the positions has redundancy in it
in the sense that the encoder is free to choose the order of
the encoded bitpositions.
This freedom means that the encoder has the opportunity
to encode not only the 100 bitpositions but also
an arbitrary choice of one from the 100! (one hundred factorial)
possible permutations.
In order to make that choice, the encoder could
subcontract to his friend Fred, who also wants to
communicate over this channel, the decision about the choice of permutation.
Receiving the permuted string of bits, the receiver can then
deduce not only the sparse file but also Fred's choice.
And Fred can use that choice to convey another file of
size log_{2}(100!) bits, which is
very roughly 100 (log_{2} (100/e) ), or 520 bits.
So the net cost of communicating this way is
(total transmitted length in bits)  (length in bits of Fred's hidden transmission)
~= 1400  520 ~= 880 bits.
Which is (pretty near) the expected Shannon information content!
This idea is called bitsback coding.
We encode in an arbitrary way, that requires some extra bits
to resolve the arbitrariness; then claim back the cost of those extra bits
by selling them to Fred.
Now we get to the really fun bit:
can you write a compression method such that the encoder himself
plays the role of Fred?
 i.e., the encoder chooses the permutation of the
bitpositions in a way that conveys some of the bitpositions!
Related concepts

How should Fred turn his 600bit file into a permutation of the 100 bits?
We need an efficient and practical program. And a decoder that
turns a permutation back into bits.

A nearby concept is this: imagine that Joe wants to
communicate an unordered selection of r objects from N objects.
How should he encode this selection in bits?
