First draft: 20 July 2010
Minor revisions: 22 July 2010

# Chaocipher

## Cracking Exhibit 1

Introduction  Exhibit 1

This page describes how I went about cracking Exhibit 1 from Silent Years. If you would like to try your hand at cracking Chaocipher, you should do so before reading the rest of this page. Trying to crack a cipher yourself is a lot more fun and instructive than just reading about it.

 `       ALLGOODQ         IROROISO  ` `      ALLGOODQ         IROROISO   ` `     ALLGOODQ         IROROISO    ` (a) (b) (c) `    ALLGOODQ         IROROISO     ` `   ALLGOODQ         IROROISO      ` `  ALLGOODQ         IROROISO       ` (d) (e) (f) `       ALLGOODQ         IROROISO  ` `        ALLGOODQ         IROROISO ` `  ALLGOODQ         IROROISO       ` (g) (h) (i) Figure 1  Determining a partial ring configuration from repeated symbols.

### How to Crack Chaocipher with a Known Plain-text/Cipher-text Pair

The key to breaking Chaocipher is to use repetitions of plain-text or cipher-text symbols to learn something about the state of the rings. In the figure to the right and the description of each step below, is a demonstration of how this is done. The plain-text phrase is `ALLGOODQ` and the cipher-text phrase `IROROISO`.

1. Nothing is known about the rings yet (unknown entries are represented by dots in the figure). Note that in this example we start at the second symbol of the phrase rather than the first, because of the useful `LL` repetition in the plain text. A red font colour indicates which symbols are about to be added to the rings; black shows which symbols have already been added; and gray which symbols have not yet been added.
2. The state of the rings after having incorporated the `L`/`R` symbols. To fully understand the transition from (a) to (b), refer to Figure 1 in the Introduction. In (b) the ring permutation has already been applied, meaning that this figure corresponds to the rightmost set of rings in the Introduction.
3. We used the fact that we already knew where the third plain-text symbol, `L`, was on the inner ring to add the third cipher-text symbol, `O`, to the outer ring. Due to the permutation the `R` on the outer ring was moved all the way from the zenith to the nadir.
4. To incorporate the next letters (`G`/`R`) the cipher-text symbol was used since the plain-text symbol had not yet been observed in the plain-text phrase. This means that the `G` was added to its location on the inner ring, opposite `R` on the outer ring, before the permutation was applied.
5. The known `O` on the outer ring was used to add an `O` to the inner ring.
6. The just-added `O` on the inner ring was used to add an `I` to the outer ring. This leaves us stuck on the right-hand side of the phrase since neither the `D` of the plain-text nor the `S` of the cipher-text has been encountered before.
7. However, we can still incorporate the first symbols (`A`/`I`) since we now know where the `I` is on the outer ring. Here the process changes slightly because we have to move in reverse. To arrive at this figure, we had to backtrack all the way to step (a). (Note that in this state we would be ready to encipher the second symbol (`L` to `R`) as the correct plain-text symbol is in the same location as the correct cipher-text symbol.)
8. The cipher-text symbol `I` was used to incorporate the plain-text symbol `A`. To do this in reverse, we followed Figure 1 in the Introduction from right to left. Firstly, the rings were rotated so that the cipher-text symbol is at the zenith. Secondly, the rings were unpermuted. Finally, the plain-text symbol was added at its correct location—adjacent to the cipher-text symbol. Now we have incorporated all of the available symbols and have learned where 6 out of 52 symbols are located on the Chaocipher rings.

This is essentially the method I used to crack Exhibits 1. There are still some additional problems to address, namely how to deal with gaps in phrases of repeated symbols, and how to find a good starting location in the text for cracking the cipher.

#### Gaps in Sequences of Symbols

To continue with the example in Figure 1, the `D`/`S` symbols have to be incorporated. Since they were not observed in the first part of the phrase, the only option is to try out every possible ring configuration consistent with observing these symbols. Figure 1(i) shows that there are 19 locations where `D`/`S` could be placed. The other 7 locations already have at least a plain-text or cipher-text symbol present. For each of the 19 possible ring configurations, the adding of seen-before symbols can now continue. Some configurations will turn out to be inconsistent with new symbols. For example, when incorporating the final symbols in the example, `Q`/`O`, the configuration shown in Figure 2 is not allowed. According to this configuration `D` should encipher to `O`, but from the text we know that `Q` should encipher to `O`. Since there is an inconsistency this configuration can be discarded. Cracking the cipher reduces to repeatedly incorporating known symbols, exploring all possible configurations when unknown symbols are found, and discarding configurations that are inconsistent with the plain and cipher texts of the exhibit.

 ` ALLGOODQ         IROROISO        ` Figure 2  An inconsistent configuration

#### Finding a Good Starting Point

Since long sequences of repeated symbols are useful, it is a good idea to search through the text for the location with the longest sequence of repeated symbols. Doing such a search shows that the longest sequence

• with 0 gaps has length 6,
• with 1 gap has length 11,
• with 2 gaps has length 19,
• with 3 gaps has length 30, and
• with 4 gaps is the entire text.

This is very encouraging. By searching through fewer than 264 ≈ 457 000 ring configurations it is possible to find the configuration that matches the plain text to the cipher text—a task easily accomplished on a computer. It turns out that the task is even easier than that. Because inconsistent ring configurations are discarded as seen-before symbols are encountered, the maximum number of different consistent ring configurations to consider is 444. That is,

• 23 after the first gap,
• 444 after the second gap,
• only 1 after the third gap (all other configurations are discarded), and
• only 1 after the fourth gap.

Now, 444 is not a large number and this code could well have been cracked by a few careful and persistent people. In a later article, on how to crack Exhibit 4, I will show that even fewer possible configurations need to be considered to crack that cipher. I believe that, had the mechanism underlying Byrne's cipher been known 90 years ago, it would eventually have been rejected as too weak for use by the military.

The best starting point for cracking Exhibit 1 is at offset 7187, that is at the location highlighted in the text below.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

From here two symbols pairs from the phrase are added to the rings without traversing any gaps.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

First the `I`/`G` gap is traversed. This does not allow the addition of any additional symbols to the rings.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

Next the `T`/`H` gap is traversed. This allows the addition of one more symbol pair, `Y`/`S`, since `S` is already on the outer ring.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

Traversing the third gap (`W`/`O`) brings us a lot further. Note that symbols from both before and after the previous phrase can be added to the rings.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

Traversing the final gap (`M`/`U`) allows us to find the full state of the rings and crack the entire cipher.

`...RANCEOFTHESECOLONIESUANDSUCHISNOWTHENECESSITYWHICHCONSTRAINSTHEMTOALTERTHEIRFORM......SREXYRUWMBTXTHYVNGZLXELVTZDCQMVFLCBBYKBMESGHSOEPSKPKEWMEQWCOQNBURIIQBNQOGAAXPEIT...`

### How to Reverse Engineer a Key from a Starting Alphabet

 Figure 3  The initial configuration for Exhibit 1.

After having cracked Exhibit 1, the initial ring configuration that was used to encipher the plain-text can be found—see Figure 3. At first glance it might not look as if there is anything very special about this configuration, but there are in fact many consecutive symbols on both rings. On the outer (cipher) ring there are `QRST`, `XY`, `LM`, `ZAB`, and `JK`. On the inner (plain) ring there are `YZ`, `EFGH`, `OP`, and `TUV`. Since Chaocipher shuffles alphabets quite quickly, it seems likely that the configuration in Figure 3 is not far removed from a fully ordered initial alphabet (with `ABCD`...`Z` on both rings). The key would be the phrase that takes us from the fully ordered alphabet to the initial alphabet for enciphering or deciphering. Such a key would need to be known by both the party that enciphered the text and the party that wants to decipher it. Here, as a code breaker, I would like to discover what the key was. To guide this search for the key, I worked from the following assumptions.

• Byrne started from the fully ordered alphabet (`ABCD`...`Z`) on both rings, and applied a key to arrive at the initial state for enciphering.
• The key is not very long. This assumption is supported by the fact that there are so many consecutive symbols in Figure 3.
• The key is English or some recognisable phrase. In principle the key could be any random sequence of symbols. For any real use of Chaocipher we might expect the key to be something unrecognisable and unguessable. However, since all the exhibits were challenges set by Byrne to the cryptography community at large, we might expect the key to be something fun or clever.

The brute force approach would be to search through all possible keys of a particular length until one that takes us from the ordered alphabet to the alphabet in Figure 3 is found. This requires too much calculation—searching through 26L keys where L is the (unknown) length of the key. To make the search space smaller, we need some way of decreasing the size of the search space.

#### Ring Entropy

In my everyday work in the field of probabilistic inference, entropy is a measure of disorder—the lower the entropy, the more structure there is in whatever is being looked at. Define ring entropy as the total number of symbols on the rings that are not followed by the next symbol in the ordered alphabet. Here the following symbol is located clockwise from the reference symbol. For the configuration in Figure 3, the ring entropy is 37. This definition of entropy has some useful properties.

• The ordered alphabet has entropy 0, and this is the only alphabet with zero entropy.
• The maximum entropy of any configuration is 52.
• This measure of entropy is rotationally invariant.

The search for the key is now done backwards from the configuration in Figure 3. This configuration has entropy 37, and we want to find symbols that would decrease the entropy (i.e. make the rings more ordered) with each backwards step. Since the key is expected to be short, the rings should become more ordered quite quickly when we backtrack correctly. Since the entropy is bounded from below, this process has to converge or fail after at most 37 steps.

#### Results

Applying this search strategy reveals that the plain-text key for Exhibit 1 has length 10 and is `TILNOYHIVK`. Starting from the fully ordered configuration and enciphering this key leaves the rings in the configuration of Figure 3. This is not a very satisfying result since the key is clearly not an English phrase or name. It is also possible that the key was not meant to be enciphered but rather to be deciphered, in order to reach the initial ring configuration. Since we already know the plain-text key, the cipher-text key is easy to find. It is the output from enciphering the plain-text key, namely `THIKKTBDNB`. This is also not very satisfying. Fortunately, with the habit of writing plain-text phrases above their cipher-text counterparts,

`   TILNOYHIVK   THIKKTBDNB`

comes a revelation. After staring at this key for a while, the phrase `THINK THINK` pops out.

`   TILNOYHIVK   THIKKTBDNB`

It seems that Byrne used this 10-letter key and a pattern alternating between the inner and outer rings (that is, enciphering some symbols in the key and deciphering others) to arrive at the initial ring configuration for enciphering Exhibit 1.

### In Conclusion

Using a known plain-text plus cipher-text attack it was possible to crack the Chaocipher system. Cracking the code

• recovered the small missing parts (hidden phrases) in the plain-text,
• revealed the initial configuration of the Chaocipher alphabets used to encipher the plain-text, and
• revealed the key used to arrive at the initial configuration of the alphabets.

Cracking Exhibit 1 was fun and challenging to an amateur cryptographer armed with a laptop computer and about 48 hours. However, I do not believe Chaocipher would have held up to professional code breakers even in the 1920s and even without the aid of a computer. In Chaocipher's defense, this attack did require both the plain and cipher texts to be known. A cipher-text only attack would be much more difficult and is, to my knowledge, still an unsolved problem. Furthermore, even slight modifications to the Chaocipher system result in texts with different statistical distributions and make them, without knowing what the modifications were, difficult to analyse or break—Byrne's Exhibits 2 and 3 are cases in point. Future articles will address cracking Exhibit 4, what I've learned from trying to crack Exhibits 2 and 3, and some thoughts on cipher-text only attacks.