Archive for April 27th, 2010

On the Beale Cipher, Part I

April 27th, 2010 3 comments

Ugh, so right now there is construction going on behind my house; an abandoned house got put down and I can only imagine Telecable extending it's dominions over.  The street is zoned to be residential, not commercial, but the owner seems to be politically well-positioned: he gets to do whatever he wants.  In Mexico, political power is concentrated amongst a few, and not necessarily the smartest (quite the contrary in fact, the majority seems pretty dumb); I have often wondered why it is not requisite to have those whose ambition is a political post (in the legislature, in the judiciary and executive branches) take an IQ or aptitude test (... and have the smartest be selected): I'd much rather have smart crooks than stupids.

At any rate, with all that pounding in the background, I thought I'd talk a little bit about my thoughts on the Beale cipher.  I started studying it a bit enthusiastic about the prospect of deciphering it completely (in the beginning of April actually), but I have become convinced this is not a task I can accomplish too easily and without any support -- in other words, it's difficult to do on my own.  Besides, I am terrible at crossword puzzles, a skill that seems necessary to be able to guess certain words.  I can't but remember my friend Seth K who could complete the NY Times Sunday crossword in like five minutes.  Anyway: my motivator was the challenge, or the mathematical aspect of it, not the 40 million in US dollars that it is purported to be about (the first cipher).  For some background and history, I earnestly recommend Simon Singh's book "The Code Book." It is extremely entertaining -- and it quickly has became one of my favorite books.  Of course there's also the Wikipedia article which is much more succinct.

When I first looked at it under the chapter of Singh's "Le Chiffre Indechiffrable," I suspected then as I do now that it is not entirely undecipherable.  The reasons for this:

*1. The Beale cipher is not a pad cipher, so it may be "breakable" without the specific requirement of the key.

*2.  The fact that certain numbers repeat themselves (some 8 times ("18"), others only once) would suggest that the Beale cipher is susceptible to a form of frequency analysis.

*3.  Certain sequences, for example, 64 following 18 in two different parts (first Beale cipher letters 124-125 and 187-188) may be important.

*4.  The distance between numbers may be significant.  It would seem the closer the difference the stronger the relationship between the letters; thus, the closer they are together the more information we could potentially squeeze out of them.  Sequence repetitions may also help a lot (64 following 18 twice, for example), and adds credence to number 1 and 2.

5.  The letter encoded by 1 has a different probability distribution of being a particular letter than do the others (see Wikipedia article).  The same goes for 71, which is the first letter of the first Beale cipher.  This may be true of other letters but it's not exactly clear which, because we don't know where a word ends and another begins.  We can surely use this to augment our frequency analysis.

6.  It cannot escape us then that if the Beale cipher is a book cipher, then each number in the code likely represents the first letter of a word from the key.  However, it's not that knowing the first-letter frequencies of the key is likely to be any help.  Although there does seem to be another bit of information of the key that would be extremely useful (and we can somewhat extract), which I will explain in my next post.

Later, I will describe how I have incorporated all these pieces of information into an analysis of frequencies, or an augmented form of the usual frequency analysis.  For now, suffice it to say that starred statements I realized fairly early; unstarred statements I noticed only after developing the analysis a bit.