Software for Codebreaking of the Lorenz SZ42

On November 15/16 2007, the rebuild of Colossus at Bletchley Park was celebrated in a Cipher Event organised by Tony Sale, curator of the British National Museum of Computing, who also headed the Colossus rebuild project. Messages encrypted with a historic Lorenz SZ42 cipher machine were transmitted form the ham radio station at the Heinz Nixdorf Museum in Paderborn to receiving stations in in Great Britain. The rebuilt Colossus machine was to perform the code breaking tasks using the historic methods.

The announcement of the Cipher Event stated that "At the same time as the international team receives the enciphered messages, radio amateurs around the world will be able to receive the same radio broadcasts and try their hand at decrypting it. It will be fascinating to see who completes the job first!". As a radio amateur (callsign: DL2KCD) I was intrigued by this challenge. Hams have a culture of contesting. The outstanding work of the cryptographers at Bletchley Park was also important from nowadays point of view in Germany, as it helped to shorten the lifetime of the Nazi dictatorship. Therefore the Cipher Event deserves attention also from this country.

I became aware of the upcoming event in mid September 2007 and found that there would be enough time to prepare for it. I figured that standard teletype modems and software would not be of much use, because the cipher stream contains non-printable characters (control characters of the Baudot code) which might not be recorded properly, and also because the historical tone frequencies were going to be used. In addition, any loss of a character in the received ciphertexts means that the following text is no longer in sync with the steps of the SZ42 cipher algorithm. Recording a wrong character from time to time does not prevent the code breaking, but loosing characters does. I figured that I would need a special reception software to analyse the audio data, with a robust method of clock recovery. To perform the subsequent cryptanalysis, I decided to also write my own software. The information on the site that conmemorates the work of the great Alan Turing at www.alanturing.net provided excellent material on the SZ42 and its cryptographic weaknesses. After years of C programming and trying other languages (like C++, Java, Haskell, Python), I had started to learn Ada early in 2007. This powerful and beautiful language has become my favourite, and I decided to do the code development as a programming exercise in Ada.

During November 15, I could receive some of the Cipher Challenge messages and decrypt them after breaking the key (wheel settings). All sourcecode used is provided below. The Ada sources were compiled with GCC/GNAT. The PC used was a laptop with 1.4 GHz CPU, using NetBSD as the operating system. I used the antenna system and the radio transceiver of the club station DL0OV in Bonn to receive the transmissions.

Putting Colossus in a competition with modern computers may be a bit unfair. Colossus was an ingenious construction and a ladnmark in the history of computing. But technology has very much evolved since: When fed with a usable ciphertext, the quick_setting program provided below found the settings of all 12 wheels within 46 seconds. During the Cipher Event, I actually spent most of the time with the signal processing work (converting the noisy audio recording into the ciphertext strean of Baudot symbols) rather than with the crypto tasks.

The software was written for my own use and is therefore not well documented, and also the user interface is taylored to a machinist looking under the hood; the programs basically form a chain of command line tools. But others might find the algorithms interesting too.

Just grab all the Ada source files from the table at the bottom of this page and compile the programs with a command like

   for i in *.adb; do gnatmake -O3 $i; done

and make sure the resulting executables are in your search path. A source tarball is provided at the bottom of this page. (Originally, I used additional compiler flags for restrictive handling of errors and warnings, like -gnatwa -gnatwe -gnatv -Wuninitialized -Werror. However, since this treats warnings like errors and since GNAT evolves and learns to warn about more and more things, newer versions of GNAT will not compile the software with some of these additional options.)

Then, for each of the two transmissions provided below, put the three input files (A.mp3, process.sh, and key0.txt) in one directory. Then run the process.sh shell script. The complete toolchain is run until the plaintext is revealed. For downloading, you can use the tarballs provided at the bottom of this page.

To save space and network bandwidth, the audio data which were originally recorded as 16 bit signed linear with 8000 Hz sample rate are provided here as MP3 files. The scripts assume that you have sox installed to convert the data back to the original format.

Note that execution times may vary slightly because of the use of random numbers to modify the trial settings. Occasionally you may not get the correct result on the first run.

The plaintext contained two upshift characters before and two downshift characters after each space (represented as <<_>> in the plain.out files). This means that the same character was sent twice in sucession far more often than in a normal plaintext. This gives rise to pronounced statistical properties in the difference sequences that are considered during Chi setting when Motors and Psi are unknown. The plaintext was therefore rather codebreaker-friendly. The up and down shifts around the spaces are not even necessary, as a space is a space in letter shift and in figure shift.

Transmission of Nov. 15, 2007, 12:00 UTC
This is the first transmission received strong enough in Bonn for decoding.
I mailed my original result to Tony Sale at 13:14 UTC, and to HNF at 13:42 UTC on Nov. 15, 2007.
Input files:
1200UTC/A.mp3 MP3 of 12:00 UTC recorded audio. Ciphertext is transmitted using historic 6-tone signal. The ciphertext transmission starts at 03:08 min after the beginning of the recording.
1200UTC/process.sh Shell script taylored to 16:00 UTC transmission.
1200UTC/key0.txt Used to supply wheel patterns to quick_setting
Output files:
1200UTC/Ac.txt Ciphertext file with some diagnostic data (first two lines omitted on input to quick_setting)
1200UTC/setting.out Output of quick_setting, the actual counterpart to the work of Colossus
1200UTC/key.txt New key file with settings found by quick_setting
1200UTC/plain.out The plaintext output
1200UTC/Ac.trace A trace file generated with cryptrace

Transmission of Nov. 15, 2007, 16:00 UTC
This transmission was received much stronger, and as far as I understand it was also the first that was received in usable quality by the Colossus team in Great Britain.
I mailed my original result to Tony Sale and HNF at 16:40 UTC on Nov 15, 2007.
Input files:
1600UTC/A.mp3 MP3 of 16:00 UTC recorded audio. Ciphertext is transmitted with "modern" two-tone signal. Shift was reversed relative to plaintext header. The ciphertext transmission starts at 03:24 min after the beginning of the recording.
1600UTC/process.sh Shell script taylored to 16:00 UTC transmission.
1600UTC/key0.txt Used to supply wheel patterns to quick_setting
Output files:
1600UTC/Ac.txt Ciphertext file with some diagnostic data (first line omitted on input to quick_setting)
1600UTC/setting.out Output of quick_setting, the actual counterpart to the work of Colossus
1600UTC/key.txt New key file with settings found by quick_setting
1600UTC/plain.out The plaintext output
1600UTC/Ac.trace A trace file generated with cryptrace

Transmission of Nov. 15, 2007, 17:00 UTC
Again a strong signal. There was apparently some problem at DL0HNF and the ciphertext started with some delay. According to the transmission schedule, this was the first Challenge 3 (no wheel settings given) that I could receive. However, I always let the computer search for all 12 wheel settings, so this was no difference to me.
I mailed my original result to Tony Sale and HNF at 17:51 UTC on Nov 15, 2007.
Input files:
1700UTC/A.mp3 MP3 of 17:00 UTC recorded audio. Ciphertext is transmitted with "modern" two-tone signal. Shift was reversed relative to plaintext header. The ciphertext transmission starts at 12:56 min after the beginning of the recording.
1700UTC/process.sh Shell script taylored to 17:00 UTC transmission.
1700UTC/key0.txt Used to supply wheel patterns to quick_setting
Output files:
1700UTC/Ac.txt Ciphertext file with some diagnostic data (first line omitted on input to quick_setting)
1700UTC/setting.out Output of quick_setting, the actual counterpart to the work of Colossus
1700UTC/key.txt New key file with settings found by quick_setting
1700UTC/plain.out The plaintext output
1700UTC/Ac.trace A trace file generated with cryptrace

Source code:

Sound recording from PC soundcard
sndrec.c Record raw audio from soundcard. Works under NetBSD.
Dmodulating and decoding (not decrypting) of audio data into Baudot symbol stream
baudot.ads Package specification related to Baudot code.
baudot.adb Package body related to Baudot code.
demod.adb Demodulation of raw audio: Arguments are sample rate, a window width,  and a list of tone frequencies. Output is a list of ampitude values for each tone.
discrim.adb Combines tone amplitudes by performing a weighted sum
decode.adb Decodes demodulated signal according to Baudot code. The patterns of the Baudot symbols are convoluted with the signal. For each time step, the best matching symbol and its amplitude are listed.
extract_pll.adb Extract the symbols and construct the stream of Baudot symbols by evaluating the peaks in the amplitudes listed by the decoder. For solid clock recovery of noisy signals, a phase locked loop (PLL) is used.
extract_direct.adb Same as above, but without the PLL. Can be used for analysis if PLL locks on wrong symbol rate or phase.
fcut.R
ffid.R
top23.R
Some R scripts that help in the analysis of the digitised audio signals. In particular, the audio tones can be found through Fourier analysis. A utility function for cutting a section out of a large audio file is also provided.
SZ42 crypto and code breaking routines
sz42.ads Package specification for SZ42 crypto routines
sz42.adb Package body, implements the algorithms of the SZ42 machine
sz42-text_io.ads Package specification for textual I/O of SZ42 data
sz42-text_io.adb Package body for textual I/O of SZ42 data
score_charts.ads Package specification for handling of score charts (like top ten listings)
score_charts.adb Package body for the above
break_lib.ads Package specification for various cryptographic attacks on the SZ42
break_lib.adb Package body for the above. Core of the crypto attacks.
quick_setting.adb Progeam that uses the crypto libraries for wheel setting (all wheels). Breaks Chi setting by successive optimisation of random trial settings, then does a brute force search for the best Motor settings, and then breaks Psi settings with random trials again. It does this by calling the procedures provided by package Break_Lib above.
cryptor.adb Program that decrypts (or encrypts) text like the SZ42
cryptrace.adb Program that lists the inner state of the SZ42 for all steps of a decryption

Tarballs for download:

Ada sources for complete toolchain:
SZ42_src.tgz

Input files for the three example runs shown above (A.mp3, process.sh, and key0.txt for each):
1200UTC.tar
1600UTC.tar
1700UTC.tar

So what?

You may ask what the significance of this is. From a scientific point of view, nothing much. I merely implemented the solution to a problem that was already solved more than half a century ago. That a modern computer could beat Colossus did not come at a surprise to many.
I view the whole thing like winning a sports competition. The challenge was to intercept radio signals and break the WWII encryption used to encipher them. As in a biathlon, it was not sufficient to be good at just one dicipline: The signal processing was as important as the crypto work. The choice of tools was free. My chioce were a receiving station at a quiet location, a laptop with NetBSD, and Ada.

Nevertheless, I am pleased by the congratulations from Bletchley Park and the positive press response.

In Memory of Tony Sale

The outstanding person behind the Colossus rebuild project at Blatchley Park was of course Tony Sale. It was sad news to hear that he had deceased in August 2011. I had met Tony at Bletchley Park in January 2008 to receive a prize for my contribution to the Cipher Event, and still have the friendly reception he gave me in very good memory. I arrived at BP in the morning when apparently nobody was around except Tony and the guard at the park gate. When I finally found the correct entrance of the hut to the room with the rebuilt colossus, I greeted the man who showed up from behind the large Colossus rack with the words "You must be Tony Sale!", and he replied "And you must be Joachim". Shortly after, he was explaining all the details of Colossus to me, the mechanism that spins the punched paper tape around, the photo cells that optically scan the data of the tape, the data buses that Colossus has, the logic circuits built with electron tubes, the various counters that can be configured, the unit that detects when a probable solution has been found, and the typewriter that prints out the results. I don't know if Tony would have liked to be called a computer geek, but in a sense he certainly was one - he knew every bit about the internals of Colossus and the methods to break the SZ42 cipher, and also asked how my program solved the task. We were two computer enthusiasts from different generations exchanging ideas. This was a very remarkable meeting to me - and if you think about the history behind Colossus and why the original Colossus had been built in the first place, it is even more so. At that time, certainly no one would have thought a meeting like this to be ever possible.

Joachim Schüth (or Schueth if you have no umlauts)