Extended notes from week 34

Some of the graphics used are on a separate page, for somewhat silly technical reasons (not compatible with Internet Explorer).

Here is the Lisp session from the lecture, somewhat cleaned up.

Breaking the Caesar cipher

Brute force

Our first task was to break the ciphertext "KYZJTZGYVIZJIVRCCPKIZMZRCKFSIVRB", assuming a Caesar cipher was used. The cipertext has been stored in the variable *caesar-sample-ciphertext*. The cipher is broken by brute force, simply trying all 26 possible keys (labeled “offset” below):

shift> *caesar-sample-ciphertext*
"KYZJTZGYVIZJIVRCCPKIZMZRCKFSIVRB"
shift> (break-caesar *caesar-sample-ciphertext*)
offset  0: "kyzjtzgyvizjivrccpkizmzrckfsivrb"
offset  1: "jxyisyfxuhyihuqbbojhylyqbjerhuqa"
offset  2: "iwxhrxewtgxhgtpaanigxkxpaidqgtpz"
offset  3: "hvwgqwdvsfwgfsozzmhfwjwozhcpfsoy"
offset  4: "guvfpvcurevfernyylgevivnygboernx"
offset  5: "ftueoubtqduedqmxxkfduhumxfandqmw"
offset  6: "estdntaspctdcplwwjectgtlwezmcplv"
offset  7: "drscmszrobscbokvvidbsfskvdylboku"
offset  8: "cqrblryqnarbanjuuhcarerjucxkanjt"
offset  9: "bpqakqxpmzqazmittgbzqdqitbwjzmis"
offset 10: "aopzjpwolypzylhssfaypcphsaviylhr"
offset 11: "znoyiovnkxoyxkgrrezxobogrzuhxkgq"
offset 12: "ymnxhnumjwnxwjfqqdywnanfqytgwjfp"
offset 13: "xlmwgmtlivmwvieppcxvmzmepxsfvieo"
offset 14: "wklvflskhulvuhdoobwulyldowreuhdn"
offset 15: "vjkuekrjgtkutgcnnavtkxkcnvqdtgcm"
offset 16: "uijtdjqifsjtsfbmmzusjwjbmupcsfbl"
offset 17: "thiscipherisreallytrivialtobreak"
offset 18: "sghrbhogdqhrqdzkkxsqhuhzksnaqdzj"
offset 19: "rfgqagnfcpgqpcyjjwrpgtgyjrmzpcyi"
offset 20: "qefpzfmebofpobxiivqofsfxiqlyobxh"
offset 21: "pdeoyeldaneonawhhupnerewhpkxnawg"
offset 22: "ocdnxdkczmdnmzvggtomdqdvgojwmzvf"
offset 23: "nbcmwcjbylcmlyuffsnlcpcufnivlyue"
offset 24: "mablvbiaxkblkxteermkbobtemhukxtd"
offset 25: "lzakuahzwjakjwsddqljanasdlgtjwsc"
nil

Clearly, the correct key is 17: “this cipher is really trivial to break” with spaces inserted.

Using statistics

Our next task is to break it again, this time pretending that we know no English. We start by taking a letter count.

shift> (count-letters *caesar-sample-ciphertext*)
#(0 1 3 0 0 1 1 0 4 2 3 0 1 0 0 1 0 3 1 1 0 3 0 0 2 5)

There were 0 As, 1 B, 3 Cs, and so forth. We compare this with the standard letter frequency vector.

shift> (break-caesar-by-statistics *caesar-sample-ciphertext*)
#(0 1 3 0 0 1 1 0 4 2 3 0 1 0 0 1 0 3 1 1 0 3 0 0 2 5)
(901 1093 1346 1000 1407 1223 1612 1164 1383 971 1280 1132 956 1237 1328
 1177 1006 2033 1084 910 1182 1871 967 1103 1427 1239)
((17 2033) (21 1871) (6 1612) (24 1427) (4 1407) (8 1383) (2 1346)
 (14 1328) (10 1280) (25 1239) (13 1237) (5 1223) (20 1182) (15 1177)
 (7 1164) (11 1132) (23 1103) (1 1093) (18 1084) (16 1006) (3 1000)
 (9 971) (22 967) (12 956) (19 910) (0 901))

The function call above returns three values: First, the letter count vector once more. Second, the result of taking the scalar product of the letter count vector with the standard frequency vector (times 1000), rotated 0, 1, 2, …, 25 places. Third is a list of pairs (o p) where p is the scalar product from position o in the previous list. Clearly, offset o=17 seems like the best candidate, and here is what we get when we decrypt with key 17:

shift> (caesar *caesar-sample-ciphertext* 17 t)
"thiscipherisreallytrivialtobreak"

Breaking the Vigenère

The ciphertext to be broken is “ELWLQ UHOMW JPGUP GLGNK GJRFH HEWND UXOFO VCCJL ELZJR TECHW JPGBR TPGII VSSWR TLZMH CLHNK KDSUU NJVIX TEVYW QHBQD UDHCO NLGFH GAPOW VSSLR ALZUL TDHUW KZBNR VSSHR TEVQD UMIMW NTBAZ KEVUF VTJCW AZINR PEVYI KPZXJ TZIHG ECSQV YPFYW CVWHJ VSSCU UEONL QYGUV JLBXO KYUYT WTDGH PEFIO NPRCQ VZDIV KEWIQ YTBWK OZHIU UNZUW VPFYG VZZCI GQFIP VSSNR RZTNK GXCIU KYUGD UEOFL ISHVO KYYYG VZGCJ PLZQL POGNU GYUNK CYRXL TPQNL QYSPH TPHNV VFRCH FTHNK QFUBW HFZFB VSSWR OXOHG GCCZW JPGND VTCHF CAHUL PWOQU GYQYE CESMV JPZVB OTQBD GWGIQ YLGNK GTFUY QHSXH PPASK GOGOU GWMVH YLHWK KYUNK GTFUS RCCUF JHOCW KYUNR TPDLL OLBXW JPAZR TLBSI NLKML PEVYH XZZOW KZBNK KDKUV CYOLJ WXSHW HZFVH KYUWD WEWIX ULBXX UTBAK KDAIV VPLJH TTSHF GOQLH YMINR PNSSR WDHUU VCIHQ KYUZU QXRUQ IPFBH VZZXK KXGYO HJCOQ GGSLV VZDUQ FQCLW WYSZD XZFMW JPPIO F” (grouping ciphertext as shown, with five letters per group, is a common practice). We have this stored in a variable named *vigenere-sample-ciphertext*.

shift> *vigenere-sample-ciphertext*
"ELWLQUHOMWJPGUPGLGNKGJRFHHEWNDUXOFOVCCJLELZJRTECHWJPGBRTPGIIVSSWRTLZMHCLHNKKDSUUNJVIXTEVYWQHBQDUDHCONLGFHGAPOWVSSLRALZULTDHUWKZBNRVSSHRTEVQDUMIMWNTBAZKEVUFVTJCWAZINRPEVYIKPZXJTZIHGECSQVYPFYWCVWHJVSSCUUEONLQYGUVJLBXOKYUYTWTDGHPEFIONPRCQVZDIVKEWIQYTBWKOZHIUUNZUWVPFYGVZZCIGQFIPVSSNRRZTNKGXCIUKYUGDUEOFLISHVOKYYYGVZGCJPLZQLPOGNUGYUNKCYRXLTPQNLQYSPHTPHNVVFRCHFTHNKQFUBWHFZFBVSSWROXOHGGCCZWJPGNDVTCHFCAHULPWOQUGYQYECESMVJPZVBOTQBDGWGIQYLGNKGTFUYQHSXHPPASKGOGOUGWMVHYLHWKKYUNKGTFUSRCCUFJHOCWKYUNRTPDLLOLBXWJPAZRTLBSINLKMLPEVYHXZZOWKZBNKKDKUVCYOLJWXSHWHZFVHKYUWDWEWIXULBXXUTBAKKDAIVVPLJHTTSHFGOQLHYMINRPNSSRWDHUUVCIHQKYUZUQXRUQIPFBHVZZXKKXGYOHJCOQGGSLVVZDUQFQCLWWYSZDXZFMWJPPIOF"

Finding the key length

shift> (break-vigenere *vigenere-sample-ciphertext*)
((15 57) (1 38) (10 36) (3 33) (5 28) (2 23) (6 23) (4 22) (7 20) (13 20)
 (16 19) (11 18) (14 16) (9 15) (8 14) (12 14))

The output is a sorted list of pairs (d c) where c is the number of coinciding letter pairs d places apart in the cipertext. We conclude that the key is likely five letters long, since d=5, 10 and 15 all yield high coincidence counts. On might think at first that d=3 is a good candidate, but the low coincidence counts for d=9 and 12 make that less likely.

Finding the key

shift> (break-vigenere *vigenere-sample-ciphertext* 5)
(((2 8906) (6 6969) (17 6952) (13 5795) (15 5703) (3 5661) (18 5409)
  (21 5254) (19 5249) (16 5241) (7 5160) (14 5137) (1 5010) (22 4912)
  (24 4835) (8 4824) (10 4789) (5 4746) (9 4615) (12 4427) (4 4393)
  (0 4261) (11 4260) (23 4257) (20 4218) (25 4152))
 ((11 9250) (7 6626) (15 5786) (22 5707) (0 5702) (24 5537) (23 5463)
  (4 5401) (12 5307) (18 5260) (5 5215) (19 5164) (21 5112) (20 5077)
  (17 5047) (6 4906) (1 4862) (25 4809) (10 4791) (16 4648) (8 4633)
  (3 4231) (2 4156) (13 3943) (14 3887) (9 3614))
 ((14 8295) (25 6358) (1 6068) (18 5938) (3 5836) (13 5810) (10 5453)
  (2 5376) (21 5365) (7 5250) (24 5197) (6 5106) (15 5020) (20 5016)
  (12 4867) (17 4794) (0 4763) (8 4663) (4 4615) (23 4592) (16 4559)
  (11 4523) (5 4344) (19 4211) (22 4178) (9 3937))
 ((20 8438) (9 6672) (16 5994) (5 5923) (7 5522) (21 5495) (6 5433)
  (1 5398) (13 5314) (8 5178) (4 5119) (3 4998) (0 4969) (24 4920)
  (19 4887) (2 4883) (25 4755) (17 4686) (14 4677) (15 4672) (10 4644)
  (12 4610) (22 4451) (18 4299) (23 4115) (11 4082))
 ((3 8874) (18 6471) (14 5871) (7 5795) (17 5751) (16 5609) (4 5453)
  (2 5405) (22 5390) (10 5374) (25 5115) (6 5077) (9 5061) (19 5053)
  (15 5006) (13 4972) (23 4890) (8 4605) (24 4513) (0 4506) (5 4436)
  (20 4308) (12 4211) (21 4198) (1 4127) (11 4063)))
(2 11 14 20 3)
"CLOUD"

Again, three items are output. The first is a list of lists of pairs. Each list of pairs is just like the list of pairs we get when breaking the caesar cipher above using scalar products. The first list is related to positions 0,5,10,… of the ciphertext, the next, positions 1,6,11,…, then 2,7,12,…, and 3,8,13,…, and finally 4,9,14,…. In each case the first element of the list is the most likely offset. These are collected in the list (2 11 14 20 3). Finally, this list is converted to the text “CLOUD”. This turns out to be the key. Here is the decryption:

shift> (vigenere *vigenere-sample-ciphertext* "cloud" t)
"cairnswasthesameastheydleftitasmalltropicalportontheshoresofthecoralseaatthisearlyhourthetownwasstillasleepbuttheroyalairstationtothenorthwasbustlingwithactivityoutonthefieldgroundcrewsweretakingtheirstationsashandlingequipmentrolledintopositionwinchmotorsclatteredtolifefromthetopofthemooringmastalightblinkedtosignalwindstrengthanddirectioneverettstudieditthoughtfullythecommanderofthestationcaptainlawrencebatesshelbymichaelsonwastheiravowedenemyhedsurelybewatchingtheirapproachwaitingtoreprimandthemforanyflawsintheevolutionthiswasanargumentforbeingcautiousandusinghismostexperiencedcrewbutonceyoustartrunningfromdangerhetoldhimselfyouneverstopandfortunefavorsthebold"
shift>

The plaintext was copied from episode 72 of this series.

Lisp code

I don't expect you to learn Lisp, or to be able to read and understand the following code. But here it is, anyway, in case you're curious: The lisp code I used for this week's lectures.

(defpackage :shift (:use :cl))
(in-package :shift)
 
(defun char-number (char)
  (let ((number (- (char-code (char-downcase char)) (char-code #\a))))
    (if (<= 0 number 25)
	number
	nil)))
 
(defun number-char (number &optional lower)
  (code-char (+ (char-code (if lower #\a #\A)) (mod number 26))))
 
(defvar *letter-frequencies*
  #(82 15 28 43 127 22 20 61 70 2 8 40 24 67 75 19 1 60 63 91 28 10 23 1 20 1))
 
(defun numbered-list (list &optional with-chars lower)
  (loop for i upfrom 0 for item in list collect
       (list (if with-chars (number-char i lower) i) item)))
 
(defun sort-descending (list &optional key)
  (if key (sort list #'> :key key) (sort list #'>)))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Caesar
(defparameter *caesar-sample-ciphertext* "KYZJTZGYVIZJIVRCCPKIZMZRCKFSIVRB")
 
(defun caesar (string &optional (offset 3) decrypt)
  (with-output-to-string (out)
    (loop for char across string
	 for number = (char-number char)
       when number do
	 (write-char (if decrypt
			 (number-char (- number offset) decrypt)
			 (number-char (+ number offset) decrypt))
		     out))))
 
(defun break-caesar (ciphertext)
  (loop for offset below 26 do
       (format t "~&offset ~2d: ~s" offset
	       (caesar ciphertext offset t))))
 
(defun break-caesar-by-statistics (ciphertext)
  (let*((letter-count (count-letters ciphertext))
	(rotated-scalar-products (rotated-scalar-product
				  letter-count *letter-frequencies*)))
    (values letter-count rotated-scalar-products
	    (sort-descending (numbered-list rotated-scalar-products)
			     #'second))))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Vigenere
(defparameter *vigenere-sample-ciphertext* "ELWLQUHOMWJPGUPGLGNKGJRFHHEWNDUXOFOVCCJLELZJRTECHWJPGBRTPGIIVSSWRTLZMHCLHNKKDSUUNJVIXTEVYWQHBQDUDHCONLGFHGAPOWVSSLRALZULTDHUWKZBNRVSSHRTEVQDUMIMWNTBAZKEVUFVTJCWAZINRPEVYIKPZXJTZIHGECSQVYPFYWCVWHJVSSCUUEONLQYGUVJLBXOKYUYTWTDGHPEFIONPRCQVZDIVKEWIQYTBWKOZHIUUNZUWVPFYGVZZCIGQFIPVSSNRRZTNKGXCIUKYUGDUEOFLISHVOKYYYGVZGCJPLZQLPOGNUGYUNKCYRXLTPQNLQYSPHTPHNVVFRCHFTHNKQFUBWHFZFBVSSWROXOHGGCCZWJPGNDVTCHFCAHULPWOQUGYQYECESMVJPZVBOTQBDGWGIQYLGNKGTFUYQHSXHPPASKGOGOUGWMVHYLHWKKYUNKGTFUSRCCUFJHOCWKYUNRTPDLLOLBXWJPAZRTLBSINLKMLPEVYHXZZOWKZBNKKDKUVCYOLJWXSHWHZFVHKYUWDWEWIXULBXXUTBAKKDAIVVPLJHTTSHFGOQLHYMINRPNSSRWDHUUVCIHQKYUZUQXRUQIPFBHVZZXKKXGYOHJCOQGGSLVVZDUQFQCLWWYSZDXZFMWJPPIOF")
 
(defun vigenere (string key &optional decrypt)
  (with-output-to-string (out)
    (loop with offsets = (map 'vector #'char-number key)
	 with pos = 0
	 for char across string
	 for number = (char-number char)
	 when number do
	 (write-char (if decrypt
			 (number-char (- number (aref offsets pos))
				       decrypt)
			 (number-char (+ number (aref offsets pos))
				       decrypt))
		     out)
	 (incf pos)
	 (when (>= pos (length key)) (setf pos 0)))))
 
;;; 1. Finding the keylength
(defun count-coincidences (string offset)
  (loop for i upfrom 0 for j upfrom offset
     while (< j (length string))
     count (eql (aref string i) (aref string j))))
 
(defun coincidence-count-list (string &optional (upto 16))
  (loop for offset from 1 to upto
       collect (count-coincidences string offset)))
 
;;; 2. Finding the key
 
(defun letter-frequencies-sorted (&optional (fvector *letter-frequencies*))
  (sort (loop for i below 26
	   collect (list (number-char i t) (aref fvector i)))
	#'>
	:key #'second))
 
(defun count-letters (string &key (start 0) (step 1))
  (loop with a = (make-array '(26) :element-type 'integer :initial-element 0)
     for i upfrom start by step while (< i (length string))
     for n = (char-number (char string i))
     when n do (incf (aref a n))
     finally (return a)))
 
(defun sample-string (string step &optional (start 0) stop)
  (when (or (not stop) (> stop (length string)))
    (setf stop (length string)))
  (map 'string #'identity
       (loop for i upfrom start by step while (< i stop)
	    collect (char string i))))
 
(defun rotated-scalar-product (a b &optional offset)
  (let ((len (length a)))
    (unless (= len (length b))
      (error "Vector lengths ~d and ~d are different" len (length b)))
    (flet ((do-offset (o) (loop for i below len sum
			       (* (aref a (mod (+ i o) len))
				  (aref b i)))))
      (if offset (do-offset offset)
	  (loop for o below len collect (do-offset o))))))
 
(defun break-vigenere (ciphertext &optional keylength)
  (if keylength
      (loop for start below keylength collect
	   (sort-descending
	    (numbered-list
	     (rotated-scalar-product 
	      (count-letters ciphertext :start start :step keylength)
	      *letter-frequencies*))
	    #'second)
	   into table
	   finally
	   (let ((top (mapcar #'caar table)))
	     (return (values table top (map 'string #'number-char top)))))
      (sort-descending
       (numbered-list
	(coincidence-count-list ciphertext)) #'second)))
2010-08-27, Harald Hanche-Olsen