# Log

##### Last lecture

**17-18/11:** We worked our way through a series of election schemes, starting with a very simple example and working towards a more sophisticated example by formulating attacks and devising countermeasures to the attacks. Along the way, we discussed security requirements for elections in some detail.

##### Previous lectures

**10-11/11**: We looked at an index-calculus type attack when the RSA signature is done by signing a hash of the message, \(\sigma=h(m)^d\). The idea is that if \(h\) has short output, \(h(m)\) is more likely to be a product of small primes. We get relations by finding "harmless" messages \(\{m_j\}\) such that \(h(m_j)\) is always a product of small primes, then ask for signatures on the harmless messages. Afterwards, we do linear algebra to find \(p_i^d\) for all the small primes. Finally, we find a "dangerous" message such that \[h(m) = \prod p_i^{r_i} \quad \Rightarrow \quad \sigma = h(m)^d = \prod (p_i^d)^{r_i} .\]

Next, we looked at Schnorr signatures and argued informally for their security. Then we began studying the Schnorr protocol, a protocol for proving that you know the discrete logaritm of something without revealing anything about what you know. We saw that the Schnorr signature scheme is derived from this protocol. Finally, we looked at applications of Schnorr signatures to various cryptographic problems, namely, proving correct decryption, proving knowledge of what was encrypted, and proving that ciphertexts contain one out of a (small) set of allowed messages.

**3-4/11**: We began studying digital signature schemes, looking at some applications (e.g. securing Diffie-Hellman against man-in-the-middle attacks). The primary example is the "textbook RSA" signature scheme, and we looked at some easy attacks on the scheme. Functionally, "textbook RSA" is restricted to very short messages, and this motivates the need for hash functions: sign the hash of the message instead of the message itself.

We then defined hash functions and the defined various properties we would like hash functions to have. For application to digital signatures, it is essential that the hash function "behaves" like a 1-1 function, otherwise signing the hash is insecure. A *collision* is a pair (*x*,*x*') that hash to the same value. Finding a *second preimage* is to find an *x*' that hashes to the same value as a given, randomly chosen, *x*. Finding second preimages is not easier than finding collisions. If finding collisions is hard, we say that a hash function is *collision resistant* and the hash function more or less behaves like an injective function.

Finding a *preimage* is to find some *x* that hashes to a given, randomly chosen, hash value. Preimages could be easier to find than second preimages. If it is hard to find both preimages and second preimages, we say that the hash function is *one way*. This hash function property can be used to strengthen signature schemes. For instance, for "textbook RSA", it is easy to find valid message-signature pairs without the signing key. But if we hash the message first, this becomes difficult.

We also studied one approach to constructing hash functions, how to construct a hash function on a large set from a hash function (often called a *compression function*) on a not so large set. We proved that if the compression function is one way and collision resistant, then so is the hash function on the large set.

We also looked at one example compression function that was secure if computing discrete logarithms was hard.

**28-28/10:** We studied factoring methods. For RSA to be secure, we need to choose primes such that the product modulus *n* is hard to factor.

Fermat factorisation tries to write *n* as a difference of two squares, by trying small integer squares in order. This will only work if the two prime factors are close together. (This might happen if we use sieving to speed up choosing primes, and choose both primes from the same region.) Choosing random, independent large primes will not be close together.

Pollard's ρ method works from the observations that (a) in a random-looking sequence, we expect repetitions modulo the prime factors before we expect collisions modulo *n*, and (b) the recurrence rule *s*(*n*+1) = *s*(*n*)²+1 mod *n* gives random-looking sequences and allows us to use cycle finding techniques. The time requirement is roughly proportional to the square root of the smallest prime factor. We must use sufficiently large primes.

The *p*-1 method chooses a random integer and tries to guess a multiple of its order modulo one of the primes. The only reasonable strategy is to hope that the order will be a product of powers of small primes, in which case a product of large powers of all the small primes will work. We could choose primes such that *p*-1 is divisible by some large prime, but (a) sufficiently large primes will most likely satisfy this, and (b) there are variants of this method that work regardless. The time requirement of these variants depend on the size of the smallest prime factor. We must use sufficiently large primes.

There are index-calculus methods for factoring integers. The idea is to find *s*, *t* such that *t*² = *s*² (mod *n*), but *t* ≠ ±*s* (mod *n*). How to do this is a fairly straight-forward adaption of index calculus ideas, with many improvements, and the time requirement depends on the size of the modulus. We must use sufficiently large moduluses.

Finally, we discussed why straight RSA is insecure, and how one really encrypts using RSA. We looked at two approaches, RSA-KEM and RSA-OAEP. The former encrypts something random using straight RSA, derives a key from the randomness, then uses that key to encrypt the message with a symmetric key cryptosystem. RSA-OAEP is based on encrypting a short message by first applying an unkeyed Feistel construction before applying straight RSA.

**21/10:** We defined a public key cryptosystem based on a group *G* of order *k*, and two numbers *e* and *d* that are inverses modulo *k*. The encryption key is (*G*,*e*) and we encrypt *m*∈*G* by computing *c*=*m*^*e*. The decryption key is *d* and we decrypt *c* by computing *m*=*c*^*d*.

The first observation is that if an attacker knows *k*, he can easily compute the decryption key from the encryption key and the system is broken. This means that the above cryptosystem really needs a family of groups, and the key generation algorithm shall select a group from that family.

The second observation is that for the group **Z***n**, *n* composite, if we know a multiple of the group order, we can find a non-trivial factor of *n*. This means that, if we use **Z***n** as the group in the above public key cryptosystem, computing the secret key is as hard as finding the prime factorization of *n*.

**20/10:** Using Diffie-Hellman to agree on a shared secret key, and then using the secret key to communicate is a practical approach to secure communication in the "telephone model" of communication. In the "e-mail model" of communication, Diffie-Hellman is impractical because it requires both correspondents to be online at the same time. However, one party (say Bob) can execute the first step of the Diffie-Hellman protocol and publish the message to everyone. He should remember the exponent he used. Afterwards, anyone can complete the protocol and use the resulting key to send a message, along with the completing message in the Diffie-Hellman protocol. Every time Bob receives such a message, he completes the Diffie-Hellman protocol by computing the corresponding shared secret and decrypts the message.

We defined precisely what a public key cryptosystem is and used Textbook ElGamal as an example. Then we studied one attack on ElGamal and looked at countermeasures.

**6-14/10:** We studied elliptic curves. An elliptic curve is defined by an equation of the form *Y*²=*X*³+*AX*+*B*, where 4*A*³+27*B*²≠0. If the coefficients *A* and *B* are in some field **F**, we say that the curve is defined over **F**. The points on the curve are the solutions to the equation from some algebraic closure of **F**, plus a special point at infinity. The rational points are the points with coordinates in **F**, plus the special point at infinity.

We saw how projective coordinates explain where this point at infinity comes from and explain the following postulates: (i) any line of the form *X*=*c* intersects the curve in two points (counting multiplicities) in the plane plus the special point at infinity; and (ii) any line of the form *Y*=*aX*+*b* intersects the curve in three points (counting multiplicities).

From these postulates, we can define a commutative group operation on the curve geometrically by saying that the special point at infinity is zero, and any colinear points sum to zero.

The rational points are closed under this group operation, hence induce a group operation and we get a finite group. We looked at the group structure of this group, then we then superficially studied how to compute the number of points on an elliptic curve.

Finally, we considered algorithms for computing discrete logarithms in this group. For very special curves (we mentioned supersingular and anomalous curves), there are fast or even trivial algorithms, but for most curves, generic algorithms are the best possible.

**30/9:** We looked at the simplest possible index calculus algorithm, for a group of order *n* based on yesterday's setup. We have a factor base and a "factoring map".

Stage one is finding a large set of linear equations involving the unknown discrete logarithms of the factor base elements. We do this by trying random exponents and using the "factoring map" to check if *g*^(*r*_*i*) = Π*l*_*j*^(*a*_*ij*). If this is true, we get one linear equation, namely *r*_*i* ≡ ∑ *a*_*ij* log *l*_*j* (mod *n*). Once we have many such equations, we can solve for the unknown discrete logarithms log *l*_*j*.

Stage two is finding a similar relation that involves the element we are interested in, that is, *xg*^(*r*) = Π*l*_*j*^(*c*_*j*). Once such a relation is found, we get the discrete logarithm from the linear equation log *x* = - *r* + ∑ *c*_*j* log *l*_*j* (mod *n*).

The we began to study algebraic plane curves, the zero sets of polynomials in two variables. We discussed smooth curves and tangent lines, then defined what an elliptic curve defined over a field of characteristic ≠ 2,3 is.

**29/9:** Finite prime fields are straight-forward. For extension fields, we compute in the factor ring **F***p*[*x*]/(*f*(*x*)), where *f*(*x*) is an irreducible polynomial. Addition, subtraction and multiplication is as usual, while division is multiplication by inverses and inverses are found using the extended Euclidian algorithm.

We looked at how to decide if a polynomial is irreducible or not. If *h*(*x*) is reducible, then gcd(*x*^(*p*^*i*)-*x*,*h*(*x*)) ≠ 1 for some 1 ≤ i ≤ deg *h*(*x*)/2. If *h*(*x*) is irreducible, then all of these gcd's will be 1, which follows from the fact that *x*^(*p*^*i*)-*x* is the product of all irreducible polynomials of degree dividing *i* (which was the part missing from the lecture). A proof of this fact can be found Chapter 20 in *A Computational Introduction to Number Theory and Algebra* by Victor Shoup.

Finally, we discussed the setup for simple index calculus algorithms. We have a factor base *B* (small primes for prime fields, irreducible polynomials for extension fields of small characteristic) a "factoring map" and a subset *V* of elements that factor. It is possible to adjust the size of the factor base so that the subset *V* is reasonably large while the "factoring map" remains easy to compute.

**23/9:** We studied *generic group algorithms*, algorithms that work for any group. The basic idea is straight-forward: the algorithm represents group elements somehow, but does not care about the representation, except for computing the group operations and comparing for equality. Making this precise is somewhat technical, but then it is possible to prove that such algorithms cannot hope to compute discrete logarithms quickly.

We also started studying finite fields, in preparation for studying index calculus algorithms.

**22/9:** We studied Pollard's ρ method for computing discrete logarithms in a group of prime order *p*. This algorithm employs three ideas: (i) By the birthday paradox, a "random-looking" sequence produced by some iteration rule *s*_{*i*+1} = *f*(*s*_*i*) will likely begin cycling after some small multiple of sqrt(*p*) iterations. (ii) We can detect this cycle using Floyd's cycle-finding algorithm. (iii) Once we have detected repetitions, then most likely we can recover the discrete logarithm.

**16/9:** We first looked at the structure of prime-power order cyclic groups, and we showed that we can recover the *p*-adic digits of the logarithm by computing logarithms in the prime-ordered subgroup. This is part I of the Pohlig-Hellman algorithm. This reduces the cost of discrete logarithms from ∑*p*_*i*^*r*_*i* to ∑*r*_*i* *p*_*i*. Essentially, it means that the cost of doing discrete logarithms in *G* is dominated by the largest prime dividing *G*.

Afterwards, we looked at Shanks' Baby-step Giant-step algorithm. It uses a time-memory trade-off to reduce time complexity to the square root of the group order, but the space complexity increases.

Coupled with Pohlig-Hellman, we see that the cost of computing discrete logarithms is no larger than roughly ∑*r*_*i* sqrt(*p*_*i*).

**15/9:** We studied the structure of cyclic groups, and then used this structure to show that instead of computing discrete logarithms in *G* itself, we can compute several discrete logarithms in prime-power order subgroups of *G*, then glue the results together using the Chinese remainder theorem. Compared to exhaustive search in *G*, this reduces the cost from ∏*p*_*i*^*r*_*i* to ∑*p*_*i*^*r*_*i*. This is part I of the Pohlig-Hellman algorithm.

**9/9:** We studied the properties of the Jacobi symbol and showed how those properties allow us to compute the Jacobi symbol quickly. Then we showed that then number of elements for which the Jacobi symbol of *a* is congruent to *a*^( (*n*-1)/2 ) modulo *n* is at most half of all elements when *n* is composite. This gives us a reliable test to distinguish primes from composites.

**8/9:** To use Diffie-Hellman, we need groups where it is hard to compute discrete logarithms. **F***q** where *q* is some prime power is a possible candidate. If *q* is prime, we need it to be a large prime, so we need to find large primes.

The prime number theorem says that there are a lot of primes, so if we can recognize primes, we can find primes simply by selecting candidates at random until we find a prime.

On possible idea on how to recognize primes is by using Fermat's little theorem. Define a subgroup *B* of **Z***n** as the subset that satisfies *a*^(*n*-1) = 1. If this subgroup is not all of **Z***n** (as it will be for primes), it can be at most half of the subgroup. We would then get a useful test by choosing a random *a*, which with high probability will not satisfy the requirement. Unfortunately, for Carmichael numbers, the subgroup *B* will be all of **Z***n**, so this test distinguishes non-Carmichael composites from Carmichael numbers and primes. Not what we want.

We then defined the Legendre symbol and showed that for prime numbers the Legendre symbol of *x* equals *x*^( (*p*-1)/2 ). We defined the Jacobi symbol, which we shall use to define a new subgroup of **Z***n** that will equal all of **Z***n** only if *n* is prime.

**2/9:** We defined *message authentication code* (MAC) and studied MACs based on polynomial evaluation. The idea is to consider the message as a polynomial with zero constant term, evaluate the polynomial at some secret and then do something with the result. If we add a second secret to the result, *m*(*k*1) + *k*2, we get a MAC that can be used for one message only, but which is unconditionally secure - no matter how much computational resources the attacker has, he cannot win except with a very small probability.

We also looked briefly at a variation applying a random permutation/block cipher to the result, π(*m*(*k*1)), a MAC based on cipher block chaining mode (CBC-MAC) and a MAC based on hash functions (HMAC).

**1/9:** We defined *symmetric encryption schemes* and continued to discuss systems based on permutations. We saw how the obvious encryption rule *ci* = π(*mi*) did not provide confidentiality for all possible messages, even when π is a random permutation. To get something secure, we have to use *cipher block chaining mode*, or CBC mode. Some examples of block ciphers include *x*→*x*+*k*, *x*→**K***x* where *x* is a vector and **K** is an invertible matrix, *x*→*x*^*a* for some a that has multiplicative inverses modulo the group order, and Rijndael, the Advanced Encryption Standard (AES). Only AES is considered secure, the rest do not look like random permutations.

We also considered *key stream generators*, a family of maps from a set *I* of *nonces* or *initialization vectors* to the set of infinite strings of group elements indexed by a key set, along with an algorithm for computing prefixes of these strings. The generator is secure if it "looks like" a random map from *I* to the set of infinite strings. We construct a crypto system by computing a prefix *z* of *f*_*k*(*i*) for some random *i*, and the ciphertext is (*i*, *m*+*z*).

**26/8:** We first talked about elementary properties of finite cyclic groups and the isomorphism from *G* of order *n* to **Z***n* under addition taking the generator *g* to 1. We shall spend a lot of time on how to compute this map.

What to do with a shared secret? We introduced the word *plaintext*, *key* and *ciphertext*, and discussed encryption and decryption rules. We discuss a few simple rules for the case where Alice wants to send a *single* encrypted message to Bob, based on the formula *c* = *m* + *k*. We discussed *known plaintext* and *brute force* (or exhaustive) search for the key.

In the end, we switched to the rule *c* = π(*m*), where π is a random permutation. We defined *block cipher* to be a family of permutations indexed by a key set plus an algorithm to compute the permutations and their inverses. A block cipher is secure if it "looks like" a random permutation.

**25/8:** We discussed the classical cryptographic problem, confidential communication over a public channel and communication with integrity over a public channel. Then we continued with the weaker problem of "shared secret generation" and began discussing the Diffie-Hellman key agreement protocol.