Technische Universiteit Eindhoven

Latest News (June 16, 2009) See the full paper Marc Stevens, Arjen Lenstra and Benne de Weger, "Chosen-prefix Collisions for MD5 and Applications", submitted to the Journal of Cryptology.
(June 2, 2009) We now have a single block chosen-prefix collision.
(December 30, 2008) On December 30, 2008 the newest application of chosen-prefix collisions for MD5 was presented at the 25th Chaos Communication Congress in Berlin: Creating a rogue CA certificate.
(July 2, 2008) On July 2, 2008 Marc received the "TU/e Afstudeerprijs 2008", i.e. the TU/e Best Master Thesis Award for the best final project in one of the TU/e master's programs completed in 2007. Congratulations!
(February 12, 2008) We've had some influence on the German Signaturgesetz. See the Colliding X.509 certificates for different identities website.
(November 30, 2007) We have two new applications of chosen-prefix collisions in proof of concept.
The first one is Predicting the 2008 US presidential elections using a Sony PlayStation 3.
The second one is a pair of colliding executables, indicating a vulnerability in software integrity protection and code signing systems.
(October 10, 2007) Marc was nominated for the Joop Bautz Information Security Award.
(June 25, 2007) Marc got his grade, a 10. Congratulations!
(May 20, 2007) Our paper has been designated a "notable paper" (one of the three) by the EuroCrypt program committee.
(Feb. 27, 2007) Final paper available below.
(Feb. 7, 2007) Our paper has been accepted at EuroCrypt 2007.
Abstract of the paper We present a novel, automated way to find differential paths for MD5.
As an application we have shown how, at an approximate expected cost of 250 calls to the MD5 compression function, for any two chosen message prefixes P and P', suffixes S and S' can be constructed such that the concatenated values P||S and P'||S' collide under MD5.
Although the practical attack potential of this construction of chosen-prefix collisions is limited, it is of greater concern than random collisions for MD5.
To illustrate the practicality of our method, we constructed two MD5 based X.509 certificates with identical signatures but different public keys and different Distinguished Name fields, whereas our previous construction of colliding X.509 certificates required identical name fields.
Further we constructed a multicollision of twelve PDF files, all having the same MD5 hash value, and we constructed a pair of colliding executables. We speculate on other possibilities for abusing chosen-prefix collisions.
The paper The final paper (v2.0 in pdf-format) is a thorough rewriting of the announcement paper (v1.1 in pdf-format).
It has many more details about the method of finding chosen-prefix collisions, but less details about the certificates.
The final paper is published in the EuroCrypt 2007 Proceedings.

Added November 30, 2007: an updated full paper, with improved methods, improved use of hardware (we now use a Sony PlayStation 3) and our new examples, is in preparation, and will be submitted to the Journal of Cryptology.
Differential Paths As an example, here are the differential paths we have constructed for the colliding certificates.
The collision example The collision example from the colliding certificates, both in binary and in hexadecimal:
   binary files (not human readable): collision1.bin, collision2.bin
   hexadecimal files (human readable): collision1.txt, collision2.txt
   differences in bits (human readable): bitdifference.txt
Verification can be done on a Windows system by e.g. Jem Berkes' md5sums.exe
(on unix, use the md5 command).
Here's a recipe to verify the collision with md5sums.
Visualisations Here are the bitmap images of the IHV differences for the collision used in the colliding certificates, and as shown in the announcement paper.

The image below has 22 lines of 128 pixels each. Each line represents one IHV difference.
Each IHV is seen as a quadruple of unsigned 32-bit integers, and the additive differences of these quadruples are shown in the so called Non-Adjacent Form. Red and blue pixels stand for differences with opposite signs.

The first line corresponds to the initial IHV, showing no difference, of course.
The next three lines correspond to the IHV's found after the first three data blocks, showing random behaviour, as it should.
The fifth line shows 8 triples of bit differences, caused by birthdaying done on the last 96 bits of the fourth data block.
In the next eight lines these eight triples are eliminated one by one, each time using a different differential path.
From the thirteenth line on all is black, indicating a collision.

The image to the right (that reminds some of a Japanese yukata or kimono next to a street sign) also has lines of 128 pixels each.
Each line in between a pair of yellow bars represents again one IHV difference.
Furthermore all internal states of the compression function are shown, each time after one of the 64 steps has finished.
We also have made a nice animated image:

Colliding X.509 Certificates
for Different Identities
As an application we have a method of constructing a pair of colliding X.509 certificates for different identities.

Creating a rogue CA certificate
Added January 1, 2009: The application of colliding certificates has been turned into a realistic attack in work that we did jointly with Alex Sotirov, Jake Appelbaum, David Molnar and Dag Arne Osvik: Creating a rogue CA certificate.
Possible application:
Nostradamus attack
John Kelsey and Tadayoshi Kohno describe a herding attack on using hashes as a commitment to a given message.
You can commit to a message without revealing it, by publishing only a hash of the message. Later on you can prove by revealing the message that you knew it at the time of publication of the commitment hash.
Our construction of chosen-prefix collisions can be used by Nostradamus to prove that he is able to predict events in the future. Here is an example, essentially due to Kelsey and Kohno.

Suppose the soccer match Ajax-Feyenoord is on July 1. Nostradamus publishes a hash value on June 30, and invites people to bet on whether he will be able to predict the outcome of the match. After the match, which happens to be won by Feyenoord, Nostradamus publishes a message with as essential content "Feyenoord will win the match tomorrow", that indeed hashes to the published value. Everybody will admire Nostradamus' prediction capabilities, and he will collect a lot of money from the betting stations.

How did Nostradamus do this? Well, he took three messages:

   m1 = "Feyenoord will win the match tomorrow." + padding
   m2 = "Ajax will win the match tomorrow." + padding
   m3 = "The match tomorrow will end in a draw." + padding

(padding is there to make sure the messages have the same length, which is a multiple of 512 bits). Now only two chosen-prefix collisions are required to let the three messages collide under MD5, as follows:

The block p3 is another padding block to adjust to the proper length. The collision blocks are yellow, the resulting collisions are red. The final collision IHV6 now needs only one additional padding block because of MD-strengthening. Then the hash value is reached that is the commitment to any of the three messages m1, m2, m3. The collision blocks have to be hidden somehow, e.g. in a small bitmap.

Added November 30, 2007: This application has now been realized, see our website on Predicting the 2008 US presidential elections using a Sony PlayStation 3.
Possible application:
Barcode Images
Here are the bitmap images of a pair of "high security" barcodes as shown in the paper.
An actual collision was placed in a pair of bitmap images.
The barcode images as such are not really colliding, they are only an example to show what such images would look like.

There are twelve differences. Four are easy to find. Can you find the other 8?
If you can't, here they are.

The images can be as small as the following bitmaps:

Added November 30: Another application has now been realized, see our website on colliding executables.
Possible application:
Content addressed storage
Some vendors have brought products on the market based on Content addressed storage (a.k.a Content addressable storage). The idea seems to come from the company EMC2, who have implemented it in their Centera product line. Caringo is another company advertising with a similar idea. See also the CAS Community.
The main idea is that the storage address is directly derived from a hash value of the contents of the file to be stored. When the hash function MD5 is used, our chosen prefix collisions could be used to secretly overwrite a file with a different, colliding one.
Note: we do not claim that the products of these companies are bad. We have not investigated these products. We only warn the reader to be cautious when such systems allow the use of broken hash functions.
More on hashes in the Centera products can be found in the paper Collision and Preimage Resistance of the Centera Content Address by Robert Primmer and Carl D’Halluin, a paper which by now has become somewhat outdated. From this paper it becomes clear that the Centera "M naming scheme" might be vulnerable. Robert Primmer tells me that EMC2 is now outphasing the M naming scheme.
Links The HashClash project, Marc Stevens' MSc project on MD5 collision construction.
Creating a rogue CA certificate
Single Block Chosen Prefix Collision
Predicting the 2008 US presidential elections using a Sony PlayStation 3
Colliding executables - a software integrity and code signing vulnerability
Colliding X.509 certificates for different identities
The previous Colliding X.509 Certificates construction
An interesting site on Meaningful Collisions (for SHA-1), about work similar to our method of chosen-prefix collision construction, but focusing on SHA-1.
 / _ \
| / | |
| \_|/
Marc Stevens CWI, Amsterdam, The Netherlands
Arjen K. Lenstra, EPFL, Lausanne, Switzerland
Benne de Weger, TU/e, Eindhoven, The Netherlands

Please send all correspondence to Benne de Weger.
Publication Date February 23, 2007
Latest Modification June 16, 2009