Predicting the winner of the 2008 US Presidential Elections using a Sony PlayStation 3November 30, 2007
|Announcement||We have used a Sony Playstation 3 to correctly predict the outcome of the 2008 US presidential elections. In order not to influence the voters we keep our prediction secret, but commit to it by publishing its cryptographic hash on this website. The document with the correct prediction and matching hash will be revealed after the elections.|
|Fingerprints||Two persons, even if they look similar, usually have different fingerprints. Fingerprints can therefore be used as an additional means to identify people. In the electronic world a similar concept has been developed for documents and messages: a fingerprint of a message is a relatively short sequence of symbols, made in such a way that even if the tiniest change is made in a message, its fingerprint changes completely. Two different messages do not have identical fingerprints. Fingerprints can therefore be used to represent messages.|
|Commitments||A commitment to a message is some type of fingerprint of the message. Because a fingerprint of a message has the properties that it changes completely if the message changes, so that two messages cannot have identical fingerprints, the message can no longer be altered after its fingerprint has been made public. This means that one can commit oneself to a message without revealing it, simply by publishing the fingerprint. If, at some later time, the message is revealed, anyone who knows how to fingerprint a message can check that the published fingerprint matches the one of the message.|
|Cryptographic hash functions||To derive a fingerprint for electronic documents one commonly uses a cryptographic hash function. These are widely known and easy to calculate functions that have the desired properties that even the tiniest change in the message or document results in an entirely different hash value, and that are believed to not allow one to come up with two messages with identical hash values. A well known and widely used cryptographic hash function is MD5.|
|Predicting the future||Suppose somebody is able to predict future events. Then a commitment can be used to show this ability to the world. An electronic document is prepared that unambiguously describes the prediction; the commitment then consists of the cryptographic hash of the document. In order not to unduly influence the natural course of events, the document itself should not be revealed, but just the commitment should be published, in such a way that it cannot be changed afterwards. This can for instance be done in a newspaper or on a publicly accessible website. After the relevant events have taken place, the document is published as well. Everyone can then verify that it contains the correct prediction, and that it has the published hash value. This proves that at the time of publishing the hash, long before the events took place, the publisher already had the document available and knew the prediction.|
|Prediction of the 2008 US presidential election winner||
As a proof of concept of our abilities we publish on this website our commitment for
our correct prediction of the outcome of the 2008 US presidential elections.
We have prepared an electronic document that unambiguously describes our prediction.
This document contains only one name of a candidate, namely of the winner.
The commitment to this document, computed as the MD5 hash value of the document, is
This commitment has been released on November 30, 2007. After the election, in November 2008, or at some later stage after the relevant parties have colluded and the world knows the designated winner of the election, we will release the document. Everybody can then check that our prediction was correct.
|How did we do this?||By making extensive use of the hidden powers of the Sony PlayStation 3.|
|Are we cheating?||
No, of course we are not cheating. We do have a document
that contains our precise and correct prediction of the winner of the 2008 US
presidential elections. That document does hash to the above value.|
|On the other hand||Yes, of course we are cheating. But the statements in the previous paragraph are true, and we did indeed use a PlayStation 3. Below we explain in some detail what we did.|
|Hash functions should be collision resistant||In order for a commitment to be of any value, the cryptographic hash function that is used must be collision resistant. That is, it must be practically impossible to find different messages that have the same hash value. Otherwise, a miscreant can use a single hash value to commit to more than a single message, and at the time the document is revealed show only the one that suits him best.|
|MD5 is not collision resistant||The cryptographic hash function MD5 that we used was shown to be not collision resistant, by prof. Xiaoyun Wang and her co-authors, in 2004 (see the EuroCrypt 2005 paper "How to break MD5 and other hash functions"). Although devastating from a theoretical point of view, many practitioners believed that the practical repercussions of prof. Wang’s attack would be minimal because of the lack of useful structure of the collisions that could be constructed.|
|Meaningful messages||For a hash collision to be useful in realistic attack scenarios, the different messages colliding to the same hash value must be meaningful. Preferably the messages are presented in a well known document format, such as Microsoft Word, PostScript, or Adobe PDF. Prof. Wang's original collisions were not meaningful messages but consisted of rather random looking strings of characters. Nevertheless, several authors have shown how any pair of random looking colliding messages can be used to create collisions between meaningful messages or documents. These constructions have the disadvantage that close inspection of either one of the meaningful colliding messages will reveal the existence of the other: given the original random collision and one of the meaningful messages, the meaningful message colliding with it can be derived. This is due to the fact that, when incorporating a collision into a pair of meaningful documents, prof. Wang's collisions required the documents to be exactly equal, except for the relatively few random looking characters comprising the collision.|
|Chosen-prefix collisions||Last year we showed, with the kind assistance of prof. Wang, how her attack on MD5 can be improved to provide chosen-prefix collisions. Our chosen-prefix collisions have only the requirement that after the collision the documents should be exactly equal. Before the collision the two documents, for which a collision is to be found, can be anything: our chosen-prefix collision finding method will always produce a collision that can be incorporated into the two documents, irrespective of what data is present before the collision.|
|PDFs||Our chosen-prefix collision finding method enables us to produce colliding documents in many different formats, and in such a way that it is not immediately obvious whether another meaningful message that collides with it is known to anyone. We chose, somewhat arbitrarily, the PDF format, having the advantage that its specifications are open, and the PDF format is in wide use in both the Windows and non-Windows communities.|
|The Nostradamus attack and multi-collisions||
In 2005 John Kelsey and Tadayoshi Kohno (see their EuroCrypt 2006 paper
hash functions and the Nostradamus attack") described an attack, in which a miscreant
uses hash collisions to produce one commitment for as many different messages as
he wants. As an application they describe the idea of using hash values as commitments
in predicting the future. They dubbed this attack the Nostradamus attack.
It is based on the ability to make chosen-prefix collisions, and that is exactly
what we can do. With our previously published method to construct chosen-prefix
collisions one can construct pairs of meaningful documents that have the same
hash value. The idea of Kelsey and Kohno can then be used to turn a number of
carefully constructed chosen-prefix collisions into a multi-collision,
consisting of any number of messages colliding into the same hash value.
The idea of Kelsey and Kohno can be visualised in a diamond structure, see the picture below.
(revealing our secret)
We have prepared twelve different predictions, ten of which are shown in the table below.|
All twelve documents we prepared, the ten given above and two hidden ones, have the MD5 hash value
The documents were first carefully prepared as valid PDF documents, with a hidden image object incorporated, containing a sufficient amount of random bits. Then, according to the diamond structure shown above, eleven chosen-prefix collisions were computed, and placed inside the hidden image objects at precisely the proper spots. In this way the twelve documents were turned into an MD5 multi-collision.
|The two hidden documents||Two of the twelve documents that we prepared are not listed above. It should be obvious which two names are missing, given that we want to be able to show the correct prediction with the right hash, once the elections are over. But, given that the missing documents have to fit at the spots for ‘E’ and ‘H’ in the diamond structure, do you know which name goes where, better than by guessing? And, much more challenging, can you find two PDF files with the proper contents and the above MD5 hash? We have such files, but wonder if others can find them too, based on the available side-information.|
As anyone can inspect for the ten documents that are given, the name of each
document refers to the unambiguous
candidate name specified in the document itself as the winner of the elections.
Once the elections are over and we know the designated winner, we simply could have published
the right document (with a good chance that we would have to reveal part of the solution
to the challenge posed above), and thrown away all the others. There's a tiny chance that the actual
winner will not be on the above list of twelve. That is a risk we accept; we apologize to
those serious candidates who have not made it to our list. |
Clearly (and assuming you believe that we have the two missing PDFs), we were not cheating. We have a document that has the announced hash value and that contains the correct prediction.
On the other hand, clearly, we were cheating indeed. We were on purpose imprudent in our choice of cryptographic hash function. Given the recent insights into the weaknesses of MD5, the bottomline of our work is:
It should be obvious that we cannot predict the future, sorry. We only know how to cheat by using ill-chosen commitments to predict the future. We don't know who will be the next US president. Some of us don't even care that much, as (s)he is not going to be our president. We wish all candidates - listed or not - good luck, and may the best one win.
You are invited to verify for yourself the following four essential points:|
|Pretty pictures||We have some pretty pictures showing the differences between the internal states of the MD5 hash function as they evolve during the hash computations.|
|Extendability||Originally we had a multi-collision consisting of just eight documents: A to H. When the 2007 Nobel peace price was announced we decided that it would be prudent to add another four documents, extending the existing collisions, without having to throw away anything of the work done to compute the first multi-collision of eight documents. From the diamond structure above it may be clear that an 8-collision requires seven chosen-prefix collisions, and that extending it to our present 12-collision only four chosen-prefix collisions had to be added.|
How does the Sony PlayStation 3 come into play?
Our chosen-prefix collision finding method is based upon the ability to extend two messages
with a block such that the next internal state of MD5 shows strictly fewer bit differences
than the previous one. By repetition we will eventually find two extended messages that
will have no bit differences left between the two last internal states of MD5, hence they
will have the same MD5 hash value. In the pretty pictures
this process of annihilating all bit differences in a number of steps is shown. |
However, we are only able to eliminate specific bit differences in this manner. Therefore our chosen-prefix collision finding method requires that we first extend the two messages such that the differences between the two last internal states have the specific form that we can eliminate. This first extension can be achieved using a birthday attack, which is computationally the most expensive part of our method. In our previous work on chosen prefix collisions we applied this birthdaying technique in a less advanced way, for which we needed thousands of PCs running for months under the BOINC distributed computing framework. Now we have a more advanced technique, which on a single PC still can take several days. On the PlayStation 3, with its multiple computing cores and each core computing four instantiations of MD5 in SIMD mode, the advanced birthday attack takes just a couple of hours. Essentially, a single PlayStation 3 performs like a cluster of 30 PCs at the price of only one. In our experience, one chosen-prefix collision can be constructed well within 2 days of computation time, using one PlayStation 3 and a quadcore PC.
|Didn't Daum and Lucks do something like this in 2005?||Magnus Daum and Stefan Lucks prepared a pair of PostScript documents colliding under MD5 that show completely different contents on screen (also see the paper "A Note on Practical Value of Single Hash Collisions for Special File Formats" by Max Gebhardt, Georg Illies and Werner Schindler). These attacks are related to ours, but they are less flexible, as they are based on the original random collision finding method of prof. Wang. Among others this implies that they rely on the macro-functionality of "data formats" such as MS Word or PostScript or PDF. Furthermore, each of the two colliding documents contains two complete content blocks (of which only one is shown on screen), which implies that the fraud will become apparent on bit-level inspection of one document only. Also these documents do not allow multi-collisions, and are certainly not extendible. In contrast, our twelve PDFs do not use any macro-functionality of the PDF format, and each file contains only one candidate name. Although our challenge related to the hidden PDF documents may become easier by peeking at the bits in the other files (and by studying the details of the MD5 collision construction), there is no indication of the two missing names in any of the other ten PDFs. If we had chosen a data format that does not allow hidden objects, we could have hidden the collisions in e.g. tiny bitmap images, or parts of images, viz. the barcode images shown on the chosen-prefix collision webpage.|
|Digitally signing the colliding documents||
Our twelve colliding documents all have the same hash value. Applying a
digital signature method to one of them, by performing a private key
operation to the hash value, yields a digital signature that will be
equally valid for each of the twelve documents. We here present a digital signature
made (using the proper PKCS#1 padding on the MD5 hash value) by the private key corresponding
to Marc Stevens'
We invite you to verify this signature against the ten PDF documents and the certificate, with your favourite public key cryptography software. We here make available an OpenSSL recipe and the required downloadables: the binary signature, the certificate, and the public key file. The certificate itself can be validated by the CA certificate, which is available for download as well.
|What other attacks are possible with this technique?||The first chosen-prefix collision was built in colliding X.509 certificates with different identities. A complete description can be found in our EuroCrypt 2007 paper "Chosen-prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". In this paper some other abuse scenarios are described as well. We also implemented a chosen-prefix collision in a pair of executables, as a proof of concept of the vulnerability of software integrity and code signing applications to attacks based on chosen-prefix collisions for MD5.|
|Will SHA-1 undergo the same fate as MD5?||Eventually, most probably it will, though attacking SHA-1 is still much more difficult than attacking MD5. See the IAIK Krypto collision website for the present state of affairs, and do donate your spare cycles to the IAIK SHA-1 collision search.|
|Paper||The chosen-prefix collision construction was described in our, officially dubbed notable, EuroCrypt 2007 paper "Chosen-prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". We are currently preparing an updated version of this paper, providing details of our improved chosen-prefix collision construction, the use of the Sony PlayStation 3 in our computations, and some more details of the multi-collision presented above. This new, full version will be submitted to the Journal of Cryptology.|
|Our real prediction||
Marc Stevens, CWI, Amsterdam, The Netherlands|
Arjen K. Lenstra, EPFL, Lausanne, Switzerland, and Bell Labs, Murray Hill, USA
Benne de Weger, TU/e, Eindhoven, The Netherlands
Please send all correspondence to Benne de Weger.
We are grateful to Dag Arne Osvik at EPFL for his help with the Sony PlayStation 3.
A large part of this work was done while Marc was visiting Arjen at EPFL in August and September 2007.
|Publication Date||November 30, 2007|
|Latest Modification||March 24, 2009|