Nostradamus

Predicting the winner of the 2008 US Presidential Elections using a Sony PlayStation 3

November 30, 2007

Marc Stevens	Centrum voor Wiskunde en Informatica
Arjen Lenstra	EPFL and Bell Labs
Benne de Weger

Romanian translation

Announcement We have used a Sony Playstation 3 to correctly predict the outcome of the 2008 US presidential elections. In order not to influence the voters we keep our prediction secret, but commit to it by publishing its cryptographic hash on this website. The document with the correct prediction and matching hash will be revealed after the elections.

Fingerprints Two persons, even if they look similar, usually have different fingerprints. Fingerprints can therefore be used as an additional means to identify people. In the electronic world a similar concept has been developed for documents and messages: a fingerprint of a message is a relatively short sequence of symbols, made in such a way that even if the tiniest change is made in a message, its fingerprint changes completely. Two different messages do not have identical fingerprints. Fingerprints can therefore be used to represent messages.

Commitments A commitment to a message is some type of fingerprint of the message. Because a fingerprint of a message has the properties that it changes completely if the message changes, so that two messages cannot have identical fingerprints, the message can no longer be altered after its fingerprint has been made public. This means that one can commit oneself to a message without revealing it, simply by publishing the fingerprint. If, at some later time, the message is revealed, anyone who knows how to fingerprint a message can check that the published fingerprint matches the one of the message.

Cryptographic hash functions To derive a fingerprint for electronic documents one commonly uses a cryptographic hash function. These are widely known and easy to calculate functions that have the desired properties that even the tiniest change in the message or document results in an entirely different hash value, and that are believed to not allow one to come up with two messages with identical hash values. A well known and widely used cryptographic hash function is MD5.

Predicting the future Suppose somebody is able to predict future events. Then a commitment can be used to show this ability to the world. An electronic document is prepared that unambiguously describes the prediction; the commitment then consists of the cryptographic hash of the document. In order not to unduly influence the natural course of events, the document itself should not be revealed, but just the commitment should be published, in such a way that it cannot be changed afterwards. This can for instance be done in a newspaper or on a publicly accessible website. After the relevant events have taken place, the document is published as well. Everyone can then verify that it contains the correct prediction, and that it has the published hash value. This proves that at the time of publishing the hash, long before the events took place, the publisher already had the document available and knew the prediction.

Prediction of the 2008 US presidential election winner As a proof of concept of our abilities we publish on this website our commitment for our correct prediction of the outcome of the 2008 US presidential elections. We have prepared an electronic document that unambiguously describes our prediction. This document contains only one name of a candidate, namely of the winner. The commitment to this document, computed as the MD5 hash value of the document, is

3D515DEAD7AA16560ABA3E9DF05CBC80.

This commitment has been released on November 30, 2007. After the election, in November 2008, or at some later stage after the relevant parties have colluded and the world knows the designated winner of the election, we will release the document. Everybody can then check that our prediction was correct.

How did we do this? By making extensive use of the hidden powers of the Sony PlayStation 3.

Are we cheating? No, of course we are not cheating. We do have a document that contains our precise and correct prediction of the winner of the 2008 US presidential elections. That document does hash to the above value.

On the other hand Yes, of course we are cheating. But the statements in the previous paragraph are true, and we did indeed use a PlayStation 3. Below we explain in some detail what we did.

Hash functions should be collision resistant In order for a commitment to be of any value, the cryptographic hash function that is used must be collision resistant. That is, it must be practically impossible to find different messages that have the same hash value. Otherwise, a miscreant can use a single hash value to commit to more than a single message, and at the time the document is revealed show only the one that suits him best.

MD5 is not collision resistant The cryptographic hash function MD5 that we used was shown to be not collision resistant, by prof. Xiaoyun Wang and her co-authors, in 2004 (see the EuroCrypt 2005 paper "How to break MD5 and other hash functions"). Although devastating from a theoretical point of view, many practitioners believed that the practical repercussions of prof. Wang’s attack would be minimal because of the lack of useful structure of the collisions that could be constructed.

Meaningful messages For a hash collision to be useful in realistic attack scenarios, the different messages colliding to the same hash value must be meaningful. Preferably the messages are presented in a well known document format, such as Microsoft Word, PostScript, or Adobe PDF. Prof. Wang's original collisions were not meaningful messages but consisted of rather random looking strings of characters. Nevertheless, several authors have shown how any pair of random looking colliding messages can be used to create collisions between meaningful messages or documents. These constructions have the disadvantage that close inspection of either one of the meaningful colliding messages will reveal the existence of the other: given the original random collision and one of the meaningful messages, the meaningful message colliding with it can be derived. This is due to the fact that, when incorporating a collision into a pair of meaningful documents, prof. Wang's collisions required the documents to be exactly equal, except for the relatively few random looking characters comprising the collision.

Chosen-prefix collisions Last year we showed, with the kind assistance of prof. Wang, how her attack on MD5 can be improved to provide chosen-prefix collisions. Our chosen-prefix collisions have only the requirement that after the collision the documents should be exactly equal. Before the collision the two documents, for which a collision is to be found, can be anything: our chosen-prefix collision finding method will always produce a collision that can be incorporated into the two documents, irrespective of what data is present before the collision.

PDFs Our chosen-prefix collision finding method enables us to produce colliding documents in many different formats, and in such a way that it is not immediately obvious whether another meaningful message that collides with it is known to anyone. We chose, somewhat arbitrarily, the PDF format, having the advantage that its specifications are open, and the PDF format is in wide use in both the Windows and non-Windows communities.

The Nostradamus attack and multi-collisions In 2005 John Kelsey and Tadayoshi Kohno (see their EuroCrypt 2006 paper "Herding hash functions and the Nostradamus attack") described an attack, in which a miscreant uses hash collisions to produce one commitment for as many different messages as he wants. As an application they describe the idea of using hash values as commitments in predicting the future. They dubbed this attack the Nostradamus attack. It is based on the ability to make chosen-prefix collisions, and that is exactly what we can do. With our previously published method to construct chosen-prefix collisions one can construct pairs of meaningful documents that have the same hash value. The idea of Kelsey and Kohno can then be used to turn a number of carefully constructed chosen-prefix collisions into a multi-collision, consisting of any number of messages colliding into the same hash value. The idea of Kelsey and Kohno can be visualised in a diamond structure, see the picture below.

Our multi-collision
(revealing our secret)

We have prepared twelve different predictions, ten of which are shown in the table below.

A:	John Edwards.pdf	G:	Fred Thompson.pdf
B:	John McCain.pdf	H:	(hidden)
C:	Mitt Romney.pdf	I:	Paris Hilton.pdf
D:	Ralph Nader.pdf	J:	Al Gore.pdf
E:	(hidden)	K:	Jeb Bush.pdf
F:	Barack Obama.pdf	L:	Oprah Winfrey.pdf

All twelve documents we prepared, the ten given above and two hidden ones, have the MD5 hash value

3D515DEAD7AA16560ABA3E9DF05CBC80.

The documents were first carefully prepared as valid PDF documents, with a hidden image object incorporated, containing a sufficient amount of random bits. Then, according to the diamond structure shown above, eleven chosen-prefix collisions were computed, and placed inside the hidden image objects at precisely the proper spots. In this way the twelve documents were turned into an MD5 multi-collision.

The two hidden documents Two of the twelve documents that we prepared are not listed above. It should be obvious which two names are missing, given that we want to be able to show the correct prediction with the right hash, once the elections are over. But, given that the missing documents have to fit at the spots for ‘E’ and ‘H’ in the diamond structure, do you know which name goes where, better than by guessing? And, much more challenging, can you find two PDF files with the proper contents and the above MD5 hash? We have such files, but wonder if others can find them too, based on the available side-information.

Some comments

As anyone can inspect for the ten documents that are given, the name of each document refers to the unambiguous candidate name specified in the document itself as the winner of the elections. Once the elections are over and we know the designated winner, we simply could have published the right document (with a good chance that we would have to reveal part of the solution to the challenge posed above), and thrown away all the others. There's a tiny chance that the actual winner will not be on the above list of twelve. That is a risk we accept; we apologize to those serious candidates who have not made it to our list.
Clearly (and assuming you believe that we have the two missing PDFs), we were not cheating. We have a document that has the announced hash value and that contains the correct prediction.
On the other hand, clearly, we were cheating indeed. We were on purpose imprudent in our choice of cryptographic hash function. Given the recent insights into the weaknesses of MD5, the bottomline of our work is:

MD5 should no longer be used as a fingerprint function for electronic documents.

By now, everyone should be aware of this. If you are still using MD5, it would be good to carefully analyse the risks that result from its continued usage.
It should be obvious that we cannot predict the future, sorry. We only know how to cheat by using ill-chosen commitments to predict the future. We don't know who will be the next US president. Some of us don't even care that much, as (s)he is not going to be our president. We wish all candidates - listed or not - good luck, and may the best one win.

Verification

You are invited to verify for yourself the following four essential points:

-	the documents all completely comply with the PDF standard;
-	each of the documents contains (also at bit-level) only one candidate name, in other words: from inspection of one document alone, even at bit-level, it cannot be concluded that it has eleven siblings;
-	the documents are all different; this can be shown by inspection in a PDF viewer, or by comparing files at byte level, or by computing the SHA-1 hash values (this should not be interpreted as support for continued usage of SHA-1);
-	the documents all have the same MD5 hash, namely the one shown above.

See the output of fciv - Microsoft's File Checksum Integrity Verifier utility.

Pretty pictures We have some pretty pictures showing the differences between the internal states of the MD5 hash function as they evolve during the hash computations.

Extendability Originally we had a multi-collision consisting of just eight documents: A to H. When the 2007 Nobel peace price was announced we decided that it would be prudent to add another four documents, extending the existing collisions, without having to throw away anything of the work done to compute the first multi-collision of eight documents. From the diamond structure above it may be clear that an 8-collision requires seven chosen-prefix collisions, and that extending it to our present 12-collision only four chosen-prefix collisions had to be added.

How does the Sony PlayStation 3 come into play? Our chosen-prefix collision finding method is based upon the ability to extend two messages with a block such that the next internal state of MD5 shows strictly fewer bit differences than the previous one. By repetition we will eventually find two extended messages that will have no bit differences left between the two last internal states of MD5, hence they will have the same MD5 hash value. In the pretty pictures this process of annihilating all bit differences in a number of steps is shown.
However, we are only able to eliminate specific bit differences in this manner. Therefore our chosen-prefix collision finding method requires that we first extend the two messages such that the differences between the two last internal states have the specific form that we can eliminate. This first extension can be achieved using a birthday attack, which is computationally the most expensive part of our method. In our previous work on chosen prefix collisions we applied this birthdaying technique in a less advanced way, for which we needed thousands of PCs running for months under the BOINC distributed computing framework. Now we have a more advanced technique, which on a single PC still can take several days. On the PlayStation 3, with its multiple computing cores and each core computing four instantiations of MD5 in SIMD mode, the advanced birthday attack takes just a couple of hours. Essentially, a single PlayStation 3 performs like a cluster of 30 PCs at the price of only one. In our experience, one chosen-prefix collision can be constructed well within 2 days of computation time, using one PlayStation 3 and a quadcore PC.

Didn't Daum and Lucks do something like this in 2005? Magnus Daum and Stefan Lucks prepared a pair of PostScript documents colliding under MD5 that show completely different contents on screen (also see the paper "A Note on Practical Value of Single Hash Collisions for Special File Formats" by Max Gebhardt, Georg Illies and Werner Schindler). These attacks are related to ours, but they are less flexible, as they are based on the original random collision finding method of prof. Wang. Among others this implies that they rely on the macro-functionality of "data formats" such as MS Word or PostScript or PDF. Furthermore, each of the two colliding documents contains two complete content blocks (of which only one is shown on screen), which implies that the fraud will become apparent on bit-level inspection of one document only. Also these documents do not allow multi-collisions, and are certainly not extendible. In contrast, our twelve PDFs do not use any macro-functionality of the PDF format, and each file contains only one candidate name. Although our challenge related to the hidden PDF documents may become easier by peeking at the bits in the other files (and by studying the details of the MD5 collision construction), there is no indication of the two missing names in any of the other ten PDFs. If we had chosen a data format that does not allow hidden objects, we could have hidden the collisions in e.g. tiny bitmap images, or parts of images, viz. the barcode images shown on the chosen-prefix collision webpage.

Digitally signing the colliding documents Our twelve colliding documents all have the same hash value. Applying a digital signature method to one of them, by performing a private key operation to the hash value, yields a digital signature that will be equally valid for each of the twelve documents. We here present a digital signature made (using the proper PKCS#1 padding on the MD5 hash value) by the private key corresponding to Marc Stevens' colliding certificate:

0EAC4046 ED283A74 41A9BFFD 6C1289CC 5C58EB38 8F5C1379 4872BABA 2C1DC1FD 6CD247CC E9B76B18 2A278851 DA4C40C5 4F2F07A6 92AF5406 0DAA840E F7331184 2D15903A 156ED4CA CE7D5456 D9454F66 9CE3B30C F5986351 41E97575 F3C73367 D0FD1F6C 5E5D7C68 7AF32B29 0087FDC1 1366722F FF36466F 36DF6716 3819B566 420AA307 5CC75E80 F2D1EBB1 6AAD0971 B51A1F05 CE993A03 218992A0 730E4405 1548263B 8FD5D3DD 7FED3EEA 5B1088FC FC0AF6E1 5EFBC149 3B0BB3BF F155BC1F 5F24BE50 60EED11A 0F621B74 EF10A4FF 483A6774 66767183 B56D060E A9BEEEBA 3EA72F4B F1879119 BA72F770 007B8050 28FD1BF3 805CFB16 5EF0CEBE D979E35D BA90E198 73451B38 F3C46FE8 AD515D28 B0241193 AE0ACD34 ECB6DB3E 3C7A37DB 38F07E3A 69108521 9B6ABABE E69E8D13 4EBE1B16 5BFB98D2 8C092064 EE586B90 8718B6FA 193E4551 CCFAE2D9 74896A36 B51BE131 B8280E87 EEF0F637 ACEAB970 E3B74AC9 81BB2ED5 B57FEBEC 1F753EB8 9F5BD3D4 3CD3BEBA A8EA2256 416385A3 F5CC8039 6F1C71BB C8A5CABA 3103D41A 87FE7486 2F4034B3 FE44FBBC 685C020F 5CC25722 16321BFD E4C263B8 425ABAE3 91B012E9 E4E4A9AF 7A973B76 3088C0E6 07BF49CA EC7493A0 B6602B31 2F205C4D 9CEABDA4 5E033736 B48BB1CC 242D20F0 F04E3C01 34663D81 98B22D38 3AEEB48D 036EC3AD 271617E5 C6625FEF 159CBA5A 6FE576E0 98AA4DEE E480FFC1 DB8E773D 62587FD1 3A487E77 CCAB610B 8D2C148E 5050279C 81F0811B D490FCEB DD1E0C02 EC9BBE35 D7EDDB30 14B2AC8F 6448B503 FCB2DC61 1ABB7C35 9FCB66C4 369D4025 A7524819 FA8DC40C 4E7EFD89 BEFC32E3 28E0E757 707AB105 47F84995 85D34F83 2460A147 84A93AF7 F28D0ECB D2C74E5E 992A0E31 DA3438B0 3162A215 79587DE0 4A0E9B87 F96A5C0F EDD76AF0 FEDDD2FB 6478E594 93CBDF3A D94139F2 F61F1D13 D744E1B6 4E92A7E5 1E570294 3B0DFBF8 CED66CFE 0B113D1A 4AC2EEEF 52707EA6 8F71B936 6C7C32E2 E3F3B6A4 6C8A3FED 490DDD8C 17E75CAB 00AECC2A 69E0D06A 4C9E1CEC 48403AB1 958A5AE1 E0947135 99AA1DD5 6954C139 7E470E49 6B35333D 7B3AE797 6C812724 BC552251 45CE01A1 CD20A5DD BC71853A D610AB47 70D80F08 748F0C4A D9891B79 84430F0A 824E0AEA 5CB34E9E 0F8071E8 44237DB1 6F280C3A C940FD45 6E4203D4 AB803C75 33777F96 0CFE6D4A 78E9D5C0 467112E5 22877FB8 E359C882 52362A1E 2F2D2CD4 C7CD184A C1C0C26E 743A7B87 64D93E1D 735420FF A9EC6E1F 5747A773 A61DEDA1 BE79D988 42E875BB 16621D5D B454D1E1 25376E83 6FB36189 76AFD3F1 71A08898 36926338 6DE9358B 56482471 3CA137C8 7D40CABC 88CF6332 C9C9A943 17647F38 50EA05A7 62863536 CBC50F34 F3087BA2 1BCA1222 A778C13B 290E3F68 C1989B7F 65459754

We invite you to verify this signature against the ten PDF documents and the certificate, with your favourite public key cryptography software. We here make available an OpenSSL recipe and the required downloadables: the binary signature, the certificate, and the public key file. The certificate itself can be validated by the CA certificate, which is available for download as well.

What other attacks are possible with this technique? The first chosen-prefix collision was built in colliding X.509 certificates with different identities. A complete description can be found in our EuroCrypt 2007 paper "Chosen-prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". In this paper some other abuse scenarios are described as well. We also implemented a chosen-prefix collision in a pair of executables, as a proof of concept of the vulnerability of software integrity and code signing applications to attacks based on chosen-prefix collisions for MD5.

Will SHA-1 undergo the same fate as MD5? Eventually, most probably it will, though attacking SHA-1 is still much more difficult than attacking MD5. See the IAIK Krypto collision website for the present state of affairs, and do donate your spare cycles to the IAIK SHA-1 collision search.

Paper The chosen-prefix collision construction was described in our, officially dubbed notable, EuroCrypt 2007 paper "Chosen-prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities". We are currently preparing an updated version of this paper, providing details of our improved chosen-prefix collision construction, the use of the Sony PlayStation 3 in our computations, and some more details of the multi-collision presented above. This new, full version will be submitted to the Journal of Cryptology.

Our real prediction 3D5 15 DEAD 7AA16560ABA3E9DF05CBC80.

Links

-	Our recent colliding executables attack, showing a vulnerability in software integrity checksums and code signing.
-	The (finished) HashClash project, Marc Stevens' MSc project on fast and smart MD5 collision construction.
-	Our previous work on chosen prefix collisions.
-	Our webpage on colliding X.509 certificates for different identities.
-	Our webpage on Creating a rogue CA certificate.
-	Our old colliding certificates construction.

Authors
___ / _ \ | / | | | \_|/ \____ Marc Stevens, CWI, Amsterdam, The Netherlands
Arjen K. Lenstra, EPFL, Lausanne, Switzerland, and Bell Labs, Murray Hill, USA
Benne de Weger, TU/e, Eindhoven, The Netherlands

Please send all correspondence to Benne de Weger.

Acknowledgements We are grateful to Dag Arne Osvik at EPFL for his help with the Sony PlayStation 3.
A large part of this work was done while Marc was visiting Arjen at EPFL in August and September 2007.

Publication Date November 30, 2007

Latest Modification March 24, 2009

Vanity Counter