December 30, 2008
MD5 considered harmful today
Creating a rogue CA certificate
Alexander Sotirov, Marc Stevens,
[Dec. 31, 2008] Responses from Verisign (RapidSSL), Microsoft and Mozilla.|
[Jan. 2, 2009] Responses from TC TrustCenter and RSA, and a US-CERT Vulnerability Note.
[Jan. 8, 2009] Video and audio files of the 25C3 presentation are available from CCC.
[Jan. 15, 2009] Response from Cisco.
[Mar. 11, 2009] Our paper with details on improved MD5 chosen-prefix collision construction is available.
[Apr. 30, 2009] Our paper is accepted at Crypto 2009.
[June 2, 2009] A new single block chosen-prefix collision.
[June 3, 2009] Our Crypto 2009 paper has won the best paper award.
[June 16, 2009] Full paper now available: Marc Stevens, Arjen Lenstra and Benne de Weger, "Chosen-prefix Collisions for MD5 and Applications", submitted to the International Journal of Applied Cryptography.
[August 22, 2009] The Crypto paper and the best paper award.
[January 12, 2010] Our work made it to number 1 of the Top Ten Web Hacking Techniques of 2009.
[October 28, 2010] This site now enjoys translations into Korean and Belorussian, provided respectively by vangelis and movavi.
We have identified a vulnerability in the Internet Public Key Infrastructure (PKI) used to issue digital certificates for secure websites. As a proof of concept we executed a practical attack scenario and successfully created a rogue Certification Authority (CA) certificate trusted by all common web browsers. This certificate allows us to impersonate any website on the Internet, including banking and e-commerce sites secured using the HTTPS protocol.
Our attack takes advantage of a weakness in the MD5 cryptographic hash function that allows the construction of different messages with the same MD5 hash. This is known as an MD5 "collision". Previous work on MD5 collisions between 2004 and 2007 showed that the use of this hash function in digital signatures can lead to theoretical attack scenarios. Our current work proves that at least one attack scenario can be exploited in practice, thus exposing the security infrastructure of the web to realistic threats.
As a result of this successfull attack, we are currently in possession of a rogue Certification Authority certificate. This certificate will be accepted as valid and trusted by all common browsers, because it appears to be signed by one of the root CAs that browsers trust by default. In turn, any website certificate signed by our rogue CA will be trusted as well. If an unsuspecting user is a victim of a man-in-the-middle attack using such a certificate, they will be assured that the connection is secure through all common security indicators: a "https://" url in the address bar, a closed padlock and messages such as "This certificate is OK" if they chose to inspect the certificate.
This successful proof of concept shows that the certificate validation performed by browsers can be subverted and malicious attackers might be able to monitor or tamper with data sent to secure websites. Banking and e-commerce sites are particularly at risk because of the high value of the information secured with HTTPS on those sites. With a rogue CA certificate, attackers would be able to execute practically undetectable phishing attacks against such sites.
The infrastructure of Certification Authorities is meant to prevent exactly this type of attack. Our work shows that known weaknesses in the MD5 hash function can be exploited in realistic attack, due to the fact that even after years of warnings about the lack of security of MD5, some root CAs are still using this broken hash function.
The vulnerability we expose is not in the SSL protocol or the web servers and browsers that implement it, but in the Public Key Infrastructure. This infrastructure has applications in other areas than the web, but we have not investigated all other possible attack scenarios. So other attack scenarios beyond the web are conceivable, such as in the areas of code signing, e-mail security, and in other areas that use certificates for enabling digital signatures or public key encryption.
The rest of this document will explain our work and its implications in a fair amount of detail. In the interest of protecting the Internet against malicious attacks using our technique, we have omitted the critical details of our sophisticated and highly optimized method for computing MD5 collisions. A scientific paper [SSALMOW] about our method is available as of March 11, 2009, so that the affected Certification Authorities have had some time to remedy this vulnerability.
This work was presented on December 30, 2008 at the 25th Annual Chaos Communication Congress in Berlin.
The presentation was originally announced as "Making the theoretical possible. Attacking a critical piece of Internet infrastructure", but finally was entitled "MD5 considered harmful today: creating a rogue CA certificate".
[added Jan. 8, 2009] Video and audio files of the 25C3 presentation in different formats are available from CCC. Look for files with names like "25c3-3023-en-making_the_theoretical_possible.xyz".
Our demo at the 25c3 was based on "sslsniff" by Moxie Marlinspike. We modified it ever so slightly to be more verbose (and hopefully more visual as a demo), to randomize serial numbers (to fix an issue with Firefox) and to sign with a user selected hash function (this allowed for SHA-1 signatures in the leaf certificate). Jacob Appelbaum misspoke when he said that he had written this tool at the start; he correctly explained the tool's origin at the end of the video.
|1. Outline of the attack||
Our attack scenario basically is as follows. We request a legitimate website certificate from a commercial Certification Authority trusted by all common browsers. Since the request is legitimate, the CA signs our certificate and returns it to us. We have picked a CA that uses the MD5 hash function to generate the signature of the certificate, which is important because our certificate request has been crafted to result in an MD5 collision with a second certificate. This second certificate is not a website certificate, but an intermediary CA certificate that can be used to sign arbitrary other website certificates we want to issue. Since the MD5 hashes of both the legitimate and the rogue certificates are the same, the digital signature obtained from the commercial CA can simply be copied into our rogue CA certificate and it will remain valid.
And here is a similar diagram and description of how our attack scenario may be used to impersonate an existing website.
An effective and efficient countermeasure to remediate this vulnerability is to stop using MD5 for digital signatures. Acceptable alternatives to replace MD5, such as SHA-1, are widespread or, like SHA-2, at least becoming rapidly available. Fortunately most CAs have already taken such steps. It would be best to stop using MD5 for the creation of certificates altogether, as its continued use may lead to severe security problems. As a last resort, in the case that MD5 cannot be discontinued immediately, a CA can take simple alternative countermeasures such as randomizing the serial numbers of all newly issued certificates. We stress that such countermeasures are mere band-aids rather than fundamental solutions.
We also recommend to be similarly careful when using SHA-1 and to avoid its adoption in new products, as theoretically it has been shown that it suffers from similar problems as MD5. While the weaknesses in SHA-1 known so far do at this moment not allow the same type of attack scenarios we have shown to be possible for MD5, experts in the area believe that it is sufficiently likely that before long SHA-1 suffers the same fate as MD5.
|2. Certificate background|
|2.1. The need for certificates||
More and more the web is used for exchanging sensitive information. Think of internet banking websites, websites that require a login password, websites on which transactions with financial value can be performed, etc. Often the sensitivity of the information implies that it should not be exposed to unauthorized persons. The exchange of sensitive information requires that the user of a website wants assurance that the website he or she is visiting is not a rogue web site, but is the genuine website of the organisation he or she has to do business with. This is what certificates provide.
A certificate is a document that contains both an identity and a public key, binding them together by a digital signature. This digital signature is created by an organisation called a Certification Authority (CA). This organisation guarantees that upon creating the digital signature it has checked the identity of the public key owner (e.g. by verifying a passport) and that is has checked that this public key owner is in possession of the corresponding private key. Anybody in possession of the CA's public key can verify the CA's signature on the certificate. In this way the CA guarantees that the public key in the certificate belongs to the individual whose identity is in the same certificate.
|2.2. X.509 certificates||
The worldwide accepted standard for digital certificates is called X.509. An X.509 certificate contains the following parts
|2.3. Certificate hierarchy||
X.509 certificates fit into a hierarchy of CA and end user certificates. A signature on data can be verified by the signer's public key. This public key is linked to the owner's identity by a certificate. This link can be verified by verifying the certificate's signature, using the public key of the issuing CA. This CA public key can be found inside the CA certificate, one layer upwards in the hierarchy. This CA certificate will itself be signed by a CA another layer up. At the top of the hierarchy there is the "root certificate", that is "self signed", and that has to be trusted for its own sake.
Many of these self signed root certificates are installed in browsers or operating systems. Applications using them, such as web browsers, will trust them automatically, i.e. the applications will accept the certificates as authenticated and genuine without explicitly asking the user if it should do so. The user may be notified by small security signs such as a closed padlock symbol in the browser's frame. Every certificate that is in a hierarchy of which the root certificate is trusted, is also automatically trusted.
|3. Hash function background|
|3.1. Hash functions||
The cryptographic operation that uses a private key to sign data does not deal directly with the data itself, but with a purportedly unique representation of this data, that has a predetermined fixed length, is short and therefore convenient to work with. This can be compared to a fingerprint as a purportedly unique, short and convenient representation of a human being. The process of creating such short representations of data is called "hashing". Unfortunately, because of the fixed length of the hash, there must exist pairs of different inputs that yield the same hash value. Good hash functions, however, have the property that finding such pairs is extremely difficult, even though they are guaranteed to exist. This is precisely where MD5 has a crucial weakness.
A "hash function" is a cryptographic function that takes as its input a bit string of any length, performs a deterministic algorithm with this input, and produces as output a bit string of fixed length. This output is called the "hash value" or the "hash" of the input. Other terms that can be found for "hash value" are "fingerprint" or "message digest"
|3.2. Hash function requirements||
A hash function should satisfy the following requirements:
A cryptographic hash function is said to be "cryptographically strong" if no better methods are known than the above mentioned brute force type methods with the mentioned complexities.
|3.3. Hash function history||
In 1990 Rivest designed MD4, a hash function with hash length 128 bits. Collisions for MD4 were found in 1995. In 2005 a method for finding preimages was published.
In 1991 Rivest announced MD4's successor, MD5, an improved design, also with hash length 128. In 1993 some weaknesses in its design were pointed out, but it took until 2004 before collisions were found. Producing collisions is nowadays a matter of seconds on a PC. The more advanced type of collisions used in our method can be produced in a matter of hours on special but commodity hardware.
In 1995 NIST proposed SHA-1, a hash function with hash length 160 bits. In 2005 weaknesses in its design were found, showing that finding collisions takes essentially fewer computations than the brute force method of complexity 280, but so far collisions have not been published. NIST advises to replace SHA-1 by 2010.
From their introduction until the present day, the hash functions MD5 and SHA-1 have been the work horses of many cryptographic systems. The replacement of MD5 in common applications is far from completed. The replacement of SHA-1 has not really taken off yet.
Even if SHA-1 would have lived up to its design objectives, its output length of 160 is too small to justify its prolonged use for more than the short term. NIST recognized this at an early stage, and came in 2001 with the new SHA-2 family of hash functions. So far these have withstood all cryptanalysis. Nevertheless NIST saw the need for mobilizing the cryptographic community to get a deeper understanding of hash function design and to come up with better hash functions for the next 10 years. Therefore it has started an open competition for selecting the successor of SHA-2, dubbed for the moment SHA-3. The winner of this competition is expected to be selected by 2012, and will most probably become the de facto hashing standard for the next decade.
|3.4. The Design of MD5||
MD5 uses the Merkle-Damgård iterative construction. The input bit string is padded to a multiple of 512 bits. The length of the unpadded bit string is incorporated in this padding. Then the padded input bit string is divided into blocks of 512 bits each, hereafter called "input blocks".
The core of MD5 is a compression function. The input blocks are fed to MD5 in
successive calls to the compression function, that uses each input block in
updating a state of 128 bits. This state is called
So if we denote the input blocks by
|3.5. Collisions for MD5||
In 2004 Xiaoyun Wang and Hongbo Yu presented a collision for MD5 consisting of 2
input blocks, neglecting padding. Details of the collision construction method were published at
EuroCrypt 2005 in [WY]. Their method works for any value of the initial
In other words, their method produces on input of any 128 bit
Due to the iterative structure of MD5 and to the fact that
In his 2007 MSc thesis [S] (see also [SLW]), Marc Stevens extended this collision construction method
to so called "chosen-prefix collisions". This means that collisions can be found
between different arbitrary initial
Thus, due to the iterative structure of MD5, the chosen-prefix collision construction method
is able to produce on input of any pair of chosen prefixes
To be more precise, for given
The chosen-prefix collision construction method is feasible in practice, but of essentially
higher complexity than the method using identical arbitrary initial
|4. Colliding certificates history|
The main problem in trying to abuse the type of collisions for MD5 that can be
produced with the available methods is that one does not have sufficient control
over the input blocks that are calculated to let the
For a number of different file types, applications of collisions and chosen-prefix collisions have been described in the literature solving the problem of meaningfulness in various ways, see e.g. "The story of Alice and her Boss". We now concentrate on the applications to certificates.
|4.2. Different public keys||
In 2005 Arjen Lenstra, Benne de Weger and Xiaoyun Wang [LW] showed how
collisions of the identical initial
|4.3. Different identities||
In 2007, due to the work of Marc Stevens [S], chosen-prefix collisions for MD5 had become available. This meant that it had become possible to choose any pair of prefixes in the certificates before the RSA moduli, and get certificates with different chosen identities and different public keys (hiding the collision blocks), and with identical MD5 hash values of the "to-be-signed" parts, thus with identical CA signatures. For an example, see "Colliding X.509 Certificates for Different Identities". This is already a much more interesting application. However, a number of limitations complicate this attack when applied to real Certification Authorities.
The attacker has to choose the entire prefix before the collision generation process can begin. This includes fields that the attacker does not directly control, such as the validity period and serial number of the signed certificate. Succesfully predicting those fields requires significant control over the CA's operational procedures, which was thought to be hard to achieve. In addition, the colliding certificates in "Colliding X.509 Certificates for Different Identities" used 8192 bit RSA keys, which are not accepted by all Certification Authorities.
Nevertheless, both the collision construction method of 2005 and the much better chosen-prefix collision construction method of 2007 on certificates based on MD5 were a firm warning to the security community that MD5 should no longer be used for digital signatures in certificates. As attacks only get better with time, it should not come as a surprise that in 2008 we were able to improve upon the previous work on colliding certificates and obtain a rogue CA certificate signed by a commercial Certification Authority.
|5. Attack details|
In July 2008 we set out to investigate if the colliding certificates attack could be applied to a real Certification Authority. The first step was to identify the CAs that still used MD5. Unfortunately it is not possible to determine the hash function a CA uses from the CA certificate. We had to look at (website) certificates issued by the CAs instead. Over the course of a week we spidered the web and collected more than 100,000 SSL certificates, of which about 30,000 were signed by CAs trusted by Firefox. There were six CAs that had issued certificates signed with MD5 in 2008:
Out of the 30,000 certificates we collected, about 9,000 were signed using MD5, and 97% of those were issued by RapidSSL.
It was quite surprising that so many CAs are still using MD5, considering that MD5 has been known to be insecure since the first collisions were presented in 2004. Since these CAs had ignored all previous warnings by the cryptographic community, we felt that it would be appropriate to attempt a practical attack to demonstrate the risk they present to everybody using a web browser that includes their root CA certificates.
We set out to obtain a legitimate website certificate from this CA for which we would be able to predict the entire contents of the "to-be-signed" part. This certificate would contain a specially crafted collision block in the RSA modulus, which would result in a MD5 collision with a second specially crafted certificate. An ignorant observer, no matter how carefully the certificate is inspected, would most likely fail to notice anything suspicious about it.
Once we have obtained a valid signature from this CA, we would copy it into our second certificate. Since the "to-be-signed" parts of both certificates have the same MD5 hash, the signature would be valid for the second certificate. Thus the real CA, without knowing it, would provide us with a valid signature for a certificate it has never seen, and should never have signed.
The potential of this attack scenario is even greater than just obtaining a rogue certificate for a single secure website. This is because our rogue certificate doesn't have to be a website certificate, but it could be an intermediary CA certificate. Although the certificate originally signed by the real CA has in the "basic constraints" field the flag "CA = FALSE", indicating that this certificate cannot be used to validate other certificates in a certificate chain, our rogue certificate has the same flag set to "CA = TRUE". We are in possession of the private key corresponding to the public key in this rogue CA certificate. As a result we are able to issue any number of certificates to anybody we choose, and they will be recognized as valid certificates by anybody trusting the real CA, which is all Internet users using one of the common web browsers.
|5.2. Validity period and serial number prediction||
The first big challenge was to figure out a way to predict the validity period and serial numbers of the certificates issued by a Certification Authority, since these are the only two fields in the chosen prefix that the attacker does not control directly.
Predicting the validity period of the certificate turned out to be an easy task. The system used by RapidSSL and FreeSSL for issuing certificates is fully automated and each certificate is issued exactly 6 seconds after the user clicks on the final "Ok" button to complete the purchase. Since the timestamps in the certificate validity period field use granularity in seconds, we could easily click this button at an arbitrary time T-6 seconds and get a certificate with a validity from time T to T+(1 year). With full control over the validity period, we move on to solving the serial number prediction problem.
We estimated that it would take us 3 days to generate our pair of colliding certificates, which meant that we needed to predict the serial number of the certificate three days in advance. Most of the certificates in our data set had serial numbers that looked random and thus hard to predict, but we noticed that RapidSSL and FreeSSL were using sequential serial numbers.
One can think of the sequential serial number as a remote counter. Each certificate we purchase from the CA reveals the current value of the counter and increments it by one. During the normal operation of the Certification Authority, the counter is incremented based on the number of the issued certificates. Statistical analysis of the serial numbers from the 9,251 certificates in our RapidSSL data set revealed that on average this number has a pretty low variance. In a three day period from Thursday night to Sunday night, the CA typically issues between 800 and 1000 certificates:
The anomaly with the last data point in the graph above is a result of the 100 certificates we purchased during one particular weekend at the end of September. Without our involvement, the increment during that weekend would have been a little less than 1000.
With this information, we decided on the following plan: On Thursday night, we would purchase a certificate to get the current value of the serial number S. We would predict that on Sunday night (time T) the serial number would be a little less than S+1000. A few hours before time T, we would start purchasing certificates to push up the serial number closer and closer to the target number, finally getting it to S+999 about 30 seconds before time T. If we were lucky and the CA received no other certificate request during those last 30 seconds, we would be able to send a request at time T and get our predicted serial number.
|5.3. Constructing colliding certificates||
In this section a detailed description is given of the contents of the two certificates with identical signatures that we now have. The certificate we obtained from the commercial CA will hereafter be called the "real certificate". Its sibling certificate is our "rogue CA certificate", and it is a sibling in the sense that the "to-be-signed" parts of these certificates were constructed together, to constitute a collision for the MD5 hash function.
The real certificate was requested from the CA by sending it a "Certificate Signing Request" (CSR). The initial engineering task we had to do is to construct the contents of the CSR and the structure and contents of the rogue CA certificate in such a way that the resulting real certificate (constructed by the CA) and the rogue CA certificate (constructed by us) will be perfectly aligned. The second, and much more intricate, engineering task was to predict the contents of various fields in the real certificates, and based on this, construct the contents of various other fields in both certificates, in such a way that we would be able to obtain an MD5 collision.
Complying with the X.509 standard [HPFS], each of the two certificates consists of:
|5.3.1. The real certificate and the CSR||
The structure of the "to-be-signed" part of the real certificate is prescribed by the CA. We have control over contents and length of a few fields only. The structure is as follows:
To obtain this certificate we constructed a Certificate Signing Request (CSR), according to the required PKCS#10 format. This request contains the following data:
The requirements for the construction of the key pair for the real certificate were as follows:
Then we had three "near collision blocks" of 512 bits (i.e. 64 bytes) each. The entire "collision block" thus consists of 96 + 3 x 512 = 1632 bits, i.e. 204 bytes.
At the end of the last near collision block the MD5 collision is established. The collision block was computed by the collision finding method described below. That means that we do not have much control over its contents, we simply have to accept the random looking data that comes out of the cluster of game consoles.
Then we have 256 - 26 - 204 = 26 bytes (= 208 bits) left in the modulus. We used them to make sure that the modulus will have the form of a product of two primes, known to us, in such a way that we can also compute the corresponding private exponent. This is done as follows.
We start with a fixed bit string
This method seems coarse, but we don't know how to do it better, and it works well in
practice. The reason is as follows. The value of
The resulting RSA modulus thus is rather unbalanced: its prime factors have 1824 and 224 bits respectively. This is estimated to be just out of reach of current factoring methods, in particular ECM.
Note that the private key corresponding to the real certificate is used only for
signing the CSR, and that the real certificate itself is used only for providing
us with a valid signature. This certificate and corresponding key pair will
never be used in some real application, though such use would in principle not
harm anybody. An interesting consequence of this is that the creation of a
secure RSA modulus is not necessary at all, we could have taken any number for
An interesting point to make is that the "path length" field in the "basic constraints" field of the CA's root certificate is not set. The path length field is the maximum number of intermediate CA levels in the CA hierarchy that this CA will allow between its own level and the end user level. If the CA would have set the path length to 0, then applications checking for the path length would have rejected certificates signed by our rogue CA, as there the path length is 1: our rogue CA is at the level between the end user certificate and the root CA certificate.
For the collision construction it is absolutely necessary that we are able to predict at bit level the entire contents of the "to-be-signed" part of the certificate up to the point where the collision block starts. The only fields that are not either constant, or directly controlled by us are the validity period and the serial number. However, using the techniques described in section 5.2, we were able to predict the values of these two fields.
|5.3.2. The rogue CA certificate||
The structure of the "to-be-signed" part of the rogue CA certificate is chosen by us (but of course within the limits of the X.509 standard), to facilitate the collision construction process. Here the only field prescribed by the CA is the issuer Distinguished Name. The structure of the rogue CA certificate is as follows:
The field that makes this certificate a "CA certificate" is the "basic constraints" fields, with the value "CA = TRUE".
The RSA key pair, of which the public key is inside the rogue CA certificate, was generated by OpenSSL in a standard key pair generation process. The private key is kept in a safe place and we're not planning on releasing it publicly. This private key is required for issuing end user certificates signed by the rogue CA, so this is quite sensitive.
To minimize this danger, the validity period of the rogue CA certificate is set to a relatively short period in the year 2004 (the month August 2004, when Xiaoyun Wang presented the first MD5 collision at the Crypto Conference in Santa Barbara). This backdating is done on purpose. It means that even if the private key to our rogue CA certificate falls into the wrong hands, the certificate would be rejected by all common browsers because of its expiration date, except when the system clock of the user's computer would be set back to within this short period in 2004. The probability that this will happen unintentionally is negligible.
Note that the serial number we have chosen is a rather low one. We do not know whether an actual certificate issued by this CA with this serial number was already in existence. We took a low number because that most probably implies that a possible other certificate with that serial number has already expired a long time ago.
The bytes 0 - 473 in the real certificate (the fields up to the modulus, and the first 5 bytes of the modulus field which are a predictable header) are pretty much fixed by CA requirements. Those 474 bytes form the "chosen prefix" on the real certificate's side. For this certificate we chose to have a 2048 bit RSA key. The main reason for this size is the fact that we have to hide the collision block in there. Our collision construction method enables us to make collision blocks of 1632 bits, so 2048 seems a reasonable choice. Moreover 2048 bit RSA moduli are quite common, so no suspicion is raised.
At the side of the rogue certificate we could not use the public key modulus for hiding the collision block. The reason for this is that we want to set the CA flag in the "basic constraints" field to the value "TRUE", contrary to the corresponding field in the real certificate. The problem is that this "basic constraints" field occurs after the public key, and with the type of collisions we have, we cannot allow any bit difference after the collision block. So in the rogue CA certificate, both the public key and the extensions up to the "basic constraints" field have to be placed before the collision block starts.
To accomplish this, it was helpful that the subject Distinguished Name in the real certificate has considerable length, that can easily be stretched by choosing an appropriately sized Common Name. At the same time we took the subject Distinguished Name in the rogue CA certificate as short as possible, to allow sufficient space for a full 1024 bit public key and the "basic constraints" field, all this to be placed in the area corresponding to the subject Distinguished Name field of the real certificate. In this way all necessary fields belonging to the "chosen prefix" of the rogue CA certificate could be placed in the first 477 bytes of the "to-be-signed" part.
The next problem that had to be solved is that we need a field in the extensions of the rogue CA certificate in which we have to hide at least 3 × 512 + 96 = 1632 bits (204 bytes) of the collision block. This is random looking data, over which we do not have much control, as this data comes out of the collision construction method, and we do not control that enough to be able to make such a block with some real world interpretation. And we have one more problem, as all data after the public key from the real certificate needs to be copied bitwise into the rogue CA certificate. This data does have an interpretation relevant to the real certificate, but this interpretation most probably is not relevant, even disturbing, for the rogue CA certificate. Together those fields have a considerable size, namely 427 bytes.
These two problems together ask for a place in the rogue CA certificate to hide 427 bytes of data in, some of which are random looking, and some of which have an interpretation that should not be shown in certificate viewers. The solution we adopted is to define a so called "Netscape Comment" block. This is a proprietary extension in which any data can be stored, which will be ignored by most certificate processing applications, including the major browsers. There is a small problem here in that formally the contents of this field must be of type IA5String, and our content is not of that form. An ASN.1 parser that strictly follows the standard will complain about this, as happens e.g. with Peter Gutmann's program "dumpasn1". But as most application software ignores the extension anyway, the certificate with this standard violating field in it will still be accepted in practice. It is conceivable that the 427 bytes could have been hidden in a different extension field (there is even an X.509 certificate with a movie hidden inside).
The Netscape Comment extension requires a header of 23 bytes. So the contents of the extension will start at byte 500, and that now can become exactly the spot where the birthday bits start, in both certificates. This explains the fact that in the real certificate, where the modulus starts at byte 474, there are first 26 random bytes to fill up the space. In a technical sense the "chosen prefixes" of the two certificates end at byte 499. The collision block starts at byte 500. This collision block consists, as said before, of 96 + 3 × 512 = 1632 bits = 204 bytes, so it ends at byte 703.
So from byte 500 until byte 926 there are 427 bytes inside the rogue CA certificate that are quite meaningless to the certificate itself. We call this big, seemingly unnecessary block of data a "tumor". From our point of view it's a benign one.
We start the modulus of the public key of the real certificate at byte 474. As
explained above, for alignment with the rogue CA certificate we need to fill up
the first 26 bytes somehow, so we just chose 26 random bytes to start the
modulus with. We have to think in terms of blocks of 64 bytes, as MD5 splits up
the input in 512 bit blocks. So we now have 12 bytes left until the next 512 bit
block starts with a new
So at byte 500 of the "to-be-signed" part we have reached the stage where in the real certificate we are inside the modulus, and in the rogue CA certificate we are inside the tumor.
The MD5 collision construction method we used consists of two stages: a
"birthdaying" stage and a "near collision" stage. The birthdaying stage is meant
to prepare the next difference in
Then there are three MD5 input blocks of 512 bits each, i.e. 3 × 64 bytes,
byte numbers 512 to 703. Each of the three input blocks was constructed by our
collision finding method to cancel out certain differences in the
In the real certificate we now have 26 + 12 + 3 × 64 = 230 bytes of the modulus. The remaining 26 bytes of the modulus were computed to ensure that it's a secure RSA modulus, as explained above.
Then in the real certificate the public exponent and all "version 3 extensions" follow. In total this required 197 bytes. They were generated in the real certificate by the CA, and then simply copied into the tumor in the rogue CA certificate. One of the fields in there is the "basic constraints" field, set to "CA = FALSE". Interestingly enough, the bytes of this field are also present in the tumor, but the certificate processing software will not recognize them as having any interpretation attached to them, as they are part of the tumor.
After the extensions in the real certificate and the tumor in the rogue CA certificate have ended, at byte 926, the "to-be-signed" part of the certificates are finished, and the MD5 hash function can be called on these 926 byte long strings to produce the colliding value.
What finally follows inside the certificate are the "signature algorithm" field and the "signature" field. Naturally, those fields are identical in both the certificates.
For download: binary files containing both "to-be-signed" parts:
To check that the two "to-be-signed" parts collide, run an MD5 program on the binary files. To check that the two "to-be-signed" parts differ, run e.g. a SHA-1 program on the binary files.
Here are the MD5 Intermediate Hash Values:
Here are the XOR differences of the MD5 Intermediate Hash Values:
The SHA-1 hash values respectively are
The differences in the MD5
Each line represents the 128 bits of one
|5.3.4. Collision construction||
The chosen-prefix collision construction method is basically described in our
EuroCrypt 2007 paper [SLW]. However, some crucial improvements
to this method have been developed that made the present application possible.
Details of those improvements have been made available in an
academic paper [SSALMOW]. The basic
idea of the method is twofold. In the second part of the method differential paths
are constructed that are capable of eliminating special bit differences in the
The first part of the method prepares the
To illustrate the collision construction we have made some nice pictures of bit differences in the internal states of MD5.
The entire inputs for MD5 consist of pairs of 15 input blocks of 512 bits each.
In each compression function call the input
In the pictures above and to the right the differences between
A single attempt for constructing a chosen-prefix collision costs about a little more than a day.
The first stage consisting of the
birthday search is computationally the most expensive. Luckily
it is also very suited for the special SPU cores of the Cell
Processor that the Sony PlayStation 3 uses. We had more than 200 PS3s
at our disposal, located at the "PlayStation
Lab" of Arjen Lenstra's Laboratory for Cryptologic Algorithms at EPFL, Lausanne, Switzerland
(see the picture; acquisition of this cluster was sponsored by EPFL DIT and by a matching equipment grant
from the Swiss National Science Foundation).
The birthdaying takes about 18 hours on the 200 PS3s using 30GB of
memory that was equally divided over the PS3s.
The second stage computes the 3 collision blocks that
|5.4. Real life execution||
For one single attempt to obtain a CA signature we had to guess the serial number and the validity period the CA was going to use a few days in advance. This information is crucial to get the chosen prefix for the real certificate's "to-be-signed" part, and this is input for the collision construction method. Once a collision has been found, the public key modulus has to be computed. Then the Certificate Signing Request could be composed and sent to the CA at the precise time, and then we had to wait for the CA to return a signed certificate, and see if the serial number and validity period were guessed correctly.
We performed our attemps over the weekend to take advantage of the lower load of the CA. Each weekend we would generate from one to three collisions with serial numbers within the predicted range and purchase enough certificates to increment the serial number up to the desired value. Unfortunately, the first three tries failed due to timing issues.
We received a couple of certificates signed only a second too late and a number of times requests by other customers of the CA interfered with our attempts to get the right serial number. Finally, on the fourth weekend we succeeded in getting a certificate with the right serial number signed at the right time.
Each certificate we purchased from RapidSSL cost us USD 45. However, the CA allowed us to reissue each certificate up to 20 times for free, which meant that a single certificate request cost us only USD 2.25. In total, we spent USD 657 on purchasing certificates for this project.
The results of executing our scenario are two certificates. We make them available for download in the common .cer and .pem formats. Moreover we give the output of the "dumpasn1" program applied to the certificates, to presents the certificate structure in a more readable form.
The private key for the certificate is not available for download, but this should not prevent anybody from verifying the authenticity of our certificate.
To see our rogue CA certificate in action, set your computer's system clock to August 2004, and then click https://i.broke.the.internet.and.all.i.got.was.this.t-shirt.phreedom.org/. Do not forget to reset your system clock afterwards.
Here you can download in different formats the example website certificate signed by our rogue CA:
A short summary of our result is that we have come in possession of a "rogue" Certification Authority whose certificate will be accepted by default by most browsers. Thus we are able to issue SSL certificates to any website we like, including rogue websites claiming to be legitimate ones.
This has been possible by exploiting the following weaknesses:
Any website, whether it is secure (i.e. uses SSL) or not, whether it has an MD5-based, SHA-1-based, SHA-256-based, or any other type of certificate, irrespective of which Certification Authority issued the certificate, can be impersonated, in particular not only genuine websites that have an MD5-based certificate are vulnerable.
The rogue Certification Authority we have has been set up for demonstration purposes only. Its certificate has on purpose been backdated, so that it will be recognized by browsers as having expired more that 4 years ago. Moreover the corresponding private key, needed to issue certificates with, will be kept in a safe place to prevent abuse. Thus it is unlikely that with this particular rogue Certification Authority harm can be done.
However, we have shown that it is possible in practice, with some non-negligible but doable effort, to set up such a rogue Certification Authority. If we can do it now, others will be capable of doing it as well in the foreseeable future. In theory it is even conceivable that other groups or individuals already have developed similar skills and knowledge as we have, and are already able to impersonate websites at their will. We have no indication that this is actually the case. Because of this we believe it is important to stop using MD5 as soon as possible.
In combination with known weaknesses in the Domain Name System (DNS) protocol such as Kaminsky's "DNS Flaw" [K2] (see also [OMM]), the vulnerability we exposed opens the door to virtually undetectable phishing attacks. Without being aware of it, users can be redirected to malicious sites that appear exactly the same as the trusted banking or e-commerce websites they believe to be visiting. User passwords and other private data can fall into wrong hands.
We do not want to help persons with criminal intent. Therefore we will for the time being not release the full details of how we have been able to obtain the rogue Certification Authority certificate. It should however be noted that the basic principles of constructing "chosen-prefix collisions" for MD5 were published already in May 2007, see [SLW], and the "Chosen-prefix collisions" website. To make our present results possible these publicly known techniques have been improved at crucial points. These improvements are published in an academic paper [SSALMOW]. Apart from that, somebody who wants to redo our work has to do considerable implementation and optimization efforts. For the time being we do not plan to release such implementation details.
Other applications than secure web communication using SSL might be vulnerable as well. Every Certification Authority that will honor requests for MD5-based certificates and that has sufficiently predictable serial numbers and validity periods, may be vulnerable to similar attacks. This may include Certification Authorities in the areas of e-mail signing and encryption, software signing, non-repudiation services, etc.
We feel safe to say that the impact of our work is very limited in the sense that we think it is highly unlikely that our rogue Certification Authority can be abused by anyone (including ourselves!). We have released sufficient detail of our methods to show that we have a realistic attack scenario. The proof of the pudding is our colliding certificates, available for download on this website, so that everyone can verify our claims, and in the demonstration we have provided. Our intent is to raise awareness that MD5 is broken so drastically that its continued use in digital signature schemes and certificates poses realistic threats.
Our desired impact is that Certification Authorities will stop using MD5 in issuing new certificates. We also hope that use of MD5 in other applications will be reconsidered as well. Proper alternatives are readily available.
|6.1. Revocation issues||
One interesting observation from our work is that the rogue certificate we have created is very hard to revoke using the revocation mechanism available in common browsers. There are two protocols for certificate revocation, CRL and OSCP. Until Firefox 3 and IE 7, certificate revocation was disabled by default. Even in the latest versions, the browsers rely on the certificate to include a URL pointing to a revocation server. Our rogue CA certificate had very limited space and it was impossible to include such a URL, which means that by default both Internet Explorer and Firefox are unable to find a revocation server to check our certificate against.
As a workaround, system admistrators can configure the browsers to always query a company-specific OSCP server for revocation information, but this option is not easily usable by individual end users. This points out an important problem that still needs to be resolved, in case similar attacks result in certificates that need to be revoked in the future.
Another interesting scenario is that our result opens possibilities for Denial of Service attacks on existing Certification Authorities. If an attacker wants to have somebody's certificate revoked (e.g. of a competing web shop), he can use our techniques to obtain a rogue certificate from the same Certification Authority, which has the same serial number as the target certificate. As soon as this rogue certificate gets published, the Certification Authority has no choice but to revoke it. Since revocation happens only on the basis of the serial number of the certificate, at the same time the legitimate certificate of the victim will be revoked. This attack will become extremely powerful if the root certificate of the Certification Authority is targeted. An attacker can thus force revocation of this root certificate, and with that of the entire certificate tree depending on it.
First and foremost, there is no proper excuse for continued use of a broken cryptographic primitive when sufficiently strong alternatives are readily available, for example SHA-2.
Secondly, there is no substitute for security awareness. Openness about security problems, vulnerabilities and technical possibilities is invaluable to make the Internet a safer place. Advice from experts should be taken seriously and early in the process. In this case, MD5 should have been phased out soon after 2004.
This being said, we continue discussing countermeasures that can be taken by different stakeholders.
Users browsing the web might be redirected to a rogue website that has obtained an erroneously trusted SSL certificate. A user cannot do anything to prevent this from happening. She might try to detect it, but we realize that this is somewhat cumbersome for the average web user. The user may click on the proper button or menu item in the browser, to view details of the certificate for the website she is visiting. The used hash function is visible in the "Signature algorithm" field, see the picture to the right, where "md5RSA" means that MD5 was used for signing the certificate. When all certificates in the chain up to the root CA certificate use other hash functions than MD5 such as SHA-1, our attack has not been used.
When MD5 has been used, fraud may be detected by inspection of the certificate at bit level. When the certificate has a "tumor" inside, this may or may not be shown in a user friendly certificate viewer. See the picture on the left, where there is a strange "Netscape comment" field with the value 3. Actually this is a 427 byte field that for technical reasons cannot be shown on screen in this viewer. An expert who can inspect the certificate with tools such as "dumpasn1", will be able to identify such a strange field in the certificate. However, a standard user will likely not notice anything. Therefore inspection of certificates is not a strong countermeasure.
Preventive countermeasures are much more powerful. We now describe the preventive countermeasures that CAs and browser vendors can take, but users cannot.
Certification Authorities are recommended to stop using MD5 altogether. To prevent chosen prefix attacks, they can add a sufficient amount of randomness to the certificate fields, preferably as far to the start of the certificate as possible. The serial number is a good field to use for this, as it comes very close to the beginning of the certificate, and it allows for 20 bytes of randomness. Many Certification Authorities seem to use already random serial numbers, albeit probably for different reasons. The German Signature Law prescribes random serial numbers for qualified SHA-1 certificates (see [B]) to provide some randomness in the hash input. We do note however that this use of randomness in the serial number is a workaround, made possible by lucky choices in the X.509 standard. It is not a bad idea in general to add randomness to a hash input when a possible attacker is able to choose the input. A much more reliable, since designed, solution is to use randomized hashing, see [HK] for a paper, and [BS] for an implementation in Firefox. Such a solution introduces ramdomization as a "mode of operation" for hash functions, which is a much more fundamental approach to the problem than relying on features that happen to be present in existing standards for non-security reasons, or for no reason at all.
Other measures a Certification Authority may take are to make active use of the "pathlength" parameter in the "basic constraints" field. When a CA has the policy of only issuing end user certificates, this policy can be enforced technically by setting the path length to 0. Our specific attack scenario can be detected because there is a three-level hierarchy (see the upper picture). This detection can however be avoided by a small variant of our attack scenario, where the rogue colliding certificate is not a CA certificate but a website certificate in itself (see the lower picture). We note that such a rogue colliding end user certificate may be constructed without a tumor: the collision block can then be hidden again inside the public key. This makes detection much more difficult. Other ways of formatting the certificate may be possible.
An interesting question is whether CAs should revoke existing certificates signed using MD5. One may argue that the present attack scenario has in principle been possible since May 2007, and that therefore all certificates (or all CA certificates) signed with MD5 that have been issued after this date may have been compromised. Whether they really have been compromised is not relevant. What is relevant is that the relying party who needs to trust the certificate does not have a proper way of checking whether the certificate is to be trusted or not. One may even argue that all older certificates based on MD5 should be revoked, as for an attacker constructing rogue certificates it is easy to backdate them to any date in the past he likes, so any MD5-based certificate may be a forgery. On the other hand, one may argue that the likelihood of these scenarios is quite small, and that the cost of replacing many MD5-based certificates may be substantial, so that therefore the risks of continued use of existing MD5-based certificates may be seen as acceptable. Regardless, MD5 should no longer be used for new certificates.
Finally, Certification Authorities can monitor the flow of Certificate Signing Requests that they receive for abnormal series of requests, such as many requests by the same user in quick succession.
Browser and Operating System vendors such as Microsoft (vendor of Windows and Internet Explorer) and Mozilla (vendor of Firefox) can implement pop-up warnings to the users when an MD5-based certificate is encountered. Blocking MD5-based certificates is also possible, but rather drastic. Browser vendors can implement path length checking. Furthermore, it is the browser vendors who determine which Certification Authorities are present in the trust lists inside the browsers or operating systems. This puts them in a good position to put pressure on the Certification Authorities to adopt proper procedures and use strong cryptographic primitives. We have contacted the mentioned browser vendors so that they are aware of the problem.
Website owners can check whether their Certification Authority has proper procedures, notably does not use unacceptable hash functions such as MD5. Website owners can ask their CAs to switch to more secure hash functions such as SHA-2.
|7.1. Responses after publication||
[added Dec. 31, 2008] Verisign, the owner of the RapidSSL brand, has immediately responded when our work became public. See the announcement "This morning's MD5 attack - resolved" by Tim Callan. Some interesting quotes from this blog:
[added Jan. 6, 2009] See also a Verisign press release and a RapidSSL Advisory.
[added Dec. 31, 2008] Microsoft issued a Security Advisory (961509): "Research proves feasibility of collision attacks against MD5", and a Microsoft Technet blog item Information regarding MD5 collisions problem by Damian Hasse.
[added Dec. 31, 2008] Mozilla has a short item in the "Mozilla Security Blog": "MD5 Weaknesses Could Lead to Certificate Forgery" by Johnathan Nightingale.
[added Jan. 2, 2009] TC TrustCenter published a response (in English) as well as a Stellungnahme (auf Deutsch). Our attack does not apply to them because they ceased using automated online certification request processing, and because they use random serial numbers. They say they have started to replace MD5-based certificates.
[added Jan. 2, 2009] RSA has an entry in the "Speaking of Security blog": "A Real New Year's Hash" by Sam Curry.
[added Jan. 2, 2009] US-CERT, the US Department of Homeland Security's Computer Emergency Readiness Team, published Vulnerability Note VU#836068: "MD5 vulnerable to collision attacks". Interesting quotes from this note:
[added Jan. 15, 2009] Cisco published a "Cisco Security Response: MD5 Hashes May Allow for Certificate Spoofing " because they think some of their products may be vulnerable as well.
|8. Frequently asked questions||
Can your proof of concept rogue CA certificate be misused?
Are all digital certificates/signatures broken?
What is the status of collision attacks for SHA-1?
Suppose that a criminal creates by our method his own rogue Certification Authority that is trusted
by all browsers.
What should websites do that have digital certificates signed with MD5?
Where does the title "MD5 considered harmful today" of this website come from?
When and where will the paper with details on the chosen-prefix collision construction method be published?
Will source and/or executable code of the chosen-prefix collision construction method be made public?
How do we know that the CAs still using MD5 havenít already been attacked by somebody else?
What are the known weaknesses in DNS that you refer to in the press release?
How much did your proof of concept research cost?
Why were game consoles used? What other hardware is suitable?
How long would it take for someone else to try and imitate you?
What is the best way to ensure that the attack scenario we developed is not possible in the future?
left to right: Benne, Arjen, Marc, Jake, David, Alex (missing: Dag Arne)
(photo: Alexander Klink)
Independent security researcher,
New York, USA
Centrum Wiskunde & Informatica (CWI),
Amsterdam, The Netherlands
News at the CWI website: English, Nederlands.
Noisebridge, The Tor Project,
San Francisco, USA
LACAL - LAboratory for Cryptologic ALgorithms,
EPFL - École Polytechnique Fédérale de Lausanne,
Computer Science Division,|
University of California at Berkeley,
Dag Arne Osvik|
LACAL - LAboratory for Cryptologic ALgorithms,
EPFL - École Polytechnique Fédérale de Lausanne
Benne de Weger|
EiPSI - Eindhoven Institute for the Protection of Systems and Information,
TU/e - Eindhoven University of Technology,
Eindhoven, The Netherlands
For press and general inquiries, please email firstname.lastname@example.org.
There is a video of Marc explaining the work (in Dutch):
For more information about this project, see the following project websites:
[added Jan. 8, 2009] Video and audio files of the 25C3 presentation in different formats are available from CCC. Look for files with names like "25c3-3023-en-making_the_theoretical_possible.xyz". Interestingly, as a means for verifying the integrity of the downloads they provide the, uhm, MD5 hashes of the files.
[added June 2, 2009] The paper [SSALMOW] has been updated, the final version will appear in the Proceedings of Crypto 2009, and is also available online. This final version also contains details about a new development: we now have a single block chosen-prefix collision.
[added June 16, 2009] The "full paper" [SLW2] is now available, including all the material of the other papers and descriptions of all the applications we have developed. It has been submitted to the International Journal of Applied Cryptography.
|Latest Modification||June 16, 2011|