Vulnerability of software integrity and code signing applications to chosen-prefix collisions for MD5November 30, 2007
|We announce two different Win32 executable files with different functionality but identical MD5 hash values. This shows that trust in MD5 as a tool for verifying software integrity, and as a hash function used in code signing, has become questionable.
|Software Integrity Checksums
|Software is vulnerable to threats on its integrity. For example, when a program is made available as an executable file on the web for downloading, some bits of the file may be accidentally changed during the download process. Or, when a program is stored on a hard disk that becomes victim to a virus attack, the program code may be changed on purpose, to get an executable file that has been infected by the virus. A popular method of thwarting those threats is to compute a checksum of the original executable file, and to publish this checksum along with the program on the download webpage, or to store this checksum in a database. At any later moment the user of the program is then able to recompute the checksum from the file as it is then residing on his computer, and verify the checksum with the stored original value. When the checksum has changed, apparently the file has changed, and should not be trusted any longer.
Software vendors have set up code signing schemes to assist users in
distinguishing software coming from a trusted source from malware. A public key
signature scheme based on PKI (Public Key Infrastructure) is used to provide
the user, along with the software itself, with a digital signature of the
software. Such a signature is unforgeable in a strong cryptographic sense, as it has been
constructed using the software vendors private key. The users can verify this
signature using the corresponding public key, usually made available inside
a certificate. A prime example of such a code signing
scheme is Microsoft Authenticode.
Hash functions play an essential role in the process of making digital signatures. The private key operation is not applied directly to the to be signed file, but to a hash value of this file instead.
|Cryptographic hash functions
|Cryptographic hash functions are widely known and easy to calculate functions that are believed to have the property that no two messages with identical hash values can be found. A well known and widely used cryptographic hash function is MD5. Especially in software integrity applications and code signing programs MD5 has been a popular hash function for many years now. Many websites making software (source code or executable) available for download provide on the website with the software itself its MD5 checksum. Another much used hash function is SHA-1. For the last decade MD5 and SHA-1 have been the default choice for hash functions in most code signing schemes.
|Hash functions should be collision resistant
|In order for a software integrity checksum or a digital signature based on a hash value to be of any value, the cryptographic hash function that is used must be collision resistant. That is, it must be practically impossible to find different messages that have the same hash value. Otherwise, a miscreant can use a single hash value to commit to more than a single file.
|MD5 is not collision resistant
|The cryptographic hash function MD5 was shown to be not collision resistant, by prof. Xiaoyun Wang and her co-authors, in 2004 (see the EuroCrypt 2005 paper "How to break MD5 and other hash functions"). Although devastating from a theoretical point of view, many practitioners believed that the practical repercussions of prof. Wang’s attack would be minimal because of the lack of useful structure of the collisions that could be constructed.
|When incorporating collisions into a pair of files, prof. Wang's original method of constructing collisions required the files to be exactly equal, except for the relatively small random looking block of bytes comprising the collision. Last year we showed, with the kind assistance of prof. Wang, how her attack on MD5 can be improved to provide chosen-prefix collisions. Our chosen-prefix collisions have only the requirement that after the collision the files should be exactly equal. Before the collision the two files, for which a collision is to be found, can be anything: our chosen-prefix collision finding method will always produce a collision that can be incorporated into the two files, irrespective of what data is present before the collision.
|Eric Verheul pointed out to us that random bytes may be appended to various formats, without changing their functionality. This idea also applies to executable files. So, by appending a chosen-prefix collision, any perfectly harmless program can be made to collide with any other program, such as a piece of malware. The two files show unchanged behaviour when executed, one good and the other bad, but now the files have identical MD5 hash values. A digital signature obtained on the harmless program will then be equally valid for the malware.
|Proof of concept
|We now present our proof of concept pair of Win32 executable files that have different functionality but identical MD5 hash values.
|A good executable
We wrote the program
HelloWorld.cpp, with the following C++ source code:
As you can see from the source code, it's completely innocent. Here is the
The MD5 hash value of the executable file
|A bad executable
We also wrote the program
GoodbyeWorld.cpp, with the following C++ source code:
As you can see from the source code, it's not really harmful, just annoying. Here is the
The MD5 hash value of the executable file
Building both executables from source code was done by a standard MS Visual C++ installation.
|Making the executables collide
As a proof of concept we applied our
chosen-prefix collision finding method
to the files
This came down to carefully constructing two blocks of 832 bytes, and appending
them to the end of both files. These 2 times 832 bytes have been constructed such that
the resulting files, renamed to
GoodbyeWorld-colliding.exe, have exactly identical MD5 hash values.
|The colliding executables
Here are the two colliding files, the
HelloWorld-colliding.exe, and the
HelloWorld-colliding.exe has exactly the same good functionality as
has exactly the same annoying functionality as
are different files, containing programs with different functionality,
they both have the MD5 hash value
|Signing the code
To demonstrate the code signing vulnerability, we have set up our own
Code Signing Authority, complete with a public key certificate
and corresponding private key. Then we signed
the colliding executable files using the private key of our Code Signing authority.
The signature is
This signature was made by applying RSA to the standard PKCS#1 padded hash value.
You are invited to verify for yourself the following essential points.
See also the output of fciv - Microsoft's File Checksum Integrity Verifier utility.
Here is a file with all IHVs.
Here is a pretty picture showing the bit differences in the internal states of MD5 as it is applied to the two colliding files.
It is important to note that the hash value shared by the two different
files is a result of the collision construction process. We cannot target
a given hash value, and produce a (meaningful) input bit string hashing
to that given value. In cryptographic terms: our attack is an attack on
collision resistance, not on preimage or second preimage resistance.
This implies that both colliding files
have to be specially prepared by the attacker, before they are published
on a download site or presented for signing by a code signing scheme.
Existing files with a known hash that have not been prepared in this
way are not vulnerable.
What we can easily do, however, is to do a brute force search on a few additional bytes, to e.g. let the first three and the last three bytes of the hash value match with a given target. This already may introduce an additional vulnerability, as many people (including at least two of us) usually only look at the first and last three bytes when checking a hash value.
For abusing a chosen-prefix collision on a software integrity protection or a code signing scheme, the attacker should be able to manipulate the files before they are being hashed and/or signed. This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process. An attacker with such access can most probably do more harm anyway, without the need for chosen-prefix collisions, to get ´official´ digital signatures on malware.
On the other hand, there is the viewpoint of the relying party, i.e. the user downloading hashed or signed code who needs some guarantee that this software can be trusted. This relying party can not be sure anymore that the published hash value or the digital signature is valid for only the executable file he downloaded. There might very well be a sibling file with the same hash value or digital signature, while only one of these siblings went through the proper hashing or signing procedure. Especially when the software integrity verification takes place under the hood, with the user not knowing that the operating system or some hidden application is silently verifying digital signatures on software to be installed, the user may be more easily lured into installing malware.
Note that it is not necessary for an attacker to build both executables from source code. It is perfectly well possible to take as the first file any executable from any source, and as the second file produce a second executable as malware. Then a byte block to be appended to both files can be found such that the resulting files have the same MD5 hash value. If an attacker can then get the first file to be signed, e.g. by the original software vendor, this signature will also be valid for the attacker-constructed malware.
It can be objected that at a bit-level inspection of the executable(s) it will become apparent that there is a strange non-functional random looking block of bytes at the end of the executable. This may be true for specialists knowing what to expect in executables, at least for us it's a complete mystery what should and should not be expected inside executables. Moreover, it could very well be possible to hide the colliding blocks not at the end, but hidden somewhere else in a less conspicuous place of the binary executable files.
Note that with the technique of constructing multi-collisions as described on our Nostradamus website, it is very well feasible to produce as many different (executable) files as one wants, all with the same MD5 hash.
One idea to thwart our attacks is to provide software with self-checking components, e.g. the software checks the integrity of its own executable as the first step of the execution. This might be a countermeasure against our attack. Nevertheless we think that the proper way to go is to avoid relying on broken crypto such as MD5 altogether.
Given the recent insights into the weaknesses of MD5, the bottomline of our work is:
Note that also the collision resistance of SHA-1 does not live up to its design criteria anymore, though attacking SHA-1 is still much more difficult than attacking MD5. See the IAIK Krypto collision website for the present state of affairs.
|Hasn't this been done before?
In December 2004 Dan Kaminsky
and Ondrej Mikle, and later
Peter Selinger, published
similar attacks, based on the Wang-type collisions that require two binary files that differ only in
the colliding blocks. To create such files from two executables with different behaviour that yet
collide under MD5, each of the two files has to contain both executables in full,
somehow using the collision to switch on the one and hide the other.
Our colliding files are based on chosen-prefix collisions. This means that we only have to append a few thousand carefully chosen bytes to each file to reach an MD5 collision. Each file by itself contains only one of the two executables. This is less suspicious.
|Sony PlayStation 3
|As mentioned on the Nostradamus attack website, we make our chosen-prefix collisions using a Sony Playstation 3. The collision built inside the above executables took us less than 2 days to make.
Marc Stevens, CWI, Amsterdam, The Netherlands
Arjen K. Lenstra, EPFL, Lausanne, Switzerland, and Bell Labs, Murray Hill, USA
Benne de Weger, TU/e, Eindhoven, The Netherlands
Please send all correspondence to Benne de Weger.
We thank Eric Verheul for pointing out to us that appending any
string of bytes to many data formats does not change the functionality.
A large part of the work behind this collision construction was done while Marc was visiting Arjen at EPFL in August and September 2007.
We are grateful to Dag Arne Osvik at EPFL for his help with the PlayStation 3.
|November 30, 2007
|January 1, 2009