CodingCrypto's page on Engineyard's Programming Contest

Engineyard posted a programming contest on July 14, 2009. The task was to find a phrase whose SHA-1 hash has minimal Hamming distance from the SHA-1 hash of a target phrase. To make things more interesting, the phrases were restricted to words from a 1000-word dictionary, had to consist of 12 words with spacing, and could be followed by at most 5 (free) printable ASCII characters. The words could have upper and lowercase letters at any position. Time: 30 hours after start.
The contest started on July 20, 2009 when the target phrase and contest dictionary were posted.

Latest news

It's official - winners announced on Engineyard's blog and it's us!!

SHA-1 and ASCII

SHA-1 is a hash function that operates on blocks of 64 bytes at a time. The output of hashing the first block is used as initialization vector (IV) for the second block etc.
ASCII characters are stored in 8 bits (=1 byte), i.e.

BuGS
takes 4 bytes; the string
BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjV
consumes exactly 69 bytes (blanks count as characters!).

The approach

We are not aware of any weakness of SHA-1 that would help solving this contest, so the only way is to hit it with computation power and that means as many tests per second as possible. We reused the first block as often as possible, i.e. for fixed first 64 bytes vary the end. We have no chance of exhausting the search space so we went for the cheapest choice: use 12 words and 12 spaces for the first block, have the second one consist entirely of the 5 characters. Our choices were 8 words of length 4 and 5 of length 5. Each process would start with a random choice of upper and lowercase letters. Scanning the dictionary for 3 minutes after the start we settled on bugs and blanks for the first runs and dirty and cows for the second.
Taking all combinations of upper and lower case letters for the 52 letters in the first part gives 252 different first blocks. Chances of repeating any by random guesses are minimal for only 30 hours. Of course we could have varied the entries more - changed the order of words for each new process to start and choose new words of 4 and 5 letters (and of course one can also take other combinations of lengths) but when each second is worth 3*109 hashes any manual change reduces your chances. The dirty cows didn't start before we burned a fuse in Taiwan and behemoth and the angels and giants had to be recabled - lots of time to think of other fun words.
Each of the first blocks was combined with all combinations of 5 letters for the second block and lots of zeros for padding (processed faster than content). Actually we didn't use ALL combinations - the CPU code did 325 (planning ahead for some extra optimization that didn't happen before the deadline; we didn't start before Sunday ...) while the GPUs did 323*622, with each GPU thread varying only the last two characters. 64 would have been nicer but would lead outside the standard character set a..zA..Z0..9 and we were worried how well they would work on Twitter.

The code

None less than Daniel J. Bernstein (currently visiting TU/e) took the time to write dedicated software for our cluster of Core2 Quad CPUs. The SHA-1 code will be very useful for our future projects and outperformed our first quick OpenSSL hack by a factor of 14 on this.
Dan did also did our CUDA implementation (good shot for a first time CUDA programmer!).
The code is posted here.

Everybody

Paul noticed the contest and brought Dan and Tanja in - just one condition: he would get the iPhone. Sure - sufficiently unlikely to work anyway. To make it more likely and to have more fun with GPUs, Dan and Tanja brought in Bo-Yin and Doug - who might now figure out how to program in Ruby and were very busy getting all machines to work and running the code which included reconfiguring the machines (taking out extra GPUs, moving to a dedicated power plug, restarting after shutdown due to lack of power, ...). Dan and Tanja stayed with the cluster for most of it, did the regular checking, and posted the results (hashbreaker for Dan and CodingCrypto for Tanja).

30 hours of staring at the screens

We started all cluster machines within minutes after the challenge came out (needed the dictionary words first). Had to take down hautbrion after a short while because its CPU was overheating. All other 9 machines stayed with us through the run and worked through about 47*106 hashes per second, working on all 4 cores.

Here is the table of the complete computation power we threw at this. Some updates on owners and grants to thank will follow shortly.

tesla:690822188 hashes/second: 4 GT200 GPUs in a S1070 (two buses)
behemoth: 657871796 hashes/second: 4 GT200b GPUs on 2 GTX 295 cards
unicorn: 337020845 hashes/second: 4 G92 GPUs on 2 9800 GX2 cards
dragon: 220369116?hashes/second: 2 GT200 GPUs on 2 GTX 280 cards
centaur: 179221757 hashes/second: 2 G92 GPUs on 2 8800 GTS cards
giant0: 106109452 hashes/second: 8 C2+ 2.66GHz cores
giant1: 106109452 hashes/second: 8 C2+ 2.66GHz cores
movebank: 94117640 hashes/second: 8 C2+ 2.8GHz cores
angel0: 69961977 hashes/second: 8 K10+ 2.3GHz cores
angel1: 69961977 hashes/second: 8 K10+ 2.3GHz cores
colossus: 68927875 hashes/second: 16 K8 2.2GHz cores
berlekamp: 51981651 hashes/second: 4 C2+ 2.83GHz cores
lanczos: 49740932 hashes/second: 4 C2 2.4GHz cores
giscours: 47603960 hashes/second: 4 C2 2.4GHz cores
moutonrothschild: 47603960 hashes/second: 4 C2 2.4GHz cores
latour: 47603960 hashes/second: 4 C2 2.4GHz cores
maucaillou: 47603960 hashes/second: 4 C2 2.4GHz cores
ausone: 47603960 hashes/second: 4 C2 2.4GHz cores
lafiterothschild: 47603960 hashes/second: 4 C2 2.4GHz cores
yquem: 47603960 hashes/second: 4 C2 2.4GHz cores
margaux: 47603960 hashes/second: 4 C2 2.4GHz cores
dauzac: 47603960 hashes/second: 4 C2 2.4GHz cores
utrecht: 47603960 hashes/second: 4 C2 2.4GHz cores
montgomery: 45636363 hashes/second: 4 K10+ 3GHz cores
ranger: 29530201 hashes/second: 4 K10 2.2GHz cores
miranda: 27178268 hashes/second: 3 (out of 4) Ci7 2.66GHz cores running 2 threads each
ebauche: 19200000 hashes/second: 2 C2 2.4GHz cores

All but two machines run under Linux, the exceptions are two Macintoshes. Programs were written in C++ and CUDA.

With all the machines running we got up to a total of 3079431974 hashes/second, i.e. 332578653192000 hashes/30 hours (but not all machines were up all the time).
Here are some statistics on the difficulty of finding hashes at various Hamming distances.

Average number of hashes: to find hash at distance:Note:
51180834166 39  
160104147904 38  
518231847164 37  
1736777001308 36  
6030475698988 35  
21709712516358 34  
81092161458160 33 (fairly good chance)
314539292928624 32 (some chance)
1267986524618517 31 (would have to be lucky)
5317362845174426 30  

Well, guess we got overly lucky with the distance 30 string. Tesla made the right internal guesses!

Some results of low weight:
34 giscours bUGs buGS buGS bUgS blAnK BlAnk blanK BlanK BUGS BUgS bugS BugS dwCpy
34 margaux BugS BUgs buGs BUgs bLANK Blank bLanK BlAnk BuGs BuGs BUGS bUGs FcBkw
34 angel1 cOWs cOws cOws Cows dirTY dIrTY DIRty DirtY cOWS COWS CoWS COWS qqhmD
34 behemoth cOWs CoWs cowS COws diRTy diRTy diRty dIrtY cOWS COwS COwS CowS kicOo
34 behemoth coWs cOWs coWS cOws dIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows FtEgn
34 behemoth CoWs cOws cOWs COWS DirTY DirTY DirtY dIrty COWS CowS cOwS CoWS csr6B
34 behemoth coWS cOWs cowS cOws dIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows byhia
34 berlekamp BUGs BuGs bUGs BUGS BlAnk BlAnk BLAnK blAnK buGs bUGs bUGs bUGS icxxt
34 ebauche rUBY rubY ruBY ruby tokyo ToKYo tOkYo tokYO rUBy RubY ruBy ruby ifloc
34 lanczos BUgS bugS BUGs BUGs blAnK Blank BlAnk BlANK BuGs BUGs bUGs BuGs ckbBr
34 ranger BuGs BUGS bugs bUgs BlAnk BlanK bLaNk blaNk BUGs BUGS BUGs BuGs banzb
34 unicorn bugS BUGS BUGs bUgs blANk BlanK blAnk bLAnK buGS bugs BuGS bUgs xphKL
34 unicorn bUgS bUGs bUGs BuGS bLank bLANK BLAnK BLaNK BuGS BuGs BuGs bUgs tmFgx
33 behemoth COws cows COws COWs diRTy diRTy diRty dIrtY cOWS COwS COwS CowS omBdF
33 behemoth CoWS coWS CowS cOWS DiRTY DIrtY DirTy DIRtY COWS COWS cOwS cOWS bvpDq
33 movebank BUgs bUGS bUGs bugS BlaNK BLanK blANK bLaNK BugS BUgs BuGs bUgs liwcC
33 ranger BuGs buGS BUgs BUgs BLanK BlanK bLanK blAnk BUGs bUgS bugS buGs CltAB
32 giscours bUgs BuGS bUgS BUgS blAnK BlAnk blanK BlanK BUGS BUgS bugS BugS wrubc
32 behemoth cOWs cOwS cOwS cowS DIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows owjo0
31 centaur bUGs BUgs bUGS bUgS bLAnk BlAnK BlANK BLank buGS bugS bUGS bUGs dCao2
30 tesla BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjV

Results

Nothing is official yet, but from scanning the public postings it looks good for us. A very nice leader chart is by jazzychad who went through the effort of removing invalid or repeated entries and got us reloading continuously. Thanks for the great service of putting everything in humanly accessible and readable form! Tanja and Dan have been sending tweets to engineyard. Our best one is at Hamming distance 30 from the challenge text

BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjV
and we got some other nice small distances, too.
Update: winners are announced and it's us!!!

Cooling down after the challenge and celebrating! cluster after
Last modified: Fri Jul 24 20:03:12 CEST 2009 by Tanja