Headlines containing words like “data breach” and “X number of users compromised” have become commonplace in current headlines.  There has been a recent spate of high profile breaches, each coming from a different source and affecting a different group of victims.  The advice, however, is always the same: change your password.  That’s it: if you are a user of a breached site, change your passwords.  This is usually followed by advice along the lines of “you should never use the same password twice” and “never use the password password”.  It’s common industry knowledge that passwords should never be sent in the clear, or stored in the clear.  Having “yourPassword1234!” sitting in a database for all to read isn’t very safe.  Use of hashing functions- one way mathematical formulas that change passwords into random letters and numbers with no possible way to change back- is the practiced norm.  Instead of storing “yourPassword1234!”, the database stores “2c3b130b35c7416551f00a501e8de878”.    It’s easy to go from your password to the stored result, but mathematically impossible to calculate the password from the stored result.  This begs the question: if it is not possible to compute a password from the stored data, why do I need to reset my passwords when that stored data is leaked?

It is a required because it is straightforward for software to follow the same hashing function used to create the stored data to guess your password.  The rudimentary approach is simple:

  • Start with the letter A
  • Hash using same formula the source system uses
  • Compare results with database, if found you now know the password for user X
  • Repeat for B, C, …, ABCDEF, DEFgh1!, … etc

“OK, that makes sense,” you’re thinking, “but doesn’t it take forever for computers to actually do this?”

That particular point is a constant cycle of one-upmanship in the industry.  Technology gets faster, hash functions become quicker to compute, computing all possible hashes becomes quicker, systems get replaced with more complicated hash functions, and from there the cycle repeats.  Consider the following real world example:

I have possession of J. Random Widget, Inc’s database for their online ordering system.  It’s an older system- it has been in production for a decade.  Eons in software years, but the HTML has been regularly updated so the site looks current.  Under the hood, however, it is still using decade old technology.  The particular hashing algorithm it uses is vanilla MD5.  This might have been perfectly acceptable in 2005 or 2006, but this is 2016.  Prior to writing this article I set up hashcat (https://hashcat.net/hashcat/) on my workstation, and ordered it to start working on the passwords collected in this database.  Hashcat is open source software designed to perform just this kind of thing- take a hash and figure out the password that created it.  Keep in mind this is quite legal- tools like these are the last line of hope for systems where legitimate users have locked themselves out due to a forgotten or misremembered password.

Because I tend to edit my own work as I write- it has taken me about half an hour to reach this point.  Out of the 2,448 separate passwords found in the database, the software has already calculated 1,336 of them.  These passwords range from as simple as “13” to “Changeme1”.  My workstation- containing off the shelf, not even the fastest possible hardware- is calculating possibilities at approximately 6.9 billion per second.  Within a day I imagine it would have discovered all 2,448 passwords.

As I mentioned before, this particular example is using outdated technology.  Modern functions would require much more time per calculation, meaning what took minutes with this example would take days or weeks with current generation processes.  Here is the rub though: I am only using a single workstation not intended for this purpose.  A dedicated attacker could conceivably build a system 5 times as fast as mine for around $1,500.  Given what one could gain access to with the right usernames and passwords, spending $1,500 could easily be considered an “investment”.  There are also cloud hosting options that focus on high performance computing, not to mention more nefarious alternatives such as putting a botnet of infected PCs to work.

This is ultimately why you hear the advice to use a long and random password, and also to use a different password for each account.  Long and random passwords make it hard for these tools to figure out your password, and having passwords unique to the account limit the scope of damage if your password is ever discovered.

And that hashcat run on my workstation?  It’s up to 1,375.