Iain Simpson

Some stuff that probably isn't very important.

Storing Passwords

With the LinkedIn password leak highlighting that a lot of people don’t know how to properly store passwords, I thought I’d offer my take on the issue.

LinkedIn use(d) SHA-1, which is a more secure algorithm than MD5, but doesn’t give much more protection when used for passwords. They didn’t salt their hashes, which has resulted in (reportedly) 60% of them already having been cracked.

You should (and hopefully are) using salts when storing hashed passwords. A few of you won’t have a clue what I’m on about, so I’ll explain (sorry if this is beneath you).

A hash takes some plaintext: “Password1”, and turns it into a string of characters that looks something like: "2ac9cb7dc02b3c0083eb70898e549b63". That’s a fairly big looking string, but that’s only because it’s encoded in hexadecimal. It’s actually only 16 characters long, or 128 bits. SHA-1 outputs hashes that are 160 bits long, which is only an additional 4 characters. Those 4 characters increase the number of different combinations by a factor of 24, or 16. This makes it more difficult to generate and store every possible password and hash - this is called a rainbow table, and several exist for MD5. Try generating the MD5 of some of your simple passwords (after you change them), and google for the hash.

Hashing algorithms are useful for verifying that a file you have is the same as a file someone else has, because it turns what could be a multi-megabyte file into a comparatively short string. Only if the hashes are the same would you need to compare the whole file to be sure that they are identical. You may have heard that MD5 is a “bad” hashing algorithm, but perhaps not understood why. With MD5, it’s possible to force a collision (where the hashes of two different files match), and trick people into thinking that your malicious file is in fact a safe one that they thought they were downloading - this requires the ability to change both files, adding/removing blank space. SHA-1 (Secure Hash Algorithm 1) addresses this, and attempts to be cryptographically secure, so it’s much more difficult (aiming towards impossible) to force a collision.

This makes almost no difference for passwords - you’re not trying to avoid collisions, which are incredibly unlikely when the data is (generally) shorter than the hash. The only real consideration is the slightly longer output making it more difficult to build a rainbow table.

Salting is when you take a random string of characters: "90c91462cd7c52d0a767ab29517bcd31", then add that to the password before hashing:

"90c91462cd7c52d0a767ab29517bcd31Password1" -> "6acf1aed0f3a30943e9c1e6ad82df6de"

So that you know what to hash the password with later, you store the salt alongside the hash. Some implementations use a different database field, a separate table, or simply prepend or append the salt to the hash. I prefer to append the salt, because it’s backwards-compatible with unsalted passwords - in case you want to quickly poke a hash into a db manually, or need to migrate a system to using hashed passwords:

"1f9002f1c2e3e6e8c5805a5f6c53f47e90c91462cd7c52d0a767ab29517bcd31"

You can easily separate the hash and salt later, because you know the length of hash that your algorithm generates. In this example I’ve used MD5, which outputs 32 hexadecimal characters.

Salting your hashes makes passwords a lot more secure in the event that they’re compromised. Instead of hashing a password once and checking the entire database for it, an attacker needs to salt and hash each password they try for every row. This makes things much, much slower.

You never know how or why your database might be stolen (disgruntled ex-employee?), so make it as difficult as possible to crack it, and use really secure passwords on your admin accounts, because they’ll focus their attention on those.

By all means, use a “better” hashing algorithm, but understand why you’re using it, and for fuck sake, salt your passwords.

Edit:

Caius Durling points out that there are hashing functions that aren’t optimised for speed (like MD5 and SHA* are). Bcrypt will help against brute-force attacks on your leaked password database, but eventually, everything will be crackable. A usable work factor for bcrypt today might mean that it’s easy to crack in a few years. It’s a trade-off between efficiency and security - remember that this is a constant arms race. What was good enough ten years ago might not cut it today.

Comments, criticism? I'm on Twitter, but you can also email me.