Password hashing and salting

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.

« Make me a sandwich. Ok! memcache internals »

Posted on 02 Feb 2011
Tagged with: [ hash ] [ md5 ] [ nacl ] [ salt ] [ sha ] [ sodiumchloride ]

The thing everybody (should) know is that when you want to secure passwords in - let’s say - a database, you have to hash to them. It’s kind of a golden rule but is it safe enough? Ask a more experienced user and they probably tell you to add some salt. Ask the reason why and they will probably say “it’s because it makes the password longer and more secure”. Even though it is true in effect that using a salt increases the overall security of your hashes BUT it’s not only because your passwords are longer. There is a another (maybe even more important) factor that comes into play, namely the fact they are more secure against rainbow table attacks, but that depends on HOW you season your hashes. Season it incorrectly, and you gain nothing in security even though you think you did….

Hashing

A hashing is a one-way function that creates a (fixed length) string based on a variable sized plain-text. It’s one-way because we can go from a plain-text to a hash, but we cannot recreate the original plain-text from the hash. There are many hash-functions, some of them good, some of them bad, and some of them quick while others are kind of slow. What you need to know is that since the output is fixed length (say: 32 chars but it depend on the hash function). It would mean that there could be multiple plain-texts that generate the same hash. This is called a collision and even though they are rare, they can (and will) occur.

h("hello") => 3462362  
h("you") => 5236272  
h("somethingelse") => 3462362

in this case, the hash function will generate the same hash for both the texts “hello” and “somethingelse”.

Most importantly you need to realize that there is NO MATHEMATICAL WAY to retrieve the plain-text back from a hash. It’s impossible. If I know the hash, I cannot calculate back the plain-text. But I can deduce: if I just try every string or text that is available, generate the hash from it, I can compare it with the original hash. So if I save the hash of a user password into the database (‘1347272’ for instance), the only way I can check if a password is valid, is when somebody enters a password, hash it, and check if the hash matches the hash in the database.

In pseudo-php code:

if (sha1($_POST['password']) == "e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4") {  
    print "You have typed to correct password!";
}

The “e5e9…” is the hash of a secret password (guess which one :p), so only when you have posted the correct password, it will display the string. It uses the sha1() hash function.

Breaking the hash

Now, if you REALLY like to know the password from the snippet above, you probably would enter the hash into Google and you instantly find the correct password. Now suppose we know the password is 6 chars long, and consists of all lowercase letters. We should be able to create a small script that iterates through all the possibilities (starting with “aaaaaa” and ending with “zzzzzz”). On every iteration, we create a sha1 hash, compare it with the “e5e9fa..” hash and if it matches, we have found the password. This is called brute-forcing the hash, and is (currently) the only way to retrieve a plain-text from a hash (plain-text being passwords in our case).

Now, suppose your system only accepts passwords between 4 and 8 characters and can consist of only lowercase letters (just as an example). Somebody evil has got access to your database and got a list of all your user-accounts with additional hashes. How bad would that be? Turns out: very bad. They could write a script to brute-force the hashes and gain access to every account.

Adding some salt

Please note that whatever security measurement we take, once somebody got a hash, they WILL be able to reverse it eventually. The only thing we can do is make it as hard (and thus not efficient for them) to take on your hashes.

We could do 2 things:

Make sure the user must enter a ridiculous long password with loads of special characters (between 20 and 30 chars for instance).
Make the passwords longer yourselves.

Option number 1 is not really feasible. Nobody in their right mind (well sorry, i do have a password that falls into the 20-30 char range) would enter such a long password. Option number 2 is easier to achieve with something called a “salt”. Before we hash a password we add a “salt” to it:

$saltedpassword = sha1(SALT . $password);

What do we gain from this? First of all, if somebody happens to enter ‘1234’ as a password, and the SALT is ‘NaCl’, the hash we store is ‘NaCl1234’, which is hashed with the sha1() function to ‘53a99f2dc3c5ce993609e6ffdc5049e61363f9c7’.

Even though the password has got a length of 4 characters, you need 8 characters to “bruteforce” decrypt the hash. If somebody just got the hash, but not the salt, you would make his life much harder.

If you need to brute-force a 4 character password, which can consists of 26 letters (lowercase ‘a’-‘z’), 26 uppercase letters (‘A’-‘Z’) and 10 digits, the amount of combinations are (26+26+10)^4 = 14.776.336 combinations. A simple php-script could try every combination ranging from ‘a’ to ‘9999’, which would take: (62^4) + (62^3) + (62^2) + (62^1) = 15 018 570 attempts.

Now, with 8 characters, the amount of attempts, would already be:
(62^8) + (62^7) + (62^6) + (62^5) + (62^4) + (62^3) + (62^2) + (62^1) = 2.21919452 × 10^14

Even though this fits on one line, don’t underestimate the amount of tries we need to actually get a match. It’s enormous. So in effect, adding a salt would be completely transparent for a user, but not for an attacker. So we’ve got a very simple way of securing your hashes even more,…. but not really…

Don’t rely on the salinity

Hold on. I just told you about salt, and showed you how it would literally dropped an attacker chance of retrieving the password to absolute 0, and it’s not any more secure? What if an attacker could gain access to your source-codes, where the SALT string resides (can’t be done? How did they get the hashes in the first place then?) In that case an attacker knows the password will start with “NaCl”, and in effect it needs to try only the 15 million combinations again. In effect, eliminating the goal of the salt..

The most important rule of thumb: Make sure your passwords are strong enough to begin with.

A salt means nothing without a good strong password to begin with but it does defend you against something else though…

Brute-force cracking with online databases.

I’ve shown you how many unique combinations you could create with just the 62 different chars (you might add special chars like !#%^ etc, and gain maybe 10 more). A hacker would need to iterate through all possible combinations and just try them all out. Not really feasible but on the internet there are big databases where you can enter a hash, and it will return a plain-text matching that hash (off course, not for all texts, but quite a few and it’s growing every day). Take a look at sites like http://hashcrack.com/

Now, those database consists of millions of records with a “plaintext”-“hash” pair, where it will lookup the hash, and returns the plain-text. Simple. A hacker would just have to insert all your hashes and see if matches will come up.

Now, we add the salt again. Chances are that those databases have a hash for “1234”, but not for “NaCl1234” (some of them have though, so use even a more complex salt). The thing with salt is that a hacker cannot use those big databases anymore and actually has to brute-force every password, even though they know the salt. It would also mean that in order a salt to be effective it doesn’t really have to be secret in order to function (only it will function even better when it’s kept secret). So even though it would not gain you anything against brute-forcing, adding a salt does secure you against the big online databases.

Rainbow tables

Since “the other side can do magic too”, using a fixed SALT and keeping it secret does not work well with another approach that hackers can use. There is a system that works with so-called “rainbow tables”, which are large datafiles with a large amount of hashes that can be created reasonably quickly without using too much space. With those rainbow-tables, it’s very efficient to create your own database-with-hashes on the fly. So if you would know the salt, you could create those datafiles which you could use to lookup the hashes. You would only need to create one dataset, since all passwords are prefixed with the known salt. Sure, it’s annoying for a hacker since they cannot use datafiles that are already available, but creating a custom datafile is feasible enough, especially when the userdata is from an important site. When there is a large userfile with maybe thousands of email-addresses and passwords to find (and off course, most users will enter their passwords on multiple sites so fetching even 10% of those passwords is a very lucrative business).

More salt

Now, just like in a game of chess, we too can defend ourselves against those rainbow table attacks. This time with a variable salt. Instead of using a constant salt, we use a different salt for each user.

An example:

sha1('NaCl-<strong>ujasdfas</strong>-1234')       ==> b9afc2380b4fccef85b3f3da3379d220f1548cad
sha1('NaCl-<strong>jubdsdsb</strong>-1234')       ==> bd3a831fcd202b546307be8ea5b4a76c87700c4e
sha1('NaCl-<strong>jehnwnad</strong>-secret123')  ==> c0678ee585f635b9bbd91cf7b2a5597e8745521c</pre>

First part is the normal “constant” salt, we don’t really need it, but we just add it anyway. Just to annoy hackers and spoil their perfectly good day. Secondly, the part between the two dashes and marked bold is a variable salt which varies for all users (but it doesn’t matter much if two of them are equal). The final part is the actual user password.

Now, one problem: how do we recreate the hash since we do not know the variable part (even the user would not know that)? Easy, we store it - plaintext - together with the password in the user-record. It really doesn’t matter if the salt is publicly available (after making sure that you have a secure password policy), but it will guard you against the fact that a hacker must create a complete new rainbowtable datafile for each user record. It cannot use one file anymore since the “prefix” changes for every file. Another nice thing about using a “variable” salt is that even when 2 users use the same password, the resulting hash will be different.

Hurrah! No more rainbow tables to decrypt all hashes at once!

Conclusion

Unfortunately, there is no way to 100% secure your hashes. It cannot be done. It will always be possible to bruteforce a hash but following a few simple steps, you can create this process too difficult or time consuming. But even then a hacker can target just one account (for instance, the root or administrator account). Make sure those passwords are very strong so bruteforce will be too difficult. Rotate your passwords as well, so when some hacker has found the password after a week or month, it’s already changed by you.

Rainbow-tables can be countered by variable salts, but again, they are no match against a hacker targeting just one single account.

The best hints I can give you:

Passwords should have a safe password policy (depends on the application, but 8+ chars and symbols should be ok)
Passwords should be hashed with a good hashing function (sha1 or better, don’t use md5).
Passwords should be hashed with a strong variable salt so rainbow table attacks are not feasible.
Read and learn more about security. It’s a very difficult area but a very important one nevertheless.

« Make me a sandwich. Ok! memcache internals »