Why use an authentication token instead of the username/password per request?
The author of https://stackoverflow.com/a/477578/14731 recommends:
DO NOT STORE THE PERSISTENT LOGIN COOKIE (TOKEN) IN YOUR DATABASE, ONLY A HASH OF IT! [...] use strong salted hashing (bcrypt / phpass) when storing persistent login tokens.
I was under the impression that login cookies served two purposes:
- UX benefit: don't ask the user to login once per request
- Performance benefit: don't need to run a slow algorithm (like bcrypt) per request
It seems the author is invalidating the second point, which makes me wonder: why bother with authentication tokens at all? Instead of mapping the username/password to a randomly-chosen authentication token, why not simply pass the username/password per request using a
Assuming we accept the author's recommendation and use bcrypt per request, what's the advantage of using an authentication token instead a username/password?
When you use an "authentication token", the simple presentation of that token by the client grants access (as long as the token is deemed valid by the server). If you store the tokens "as is" in your server's database, then an attacker who could get a glimpse at your database will immediately learn all the tokens, allowing him to send requests in the name of all users for whom a valid token still exists in the database. This is bad. Hence the recommendation of storing only a hash of the authentication token.
The advice about bcrypt for authentication tokens is misguided. Bcrypt, like all functions dedicated to password hashing, is meant to be used for passwords. Passwords are weak; they are vulnerable to dictionary attacks. To cope with their inherent weakness, the password hashing function must be salted (to prevent parallel attacks and precomputed tables) and must also be awfully slow (through a huge number if internal iterations). This makes password hashing functions expensive. So you don't want to use a password hashing function unless you need it, e.g. to hash a password.
An authentication token is not a password; it is a random value which was generated and remembered by a computer, without any human brain involved in the process. As such, if you generated the token properly (at least 128 bits, obtained from a cryptographically secure PRNG), then there is no need for salts or iterations; just use a plain hash function (even MD5 would be fine there). This will be more efficient.
As for using a random authentication token instead of login+password for each request, there are two main reasons for that:
Performance: as explained above, an authentication token can be verified safely with a simple hash, which will be vastly more efficient than a heavy bcrypt call. You want to keep your precious CPU cycles for when they are really useful, in particular when applying bcrypt to an actual human-managed password.
Client-side storage: the authentication token will be stored on the client as a cookie value. If the login and password are sent back with every request, then they are stored as a cookie on the client. People get nervous when their passwords are written to files -- and such storage is not equivalent (from a security point of view) to the storage of an authentication token, because human user are on the (bad) habit of reusing their passwords for multiple systems, whereas authentication tokens are inherently server-specific.
Applying bcrypt to binary data is dangerous. bcrypt intentionally stops reading the input after seeing byte with value 0. It also stops after reading 72 bytes. So you can bcrypt the hex or base64 representation of a random token, but don't bcrypt raw binary data.
Also, authentication tokens can have a short lifespan. Asking the user to re-authenticate once a day or once a week doesn't seem too bad. But if you ask him to change his password once a day or week he won't be happy.
What algorithm do you recommend for this? A quick OWASP wiki search returned no results. Is a SHA1 fine, or is something stronger (SHA256/SHA512) needed?
@AnonymousPenguin: You only need preimage resistance, so MD5 or SHA1 would *probably* be fine, but there’s no reason not to pick a SHA-2 anyway.
Could you please explain how MD5 would be considered a secure solution? AFAIK there are colision attacks on MD5 and they are very very fast to compute, so if the database leak a collision could be derived or brute force will become much more easier to attack since it's so fast.
@mFeinstein - Ry is correct; this situation only requires *preimage* resistance. A *collision attack* means finding *any* two documents that have the same hash; that won't help you match a *specific* document (especially for randomly generated documents, such as tokens). If collision attack is "not likely"** to find any match, then attack must be brute force birthday attack - see MD5 discussion. ** If dealing with financial data, highly sensitive data, or must guarantee *no* accounts compromised, don't use MD5 here -- there is a (tiny) risk.
Well, as far as I have seen, modern GPUs can generate all possibile options of MD5 hashes pretty quickly, so finding a matching token is feasible with MD5 nowadays.
Agreed. An can quickly reverse the hashes and hijack any session. It'd be much harder to do that if the tokens were hashed with HMAC, especially if they are long-lived (as in "remember me"). Why isn't bcrypt with a low work factor a good choice?
8 x GTX 1080 can do 200 billion MD5 hashes a second. Passwords are low entropy, so brute force with a fast hash algorithm (MD5, Blake family, SHA family, etc) is feasible. That's why it's recommended to use something like argon2, bcrypt, pbkdf2, etc for password hashing. Conversely, a token with high entropy doesn't need a slow algorithm. Brute force is not feasible anyways. A token with 128 bits of entropy is in a space the size of 2^128. 2^128 / 200 billion MD5 hashes a second / 31556952 seconds a year = ~54 quadrillion years.