Generating an unguessable token for confirmation e-mails

  • I'm generating a token to be used when clicking on the link in a verification e-mail. I plan on using uniqid() but the output will be predictable allowing an attacker to bypass the confirmation e-mails. My solution is to hash it. But that's still not enough because if the hash I'm using is discovered then it will still be predictable. The solution is to salt it. That I'm not sure how to do because if I use a function to give variable salt (e.g. in pseudocode hash(uniqid()+time())) then isn't the uniqueness of the hash no longer guaranteed? Should I use a constant hash and that would be good enough (e.g. hash(uniqid()+asd741))

    I think all answers miss an important point. It needs to be unique. What if openssl_random_pseudo_bytes() procduces the same number twice? Then one user wouldn't be able to activate his account. Is people's counter argument that it's unlikely for it to produce the same number twice? That's why I was considering uniqid() because it's output is unique.

    I guess I could use both and append them together.

    Well no hard feelings but they didn't answer the question. It's been too long to remove the down votes.

    @Celeritas Quoting from Tom's answer: “making collisions so utterly improbable that you don't need to worry about them”. The token has to be generated in such a way that it won't collide with a value picked by the attacker, and the attacker may be using `openssl_random_pseudo_bytes` as well, to the token will not collide with another token that you generate (except with less-likely-than-a-meteorite-strike probability).

    @Celeritas **All** of the answers have directly answered your question. The issue is with _your own_ lack of knowledge. Honestly, there's nothing wrong with not knowing stuff, I certainly had no idea about crypto 6 months ago. `openssl_random_pseudo_bytes` gets its random bytes from sources that the OS has put a lot effort in generating. It gathers _a lot_ of entropy to come up with them. So collisions are very unlikely, they're at least as unlikely as collisions from `uniqid()` if not even more.

    @TildalWave That's obvious to someone who's been immersed in security or crypto for a while. Please consider that it is not obvious to everyone. The relationship between randomness and uniqueness is actually fairly subtle.

    In practice using "truly random" bytes is a better way to ensure uniqueness (keeping in mind the birthday problem) than any other source of information. Other methods (like time or UUIDs) require more caution, increases the likelihood of repetitions, or needs more organization. And those methods are, of course, more predictable than random data.

  • Tom Leek

    Tom Leek Correct answer

    8 years ago

    You want unguessable randomness. Then, use unguessable randomness. Use openssl_random_pseudo_bytes() which will plug into the local cryptographically strong PRNG. Don't use rand() or mt_rand(), since these are predictable (mt_rand() is statistically good but it does not hold against sentient attackers). Don't compromise, use a good PRNG. Don't try to make something yourself by throwing in hash function and the like; only sorrow lies at the end of this path.

    Generate 16 bytes, then encode them into a string if you need a string. bin2hex() encodes in hexadecimal; 16 bytes become 32 characters. base64_encode() encodes in Base64; 16 bytes become 24 characters (the last two of which being '=' signs).

    16 bytes is 128 bits, that's the "safe value" making collisions so utterly improbable that you don't need to worry about them. Don't go below that unless you have a good reason (and, even then, don't go below 16 anyway).

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM

Tags used