Why does MD5 hash starts from $1$ and SHA-512 from $6$? Isn't it weakness in itself?

  • I have moved this question from stackoverflow to this place. I know it may be a question about 'opinion' but I am not looking for a private opinion but a source of the final decision to keep it this way.

    I have been taught that nobody tries to open a door when one does not know that the door even exist. The best defense would be then to hide a door. It could be easily seen in the old war movies - nobody would keep a hideout in the light. It was always covered with something suggesting that 'there is nothing interesting there.'

    I would assume that in cryptography that would work the same way. Why would then hash generated by MD5 started from $1$, and telling what this is a hash in the first place, and then what kind of hash it is (MD5)?

    Now, I see that sha512 does exactly the same thing. Isn't it a weakness by itself? Is there any particular reason why we would have it done this way?

    The main question the is: Should I scramble my hash before storing it to hide this from a potential enemy? If there is no need for that then why?

    To avoid answers that suggest that obscurity is not security, I would propose this picture. It is WWII. You have just received a hint that SS is coming to your house suspecting that you are hiding partisans, and this is true. They have no time to escape. You have two choices where you could hide them - in the best in the world safe, or in the hidden hole underneath the floor, hidden so well that even your parents would did not suspect that it is there. What is your proposal? Would you convince yourself that the best safe is the best choice?

    If I know there is a treasure hidden on an island then I would like to know which island it is or I will not start searching.

    I am still not convinced. Chris Jester-Young so far gave me something to think about when suggesting that there can be more algorithms generating the same hash from different data.

    read the top answer to the linked question. Thomas is dead-on. Secrets are useful, but you want to keep the *right* secret. If you want to prevent attackers from brute-forcing your hash, then make it dependent on a secret key somewhere; *a secret you can keep*. Don't try to make your algorithm secret; it won't work. If you want to add secrecy, then add secrecy. Don't add obscurity.

    A better comparison to the situation with correctly-implemented modern cryptography: The SS knows for *certain* that you and the partisans are inside your home, and your safe has teragrams of food, a teleporter connected to America, and cannot be scratched by all the nuclear bombs in the world. You're darned right you should sit in the safe and stick your tongue out at the attackers.

    “suggesting that there can be more algorithms generating the same hash from different data” No, this does not happen. Not if we're talking about cryptographic hashes (hashes used in hash tables are a completely different matter).

  • dr jimbob

    dr jimbob Correct answer

    7 years ago

    First, there's Kerckhoffs's principle which is always desirable:

    A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

    where in this case the password is the key. So its not a goal to keep the cryptosystem secret.

    Second, you are wrong about those being md5 or sha512 hashes; the values stored in your /etc/shadow are md5crypt or sha512crypt, which involves a strengthening procedure (many rounds of a md5 or sha512 hash).

    Now if your four choices are MD5crypt, sha256crypt, sha512crypt, and bcrypt (the most popular choices in linux systems), here are four hashes all generated with $saltsalt$ (or equivalent) as a salt and hashing the password not my real password:

    >>> import crypt
    >>> crypt.crypt('not my real password','$1$saltsalt')
    '$1$saltsalt$4iXfpnrgHRXkrDbPymCE4/'
    
    >>> crypt.crypt('not my real password','$5$saltsalt')
    '$5$saltsalt$E0bMpsLR71z8LIvd6p2tD4LZ984JxyD7B9lPLhq4vY7'
    
    >>> crypt.crypt('not my real password','$6$saltsalt')
    '$6$saltsalt$KnqiStSM0GULvZdkTBbiPUhoHemQ7Q06YnvuJ0PWWZbjzx3m0RCc/hCfq54Ro3fOwaJdEAliX9igT9DD2oN1u/'
    
    >>> import bcrypt
    >>> bcrypt.hashpw('not my real password', "$2a$12$saltsaltsaltsaltsalt..")
    '$2a$12$saltsaltsaltsaltsalt..FW/kWpMA84AQoIE.Qg1Tk5.FKGpxBNC'
    

    Even without the annotation, its fairly straightforward to figure out which scheme they each use (md5crypt, sha256crypt, sha512crypt, and bcrypt are 34,55,98, and 60 chars long respectively (in base64 encoding with annotation and salt). So unless you suggest truncating the hash, or altering the hashes properties the annotation for consistency doesn't lose any security. It also gives you a method to gracefully update user passwords. If you decide that md5crypt is no longer secure, you can switch users' hashes to bcrypt on next login (and then after a period of time deactivate all accounts left on md5crypt). Or if your algorithm like bcrypt (when it was $2$) needs to be updated, because of a flaw in design you can readily identify flawed schemes when the fixed scheme went to $2a$.

    Even worse, you could try saying, I'm going to modify sha512 with new constants and round keys. That would make it superhard to break -- right? No, it just makes it super hard for you to know you didn't accidentally introduce a major vulnerability. If they can get at your /etc/shadow, they probably can also get at the library used to log you in and with time could reverse engineer your hashing scheme and this will be MUCH MUCH simpler than breaking a strong password.

    Again, the expected time to brute force a very strong passphrase stored in sha256 hash is O(2^256 ), e.g., a billion computers doing a billion sha256crypts per nanosecond (each involving ~5000 rounds of sha256), would take 300000000000000000000000 (3 x 10^23) times the the age of the universe to break it. And with sha512crypt, if each of the ~10^80 atoms in the observable universe each did a billion sha512crypts every nanosecond it would still take 10^38 times the age of the universe. (This assumes you have a 256-bit and 512-bit or higher entropy passphrase).

    I am found convinced - especially by 3 x 10^23. Of course I would not use this approach in case of the WWII case ;) Thank you very much for your effort and details.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM