What's the difference between HMAC-SHA256(key, data) and SHA256(key + data)

  • Is there anything different about how secure these two hashing algorithms are? Does HMAC "fuse" the data and the key in a special way that's more security-aware?

  • Yes, HMAC is more complex than simple concatenation.

    As a simplistic example, if you were to simply concatenate key + data, then "key1"+"data" yields identical results to "key"+"1data", which is suboptimal. HMAC will yield different results for each. There are other flaws with simple concatenation in many cases, as well; see cpast's answer for one.

    The specification for HMAC is called RFC2104, which you should read if you have this level of interest.

    In summary, to implement HMAC, you should first:

    Create "ipad", which is 0x36 repeated BLOCKSIZE times. Create "opad", which is 0x5c repeated BLOCKSIZE times.

    • Note that BLOCKSIZE is 64 bytes for MD5, SHA-1, SHA-224, SHA-256, and 128 bytes for SHA-384 and SHA-512, per RFC2104 and RFC4868.

    Then HMAC is defined as:

    HASH(Key XOR opad, HASH(Key XOR ipad, text))

    or, in detail from the RFC,

    (Pretext: The definition of HMAC requires a cryptographic hash function, which we denote by H, and a secret key K. We assume H to be a cryptographic hash function where data is hashed by iterating a basic compression function on blocks of data. We denote by B the byte-length of such blocks.)

    (1) append zeros to the end of K to create a B byte string
        (e.g., if K is of length 20 bytes and B=64, then K will be
         appended with 44 zero bytes 0x00)
    (2) XOR (bitwise exclusive-OR) the B byte string computed in step
        (1) with ipad
    (3) append the stream of data 'text' to the B byte string resulting
        from step (2)
    (4) apply H to the stream generated in step (3)
    (5) XOR (bitwise exclusive-OR) the B byte string computed in
        step (1) with opad
    (6) append the H result from step (4) to the B byte string
        resulting from step (5)
    (7) apply H to the stream generated in step (6) and output
        the result
  • There's actually a very big problem with SHA256(key||data): SHA-256, along with SHA-512, SHA-1, MD5, and all other hashes using the Merkle–Damgård construction, is vulnerable to a length extension attack: given H(x), it's very simple to find H(x||y), even if you only know the length of x, because of how the construction works.

    (Essentially, the construction works like this: You have a variable state that starts at some fixed value specified in the algorithm. You split the input to the hash function into blocks of size specified in the algorithm (padding the last block if it's too short), and for each block, you use the current block and the current state to compute the new state using some special function specified in the algorithm. The value of state after processing the last block is the hash value. With any function using this construction, if you have the length of x, you can compute the padding p used. Then if you have H(x), you have the state after processing every block of x||p, which means you can proceed from there to compute H(x||p||y)).

    That means that an attacker who knows the length of your MAC key and knows a particular value of SHA256(key||data) can easily compute SHA256(key||data||otherdata) for some given otherdata. They can choose most of the other data, but even if they couldn't, it's a fatal flaw in a MAC scheme if an attacker without the key can forge any MAC-data pair from other legitimate MAC-data pairs.

    Incidentally, SHA256(data||key), while not vulnerable to length extension, is vulnerable to collisions in SHA256, which can also produce collisions in the proposed MAC, due to the same iterated construction. HMAC's nesting prevents these and various other attacks. With non-Merkle-Damgård hashes, you don't necessarily need the HMAC construction, though.

    For SHA256(data||key), both the data and the key you mentioned were fixed length, right? If not, SHA256(data||otherdata||key) is still feasible I would suppose?

License under CC-BY-SA with attribution

Content dated before 7/24/2021 11:53 AM