Encryption and compression of Data

  • If we want both encryption and compression during transmission then what will be the most preferable order.

    1. Encrypt then compress
    2. Compress then encrypt

    Do not forget integrity protection. In most settings it is as important as encryption. It can be done with MACs (for example HMAC) in a symmetric setting or signatures in an asymmetric setting.

    That looks like a question from the coursera cryptography course's exams.

  • Polynomial

    Polynomial Correct answer

    9 years ago

    You should compress before encrypting.

    Encryption turns your data into high-entropy data, usually indistinguishable from a random stream. Compression relies on patterns in order to gain any size reduction. Since encryption destroys such patterns, the compression algorithm would be unable to give you much (if any) reduction in size if you apply it to encrypted data.

    Compression before encryption also slightly increases your practical resistance against differential cryptanalysis (and certain other attacks) if the attacker can only control the uncompressed plaintext, since the resulting output may be difficult to deduce.

    EDIT: I'm editing this years later because this advice is actually poor in an interactive case. You should not compress data before encrypting it in most cases. A side-channel attack method known as a "compression oracle" can be used to deduce plaintext data in cases where the attacker can interactively cause strings to be placed into an otherwise unknown plaintext datastream. Attacks on SSL/TLS such as CRIME and BREACH are examples of this.

    However, compression might allow new attacks in some contexts.

    Wow, never even knew about that attack. Great writeup, too!

    Attempting to compress encrypted data is a way of testing the diffusion property of the encryption algorithm, it's also a test for key material; neither should compress at all in a perfect world.

    @lynks It is not, however, a definitive test of randomness. If the encrypted file does not compress, your cipher isn't a complete joke, but may still very well be insecure in the extreme. If the encrypted file does compress, all hope is lost and you may as well hand over the plaintext to the bad guys.

    Also add to that true randomness should contain coincidental patterns from time to time, so in a large enough sample one should be able to compress a few bytes here and there anyway.

    @ewanm89 Potentially. In reality, you need reasonable runs in order to compress, and most algorithms have some sort of metadata overhead per "block" of compressed text. As such, you're unlikely to even break even.

    @ewanm89: The number of possible compressed messages of length *n* cannot be greater than the number of possible messages of length *n*. So, if we average over the set of all possible messages, the average compression ratio (compressed size divided by uncompressed size) cannot be less than 100%. Compression algorithms achieve real-world compression ratios of less than 100% by targeting common patterns at the *expense* of uncommon ones; so, a truly-randomly-generated message will usually have a compression ratio of greater than 100%.

    @ruakh only when you counter in the metadata as Polynomial says, yes, it's unlikely that the pattern will repeat enough for the dictionary to be larger than the amount of compression in the actual data. Of course a block cipher in ECB mode tends to be highly compressible but then it tends not to give random output and is open to dictionary attacks.

License under CC-BY-SA with attribution

Content dated before 7/24/2021 11:53 AM