Compute the AES-encryption key given the plaintext and its ciphertext?
I'm tasked with creating database tables in Oracle which contain encrypted strings (i.e., the columns are RAW). The strings are encrypted by the application (using AES, 128-bit key) and stored in Oracle, then later retrieved from Oracle and decrypted (i.e., Oracle itself never sees the unencrypted strings).
I've come across this one column that will be one of two strings. I'm worried that someone will notice and presumably figure out what those two values to figure out the AES key.
For example, if someone sees that the column is either Ciphertext #1 or #2:
BF,4F,8B,FE, 60,D8,33,56, 1B,F2,35,72, 49,20,DE,C6.
BC,E8,54,BD, F4,B3,36,3B, DD,70,76,45, 29,28,50,07.
and knows the corresponding Plaintexts:
Plaintext #1 ("Detroit"):
44,00,65,00, 74,00,72,00, 6F,00,69,00, 74,00,00,00.
Plaintext #2 ("Chicago"):
43,00,68,00, 69,00,63,00, 61,00,67,00, 6F,00,00,00.
can he deduce that the encryption key is "Buffalo"?
42,00,75,00, 66,00,66,00, 61,00,6C,00, 6F,00,00,00.
I'm thinking that there should be only one 128-bit key that could convert Plaintext #1 to Ciphertext #1. Does this mean I should go to a 192-bit or 256-bit key instead, or find some other solution?
(As an aside, here are two other ciphertexts for the same plaintexts but with a different key.)
Ciphertext #1 A ("Detroit"):
E4,28,29,E3, 6E,C2,64,FA, A1,F4,F4,96, FC,18,4A,C5.
Ciphertext #2 A ("Chicago"):
EA,87,30,F0, AC,44,5D,ED, FD,EB,A8,79, 83,59,53,B7.
[Related question: When using AES and CBC, can the IV be a hash of the plaintext?]
I am adding an answer as a community wiki because I believe that the accepted answer is dangerously misleading. Here's my reasoning:
The question is asking about being able to derive the AES keys. In that regard the accepted answer is correct: that is called a Known-plaintext Attack, and AES is resistant to that kind of attack. So an attacker will not be able to leverage this to derive the key and make off with the whole database.
But there is another, potentially dangerous attack at play here: a Ciphertext Indistinguishablity Attack. From Wikipedia:
Ciphertext indistinguishability is a property of many encryption schemes. Intuitively, if a cryptosystem possesses the property of indistinguishability, then an adversary will be unable to distinguish pairs of ciphertexts based on the message they encrypt.
The OP showed us that this column holds one of two possible values, and since the encryption is deterministic (ie does not use a random IV), and attacker can see which rows have the same value as each other. All the attacker has to do is figure out the plaintext for that column for a single row, and they've cracked the encryption on the entire column. Bad news if you want that data to stay private - which I'm assuming is why you encrypted it in the first place.
Mitigation: To protect against this, make your encryption non-deterministic (or at least appear non-deterministic to the attacker) so that repeated encryptions of the same plaintext yields different cipher texts. You can for example do this by using AES in Cipher Block Chaining (CBC) mode with a random Initialization Vector (IV). Use a secure random number generator to generate a new IV for each row and store the IV in the table. This way, without the key, the attacker can not tell which rows have matching plaintext.
My own answer, currently accepted, is a tad old and I wasn't aware of Ciphertext Indistinguishablity Attack at the time. I'd delete my own but it looks like I can't delete an accepted answer.
@VitorPy You can probably flag it for moderator attention to have it un-accepted (in which case, leave it up for historical reasons).
@VitorPy actually moderators have no ability to unaccept an answer (and don't we often wish we did!) You can comment on the question to the OP, and get him to swap it. That said, I don't think your answer is *wrong* per se (the key is still unretrievable, which is what was asked), it is just incomplete wrt what Mike is raising here.
For a block cipher with a n-bit key, if, given a plaintext block and the corresponding ciphertext, the key can be guessed in less than 2n-1 step on average, then that block cipher will be said to be "broken" and cryptographers will make a point of not using it. The AES is not broken (yet). So no worry.
A few things may still be said, though:
- Having a plaintext and the corresponding ciphertext allows an attacker to verify a potential key value.
- The 2n-1 value is actually half the size of the key space. The idea is that the attacker can try all possible keys, until one matches. On average, he will have to try half the keys before hitting the right one. This assumes that the key space has size 2n. You still have the possibility to reduce the key space: e.g., if you decide that your key is the name of a US town, then the number of possible keys is very much lower (there must not be more than 100000 towns in the USA). Hence, you get the promised 128-bit security only if your key generation process may indeed produce any 128-bit key.
- You apparently encrypt each value by stuffing it directly into an AES core. AES being deterministic, this means that two cells with the same value will yield the same encrypted block, and any attacker may notice that. In other words, you leak information about which cells are equal to each other. Depending on your situation, this may or may not be an issue; you should be aware of it.
- You do not say how you handle values longer than 16 bytes. This is not a simple issue. In all generality, this requires a chaining mode such as CBC, and an Initialization Vector (it depends on the mode; for CBC, a 16-byte random value -- a new IV for each encrypted value)(this can also fix the information leakage from the previous point).
According to your definition, AES is broken, since the computational complexity has been reduced by ~3 bits: https://www.schneier.com/blog/archives/2011/08/new_attack_on_a_1.html
The answer: No, the AES key cannot be recovered in this scenario. AES is secure against known-plaintext attack. This means that, even if an attacker knows the plaintext and its corresponding ciphertext (its encryption under some unknown AES key), then the attacker cannot recover the AES key. In particular, the attacker cannot recover the AES key any faster than simply trying possible keys one after another -- which is a process that will take longer than the lifetime of our civilization, assuming that the AES key is chosen randomly.
P.S. I noticed that whatever you are using for encryption does not seem to use an IV. This is a security risk. I don't know what mode of operation you are using, but you should use a well-regarded mode of encryption (e.g., CBC mode encryption, CTR mode encryption) with a random IV. The fact that encrypting the same message multiple times always gives you the same ciphertext every time is a security leak which is better to avoid. You can avoid this leak by using a standard mode of operation with an appropriate IV. (You probably also should use a message authentication code (MAC) to authenticate the ciphertext and prevent modifications to it.)
Salt your encryption.
That way there wont be any patters in your encryption. (There are other benefits too!)
For encryption, the usual term is not Salt, but Initialization Vector or IV. http://en.wikipedia.org/wiki/Initialization_vector http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
AES is not as easy as just building a rainbow table. The first thing you have to realize is the table requires an initialization vector. As long as you're changing this on a semi regular basis building a rainbow table (which is not really realistic.) would take a very very long time. Orders of magnitude long. Since a typical rainbow table would be essentially 2 dimensions, you would essentially need a cube of result sets to figure out both the IV and key.
If you read Thomas Pornin's post he goes into pretty great detail as to what this means, in terms of brute forcing the result.
The realistic worry is that someone with access to the database would be able to inject a string from another field (presumably because you're not using a random padding value in the column per element. Or seeding. )
If you seed the value you won't have this issue, and the only (realistic) attack on the cipher-text itself is made much more difficult.