How to safely store sensitive data like a social security number?

  • I am looking for a way to safely store personal information with low entropy safely.

    I have the following requirements for the data:

    • Must be able to search (i.e. to look up an existing piece of data) but not view
    • Other systems must be able to recover the real value
    • The system must be reasonably well performant (options in seconds not hours)

    I think a system of encrypting the data using a public key is my best option. I can keep the private key offline so the individual value cannot be directly recovered. However I think that an attacker could use the encryption process as an oracle and recover the data due to its low entropy.

    Any ideas on how to improve the security of this system? Not collecting this data is not an option. There will be additional layers around this data (access control, logging, physical security, etc) so I am just focused on this part of the system.

    What's your threat model? What kinds of attackers, with what resources?

    The main attack channel is assumed to be exploiting the application itself. However the application is on an isolated network in a physically secure area. As far as attackers resources its not a large system so I doubt it would get a lot of resources aimed at it.

    What do you mean by "able to search ... but not view"? If it is low-entropy data, can I search for all possibilities to view the data?

    Do you need to be able to search only for metadata? Or excerpts from the file itself?

    @AriTrachtenberg You can search using the value as a key but the system will never display the value.

    So, if the values are low-entropy, can't I fuzz through all possible values and establish what is in the database?

  • David

    David Correct answer

    7 years ago

    What you're looking for is deterministic encryption: that the same value encrypted twice gives the same output. Given deterministic encryption with a key K, an attacker would need the key to determine which SSN maps to which encrypted value. You can still perform searches on the deterministically encrypted data, but only equivalency comparisons (==, !=).

    Examples of deterministic crypto that would work:

    • Block ciphers in ECB mode, if the data is <1 block long
    • Block ciphers in CBC mode, with a static IV.
    • Block ciphers in CBC mode with an IV derived from the plaintext. (Note that you don't want to store the IV then, so decryption without the plaintext is thus impossible, so this is a search-only option.)

    What won't work:

    • CTR Mode with a static IV (an attacker can then use multiple ciphertexts to recover the keystream & plaintexts)
    • CBC Mode with a random IV (can't search)
    • Any stream cipher (same as CTR mode)

    Note that, in all cases, you are giving up ciphertext indistinguishability, but that's a core requirement of being able to search on the ciphertexts.

    You do need a mechanism to share the key with other systems that need access to the plaintext, but an attacker who gains access to a database backup, SQL injection, or any other attack that gives access only to the database won't be able to discern the plaintexts.

    PKI is not useful here, as you point out, as having the public key allows to enumerate the values and recover them, if you're using a deterministic PKI cryptosystem (plain, unpadded, RSA, for example). Using a non-deterministic PKI (padded RSA) will not allow you to search on the ciphertexts.

    I would review whether you really need to encrypt small, easily brute forced plaintexts. What is your threat model? Can you protect against these threats in other ways?

    this is close to answer I was thinking about. However is there any difference between a deterministic block encryption and an unpadded public key encryption?

    If the public key is protected as well as a symmetric key would be, no. If the public key is known to the attacker, it is trivial to enumerate the range of something as small as SSNs.

    +1 for novel implementation of a lossless one way function.

    CBC with a static IV would leak information. If (and only if) two plaintexts start with the same block value then the first ciphertext block will also be the same. That can go on with the 2nd, 3rd, ... blocks. The first block where two cipher texts differ is in the same position as the first block where the two plaintexts differ.

    What package should I use to deterministically encrypt SSN in django python?

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM

Tags used