Why is it wrong to *implement* myself a known, published, widely believed to be secure crypto algorithm?

  • I know the general advice that we should never design¹ a cryptographic algorithm. It has been talked about very extensively on this site and on the websites of professionals of such caliber as Bruce Schneier.

    However, the general advice goes further than that: It says that we should not even implement algorithms designed by wiser than us, but rather, stick to well known, well tested implementations made by professionals.

    And this is the part that I couldn't find discussed at length. I also made a brief search of Schneier's website, and I couldn't find this assertion there either.

    Therefore, why are we categorically advised against implementing crypto algorithms as well? I'd most appreciate an answer with a reference to an acclaimed security expert talking about this.

    ¹ More precisely, design to our hearts' content; it might be a good learning experience; but please please please please, never use what we designed.

    Comments are not for extended discussion; this conversation has been moved to chat.

  • The reason why you want to avoid implementing cryptographic algorithms yourself is because of side-channel attacks.

    What is a side-channel?

    When you communicate with a server, the content of the messages is the "main" channel of communication. However, there are several other ways for you to get information from your communication partner that doesn't directly involve them telling you something.

    These include, but are not limited to:

    • The time it takes to answer you
    • The energy the server consumes to process your request
    • How often the server accesses the cache to respond to you.

    What is a side-channel attack?

    Simply put, a side-channel attack is any attack on a system involving one of these side-channels. Take the following code as an example:

    public bool IsCorrectPasswordForUser(string currentPassword, string inputPassword)
    {
        // If both strings don't have the same length, they're not equal.
        if (currentPassword.length != inputPassword.length)
            return false;
    
        // If the content of the strings differs at any point, stop and return they're not equal.
        for(int i = 0; i < currentPassword.length; i++)
        {
            if (currentPassword[i] != inputPassword[i])
                return false;
        }
    
        // If the strings were of equal length and never had any differences, they must be equal.
        return true;
    }
    

    This code seems functionally correct, and if I didn't make any typos, then it probably does what it's supposed to do. Can you still spot the side-channel attack vector? Here's an example to demonstrate it:

    Assume that a user's current password is Bdd3hHzj (8 characters) and an attacker is attempting to crack it. If the attacker inputs a password that is the same length, both the if check and at least one iteration of the for loop will be executed; but should the input password be either shorter or longer than 8 characters, only the if will be executed. The former case is doing more work and thus will take more time to complete than the latter; it is simple to compare the times it takes to check a 1-char, 2-char, 3-char etc. password and note that 8 characters is the only one that is notably different, and hence likely to be the correct length of the password.

    With that knowledge, the attacker can refine their inputs. First they try aaaaaaaa through aaaaaaaZ, each of which executes only one iteration of the for loop. But when they come to Baaaaaaa, two iterations of the loop occur, which again takes more time to run than an input starting with any other character. This tells the attacker that the first character of the user's password is the letter B, and they can now repeat this step to determine the remaining characters.

    How does this relate to my Crypto code?

    Cryptographic code looks very different from "regular" code. When looking at the above example, it doesn't seem wrong in any significant way. As such, when implementing things on your own, it might not be obvious that code which does what it's supposed to do just introduced a serious flaw.

    Another problem I can think of is that programmers are not cryptographers. They tend to see the world differently and often make assumptions that can be dangerous. For example, look at the following unit test:

    public void TestEncryptDecryptSuccess()
    {
        string message = "This is a test";
        KeyPair keys = MyNeatCryptoClass.GenerateKeyPair();    
    
        byte[] cipher = MyNeatCryptoClass.Encrypt(message, keys.Public);
        string decryptedMessage = MyNeatCryptoClass.Decrypt(cipher, keys.Private);
    
        Assert.Equals(message, decryptedMessage);
    }
    

    Can you guess what's wrong? I have to admit, that wasn't a fair question. MyNeatCryptoClass implements RSA and is internally set to use a default exponent of 1 if no exponent is explicitly given.

    And yes, RSA will work just fine if you use a public exponent of 1. It just won't really "encrypt" anything, since "x1" is still "x".

    You might ask yourself who in their right mind would do that, but there are cases of this actually happening.

    Implementation Errors

    Another reason why you might go wrong implementing your own Code is implementation errors. As user Bakuridu points out in a comment, bugs in Crypto code are fatal in comparison to other bugs. Here are a few examples:

    Heartbleed

    Heartbleed is probably one of the most well-known implementation bugs when it comes to cryptography. While not directly involving the implementation of cryptographic code, it nonetheless illustrates how monstrously wrong things can go with a comparatively "small" bug.

    While the linked Wikipedia article goes much more in-depth on the issue, I would like to let Randall Munroe explain the issue much more concisely than I ever could:

    https://xkcd.com/1354/ https://xkcd.com/1354/ - Image Licensed under CC 2.5 BY-NC

    Debian Weak PRNG Bug

    Back in 2008, there was a bug in Debian which affected the randomness of all further key material used. Bruce Schneier explains the change that the Debian team made and why it was problematic.

    The basic gist is that tools checking for possible problems in C code complained about the use of uninitialized variables. While ususally this is a problem, seeding a PRNG with essentially random data is not bad. However, since nobody likes staring at warnings and being trained to ignore warnings can lead to its own problems down the line, the "offending" code was removed at some point, thus leading to less entropy for OpenSSL to work with.

    Summary

    In summary, don't implement your own Crypto unless it's designed to be a learning experience! Use a vetted cryptographic library designed to make it easy to do it right and hard to do it wrong. Because Crypto is very easy to do wrong.

    Comments are not for extended discussion; this conversation has been moved to chat.

    Personally, I'd make "implementation errors" the first item as it's far more likely IMO that average joe developer will make a mistake that allows their crypto to be easily broken, as opposed to being hit by a side-channel attack. I'd also recommend emphasising that the OpenSSL projects' maintainers were experienced crypto programmers, and if *they* made such a simple mistake, average joe developer is unlikely to do much better.

    I know I have answered "for" but a really, really good argument against is the well meaning implementation of "email validation" with thousands of examples that are crap, This despite the rules being clear, public, and available for the last 30 years.

    @mckenzm Why? RFC-822 defines a regex for a valid email address.

  • The side channel attacks mentioned are a big thing. I would generalize it a bit more. Your crypto library is very high risk/high difficulty code. This is often the library that is trusted to protect the rest of an otherwise soft system. Mistakes here can easily be millions of dollars.

    Worse, you often have to fight your own compiler. The trusted implementations of these algorithms are heavily studied, under many different compilers, and have small tweaks to ensure that compilers don't do bad things. How bad? Well, consider those side channel attacks that everyone is mentioning. You carefully write your code to avoid all of those attacks, doing everything right. Then you run the compiler on it. The compiler doesn't have a clue what you were doing. It can easily see some of those things you did to avoid side channel attacks, see a faster way to do that, and optimize out the code you carefully added in! This has even shown up in kernel code, where a sightly-out-of-order assignment ends up giving the compiler permission to optimize out an error check!

    Detecting things like that can only be done with a disassembler, and lots of patience.

    Oh, and never forget compiler bugs. Last month I spent the better part of a week tracking down a bug in my code which was actually perfectly fine -- it was a known bug in my compiler that was actually causing the problem. Now I got lucky here, in that the bug crashed my program so everybody knew something needed to be done. Some compiler bugs are more subtle.

    The slightly-out-of-order assignment wasn't in crypto code, and is something all C and C++ developers need to be aware of. (Of course, it's potentially more expensive in crypto code).

    @MartinBonner Agreed, not crypto code, though I did hear that it hit MySQL as a vulnerability years ago. They "properly" checked for a circular buffer overflow, but did so using pointer overflow, which is UB, so the compiler compiled their check out. Not crypto, but one of those little things that you can't afford to get wrong in crypto.

    @CortAmmon and the key difference is when crypto is involved, people will **very actively** look for all your flaws and have good reason to never tell you. They will literally throw every trick that currently exist, then move to things that have not been thought of yet.

    _Any_ code, not just crypto code, that "relies" on undefined behavior can't be trusted, period. Code that invokes UB is essentially meaningless and there's no way to contain the undefinedness. The cases you mention where the compiler "ruins" the code by applying an optimization are an artifact of the illformedness of the source it was given. That's not an error on the part of the compiler, but on the programmer.

    One doesn't fight the compiler when it comes to undefined behavior (that would imply it's misbehaving), one instead fights the C language and the way it makes UB hard to spot. If you don't like C, then either don't use it or formally propose a revision.

    On the topic of compilers, there's also the possibility of it being malware.

    @AlexReinking You share an opinion with the GCC devs =) It is true that UB is UB, unless you specifically target *a* compiler and its particular behaviors (in which case it is UB by the spec, but not UB for that compiler). The point of that section was to show how tremendously *subtle* these bugs can be. And it can even work for several versions of your compiler, passing your testing, only to crop up when you upgrade compilers later and the compiler responds to the UB differently.

    @Nelson I've been trying to figure out if that's really a difference or not. I do believe people look very actively at all exploitable software, crypto or not. I think crypto is simply a user-facing library which, if compromised, likely exposes poorly secured software underneath (which depended on the crypto). This is pure conjecture, but I would expect software such as Apache's httpd or the Linux kernel TCP/IP stack are attacked just as brutally as crypto software such as OpenSSL. Then again, I'd also recommend the "do not roll your own" for those sorts of libraries as well!

    @CortAmmon - That can be reasonable, though I'll caution that there's a difference between UB and implementation-defined behavior. If you take that philosophy, then you must also acknowledge that you are dependent on not just the vendor, but the particular version and patch level.

    @CortAmmon Yes, people go after all exploitable software, but crypto is seen as a high-value target because it's used to protect things of value. If an attacker can leverage a flaw in a crypto library that's part of an authentication scheme to gain access to a system, they can do a lot of damage.

    With very high risk/high difficulty code you really want to go for a well-vetted, possibly even open source, crypto library. Many pairs of eyes have a higher chance to spot errors, then few.

    Paranoid thought on side effects of having a strong culture discouraging people rolling their own: it guarantees that everyone ends up using a tiny number of libraries for all crypto work... making it vastly more tractable for any capable group to compromise the majority of all communication by targeting about a half dozen libraries.

    @Murphy That's conspiracy theory thinking.... which means you fit right in with the crowd =) In all seriousness, that's the idea behind open source. More eyes on the code means more white-hat eyes on the code as well as black-hat eyes. Whether that idea *actually* works out in our favor is something we will learn (or not learn!) in the coming decades! Good addition to the discussion! (If I may proffer a case study to further your argument, Ken Thompson's login backdoor is an example of this revealed all the way back in 1984)

    @CortAmmon a little. My go-to example is the Underhanded C Contest where the goal is to write code that can pass code review while including something underhanded. I kinda wish there was a framework for making it safer for people to roll their own code. Encrypt with standard library first... then drop it into a sandbox where an amateur can layer on their own random method with separate keys without significant risk of weakening the inner layer. Would defeat tactics targeting standard libraries.

    @Murphy that seems like a recipe for failure. Encrypt something then let someone else play with the result. Hmm, let me cache the results of this encrypted password and do something special if someone wants it again. You've just proposed a way to weaken encryption.

    @iheanyi hence why I mentioned some kind of framework to seperate the other implementation. If you don't re-use keys or do similarly stupid and work with the encrypted block of data then you should no more be able to weaken it than all the other random code running on your server doing tasks or the server transmitting that encrypted block over the network should be able to weaken the encryption. (though it is important that the most solid implementation go first) Monoculture in the crypto library market makes the whole industry vulnerable.

    @Murphy hahaha, your "framework" apparently is "don't write buggy code or code that does something wrong".

    @Murphy If I'm reading you correctly, you're talking about using multiple encryption? You first encrypt with something "home brew" for diversity, then encrypt with a standard encryption method before sending it out? Hopefully using different keys, of course.

    @CortAmmon Cascade cipher and it's very important that the best algorithm go first. https://link.springer.com/article/10.1007/BF02620231 And yes you must use different keys. If I was some shadowy figure in one of the big gov intelligence agencies I would probably just drop 10 million on subverting each of the top 10 most used crypto libraries then another few million on a PR campaign focusing on crypto discussion telling everyone to to never ever create their own implementations.... and call it a day.

  • The case against rolling your own crypto is that bugs can hide in crypto software without symptoms, even in the face of extensive tests.

    Everything will seem to function perfectly. For example, in a signing/verifying application, the verifier will o.k. valid signatures and reject invalid ones. The signatures themselves will look like gibberish to the eye. But the bug will still be there, waiting for an actual attack.

    Did you ever typo a character in your code and didn't notice, causing an editor highlight or a fast compile or runtime error, then promptly fixed it? If it had no highlight, compiled and ran with no visible symptoms, would you ever catch that typo? That's the level of gotcha in rolling your own crypto.

  • Even in situations where side-channel attacks are not possible, cryptographic algorithms often have implementation details that are security-critical but not obvious. Two examples:

    • The ECDSA signature algorithm requires the use of a random integer when generating a signature. This integer needs to be different for each signature generated with a given private key. If it's re-used, anyone who gets two signatures can recover the private key using basic modular arithmetic. (Sony made this mistake with the copy protection for the PlayStation 3, using the same number for every signature.)

    • Generating the keypair for the RSA algorithm requires generating two random large prime numbers. Under normal conditions, recovering the private key requires integer factorization or solving the RSA problem, both very slow mathematical operations. If, however, a keypair shares one of its primes with another keypair, then the private keys of both pairs can easily be recovered simply by computing the greatest common divisor of the two public keys. (A number of routers generate SSL certificates on first powerup, when there isn't much randomness available. From time to time, two routers will happen to generate certificates with overlapping keypairs.)

  • I think the small print says:

    It's OK to implement a cryptographic algorithm as long as your code is bug free and avoids every pitfall on every platform (OS and architecture) where the code will run.

    For example, some implementations probably have extra code to prevent side channel attacks. It's not intrinsic to the algorithm, but it is required to make the implementation safe. This is probably one point of many.

  • It's extremely easy to get cryptography wrong if you implement it yourself and don't have an extremely solid understanding of it. Of the home grown implementations I've seen in my career, I can't think of a single one that did not have catastrophic weaknesses in it that were easily exploited leading to an outright break in most cases or at least a major weakening of the protection.

    Beyond that, even if you have the skill set and understanding to do your own implementation, the possibility of other weaknesses against the implementation itself is high for things like timing attacks or actual bugs in implementation that may leak information directly even if things work correctly in an ideal case. For these cases, it isn't that the implementers have a better understanding necessarily so much as a lot more people have used and tested the implementation and there are far more people looking to make sure it is secure.

    If you implement yourself, you have a very small number of white hats looking at it and a potentially large number of black hats, so you are out numbered by attackers. By using a large, highly used implementation, it balances the number of white and black hat hackers attacking it to be a more even mix.

  • I’d like to offer a slightly different perspective...

    It’s not that nobody should ever implement cryptography. After all, somebody has to do it. It’s just an extremely daunting task and you should ask whether you have the necessary experience and resources at your disposal.

    If you have strong expertise in relevant fields of mathematics and computer science, a strong team of peer reviewers, methodical and careful testing in all environments, keep up with relevant literature, understand the design and implementation pitfalls that are specific to crypto... then sure, go ahead and implement crypto.

    ..and if you think there's nothing already available that is suitable for your use and will be as good as what you will make.

  • Well this escalated quickly. I realise this will not be popular but, if you think you know what you are doing and you have a good understanding of the language your using and the way in which its being used, then you may well be perfectly safe writing your own implementation of some crypto primitives.

    For example if you are checking hashes match the expected value before booting some firmware, implementing the hashing algorithm is likely to be fine. There is very little feedback to the attacker, if say an incorrect value gets no response at all.

    However this rarely occurs in the wild as there is already a SHA256 implementation in every language you can think of: why would you copy it out with some different variable names. What happens is either someone decides to be clever or wants a different behaviour, or doesn't understand the nuance (like side channels etc).

    The thinking among the community seems to be: fewer cowboys is better, so scare everyone off. This may well be right but I think the advice is easier to follow without making any remarks (like never implement your own) which seem over zealous.

    That said, its easy to forget that while you know exactly when you don't understand most programming things because they don't work as you expect. Crypto always works as you expect in the 'gave the right answer' sense. However knowing what you don't know that others might about crypto is a non-trivial task. This is a mindset people have difficulty with. Hence people asking "why can't I", not "what's wrong with my proof I can".

    Overall I am mind to maintain the "if you have to ask: don't" stance is probably for the best. But that does't mean your thought process is wrong. Just that those giving out the advice can't really be sure it isn't.

    Thankfully SHA-256 is naturally side-channel resistant.

    The idea that "if nobody is allowed to do *X* then, *who* is allowed to do *X*?" is rather common in any question that ends up as "Don't roll your own". The idea of my answer is not that you should never ever implement crypto code. The idea is "If you implement your crypto code, you have to be aware of many more things than your average code" and the run-of-the-mill dev will probably fail at delivering good crypto code.

    @forest Is that sarcasm? Otherwise, what makes SHA-256 naturally side-channel resistant?

    @MechMK1 It isn't sarcasm. SHA-256 is naturally side-channel resistant because it uses additions, fixed-distance rotations, and XORs (that is, it's an _ARX_ function), which are all constant-time operations on most modern processors. Unlike, say, AES, it does not use any secret-dependent lookup tables. This means that even a naïve implementation will resist side channels, whereas implementing AES in constant time requires an understanding of low-level CPU architecture.

    The same is actually also true with MD4, MD5, SHA-1, the rest of the SHA-2 family, and RIPEMD160, which are all very closely related ARX hashes (their designs all stem from MD4 and improve on it in various ways). There are also some _ciphers_ that are naturally side-channel resistant, like ChaCha20. Pretty much any pure ARX function will be easy to implement without worrying about constant time behavior.

  • Very simply, older more widely used encryption software, has been subjected to more testing (friendly and unfriendly) and more analysis.

    Moreover, the history of encryption is littered with broken software, some of which was authored by eminent expert cryptographers.

    So, the codes that you can have the most confidence in, are very often, the codes that have been around for a while.

  • Another reason that goes hand in hand with many of the other answers, from the fact that doing encryption well is hard:

    Getting encryption right is expensive.

    Getting encryption wrong is expensive.

    Because properly implementing encryption is such a difficult thing to do that requires oodles of work, from a business perspective it's unlikely to make sense that you implement your own cryptography.

    And then you have the risks of getting it wrong, which could include all kinds of negative consequences, including fines, bad publicity, and worse depending on what you're supposed to be encrypting.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM