Why are programs written in C and C++ so frequently vulnerable to overflow attacks?

  • When I look at the exploits from the past few years related to implementations, I see that quite a lot of them are from C or C++, and a lot of them are overflow attacks.

    • Heartbleed was a buffer overflow in OpenSSL;
    • Recently, a bug in glibc was found that allowed buffer overflows during DNS resolving;

    that's just the ones I can think off right now, but I doubt that these were the only ones that A) are for software written in C or C++ and B) are based on a buffer overflow.

    Especially concerning the glibc bug, I read a comment that states that if this happened in JavaScript instead of in C, there wouldn't have been an issue. Even if the code was just compiled to Javascript, it wouldn't have been an issue.

    Why are C and C++ so vulnerable to overflow attacks?

    With great power comes great responsibility

    This answer and this answer might be interesting reads. It basically comes down to the design of the language, and the level it was implemented.

    @RoraΖ there are tools for compiling C to javascript though, like emscripten. http://dankaminsky.com/2016/02/20/skeleton/, near the bottom is what I am referring to.

    Your question is kind of like "Why do only Windows computers get Windows viruses?". Because Windows Viruses are only possible on Windows Computers. C and C++ get buffer overflow vulnerabilities from their ability to do unchecked pointer arithmetic. Most other languages don't have this capability, and thus can't have buffer overflows. Your question also doesn't consider the popularity of these languages. (Perhaps other languages are MORE problematic, but aren't used as much, thus they have less total vulnerabilities).

    Comments are not for extended discussion; this conversation has been moved to chat.

    In C++ one of the reasons for buffer overflows is to fail in modern C++ and ignore more safe concepts like STL. If you use C++ like C you'll get what you deserve.

    There was a brilliant comment before that seems to have been deleted: "It's because a scalpel cuts more than a safety scissors"

    C/C++ is also most lickly to be used for software that is exposed to the most risk and attacks.

    Heartbleed was for me the result of two bad practices in programming: 1. Use of goto statement and 2. lack of programming standard like "Forbid the usage of `if`statement without braces". With this, it would not happens. After a lack of unit testing is probably another reason as well... I agree that C and C++ are more prone for that attack as they are quite low level language and it is often the developer's responsibility to prevent a bad usage.

  • C and C++, contrary to most other languages, traditionally do not check for overflows. If the source code says to put 120 bytes in an 85-byte buffer, the CPU will happily do so. This is related to the fact that while C and C++ have a notion of array, this notion is compile-time only. At execution time, there are only pointers, so there is no runtime method to check for an array access with regards to the conceptual length of that array.

    By contrast, most other languages have a notion of array that survives at runtime, so that all array accesses can be systematically checked by the runtime system. This does not eliminate overflows: if the source code asks for something nonsensical as writing 120 bytes in an array of length 85, it still makes no sense. However, this automatically triggers an internal error condition (often an "exception", e.g. an ArrayIndexOutOfBoundException in Java) that interrupts normal execution and does not let the code proceed. This disrupts execution, and often implies a cessation of the complete processing (the thread dies), but it normally prevents exploitation beyond a simple denial-of-service.

    Basically, buffer overflow exploits requires the code to make the overflow (reading or writing past the boundaries of the accessed buffer) and to keep on doing things beyond that overflow. Most modern languages, contrary to C and C++ (and a few others such as Forth or Assembly), don't allow the overflow to really occur and instead shoot the offender. From a security point of view this is much better.

    *"From a security point of view this is much better."* While this is certainly true, it also makes some types of programming -- particularly, operating system programming -- significantly more difficult. Remember that C's heritage traces back to being a programming language designed to implement Unix in a portable manner; for good reason, C is sometimes referred to as **portable assembler**.

    Comments are not for extended discussion; this conversation has been moved to chat.

    @MichaelKjörling True, but then again, there's plenty of Microsoft Research OSes that build on entirely managed (and in this one way, entirely secure) code, including static verification. Microsoft spends a lot of money on fixing this problem systematically, rather than waiting for people to wisen up. As always :D Performance is always tricky, but then again, you get plenty of opportunities for optimization with managed and reflectable code than you ever get with assembly - for a lot of server software, they even managed to get a sizeable performance increase thanks to that.

    @Luaan I doubt the major difference was moving from "assembly" to "managed code". If anything, it seems more likely to have been due to moving from non-JITed to JITed code. With compilation time optimizations, you have to pick a lowest baseline that you are willing to support. With JITed code, you can optimize for the specific machine you are running on. In principle, you probably could JIT code written in C; I'm not sure if anyone has tried that, though...

    @MichaelKjörling Actually, many of them aren't JIT compiled. They are still compiled for the specific hardware configuration, though. But to get any effective at JITted compilation, you need a lot of extra information, and a lot of limits - C is simply way too freeform to allow a lot of meaningful optimizations even from source code, much less the compiled code. Even something as simple as those bounds checks - there's no way for a C-compiler to do bounds checking for you, since you're just manipulating some random pointers, as far as the compiler knows. The same goes for safely omitting them.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM