Build a Compiler Bomb

  • Introduction



    You're probably familiar with zip bombs, XML bombs, etc. Put simply, they are (relatively) small files which produce enormous output when interpreted by naïve software. The challenge here is to abuse a compiler in the same way.



    Challenge



    Write some source code which occupies 512 bytes or less and which compiles into a file which occupies the most possible space. Largest output file wins!



    Rules



    OK, so there are a few important clarifications, definitions and restrictions;




    • The output of the compilation must be an ELF file, a Windows Portable Executable (.exe), or virtual bytecode for the JVM or .Net's CLR (other types of virtual bytecode are also likely to be OK if asked for). Update: Python's .pyc / .pyo output also counts.

    • If your language-of-choice can't be compiled directly into one of those formats, transpilation followed by compilation is also allowed (Update: you can transpile multiple times, just so long as you never use the same language more than once).

    • Your source code can consist of multiple files, and even resource files, but the summed size of all these files must not exceed 512 bytes.

    • You cannot use any other input than your source file(s) and the standard library of your language-of-choice. Static linking standard libraries is OK when it's supported. Specifically, no third party libraries or OS libraries.

    • It must be possible to invoke your compilation using a command or series of commands. If you require specific flags when compiling, these count towards your byte limit (e.g. if your compile line is gcc bomb.c -o bomb -O3 -lm, the -O3 -lm part (7 bytes) will be counted (note the initial leading space isn't counted).

    • Preprocessors are permitted only if they are a standard compilation option for your language.

    • The environment is up to you, but in the interests of making this verifiable, please stick to recent (i.e. available) compiler versions and operating systems (and obviously specify which you're using).

    • It must compile without errors (warnings are OK), and crashing the compiler doesn't count for anything.

    • What your program actually does is irrelevant, though it can't be anything malicious. It doesn't even have to be able to start.



    Example 1



    The C program



    main(){return 1;}


    Compiled with Apple LLVM version 7.0.2 (clang-700.1.81) on OS X 10.11 (64-bit):



    clang bomb.c -o bomb -pg


    Produces a file of 9228 bytes. The total source size is 17+3 (for the -pg) = 20 bytes, which is easily within size limit.



    Example 2



    The Brainfuck program:



    ++++++[->++++++++++++<]>.----[--<+++>]<-.+++++++..+++.[--->+<]>-----.--
    -[-<+++>]<.---[--->++++<]>-.+++.------.--------.-[---<+>]<.[--->+<]>-.


    Transpiled with awib to c with:



    ./awib < bomb.bf > bomb.c


    Then compiled with Apple LLVM version 7.0.2 (clang-700.1.81) on OS X 10.11 (64-bit):



    clang bomb.c


    Produces a file of 8464 bytes. The total input here is 143 bytes (since @lang_c is the default for awib it didn't need to be added to the source file, and there are no special flags on either command).



    Also note that in this case, the temporary bomb.c file is 802 bytes, but this counts towards neither the source size nor the output size.



    Final Note



    If an output of more than 4GB is achieved (perhaps if somebody finds a turing complete preprocessor), the competition will be for the smallest source which produces a file of at least that size (it's just not practical to test submissions which get too big).


    If using a transpiler, does the output source code need to be under 512 bytes as well as the input source code?

    @trichoplax only the *original* source needs to be within the byte limit. The source after transpilation can be as big as you like (the Brainfuck example has an intermediate size of 802 bytes, for example)

    Is repeated transpilation allowed?

    @orip interesting; I think I'll say: only if you never have to the same language more than once. So for example, Befunge -> Java -> C would be OK, but Fortran -> Java -> Fortran wouldn't.

    So interpreted languages, unless transpiled, cannot be used for this challenge, correct?

    @LegionMammal978 yes it has to produce one of the file types I specified. But if you think you've found something which is more virtual-machine than interpreted-language, ask about it specifically and it's possible I'll allow it (it's a bit subjective so I wanted to be very restrictive to begin, with the option of opening it up)

    Python is an example of a language regarded as interpreted that compiles to bytecode.

    @trichoplax speak of the devil. I was literally about to ask Dave if python compiles to one of his specified formats. Thanks!

    @trichoplax I wasn't aware of that, but from some reading it looks like yes; compiling to Python bytecode absolutely counts. So for python, the output size would be the sum total size of all your pyc/pyo files. I'll update the question soon with these comment-based updates.

    OK, all updated. I'm off for now (it's late in England!). I'll be back to answer any other questions tomorrow. Good luck!

    A "turing-complete preprocessor" is generally not a preprocessor (step that happens before parsing) at all, but a *metaprogramming system* that gets executed at some point between parsing and codegen. There are several languages that have one, or have had one grafted on in some way.

    How about modifying the source code of a compiler so that anything it compiles is really big? And then use it to compile itself....

    Don't forget fork bombs, which does something similar to your CPU capacity.

    @WGroleau generally in code challenges you can't use anything which was specially-built for the purpose. In fact, using anything developed since the question was posted is generally not allowed.

    When you say 4GB, do you mean 4,294,967,296 bytes (4 * 2^30), or 4,000,000,000 bytes (4 * 10^9)?

    @PatrickRoberts well 4GB officially means 4,000,000,000 bytes (it's GiB for 1024^3), but I can't imagine it would make a lot of difference to any answer. The rule is only there to keep things vaguely sane!

    @Dave reason I ask is because of the footnote about scoring. If you're shooting for as close to 4GB as possible, then it actually matters which of the two it is.

    "perhaps if somebody finds a turing complete preprocessor" Common Lisp's macros have the full language available. It'd be easy to add a literal 4GB array to the source.

    Even the humble C preprocessor is close enough to Turing complete as makes no difference. Although to be honest that says more about the utter uselessness of Turing-completeness as a real-world measurement of power than anything else (for practical engineering TC is rarely useful, and rarely even used; has more negative consequences than positive).

    Are only "commercially available" compilers allowed or would it be allowed to write an own compiler?

    @MartinRosenau - WGroleau already asked a similar question; it's standard in coding challenges that you can use anything which *already existed* when the challenge began.

    Why the 4GB limit? before I saw that rule, I was imagining seeing code one-liners that would create enough material to fill a data-center, which would be awesome.

    @GiantCowFilms: You can still find many such solutions here, ie. ones which can be freely scaled by changing the exponent. (Eg. change the `4**7` in my original solution to `2**32`, and it would theoretically generate 8 exabytes of output) As the rule mentions, it exists because no-one would be able to test solutions which fill a data-center or use a terabyte of RAM, so they would be theoretical at best. (I, for one, had problems testing my own solution, with a machine that has 16GB of RAM, even though the output is only 8.5GB, since it uses significantly more RAM than that :P)

    @Dave can it be a batch file being compiled with http://www.f2ko.de/en/b2e.php ?

  • C, (14 + 15) = 29 byte source, 17,179,875,837 (16 GB) byte executable


    Thanks to @viraptor for 6 bytes off.


    Thanks to @hvd for 2 bytes off and executable size x4.


    This defines the main function as a large array and initialises its first element. This causes GCC to store the entire array in the resulting executable.


    Because this array is bigger than 2GB, we need to provide the -mcmodel=medium flag to GCC. The extra 15 bytes are included in the score, as per the rules.



    main[-1u]={1};

    Don't expect this code to do anything nice when run.


    Compile with:


    gcc -mcmodel=medium cbomb.c -o cbomb



    It took me a while to get round to testing @hvd's suggestion - and to find a machine with enough juice to handle it. Eventually I found a old non-production RedHat 5.6 VM with 10GB RAM, 12GB swap, and /tmp set to a large local partition. GCC version is 4.1.2. Total compile time about 27 minutes.



    Due to the CPU and RAM load, I recommend against doing this compile on any remotely production-related machine.



    Nice. Looks like it's off to a higher start than I'd expected!

    wow, 4GB already. Incidentally, I've been avoiding counting the leading space for extra arguments (to avoid ambiguity with combined flags, etc.) so you can claim +15 bytes rather than +16

    does something break if you use the large mcmodel? saves a byte :)

    @Sparr strangely it does for me. `crtstuff.c:(.text+0x1): relocation truncated to fit: R_X86_64_32 against symbol '__TMC_END__' defined in .data section in cbomb`. Beats me.

    also, I'm pretty sure {1} initializes *all* elements to 1, not just the first.

    Unfortunately with your latest code and GCC4.9 I get this error: `/var/folders/[...].s:6:Repeat < 0, .space ignored`. It looks like it's a mac issue though, so I'll test it again later when I'm on Kubuntu. (your earlier, ~1GB version is verified)

    @Dave I'm testing on Ubuntu 14.04.3 with GCC 4.8.4. Are you sure the `gcc` on your OSX is actually GCC?. The default was switched to LLVM a few releases back.

    I'm playing against my solution here, but... you don't need `a`. You can just use `main[1<<30]={1};`

    @DigitalTrauma yeah I'm using an explicit gcc49 install. Though the default clang (masquerading as gcc) can't handle it either. I'll test it on linux tomorrow, where I expect it will be fine.

    @DigitalTrauma it actually puts the `1,0,0,0,0,0....` as the contents of the main function. That is - the first instruction of main() is `00`, which should fail with invalid instruction exception.

    Why not use `main[1<<31]={1};`?

    @wizzwizz4 2 reasons. 1) 4GB executable is big enough as per the question. 2) `1<<31` > MAX_INT, and thus the compiler sees it as a -ve value.

    @DigitalTrauma But you could use `main[(unsigned int)1<<`... oh. :-/ Thanks for explaining!

    Oh my. This is evil. X froze for several minutes trying to compile that code. I was starting to look for another computer to possibly ssh back in and kill the gcc process before it finally came back to life. Btw. If you want a larger value than `1<<30` then `7<<28` could be an option.

    Took a few minutes, but yup - latest code is confirmed on Kubuntu.

    Would mcmodel=large also work?

    >4gb? That escalated quickly

    Can you explain HOW this behaving of the compiler occurs? I would be interested in it.

    Hmm... I wonder if there are any compilers that would let you get away with switching out the `1<

    On my computer it doesn't even compile, because I have `/tmp` in the RAM.

    @ParthianShot You can use `-1u` to save one more byte. GCC seems to be okay with it, but I don't have enough free disk space to tell for certain.

    @the_Seppi You can use the `TMPDIR` environment variable to have it put temporary files elsewhere.

  • Python 3, 13 byte source, 9,057,900,463 byte (8.5GiB) .pyc-file



    (1<<19**8,)*2

    Edit: Changed the code to the version above after I realized the rules say output size beyond 4GiB doesn't matter, and the code for this one is ever so slightly shorter; The previous code - and more importantly the explanation - can be found below.




    Python 3, 16 byte source, >32TB .pyc-file (if you have enough memory, disk space and patience)



    (1<<19**8,)*4**7

    Explanation: Python 3 does constant folding, and you get big numbers fast with exponentation. The format used by .pyc files stores the length of the integer representation using 4 bytes, though, and in reality the limit seems to be more like 2**31, so using just exponentation to generate one big number, the limit seems to be generating a 2GB .pyc file from an 8 byte source. (19**8 is a bit shy of 8*2**31, so 1<<19**8 has a binary representation just under 2GB; the multiplication by eight is because we want bytes, not bits)


    However, tuples are also immutable and multiplying a tuple is also constant folded, so we can duplicate that 2GB blob as many times as we want, up to at least 2**31 times, probably. The 4**7 to get to 32TB was chosen just because it was the first exponent I could find that beat the previous 16TB answer.


    Unfortunately, with the memory I have on my own computer, I could test this only up to a multiplier of 2, ie. (1<<19**8,)*2, which generated a 8.5GB file, which I hope demonstrates that the answer is realistic (ie. the file size isn't limited to 2**32=4GB).


    Also, I have no idea why the file size I got when testing was 8.5GB instead of the 4GB-ish I expected, and the file is big enough that I don't feel like poking around it at the moment.


    +1, but why don't `(1<<19**8,)*2`? 4GB is enough.

    @ChristianIrwan: Yeah, I'd forgotten that rule, only realized it a few minutes ago and haven't figured out what kind of edit I should make yet. :-)

    Nice. Since this is only 13 bytes, we finally have a challenger to the first-posted answer! I was only able to confirm `1<<18` on my machine (1.5GB) but I'll test it on linux later, where I expect it will work with the full 8GB (not going to try the 32TB version!)

    @Dave: The exact size might depend on the version (1.5GB sounds weird no matter what, though); I was using Python 3.3.5, and used `python -m py_compile asd.py` to generate the .pyc-file.

    @AleksiTorhamo it's small because I had to change it to `1<<18**8` instead of `1<<19**8`. The full version failed with an invalid IO operation (so I'll check it on another OS & filesystem)

    @Dave: Ahhh, I totally misunderstood what you meant. :P

    IIRC, python uses 30 bits per 32-bit word in its integer representation

    Honest question : If I run this in my terminal what will happen?

    Just a 2021 update: constant folding has been fixed in current versions of Python3. I managed to reproduce the compiler bomb with python3.5 from deadsnakes PPA.

    @AnshumanKumar: In Python interactive shell? If you put it into a variable (so eg. `v = (1<<19**8,)*4**7`), and you have enough free memory (a bit over 2GB), nothing much - it'll take a second or two to complete, and the process will use that much more memory. If you *don't* put it into a variable, Python will try to display the number, which means it needs to convert it to decimal first - which will take some time (and more memory) with a number that big. How long? Depends on the computer, but on the order of a year. But after that, it should finally print the multi-gigabyte output for you!

  • C#, about 1 min to compile, 28MB output binary:



    class X<A,B,C,D,E>{class Y:X<Y,Y,Y,Y,Y>{Y.Y.Y.Y.Y.Y.Y.Y.Y y;}}


    Adding more Y's will increase the size exponentially.



    An explanation by Pharap as per @Odomontois' request:



    This answer is abusing inheritance and type parameters to create recursion. To understand what's happening, it's easier to first simplify the problem. Consider class X<A> { class Y : X<Y> { Y y; } }, which generates the generic class X<A>, which has an inner class Y. X<A>.Y inherits X<Y>, hence X<A>.Y also has an inner class Y, which is then X<A>.Y.Y. This then also has an inner class Y, and that inner class Y has an inner class Y etc. This means that you can use scope resolution (.) ad infinitum, and every time you use it, the compiler has to deduce another level of inheritance and type parameterisation.



    By adding additional type parameters, the work the compiler has to do at each stage is further increased.



    Consider the following cases:

    In class X<A> { class Y : X<Y> { Y y;} } type param A has a type of X<A>.Y.

    In class X<A> { class Y : X<Y> { Y.Y y;} } type param A has a type of X<X<A>.Y>.Y.

    In class X<A> { class Y : X<Y> { Y.Y.Y y;} } type param A has a type of X<X<X<A>.Y>.Y>.Y.

    In class X<A,B> { class Y : X<Y,Y> { Y y;} } type param A is X<A,B>.Y and B is X<A,B>.Y.

    In class X<A> { class Y : X<Y> { Y.Y y;} } type param A is X<X<A,B>.Y, X<A,B>.Y>.Y and B is X<X<A,B>.Y, X<A,B>.Y>.Y.

    In class X<A> { class Y : X<Y> { Y.Y.Y y;} } type param A is X<X<X<A,B>.Y, X<A,B>.Y>.Y, X<X<A,B>.Y, X<A,B>.Y>.Y>.Y and B is X<X<X<A,B>.Y, X<A,B>.Y>.Y, X<X<A,B>.Y, X<A,B>.Y>.Y>.Y.



    Following this pattern, one can only imagine1 the work the compiler would have to do to to deduce what A to E are in Y.Y.Y.Y.Y.Y.Y.Y.Y in the definition class X<A,B,C,D,E>{class Y:X<Y,Y,Y,Y,Y>{Y.Y.Y.Y.Y.Y.Y.Y.Y y;}}.



    1 You could figure it out, but you'd need a lot of patience, and intellisense won't help you out here.


    This is more like the sort of insanity I was expecting! Looks like I'm off to reinstall Mono…

    Can you provide an explanation of such notorious effect?

    +1 for doing more than just initializing a large array.

    Here's an example using Try Roslyn and just 3 `Y`s.

    Verified with `mcs bomb.cs`, but I had to add `class M{static void Main(){}}` to the source (still easily within the byte limit). If you have a way to compile it without a Main method please post the command! Also, 1 more `.Y` produces ~146MB, then ~732MB, then… I'll get back to you on the next one!

    @Dave Try `mcs /target:library bomb.cs`

    I saw this question and immediately thought of you. Nice!

    There was a stackoverflow question about this - I remember reading it, but can't find it now. If you have a link, I will greatly appreciate it if you share.

    @StigHemmer are classes not just large arrays? ;)

    @PremierBromanov Is everything not just a large array, then?

    @immibis exactly

    Vladimir Reshetnikov! Good to see you here (and MSE)

    @Dave You could have put `static void Main(){}` in `X`'s definition. The compiler only cares that there is a suitable entry point available, it's not worried about which class it's in (though it still has to be told which suitable entry point is the one you're using).

    I'm going to see how far I can take this with 16GB RAM.

    Nevermind. Crashed trying with 11 `Y`s :D


  • If an output of more than 4GB is achieved (perhaps if somebody finds a turing complete preprocessor), the competition will be for the smallest source which produces a file of at least that size (it's just not practical to test submissions which get too big).




    "Template Haskell" allows Haskell code to be generated at compile-time using Haskell, and is hence a turing complete pre-processor.



    Here's my attempt, parameterised by an arbitrary numerical expression FOO:



    import Language.Haskell.TH;main=print $(ListE .replicate FOO<$>[|0|])


    The magic is the code inside the "splice" $(...). This will be executed at compile time, to generate a Haskell AST, which is grafted on to the program's AST in place of the splice.



    In this case, we make a simple AST representing the literal 0, we replicate this FOO times to make a list, then we use ListE from the Language.Haskell.TH module to turn this list of ASTs into one big AST, representing the literal [0, 0, 0, 0, 0, ...].



    The resulting program is equivalent to main = print [0, 0, 0, ...] with FOO repetitions of 0.



    To compile to ELF:



    $ ghc -XTemplateHaskell big.hs
    [1 of 1] Compiling Main ( big.hs, big.o )
    Linking big ...
    $ file big
    big: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /nix/store/mibabdfiaznqaxqiy4bqhj3m9gaj45km-glibc-2.21/lib/ld-linux.so.2, for GNU/Linux 2.6.32, not stripped


    This weighs in at 83 bytes (66 for the Haskell code and 17 for the -XTemplateHaskell argument), plus the length of FOO.



    We can avoid the compiler argument and just compile with ghc, but we have to put {-# LANGUAGE TemplateHaskell#-} at the beginning, which bumps the code up to 97 bytes.



    Here are a few example expressions for FOO, and the size of the resulting binary:



    FOO         FOO size    Total size    Binary size
    -------------------------------------------------
    (2^10) 6B 89B 1.1MB
    (2^15) 6B 89B 3.6MB
    (2^17) 6B 89B 12MB
    (2^18) 6B 89B 23MB
    (2^19) 6B 89B 44MB


    I ran out of RAM compiling with (2^20).



    We can also make an infinite list, using repeat instead of replicate FOO, but that prevents the compiler from halting ;)


    Welcome to Programming Puzzles and Code Golf. This is a **brilliant** answer, especially for a new user to this site. If you need any help (which I doubt), feel free to ask.

    @wizzwizz4: Yeah, it is a brilliant answer. It's essentially the same as mine, except that in Haskell it requires a special compiler directive to make the metaprogramming work. ;)

    When I compile with GHC 7.8.3 I get "Not in scope: ‘<$>’" (I set the code to `[...].replicate (2^10)<$>[|0|])`). I'm not experienced with Haskell; any hints on how to make this compile?

    Too bad template haskell isn't lazy enough to stream out an infinite executable.

    Hi @Dave the `<$>` function is widely used in Haskell, but was only moved to the "prelude" (the set of functions available by default) in GHC 7.10. For earlier versions you'll need to add `import Control.Applicative;` after the exising `import` statement. I just tried with GHC 7.8.4 and it works.

    I came here to post something exactly like this. Well done.

    Cool; confirmed :)

    It should be easy to make the growth here exponential here. Ben Lippmeier used to show an example on IRC which looked like: main = print x where x1 = (); x2=(x1,x1), x3=(x2,x2)... x=(xn,xn), which should be almost trivial to implement using TH. Then the file size would be proportional to 2^N (where N is FOO but optimised for code size :P) [For food measure, use !x with BangPatterns turned on, to ensure it is actually computer, if the binary doesn't end up being that big]

    @Axman6 Interesting, but remember that `FOO` is an arbitrary Haskell expression of type `Int`. The linear relationship to the *value* of `FOO` is incidental: imagine if we used `2^BAR` instead of `FOO`: that matches your proposed exponential solution without having to change the TH. We could also beat it using factorial, or Ackermann, etc.

    @Axman6 A program of length `L` can't exceed binaries of size `BusyBeaver(L)`. Any "boilerplate" bytes (those which aren't part of our numerical expression) don't count towards the argument of `BusyBeaver`, so for `B` bytes of boilerplate we're limited to `BusyBeaver(L-B)`. Therefore the overwhelmingly best strategy is to minimise boilerplate (in our case, TH): no matter how smart it looks, BusyBeaver is smarter than us, and will make better use of those bytes ( https://news.ycombinator.com/item?id=9060014 ).

    @wizzwizz4 don't be a patronizing asshole

    @user234461 Ok, I won't be... I don't know where that came from, though; I was rubbish at golfing when I first came to this site and am still not as good as this user, so I used bold face to emphasise the compliment. https://www.xkcd.com/677/

  • C++, 250 + 26 = 276 bytes





    template<int A,int B>struct a{static const int n;};
    template<int A,int B>const int a<A,B>::n=a<A-1,a<A,B-1>::n>::n;
    template<int A>struct a<A,0>{static const int n=a<A-1,1>::n;};
    template<int B>struct a<0,B>{static const int n=B+1;};
    int h=a<4,2>::n;


    This is the Ackermann function implemented in templates. I'm not able to compile with h=a<4,2>::n; on my little (6GB) machine, but I did manage h=a<3,14> for a 26M output file. You can tune the constants to hit your platform's limits - see the linked Wikipedia article for guidance.



    Requires -g flag to GCC (because it's all the debug symbols that actually consume any space), and a larger-than-default template depth. My compile line ended up as



    g++ -ftemplate-depth=999999 -g -c -o 69189.o 69189.cpp


    Platform information



    g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
    Linux 3.13.0-46-generic #79-Ubuntu SMP x86_64 GNU/Linux

    I really like this one, but I'm not sure I can accept a .o output, since I did say ELF/.exe/etc. (and compiling this fully optimises it all out!). Still, +1 (and confirmed)

    Update: As Ben Voigt points out on his answer, GCC on Linux *does* generate ELF files as .o output, and I've been able to confirm the <3,14> variant with it, so yup - this is valid.

    I was expecting something absurd to come out of C++ templates. I *wasn't* expecting the Ackermann function.

    won't Fibonacci give you a smaller code and better output size control?

    But we want bigger code! Fibonacci gives almost the same size as pure linear code (but longer compile time that the linear). You could certainly have fun with a static array of size `A+B` in each class, now I think of it...

    Unless you are really careful, the limitation here will be template depth based. Which means that the byte count will grow with the template depth parameter more than anything; only exponential. Seems sad to use Ackermann to get only exponential growth.

  • Here's my C answer from 2005. Would produce a 16TB binary if you had 16TB RAM (you don't).



    struct indblock{
    uint32_t blocks[4096];
    };

    struct dindblock {
    struct indblock blocks[4096];
    };

    struct tindblock {
    struct dindblock blocks[4096];
    };

    struct inode {
    char data[52]; /* not bothering to retype the details */
    struct indblock ind;
    struct dindblock dint;
    struct tindblock tind;
    };

    struct inode bbtinode;

    int main(){}

    "Would produce a 16TB binary if you had 16TB RAM (you don't)." - neither do I have a 16TB hard drive! I can't really verify this one, but it's cool nonetheless.

    I discovered this one by accident and watched the compiler topple over when it ran out of address space.

    In my C++ answer, I wanted to avoid doing this because you run afoul of implementation limits on size of a single object, so my templates instead generate large numbers of more reasonably-sized objects.

    Please do NOT attempt to golf this entry; golfing defeats the intent of the code sample and there are no score benefits to doing so anyway. Code is already GPL'd as of 2005.

    @Joshua: I wasn't golfing the entry, right now it is broken, because the first three lines define a variable, not a type. Then the next structure fails because its member uses an incomplete type.

    @Mego: See the revision history.

    @Joshua: In the future when you review a suggested edit, be sure to read the explanatory comment.

    In which case, what edit did you make to my code Mego? The diff says you edited the whole thing but I can't spot any differences.

    @Joshua: Check the markdown diff. Mego only added the highlighting hint.

    I would assume that the 16TB of data is very very repetitive, therefor you should be able to put both a swapfile and the output file on a compressed FS and then be able to do this, but I'm not sure.

    @shelvacu: The more immediate problem is the code causes the compiler to allocate 16TB RAM at once.

    @Joshua That's why you also put a swapfile on the compressed FS. I'm pretty sure that would work but I'm not sure. Also possibly zram: https://en.wikipedia.org/wiki/Zram

    I tried to compile it to see how fast it would lock up, i included `stdint.h` because it didn't know the type uint32_t. After that it compiles within the second and generates a segmentation fault when i run it.

    the (You Don't) makes this response one that will not age gracefully with time. I (and people of my current time period) still don't. but at some point every one will.

    @AlexBollbach: The compiler's been fixed to not require the RAM since then. But as for 16TB RAM that can't fit in current RAM physical sizes because that would make one bit smaller than one atom. It's not impossible that will be willing to make our computers big enough, but its unlikely. In addition, the actual code for this answer was for 32 bit when I first discovered it. You don't have 16 TB 32 bit RAM for a single process.

    @Joshua: I'm sorry, but there's just no way that that can be correct, as there are 1TB SD cards available, and you could fit ~6 of those per one side of a RAM stick. (So clearly you can fit at least 12TB of storage in the space of a single RAM stick without hitting any fundamental physical limits, and most motherboards have more than one RAM slot) For one atom per bit, you could actually fit on the order of 100-1000TB per square centimeter (1 atom thick), or billions of TB per cubic centimeter.

  • ASM, 61 bytes (29 bytes source, 32 bytes for flags), 4,294,975,320 bytes executable



    .globl main
    main:
    .zero 1<<32


    Compile with gcc the_file.s -mcmodel=large -Wl,-fuse-ld=gold


    The 4GB threshold means you only need `1<<30`! I haven't been able to test this one yet (looks like I'm having some general issues with my gcc installation on this machine, but I'm getting _main not found linker issues with this). I'll try on linux tomorrow & upvote if I get it to work!

    `1<<30` is good enough for C. Since this is assembler, the size is in bytes.

    Ah, makes sense.

    I ran out of memory on my system. It does generate the temporary .o file (4.1G), but the linker cannot cope. (out of memory error) Someone with a lot of ram needs to try this... It does not crash the compiler though, so should still meet the rules.

    @viraptor how *much* ram, exactly?

    @viraptor My system has 32GB of RAM and for kicks I tried to build your code. `as` manages to hand off to `ld`, but `ld` fails with this. Not even `-mcmodel=medium` seems to help.

    @IwillnotexistIdonotexist oh, that's disappointing. Could you try again after adding `.data` as the first line? (mcmodel may actually change the result in that case)

    try forcing use of `gold` linker: `gcc -fuse-ld=gold ...` compiles/links... eek! Finished in 1:29 (89 seconds) and size of 1,073,748,000 bytes.

    Oops, that was the 1<<30 version, 1<<32 version takes more than 5:48, gold dies though. Interesting...

    I'm experimenting with linker scripts. I really want to get this to work :)

    Did it ever finish assembling?

    @Mego Thank you very much! I updated the description.

    Confirmed with GCC on Kubuntu.

  • Plain old C preprocessor: 214 bytes input, 5MB output


    Inspired by my real-world preprocessor fail here.


    #define A B+B+B+B+B+B+B+B+B+B
    #define B C+C+C+C+C+C+C+C+C+C
    #define C D+D+D+D+D+D+D+D+D+D
    #define D E+E+E+E+E+E+E+E+E+E
    #define E F+F+F+F+F+F+F+F+F+F
    #define F x+x+x+x+x+x+x+x+x+x

    int main(void) { int x, y = A; }

    Experiments show that each level of #defines will (as expected) make the output approximately ten times larger. But since this example took more than an hour to compile, I never went on to "G".


    This is kinda like an xml bomb

    Specifically it is an implementation of the original "Billion Laughs".

    This is insane yet simple.

    Wow, this actually causes a segfault in GCC 4.9 and Clang. Which compiler did you use?

    @Dave: gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

    @Dave: Strange. When I compile using make, it compiles, but if I type in the exact same command that make uses, it crashes. And it doesn't seem to be related to environment variables.

    Ironically, the Tiny C Complier (which you would expect to fail even faster than other compilers) actually compiles this successfully in 0.131 seconds on Linux Mint 20.1 Cinnamon.

  • Java, 450 + 22 = 472 bytes source, ~1GB class file



    B.java (golfed version, warning during compilation)



    import javax.annotation.processing.*;@SupportedAnnotationTypes("java.lang.Override")public class B extends AbstractProcessor{@Override public boolean process(java.util.Set a,RoundEnvironment r){if(a.size()>0){try(java.io.Writer w=processingEnv.getFiler().createSourceFile("C").openWriter()){w.write("class C{int ");for(int i=0;i<16380;++i){for(int j=0;j<65500;++j){w.write("i");}w.write(i+";int ");}w.write("i;}");}catch(Exception e){}}return true;}}


    B.java (ungolfed version)



    import java.io.Writer;
    import java.util.Set;

    import javax.annotation.processing.AbstractProcessor;
    import javax.annotation.processing.RoundEnvironment;
    import javax.annotation.processing.SupportedAnnotationTypes;
    import javax.annotation.processing.SupportedSourceVersion;
    import javax.lang.model.SourceVersion;
    import javax.lang.model.element.TypeElement;

    @SupportedAnnotationTypes("java.lang.Override")
    @SupportedSourceVersion(SourceVersion.RELEASE_8)
    public class B extends AbstractProcessor {
    @Override
    public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) {
    if (annotations.size() > 0) {
    try (Writer writer = processingEnv.getFiler().createSourceFile("C").openWriter()) {
    writer.write("class C{int ");
    for (int i = 0; i < 16380; ++i) {
    for (int j = 0; j < 65500; ++j) {
    writer.write("i");
    }
    writer.write(i + ";int ");
    }
    writer.write("i;}");
    } catch (Exception e) {
    }
    }
    return true;
    }
    }


    Compilation



    javac B.java
    javac -J-Xmx16G -processor B B.java


    Explanation



    This bomb uses Annotation Processors. It needs 2 compile passes. The first pass builds the processor class B. During the second pass the processor creates a new source file C.java, and compiles it to a C.class with a size of 1,073,141,162 bytes.



    There are several limitations when trying to create a big class file:




    • Creating identifiers longer than about 64k results in: error: UTF8 representation for string "iiiiiiiiiiiiiiiiiiii..." is too long for the constant pool.

    • Creating more than about 64k variables/functions results in: error: too many constants

    • There is also a limit of about 64k for the code size of a function.

    • There seems to be a general limit (bug?) in the java compiler of about 1GB for the .class file. If I increase 16380 to 16390 in the above code the compiler never returns.

    • There is also a limit of about 1GB for the .java file. Increasing 16380 to 16400 in the above code results in: An exception has occurred in the compiler (1.8.0_66). Please file a bug ... followed by a java.lang.IllegalArgumentException.


    Neat; you've essentially made your own preprocessor, within the size limit, in a language with a compiler which natively supports custom preprocessors. It's within the rules. The final class was only 0.5GB for me, but I can confirm the method.

    Another example in Java http://habrahabr.ru/post/245333/ - it uses nested `try..finally` (code in finally block is duplicated for normal and exceptional cases) and initializer block (code from initializer block is appended to each constructor)

    I replaced the `ä` by an `i` and adjusted the numbers. Now the bomb should create a 1GB class on any system without any encoding issues. However, it now needs a lot more memory.

    ? extends TypeElement?!?

    @Victor that won't help much because method bytecode is limited to 64kb per method. To get truly large classfiles, you need to use attributes (i.e. annotations). Speaking of which, the theoretical maximum size of a classfile is around 2^65 bytes (max attribute length is 2^32+5, and you can have up to 2^16-1 attributes per field and method, and you can have up to 2^16-1 of each per class).

    @Antimony described method results in ~100MB file

    @Victor - I suppose if you create ~65k constructors, then you could get up to around 4gb. But to go much beyond that, you need to use attributes.

    Better yet, just import java.* and javax.*

  • C, 26 byte source, 2,139,103,367 byte output, valid program



    const main[255<<21]={195};


    Compiled using: gcc cbomb.c -o cbomb (gcc version 4.6.3, Ubuntu 12.04, ~77 seconds)



    I thought I'd try to see how large I could make a valid program without using any command line options. I got the idea from this answer: https://codegolf.stackexchange.com/a/69193/44946 by Digital Trauma. See the comments there as to why this compiles.



    How it works: The const removes the write flag from the pages in the segment, so main can be executed. The 195 is the Intel machine code for a return. And since the Intel architecture is little-endian, this is the first byte. The program will exit with whatever the start up code put in the eax register, likely 0.



    It's only about 2 gig because the linker is using 32 bit signed values for offsets. It's 8 meg smaller than 2 gig because the compiler/linker needs some space to work and this is the largest I could get it without linker errors - ymmv.


    As an interesting aside, the output is 2,078,451 bytes gziped with max compression = 1029:1 compression ratio.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM