We're no strangers to code golf, you know the rules, and so do I

  • Write the shortest program that prints the entire lyrics of "Never Gonna Give You Up" by Rick Astley.


    • Must output the lyrics exactly as they appear in the above pastebin*. Here's the raw dump: http://pastebin.com/raw/wwvdjvEj

    • Cannot rely on any external resources - all lyrics must be generated by / embedded in code.

    • No use of existing compression algorithms (e.g. gzip / bzip2) unless you include the full algorithm in your code.

    • Use any language, shortest code wins.

    Update, 1st June 2012:

    For solutions containing non-ASCII text, the size of your solution will be counted in bytes, based on UTF-8 encoding. If you use codepoints that cannot be encoded in UTF-8, your solution will not be judged as valid.

    Update, 7th June 2012:

    Thank you all for your awesome solutions! I'll be accepting the shortest answer tomorrow afternoon. Right now, Peter Taylor's GolfScript answer is winning, so get cracking on some improvements if you want to beat him! :)

    *There is a typo in the Pastebin (line 46, "know" should be "known"). You may either replicate it or not at your discretion.

    I think there may be a typo in the pastebin text. In the second to last paragraph before the final chorus (first line): "We've know each other for so long" I think should be "We've **known** each other for so long"

    @w0lf You're correct. I only scanned the lyrics to verify their accuracy, I must've missed that one. I'll accept either "know" or "known".

    I would recommend adding a list of languages to be used (as a link to a list of popular or sth) because I have seen people w defining a language that prints "hello world" with 0 bytes of code :)

    Do you count newlines?

    Maybe next time it would be more fun to define "shortest" as "minimum lines of code", "minimum number of characters" or something like this. When people start hacking away at character encoding level, then the result is just not readable so easily, thus destroying some degree of fun.

    Lines of code is pointless, because in most languages you can just stick everything into one line. If your solution is ASCII, count the characters excluding newlines. If your solution is non-ASCII (i.e. it uses character encoding to shorten the result) you count the bytes.

    26k+ views in just 3 days. WOW...

    It seems a bit counter-intuitive to have the source code of a non-Unicode-based language be measured as if it were UTF-8-encoded. But it's your problem, so you call the shots!

    How in the world did this generate so many views and votes? Whatever he did, I'm going to reverse engineer it.

    @PhiNotPi Good luck reverse engineering "Jeff Atwood tweeting you".

    @PhiNotPi It looks like it was posted on Hacker News.

    It got tweeted via the CodeGolf SE twitter account, then Jeff Atwood and Wrox Press tweeted it too. From there it got onto THN. I got a gold and two silver badges in the space of 5 minutes :)

    I think Ed H.'s 555-byte solution is still the front-runner, actually.

    @breadbox So it is! Missed that one.

    There seems to be some confusion about scoring wrt newlines ... You say "If your solution is ASCII, count the characters **excluding newlines**", so wouldn't that make Ed.H's solution score **554**? (His solution has 556 bytes *including* a newline at the end of each of its two lines of code.)

    @r.e.s. I just copied the value from his post, I didn't manually check the length. You're right, it's 554, not 556. I've updated the question to reflect this.

    Waaait a second. Newlines are not counted? If so, I can shave about 50 bytes off my solution with ease, so you might want to rethink that.

    I'm talking about newlines that are part of code structure. You certainly can't ignore instances where you've used `\n`, or had newlines as part of the song's text. Also, I only see 14 newlines in your solution.

    Oh, ok. Still, if, say, newlines in ruby/python are not counted, how about not counting semicolons in php? ) wouldn't it be simpler to just always count the bytes?

    > Also, I only see 14 newlines in your solution. ...and 30 "q"s, plus about 20 more saved by str_split. Which I could easily change to newlines (and newlines to "q"). Because newline is just a character like every other, why treat it differently?

    If the newlines later become part of the output, they count towards the size of your code.

    +1 for actually accepting the shortest solution on a [code-golf] question. :-)

    You can technically just make an infinite loop that prints a letter each time around. Eventually, the song will be I. There somewhere

    The restriction with UTF-8 makes no sense. What if my code is shorter when encoded as UTF-16 or as Latin-1? You should either count the number of bytes or the number of characters, but leave the encoding up to the author.

    I'd like to see a seed answer.

  • Ed H.

    Ed H. Correct answer

    9 years ago

    Ruby 576 557 556 (552) chars && PHP 543 chars

    Another search-and-replace solution. Note that this form of solution is essentially a Grammar-based Compression Code http://en.wikipedia.org/wiki/Grammar-based_code
    Check out http://www.cs.washington.edu/education/courses/csep590a/07au/lectures/lecture05small.pdf for a simple to understand compression example.

    I've written the substitution rules so that the starting character for each substitution is computed (they are in sequential ASCII order); it need not be present in the transition data.

    s="We; n7trangMsL8loT63Ke rules5s8d8I
    AJull commit4nt'sChatFKink: of6CHldn'tRetKisJrom<[email protected]/A= if?<sk 42DS'tLE 4?;Lo8bli=L7ee..
    I justCannaLE?2Gotta >u=Msta=.|
    Ng1Nlet? downNrun<rH=5desMt?N>cryNsayRoodbyeNtE< lie5hurt?|

    We'T3n [email protected] s8lSg6r hear9<ch: but6;Lo7hyL7BInsideCe both3Cha9Ro: S
    We3KeRa45we;QplB|1)O)NgiT, nPgiT
    (G|iT? up| howFJeel:
    | know|me|<= |
    YH|8s|o |t's been|ing|'re| a|nd|make? | yH| othM|A|ay it
    | w|D|ell| I'm|G|ou|I| f|Lh| t|er|
    (Ooh|eTrQ|RSna | g|on|ve".scan(/[^|]+/){s.gsub!((i+=1).chr,$&)}
    puts s

    implementation notes

    • The above solution is 556 chars long, but scores 552 with the removal of newlines from source code. It is slightly better scoring than the original 556 char solution I had that scored 554.

    • I use "known" instead of "know"; this makes the verse repetition identical and should make the compression better

    • I originally optimized the rules by repeatedly searching for the substitution that would yield the most decrease in current code size. However, I found that RiderOfGiraffe's substitution rules were (slightly) better than mine, so I'm now using a modified version of his rules.

    • I spent time reordering the rules so that a single substitution pass can decompress everything.

    • Due to the computed initial character in the substitution rules, my rule list has a rule for every character in a contiguous ASCII range. I found that having some "dummy" substitution rules was better for code size than having all real rules and coding a final text fixup substitution.

    • This example is small enough that you probably could successfully write a program to find the optimal (least total cost) set of rules. I would not expect doing this would yield size reductions of more than a few bytes.

    older implementation

    This older implementation has 576 characters and started with substitution rules from ugoren's bash/sed implementation. Ignoring the substitution variable renaming, my first 28 substitutions are exactly the same as those performed in ugoren's program. I added a few more to lower the overall byte count. This is possible because my rules are more efficiently represented than those in ugoren's implementation.

    puts"WeM noHtraLersB loJ;6 C rules=so do $
    & full commitment'sGhat<thinkDof;Gouldn'tKet this fromFny oCrKuy.
    -&E if9ask me1~on't @ me:MBo bliEBHee//

    We'J6n each oCr forHo loL;r hear2FchDbut;MBoHhyBH7$nsideGe both6Gha2ADon
    We6 CKame=weM>pl7|
    $ justGanna @:1#otta 8uEerstaE/|
    5?9up5let9down5runFrouE=desert:[email protected] lie=hurt:|(Ooh)5?, nI>?
    (#4| how<feeliL
    |t's been|(Ooh,K4|iJ9up)
    NI>| know|ay it
    |make9|: | you|
    You| $'m |FE |Anna |giJ|tell|Ko| to|the|iL |nd| a| w| s|eJr|ve| g|ng|'re".split("|").inject{|m,o|m.gsub((i+=1).chr,o)}.tr('&~#$',"ADGI")

    I did not try optimizing the substitution rules in this one.

    contest notes

    The search-and-replace decompression scheme works well for this contest because most languages have pre-built routines that can do this. With such a small amount of text to be generated, complex decompression schemes do not seem to be feasible winners.

    I've used only ASCII text and also avoided using unprintable ASCII characters. With these restrictions, each character in your code can only represent up to a maximum of 6.6 bits of information; this is very different than to real compression techniques where you use all 8 bits. In some sense, it is not "fair" to compare to gzip/bzip2 code size because those algorithms will use all 8 bits. A fancier decompression algorithm may be possible if you can include traditionally unprintable ASCII in your strings AND each unprintable character is still written in your code as a single byte.

    PHP Solution

    I justCannaLE?2Gotta >u=Msta=.q
    Ng1Nlet? downNrun<rH=5desMt?N>cryNsayRoodbyeNtE< lie5hurt?q

    We'T3n [email protected] s8lSg6r hear9<ch: but6;Lo7hyL7BInsideCe both3Cha9Ro: S
    We3KeRa45we;QplBq1)O)NgiT, nPgiT
    (GqiT? upq howFJeel:
    q knowqmeq<= q
    YHq8sqo qt's beenqingq'req aqndqmake? q yHq othMqAqay it
    q wqDqellq I'mqGqouqIq fqLhq tqerq
    (OohqeTrQqRSna q gqonqve"),"We; n7trangMsL8loT63Ke rules5s8d8I
    AJull commit4nt'sChatFKink: of6CHldn'tRetKisJrom<[email protected]/A= if?<sk 42DS'tLE 4?;Lo8bli=L7ee..

    The above solution takes the PHP from "a sad dude" and combines it with my substitution rules. The PHP answer turns out to have the shortest decompression code. See http://ideone.com/XoW5t

    My `sed` solution surely can't beat it. I'm working on something that hopefully has a chance - you have 75 bytes of overhead, maybe I'll cut it down (not in Ruby).

    I find **555** bytes in your program (with no trailing newline).

    The OP's scoring method is to exclude newline separators in ascii source code, so it seems this should score **554**.

    Great solution! :)

    Looks like the shortest decompression code was in PHP. Here's a 543 byte solution (my rules, using code snippets from "a sad dude") http://ideone.com/XoW5t

    I suggest putting the PHP solution in your answer. It's a pity if the shortest solution won't be shown, just because neither you nor "a sad dude" can claim full ownership.

    Kudos... I got to 641 in Python trying to automate the finding of which substrings to replace, and yours is 90 less than that!

License under CC-BY-SA with attribution

Content dated before 7/24/2021 11:53 AM