Debunking Stroustrup's debunking of the myth “C++ is for large, complicated, programs only”

  • Stroustrup has recently posted a series of posts debunking popular myths about C++. The fifth myth is: “C++ is for large, complicated, programs only”. To debunk it, he wrote a simple C++ program downloading a web page and extracting links from it. Here it is:


    #include <string>
    #include <set>
    #include <iostream>
    #include <sstream>
    #include <regex>
    #include <boost/asio.hpp>

    using namespace std;

    set<string> get_strings(istream& is, regex pat)
    {
    set<string> res;
    smatch m;
    for (string s; getline(is, s);) // read a line
    if (regex_search(s, m, pat))
    res.insert(m[0]); // save match in set
    return res;
    }

    void connect_to_file(iostream& s, const string& server, const string& file)
    // open a connection to server and open an attach file to s
    // skip headers
    {
    if (!s)
    throw runtime_error{ "can't connect\n" };

    // Request to read the file from the server:
    s << "GET " << "http://" + server + "/" + file << " HTTP/1.0\r\n";
    s << "Host: " << server << "\r\n";
    s << "Accept: */*\r\n";
    s << "Connection: close\r\n\r\n";

    // Check that the response is OK:
    string http_version;
    unsigned int status_code;
    s >> http_version >> status_code;

    string status_message;
    getline(s, status_message);
    if (!s || http_version.substr(0, 5) != "HTTP/")
    throw runtime_error{ "Invalid response\n" };

    if (status_code != 200)
    throw runtime_error{ "Response returned with status code" };

    // Discard the response headers, which are terminated by a blank line:
    string header;
    while (getline(s, header) && header != "\r")
    ;
    }

    int main()
    {
    try {
    string server = "www.stroustrup.com";
    boost::asio::ip::tcp::iostream s{ server, "http" }; // make a connection
    connect_to_file(s, server, "C++.html"); // check and open file

    regex pat{ R"((http://)?www([./#\+-]\w*)+)" }; // URL
    for (auto x : get_strings(s, pat)) // look for URLs
    cout << x << '\n';
    }
    catch (std::exception& e) {
    std::cout << "Exception: " << e.what() << "\n";
    return 1;
    }
    }

    Let's show Stroustrup what small and readable program actually is.



    1. Download http://www.stroustrup.com/C++.html



    2. List all links:


       http://www-h.eng.cam.ac.uk/help/tpl/languages/C++.html
      http://www.accu.org
      http://www.artima.co/cppsource
      http://www.boost.org
      ...



    You can use any language, but no third-party libraries are allowed.


    Winner


    C++ answer won by votes, but it relies on a semi-third-party library (which is disallowed by rules), and, along with another close competitor Bash, relies on a hacked together HTTP client (it won't work with HTTPS, gzip, redirects etc.). So Wolfram is a clear winner. Another solution which comes close in terms of size and readability is PowerShell (with improvement from comments), but it hasn't received much attention. Mainstream languages (Python, C#) came pretty close too.


    Comment on subjectivity


    The most upvoted answer (C++) at the moment votes were counted was disqualified for not following the rules listed here in the question. There's nothing subjective about it. The second most upvoted question at that moment (Wolfram) was declared winner. The third answer's (Bash) "disqualification" is subjective, but it's irrelevant as the answer didn't win by votes at any point in time. You can consider it a comment if you like. Furthermore, right now the Wolfram answer is the most upvoted answer, the accepted one, and is declared winner, which means there's zero debate or subjectivity.


    Comments purged as they were all either obsolete or off-topic.

    Clarification: Shall the list of links be as incomplete as Stroustrup's one, i.e. skip any non-http-links that don't include `www` (including the https, ftp, local and anchor ones on that very site) and report false-positives, i.e. non-linked mentions of `http://` as well (not here, but in general)?

    Why is pointing out that ALL of the posted answers don't apply to what the OP asked, obsolete or off-topic?

    To each his own, I've been called worse. If the OP's goal wasn't to try and somehow prove that Stroustrup is wrong, then I'd agree with your assessment. But the entire premise of the question is to show how "your favorite language" can do the same thing as this 50 lines of C++ in much less lines of code. The problem is that none of the examples do the same thing. In particular, none of the answers perform any error checking, none of the answers provide reusable functions, most of the answers don't provide a complete program. The Stroustrup example provides all of that.

    What's sad is his web page isn't even valid UTF-8. Now I've gotta work around that, despite his server advertising `Content-Type: text/html; charset=UTF-8`... I'm gonna email him.

    I wish I'd thought of coming here and asking this question when I read that piece. Certainly C++ is better than it was in the past, but it's by no means optimal.

    @Dunk The other examples don't provide reusable functions because they accomplish the entire functionality of those functions in a single line and it makes no sense to make that a whole function on its own, and the C++ example doesn't perform any error checking that isn't handled natively in almost an identical manner, and the phrase "complete program" is almost meaningless.

    "You can use any language, but no third-party libraries are allowed." I don't think that's a fair requirement considering `boost/asio` is used up there which *is* a third-party library. I mean how will languages that don't include url/tcp fetching as part of its standard library compete?

    @greatwolf They don't. That's the point.

    @Jason - upvote for "C++ example doesn't perform any error checking that isn't handled natively in almost an identical manner".

    Virtually all the answers fail the task (including the original lol) because they don't pick up relative links! I did mine herE: http://forum.dlang.org/thread/[email protected]#post-nxcpwmyjfbfbjxqtmrzd:40forum.dlang.org

    Is nobody gonna talk about using regexes to parse HTML? Really? I mean Stroustrup does it himself but at least his regex doesn't rely on the HTML-attribute using `"` and only ever `"` to delimit its value. 9 out of 10 answers here would fail on ``

    @funkwurm Problems with the provided solutions have been mentioned many times, you just need to look through comments. The famous "parsing HTML with regex" answer from SO has been brought up too. Many comments have been removed by the mod though.

    @undergroundmonorail BF++ does. It is giving me strange and deviant thoughts.

    I have to admit, I'm surprised by Stroustrup's claim that most people believe that C++ is used for large programs. I (probably incorrectly) believe the opposite - that for large programs, it's worthwhile to use a language like Java or C# that makes it harder to shoot yourself in the foot!

    Its a little odd to me that Stroustrup's challenge is to write C++ code that imports no third party code and the first line (or so, I'm not going to page back and lose my post thus far) is an import of boost's asio library. It kind of makes OP's opinion suspect. But in any case, comparing different languages in this task is very much like comparing apples and oranges. It doesn't really make much sense to use a hammer to tap in a pin, but it can be done; it doesn't make much sense to write assembly code to extract url's from a web page but it can be done. I suspect you could write a RoR program

    This code snippet appears in a hacking scene on the Netflix series "Limitless"; Season 1, Episode 10, ~13:05. Proof: http://i.imgur.com/7a16H8y.png

    I'm casting a close vote as lacking an objective winning criterion because the question is tagged [tag:popularity-contest], but the "winner" paragraph at the end suggests the OP is subjectively disqualifying answers.

    @pppery I really wonder what are you trying to achieve by closing this question.

    @pppery The most upvoted answer (C++) at the moment votes were counted was **disqualified for not following the rules** listed in question. There's nothing subjective about it. The second most upvoted question at that moment (Wolfram) was declared winner. The third answer's (Bash) "disqualification" is subjective, but it's **irrelevant** as the answer didn't win by votes at any point in time. You can consider it a comment if you like. Furthermore, right now the Wolfram answer is the most upvoted answer, the accepted one, and is declared winner, which means there's zero debate or subjectivity.

    @pppery Does that mean that if the OP edited out the “Winner” section and just left the answer accepted without any comments the question would have an objective winning criteria? I agree with the OP here, pop-con is objective, and “no third party libraries”, while not a great thing to ban, still disqualifies the C++ answer, making the Wolfram answer the clear winner. Flag the C++ answer in invalid and deletion, don’t close the question because of one bad (debatable but not my point) answer

    OK, you've convinced me. I'll vote to re-open as soon as the C++ answer is deleted by a moderator (which should have been done in the first place rather than disqualifying it from the winning criteria)

    And my flag was resolved without the answer being deleted, so this question is now in a chicken-egg situation leaving no resolution other than remaining closed.

    @pppery What rule do you follow exactly that says to close a question if one answer violates its rules? Besides closing a question for zero practical reasons other than virtual points, all you have achieved is wasting of time. You can't reopen the question anyway, you aren't a mod.

  • swish

    swish Correct answer

    6 years ago

    Wolfram



    This feels like complete cheating



    Import["http://www.stroustrup.com/C++.html", "Hyperlinks"]


    So just add some honest parsing on top



    Cases[
    Import["http://www.stroustrup.com/C++.html", "XMLObject"],
    XMLElement["a", {___, "href" -> link_, ___}, ___] :>
    link /; StringMatchQ[link, RegularExpression["((http://)?www([./#\\+-]\\w*)+)"]]
    , Infinity]

    Nope, I don't see any cheating here. This challenge is about bringing out the best of your language. And that first line is the epitome of "small and readable".

    An answer that can ignore the silly arguments about catching ftp links. Brilliant.

    Came here to offer this exact solution, pleased to see others have appreciated it as well.

    @MartinBüttner In that case you might want to consider downvoting http://meta.codegolf.stackexchange.com/a/1078/12130

    @DavidMulder Who says I didn't? ;)

    @MartinBüttner Ah, if you're aware of it then you should probably enforce the community consensus though~ (and try to change the consensus if you feel so inclined).

    @DavidMulder Technically, the loophole is currently not valid, since the vote breakdown is +41/-21 (and the loophole question states that loopholes are accepted if there are at least twice as many upvotes as downvotes). A close call, admittedly, but still. ;) Furthermore, this is a popularity contest, not a code golf, and in particular, it's a pop-con about showing how easily this can be done in a given language, which is why I think the loophole doesn't really apply to this challenge anyway (since the challenge basically asks for it).

    @MartinBüttner Oh nice~, I couldn't check that. Sooo, is anything stopping people right now from adding a purpose build function to their codegolf language forks for each new question? That seems the 'sensible' next step in that case~ (It's the problem I have with most of these challenges, languages like OpenEdge, ColdFusion and Wolfram are what I tend to call framework languages, because they contain not just a language itself, but also very high level purpose build languages you would normally find in frameworks and/or libraries. The true strength of a language is not in the number of

    purpose build functions, but in the easy of access to such a function if you need it. For example with node.js this would be a `npm install jsdom` call, whereas with Mathematica you have to search the interwebs for the libraries you need).

    @DavidMulder That's a different issue. Only languages (and language versions) which were available before a challenge was posted, may be used in that challenge. If you want to discuss this further, feel free to join us in chat.

    @MartinBüttner: Actually, it's bashing C++ for not declaring "everything and the kitchen sink" to be part of the standard library, and thus making it unsuitable for really small devices, restricted environments, and other situations...

    I suppose, this `Import` statement does true Html parsing rather than the original code’s naive regex matching. So it’s not equivalent. It’s closer to what you likely want in real life, but still, not doing the same as the original…

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM