Sanitizing input for parameterized queries

  • If I use fully parameterized queries everywhere, is it still necessary and/or security-relevant to somehow sanitize input? E.g. check that mail addresses are valid before sending a parameterized query against the database, or filtering out certain special characters from text?

    I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.

    Right now, we have full unicode support against SQL Server (MySQL seems to have problems with Emojis), I'm not sure how to filter out security risks without losing that property.

  • marcelm

    marcelm Correct answer

    4 years ago

    No, it's not necessary. But please, read on.

    Input sanitization is a horrible term that pretends you can wave a magic wand at data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software.

    Data that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.

    So what should we do? Make sure the data is never in a position to do harm.

    The best way to achieve this is to avoid interpretation of the data in the first place. Parametrized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.

    For many other situations, the data still needs to be embedded in other formats, say, HTML. In that case, the data should be escaped for that particular language at the moment it's embedded. So, to prevent XSS, data is HTML-escaped at view time. Not at input time. The same applies to other embedding situations.

    So, should we just pass anything we get straight to the database?

    Maybe. It depends.

    There are definitely things you can check about user input, but this is highly context-dependent. Because sanitization is ill-defined and mis-used, I prefer to call this validation.

    • For example, if some field is an supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
    • You can certainly do some validation on email fields (although some people argue there's not much you can do besides checking for the presence of a @, and they have a good point).
    • You can require comments to have a minimum and maximum length.
    • You should probably verify that any string contains only valid characters for its encoding (e.g., no invalid UTF-8 sequences).
    • You could restrict a username to certain characters, if that makes sense for your userbase.
    • A minimum length for passwords is, of course, incredibly common.

    As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They are not to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.

    Users should be free to type '; DROP TABLE users; -- without their post being rejected or mangled to \'; DROP TABLE users; --. Note that I'm able to include such "malicious" content on sec.SE!

    So, to answer your original question:

    ... is it still necessary and/or security-relevant to somehow sanitize input?

    No, it is not. But please do properly escape the data where needed before outputting it. And consider validations, where applicable.

    I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.

    Then escape or filter the data before outputting to those tools, but don't mangle data in your database.

    But really, those scripts should be fixed, or rewritten with security in mind.

    (MySQL seems to have problems with Emojis)

    Little off-topic, but have a look at the utfmb4 charset for MySQL ;)

    To add to the problem with sanitizing, if you write a sanitizer that strips out e..g HTML script tags, did you remember to check if any new script tags appeared in the "cleaned" version? IIRC that led to a real web vulnerability; they should have rejected the problematic input instead of trying to clean it.

    "A minimum length for passwords is, of course, incredibly common." Your hashing algorithm should take care of this for you, in terms of validating the length of the string being inserted into the database ;-)

    Problem with writing everything back to the database is that another system that isn't very secure may consume that data. For example your bank website allows you to enter a nice redirect script tag when you change your address, okay next time you log into the website it will either just redirect you or show as plain text, so no harm done as it won't affect other customers. What about the system that the call centre uses when you phone up? Would that redirect them to a hostile site when they view your account?

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM