Don't try to sanitize input. Escape output. (2020)
Limitations of input sanitization in preventing XSS attacks are discussed. Filtering unsafe characters may alter input or provide false security. Contextual escaping and validation are crucial for secure coding practices.
Read original articleThe article discusses the limitations of input sanitization in preventing cross-site scripting (XSS) attacks. It explains how simply filtering out unsafe characters can lead to unintended consequences like altering user input or providing a false sense of security. The recommended approach is to escape output based on the context in which it will be displayed, ensuring proper handling of characters in different scenarios like HTML, JSON, and SQL. The article emphasizes the importance of contextual escaping and using features like parameterized queries in SQL to prevent vulnerabilities. It also addresses the challenge of allowing users to input HTML or Markdown content, suggesting approaches like whitelisting allowed tags and attributes or using security-vetted libraries. Additionally, the article highlights the significance of input validation for ensuring data integrity and preventing malicious inputs. It provides further resources for understanding and implementing secure coding practices to mitigate XSS and SQL injection risks effectively.
Related
Simple ways to find exposed sensitive information
Various methods to find exposed sensitive information are discussed, including search engine dorking, Github searches, and PublicWWW for hardcoded API keys. Risks of misconfigured AWS S3 buckets are highlighted, stressing data confidentiality.
Beyond monospace: the search for the perfect coding font
Designing coding fonts involves more than monospacing. Key considerations include hyphens resembling minus signs, aligning symbols, distinguishing zero from O, and ensuring clarity for developers and type designers. Testing with proofing strings is recommended.
'Skeleton Key' attack unlocks the worst of AI, says Microsoft
Microsoft warns of "Skeleton Key" attack exploiting AI models to generate harmful content. Mark Russinovich stresses the need for model-makers to address vulnerabilities. Advanced attacks like BEAST pose significant risks. Microsoft introduces AI security tools.
Htmx does not play well with content security policy
HTMX, a JavaScript framework, presents security challenges due to its handling of HTML tags and external script loading. Despite some security features, HTMX usage raises HTML injection risks, complicating full security implementation.
CSS Can Get You in Jail – Browser renderers, now deemed criminals
The blog post discusses the legal risks of using CSS for numbering in legal documents on websites. It advises hardcoding numbers for accuracy and legal compliance, prioritizing precision over aesthetics.
After some investigating, I figured out how did he obtain the data.
He was one of the first 100 users, he set one of his fields to an xss hunter payload, and slept on it.
After two years, a developer had a dump of data to test some things on, and he loaded the data into an sql development software on his mac, and using his vscode muscle memory, he did a command+shift+p to show the vscode command bar, but on the sql editor it opened "Print Preview", and the software rendered the current table view into a webview to ease the printing, where the xss payload got executed and page content was sent to the researcher.
Escape input, you never know where will it be rendered.
It's worth emphasizing that there's still plenty of scope for sensible input validation. If a field is a number, or one of a known list of items (US States for example) then obviously you should reject invalid data.
But... most web apps end up with some level of free-form text. A comment on Hacker News. A user's bio field. A feedback form.
Filtering those is where things go wrong. You don't want to accidentally create a web development discussion forum where people can't talk about HTML because it gets stripped out of their comments!
Say one was updating TeX to take advantage of this --- all the normal Unicode character points would then have catcodes set to make them appropriate to process as text (or a matching special character), while "processing-marked-up" characters would then be set up so that for example:
- \ (processing-marked-up variant) would work to begin TeX commands
- # (processing-marked-up variant) would work to enumerate macro command arguments
- & (processing-marked-up variant) would work to delineate table columns
&c.
and the matching "normal" characters when encountered would simply be set.
Of course once the product is in production you can swim one direction but not fight the current going in the other. You can always move to escaping output, but retroactively sanitizing input is a giant pain in the ass.
But the problem comes in with your architecture, and whether you can discern data you generated from data the customers generated. Choose the wrong metaphors and you end up with partially formatted data existing halfway up your call stack instead of only at the view layer. And now you really are fucked.
Rails has a cheat for this. It sets a single boolean value on the strings which is meant to indicate the provenance of the string content. If it has already been escaped, it is not escaped again. If you are combining escaped and unescaped data, you have to write your own templating function that is responsible for escaping the unescaped data (or it can lie and create security vulnerabilities. "It's fine! This data will always be clean!" Oh foolish man.)
The better solution is to push the formatting down the stack. But this is a rule that Expediency is particularly fond of breaking.
I think the big problem with just escaping output is that you can accidentally change what the output will actually be in ways that your users can't predict. If I am explaining some HTML in a field and drop `<i>...</i>` in there today, your escaper may escape this properly. But next month when you decide to change your output to actually allow an `<i>` tag, then all of a sudden my comment looks like some italicized dots, which broke it.
Instead if you structure it, and store it in your datastore as a tree of nodes and tags, then next month when you want to support `<i>` you update the input reader to generate the new structure, and the output writer to handle the new tags. You preserve old values while sanitizing or escaping things properly for each platform.
However, in the stuff about SQL, you could use SQL host parameters (usually denoted by question marks) if the database system you use supports it, which can avoid SQL injection problems.
If you deliberately allow the user to enter SQL queries, there are some better ways to handle this. If you use a database system that allows restricting SQL queries (like the authorizer callback and several other functions in SQLite which can be used for this purpose), then you might use that; I think it is better than trying to write a parser for the SQL code which is independent of the database, and expecting it to work. Another alternative is to allow the database (in CSV or SQLite format) to be downloaded (and if the MIME type is set correctly, then it is possible that a browser or browser extension will allow the user to do so using their own user interface if they wish to do so; otherwise, an external program can be used).
Some of the other problems mentioned, and the complexity involved, are due to problems with the messy complexity of HTML and WWW, in general.
For validation, you should of course validate on the back end, and you may do so in the front end too (especially if the data needed for validation is small and is intended to be publicly known). However, if JavaScripts are disabled, then it should still send the form and the server will reply with an error message if the validation fails; if JavaScripts are enabled then it can check for the error before sending it to the server; therefore it will work either way.
http://www.ranum.com/security/computer_security/editorials/d...
Defining what is valid for an input field and rejecting everything else helps the user catch mistakes. It's not just for security.
Some kinds of information are tricky to sanitize. Names, addresses and such. Especially in an application or site that has global users. Do the wrong thing and you end up aggravating users, who are not able to input something legitimate.
But maybe don't allow, say, a date field to be "la la la" or even "December 47, 2023".
1) you get your input data into the form that is meaningful in the database by validating, sanitising and transforming it. Because you know what form that data should be in, and that's the only form that belongs in your database. Data isn't just output, sometimes it is processed, queried, joined upon.
2) you correctly format/transform it for output formats. Now you know what the normalised form is in the database, you likely have a simpler job to transform it for output.
It's not just lazy to suggest there's a choice here, it's wrong.
Escaping/sanitizing on output takes extras cycles/energy that can be spared if the same process is done once upon submission.
Think more sustainable.
This post has a narrow view on attackers.
Related
Simple ways to find exposed sensitive information
Various methods to find exposed sensitive information are discussed, including search engine dorking, Github searches, and PublicWWW for hardcoded API keys. Risks of misconfigured AWS S3 buckets are highlighted, stressing data confidentiality.
Beyond monospace: the search for the perfect coding font
Designing coding fonts involves more than monospacing. Key considerations include hyphens resembling minus signs, aligning symbols, distinguishing zero from O, and ensuring clarity for developers and type designers. Testing with proofing strings is recommended.
'Skeleton Key' attack unlocks the worst of AI, says Microsoft
Microsoft warns of "Skeleton Key" attack exploiting AI models to generate harmful content. Mark Russinovich stresses the need for model-makers to address vulnerabilities. Advanced attacks like BEAST pose significant risks. Microsoft introduces AI security tools.
Htmx does not play well with content security policy
HTMX, a JavaScript framework, presents security challenges due to its handling of HTML tags and external script loading. Despite some security features, HTMX usage raises HTML injection risks, complicating full security implementation.
CSS Can Get You in Jail – Browser renderers, now deemed criminals
The blog post discusses the legal risks of using CSS for numbering in legal documents on websites. It advises hardcoding numbers for accuracy and legal compliance, prioritizing precision over aesthetics.