Cross-Site Scripting (XSS) is the most pervasive vulnerability present in Web applications today. That being said, it is possible to build Web apps that are impervious to XSS by arming yourself with an understanding of the threat and a basic toolbox of encoding functions.
XSS Review
The attack occurs in a variety of scenarios where data is taken in by your website and then replayed to the user as an executable script. For example, imagine navigating to the following URL:
http://www.contoso.com/shopping?name=<script>eval(name)</script>
If the website were to replay the query string parameter into its HTML markup verbatim, malicious script would execute on the page. Given the same-origin policy security model of the browser, this script could perform actions or access data on behalf of the user behind the keyboard.
There are numerous other ways that an XSS vulnerability might arise. For example, imagine your Web application presents a page with a list of users. If one of the users managed to set their visible name to a SCRIPT
element, we then have XSS, though this scenario does not involve query string parameters per se.
Alternatively, consider a situation where an onerror
attribute results in malicious script execution (as opposed to a SCRIPT
element). How many mechanisms like this exist within HTML/JavaScript that enable script execution? It turns out that there are a lot. Fortunately you don't need to be an XSS expert to prevent XSS vulnerabilities from being introduced in application logic.
Best Practice
Generally speaking, the strategy to pursue in building application code is to encode potentially untrusted content appropriately for the context in which it's being output on the page.
It's worthwhile to define these terms. Potentially untrusted content could be input from the user to the website, or even information stored in a database. If your Web application takes input from the URL, that data is potentially untrusted content because the data could have been supplied by an attacker. Information from cookies is not generally directly suspect because of restrictions enforced by the same origin policy; however, if the data originally came from the URL or an HTTP POST
, you should consider it suspect. Perhaps the easiest way to define potentially untrusted content would be to say that it's any content that the application did not itself define statically in its own business logic.
The context on the page into which the output is placed is also very important to consider as it dictates how you must encode output. Consider the following example in ASP.NET:
<a href= "http://contoso.com/app.aspx?var=<%:Server.UrlEncode(UntrustedVar)%> "> <%: UntrustedTitle %> </a>
Notice that the UrlEncode
function is used to encode query string data. It's an IIS 6.0 function that converts spaces to + signs and non-alphanumeric characters to their hex equivalents. The default HtmlEncode
-based encoding is used in the context of HTML. To understand why different encoding mechanisms are necessary, consider what malicious input might look like if encoding were not in place. In the HREF case, the output might close off the attribute and append a new attribute that would run script, for example:
" onload=[Malicious script]
Whereas in the second case, an effective attack would be:
<script>[Malicious script]</script>
So the various encoding mechanisms must encode different sets of characters to offer effective protection. (In addition, URLs are percent-encoded so that they may be properly parsed by browsers and Web servers, whereas HTML markup is encoded into HTML entities. Using the wrong encoding would create URLs or markup that can't be properly parsed.)
It is important to understand that each distinct output context requires a different encoding method. Other notable contexts include XML (attributes and markup), CSS, and JavaScript strings.
There is one issue worthy of note: Your application might hand off data to an external control or API to render on the page. In such a case, what encoding should be applied? To find out, you may need to evaluate the security guarantee provided by the external code. It seems reasonable to assume API input is encoded appropriately for the output context, although any particular API might, in fact, push that responsibility to its caller. The documentation for any good API should specify any required encoding necessary to ensure that output is rendered securely on the page. The <%: %> syntax in ASP.NET 4 and later provides a clever solution to this problem, utilizing a new HtmlString
type.
The Microsoft AntiXSS Library
All major Web platforms provide some sort of API for output encoding. Microsoft's implementation for ASP.NET is a library of encoding functions referred to as the Microsoft AntiXSS Library. This library has been available since ASP.NET 4.5.
The first thing you'll want to do to leverage the AntiXSS Library on ASP.NET 4.5 is to enable it as the default encoder by adding the encoderType
attribute to your Web.config file:
<httpRuntime ... encoderType= "System.Web.Security.AntiXss.AntiXssEncoder, System.Web, Version=4.0.0.0, Culture=neutral,PublicKeyToken=b03f5f7f11d50a3a" />
This entry will cause default output encoding functionality in ASP.NET to use the conservative AntiXSS Library encoding. In addition, you may then begin to utilize APIs in the AntiXssEncoder
class:
HtmlEncode
(leveraged by the <%: %> syntax),HtmlFormUrlEncode
, andHtmlAttributeEncode
XmlAttributeEncode
andXmlEncode
UrlEncode
andUrlPathEncode
CssEncode
andJavaScriptStringEncode
Each of these APIs encode data for different contexts. As described previously, it is very important to make use of the proper function for a given context. Examine the surrounding markup on the page to determine context appropriately and choose the right function or combination of functions. In cases where it's necessary to encode more than once, be aware that order is important. For example:
<a href= ”http://contoso.com/app.aspx?var=<%:Server.UrlEncode(UntrustedVar)%>”> <%: UntrustedTitle %> </a> <script> var x = "<%=HttpUtility.JavascriptStringEncode(UntrustedVariable)%>"; . . . </script>
In this example, only a single query string variable is encoded, using the UrlEncode
function. UrlEncode
and UrlPathEncode
are not appropriate for encoding entire URLs. If you need to encode a full URL, it is necessary to, at minimum, validate its URL scheme to avoid allowing URLs with the JavaScript:
or VBScript:
URL schemes. To do this, construct a new Uri
object and then validate the URL scheme as acceptable.
That's really all there is to it. While there are other XSS-related security techniques to evaluate when building your Web technology (such as sandboxing, HTML sanitization, and the like), you will find that proper encoding is what's necessary to prevent the most prevalent XSS bugs.
Conclusion
While XSS remains a pervasive Web threat, a good understanding of Web encoding techniques and their supporting APIs enables you to secure your Web applications. While all modern application platforms provide the necessary APIs to enable output encoding, it's up to individual developers to effectively apply the proper functions in the appropriate context.
The AntiXSS Library is available for download at no cost. Special thanks to Levi Broderick and Barry Dorrans for contributing to this article.
David Ross is a Principal Security Software Engineer on the MSRC Engineering team at Microsoft. Prior to joining MSRC Engineering in 2002, Ross spent his formative years on the Internet Explorer Security Team.