Strip and Parse ProposalLast Update: 7th April, 2005
Article ID: 6



Introduction

Cross Site Scripting vulnerabilities can occur when data collected from the user contains malicious content. Any user input that a site asks for can make the site vulnerable when it displays the same data back to the user in some way.

This includes indirect user input by accepting site parameters that are placed in the url, for example displaying a requested article by looking at the article ID in the url.

If the site does not parse direct or indirect user input, it is open for attackers to abuse, such as hijacking user accounts, retrieving or changing user account information, or to set or steal stored cookie information.

This can be done when the attacker injects scripting code, for example in HTML or Javascript, into the data the site is processing.

The Strip and Parse Proposal is realized to prevent these sorts of vulnerabilities from existing.

This is done by "stripping" the user input of unwanted characters, and by "parsing" the data when it is displayed or processed to show or handle the data exactly how it was entered.

The reason for examining user data twice, once at input and once at output, is to protect data coming from, or showing to, external sources.

Stripping

The stripping part of this proposal involves parsing the user input through the tep_sanitize_string() function.

Two actions are performed on the user input data.

Replacing Multiple Spaces

All multiple spaces existing in the string is replaced with a single space, for example:


Favourity Quote: I do not    want      to fill out this       input field appropriately



will be parsed to:

I do not want to fill out this input field appropriately


Replacing < and > Characters

All < and > characters in the string are replaced with an underscore "_" character. This is to prevent HTML and Javascript from being injected into user input, for example the following string:

Nickname: this_is_<b>my_nickname</b>


will be parsed to:

this_is__b_my_nickname_/b_


The reason why the native PHP function of strip_tags() is not being used is shown in the following example:

Nickname: this_is_<b<b>>my_nickname</</b>b>


strip_tags() will return the following string:

this_is_<b>my_nickname</b>


as "<b", ">", "</", and "b>" are only valid html tags after strip_tags() has joined these characters together by stripping out the middle parts of "<b>" and "</b>".

Parsing

The parsing part of this proposal involves parsing the data through a less restrictive output function titled tep_output_string(), or a more restrictive output function titled tep_output_string_protected().

All data that the user submited should be parsed through the more restrictive tep_output_string_protected() function, which parses the data through the PHP native function of htmlspecialchars(), to display the data exactly how the user submited it (or after it was parsed before storing the data on the server).

Following the nickname example above, the tep_output_string_protected() string will show the value as

this_is__b_my_nickname_/b_


which does not contain any HTML or Javascript scripting code.

However, if this data was modified by an external source (for example, via the CMS part of the site sharing the same user account database table as the osCommerce based online store) that did not parse the user input appropriately, then the tep_output_string_protected() would display the data as:

this_is_<b>my_externally_modified_nickname</b>


without allowing the HTML or Javascript scripting code to be executed, otherwise it would normally be displayed as:

this_is_my_externally_modified_nickname


The less restrictive tep_output_string() function is used when presenting a form to the user for data to be submited.

When the user submits the form and the server side validation routines have failed on one or more input fields, the data the user submited should be displayed in the input field exactly how they typed it in, without damaging the HTML used to present the field to the user.

For example, when asking the user for their nickname, the following HTML code will be used:

Nickname: <input type="text" name="nickname">


When the user submits the form and for some reason the server side validation routines have failed on one or more fields, the following HTML code for the nickname will be used:

Nickname: <input type="text" name="nickname" value="this_is_<b>my_nickname</b>">


The value of the nickname in that example is not damaging, however if the nickname entered was:

this_is_"><b>my_nickname</b>


The following HTML code would be used:

Nickname: <input type="text" name="nickname" value="this_is_">my_nickname">


which allows the user to close the <input> tag prematurely and to inject HTML or Javascript scripting code into the HTML page.

The tep_output_string() function is able to parse specific characters passed as a function parameter ala htmlspecialchars(), or to parse only the double quote " character by default.

The tep_output_string() function would then display the above example as:

Nickname: <input type="text" name="nickname" value="this_is"><b>my_nickname</b>">


which closes the <input> tag at the appropriate time and in a safe manner.

Using a single quote character ' for tag values would make tep_output_string() useless. In order to remain consistent with HTML form field tags, use the provided tep_draw_*_field() functions defined in includes/functions/html_output.php.

The PHP code to address the above example would be:

Nickname: <?php echo tep_draw_input_field('nickname'); ?>


All tep_draw_*_field() functions use tep_output_string() appropriately on user input, and automatically fills in the input field with the data the user submitted.

Example Usage


<?php
  $data_stored_successfully
= false;

  if (isset(
$HTTP_GET_VARS['action']) && $HTTP_GET_VARS['action'] == 'process') {
    
$nickname = tep_db_prepare_input($HTTP_POST_VARS['nickname']);

    
tep_db_query("insert into users (nickname) values ('" . tep_db_input($nickname) . "')");

    
$data_stored_successfully = true;
  }

  
$name = '';
  if (
$data_stored_successfully == true) {
    
$name = tep_output_string_protected($nickname); //$nickname is defined above
  
}

  echo
'Hello ' . $name . '! Please fill in your nickname.<br><br>';

  echo
tep_draw_form('nickname', tep_href_link($PHP_SELF, 'action=process'));

  echo
'Nickname: ' . tep_draw_input_field('nickname');

  echo
'<br><br>';

  echo
tep_image_submit('save.gif');

  echo
'</form>';
?>



tep_db_prepare_input() is used in the above example which automatically parses the string through tep_sanitize_string(). The tep_db_prepare_input() and tep_db_input() functions are used to make sure the data to be used in an SQL query does not break the SQL query.


Resources

The Cross Site Scripting FAQ
http://www.cgisecurity.com/articles/xss-faq.txt

Cross Site Scripting Info
http://httpd.apache.org/info/css-security/

 

 

Trademark Policy | Copyright Policy | Sitemap

Copyright © 2000-2005 osCommerce. All rights reserved.