Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Clean Up Those URLs with Apache


WebReview.com: December 1, 2000: Developers: Clean Up Those URLs with Apache

Rank: 2

Why Rewrite?

Struggling to keep your URLs orderly or manage redirects with ease? Struggle no more. Apache's mod_rewrite solves the problem.

Further Reading (offsite):

Selected articles on mod_rewrite

Apache URL Rewriting Guide

Another Rewriting Guide



I often hear my Grandfather reminisce about a time when the Web was a simpler place. He says to me, "Son, I remember a time when a Web site could be built by creating a few pages and some links. I'm talking about normal links, consisting of a few words and slashes, not the blasted question marks, ampersands, and equal signs that you see today. I used to have to walk 6 miles in the snow to get to the local Internet cafe."

Although he sometimes exaggerates, I feel his pain. As Web sites grow in size and complexity, it's become necessary to make use of more than just HTML to efficiently maintain the mammoth amounts of information provided to users. Site administrators are faced with consistently maintaining and moving this information across directories and even servers. Many of these administrators make use of one of Apache's most powerful and confusing modules: mod_rewrite. This module makes it possible to rewrite URLs on the fly, and to redirect users. It also allows you to use regular expressions to parse URLs, resulting in a very powerful site administration tool. In this short tutorial, I'll introduce just three of mod_rewrite's functions: URL Redirection, URL Parsing, and Browser Dependent Redirection. But first, a short introduction of Apache's configuration directives is in order.

The Directives

Much of the workings of the Apache server are dependent upon the configuration settings specified within the httpd.conf file. At the time of writing, this file contained 800+ lines of code and comments. For the typical user, many of these configuration settings, otherwise known as directives, will be irrelevant as they relate to mod_rewrite. However, several of these directives are of particular importance:

<Directory /path/to/some/directory>

The <Directory></Directory> tag set encloses other directives that are to be applied specifically to the directory—and ensuing subdirectories—specified by '/path/to/some/directory'. For example, suppose that I wanted to use login.html as the index page for a particular directory. I could specify this within a <Directory> tag set:

<:Directory "/home/httpd/web/restricted">
DirectoryIndex login.html
</Directory>

Note that, unless explicitly overridden, all subdirectories of the named directory will be affected as well. Referring to the above example, unless otherwise stated, the directory '/home/httpd/web/restricted/anotherdirectory' would also look for a file named 'login.html', as would all other subdirectories of the 'restricted' directory.

For more information about the <Directory> directive, check out the Apache documentation on this subject.

RewriteEngine (on | off)

This directive essentially turns on and off the rewriting capabilities. Regardless of the fact that you have enabled the mod_rewrite module, you must include this directive in order to use the rewriting features.

RewriteCond conditional_information

This directive acts kind of like an if statement does in conventional programming, defining a condition that must be met in order for a rewriting command to be carried out. You can string several of these directives together, forcing a number of different conditions to be examined.

RewriteRule rewrite_information

This directive actually carries out the URL rewriting. If the RewriteRule is preceded by one or more RewriteCond directives, then it'll only be executed if all of the RewriteCond conditionals are met, and the URL being examined fits in the pattern specified by rewrite_information.

URL Redirection

As your site grows, there are times when it's unavoidable to make structural changes. One common change is to dedicate servers to perform certain intensive operations, such as database searching. Chances are that many of your regular users have bookmarked the search interface. Using mod_rewrite, you could either redirect the user to the new URL, or display the new URL in the address bar, hinting that the user should update his bookmark. That's the power of mod_rewrite: flexibility.

Let's consider an example. Assume that you have set up a new URL used for searching. Using mod_rewrite, you can redirect the user to the new site without ever letting the user know that the address has changed:

RewriteEngine on
RewriteRule ^search/search.html$ http://search.yoursite.com

Any user going to http://www.yoursite.com/search/ will automatically be redirected to the new URL, http://search.yoursite.com. However, the user's address bar will continue to read http://www.yoursite.com/search/.

However, it might also be a good idea to explicitly inform the user that the site has in fact changed. This is easily accomplished by adding the flag [R] (which stands for 'redirect') to the above RewriteRule. This will result in the URL in the address bar being updated in addition to the redirection:

RewriteEngine on 
RewriteRule ^search/search.html$ http://search.yoursite.com [R]

While this feature of mod_rewrite is certainly useful, this is only a small sample of what it can do. In the next section, I'll use more advanced regular expressions to parse and reformat a URL.

URL Parsing

Another problem that commonly arises as sites grow considerably larger is the need to change the way certain information is retrieved. Imagine for example, that your organization sold books online, and that after years of statically building pages, one for each book, you decide to step into the modern era and use a database and scripting language to dynamically serve these pages. The problem is, links to your books reside all over the Internet, and these links undoubtedly use the old page formatting. How do you fix this problem? Well, you could attempt to track down every Internet link to your catalog, and request that they change it, or you could employ mod_rewrite to reformat the request for you. Take your pick, but I'll choose the latter option, thank you.

Suppose this was the original format of each product page:

http://www.yoursite.com/bookstore/X-XXX-XXXXX-X.html

Where X-XXX-XXXXX-X would be replaced by the ISBN of each book.

The new format looks like this:

http://www.yoursite.com/bookstore/singleview.php?ISBNXXXXXXXXXX

So how to use mod_rewrite to fix this problem? One simple command is all it takes:

RewriteRule ^/([0-9]*)-([0-9]*)-([0-9]*)-([0-9]*).* /singleview.php?ISBN$1$2$3$4

This example shows how powerful mod_rewrite can be, parsing the URL and reformatting it on the fly. You gotta love that.

Browser Dependent Redirection

Another cool feature of mod_rewrite is the capability of automatically determining to which page a user should be forwarded depending upon his browser type. For example, assume that you wanted to redirect the user to a browser-specific page, one for Internet Explorer, one for Opera, and the default "index.html" page for all other browsers. Using mod_rewrite and the server variable HTTP_USER_AGENT, this can be done in a snap:

RewriteCond %{HTTP_USER_AGENT} Opera
RewriteRule ^index.html http://www.yoursite.com/homepage-opera.html [R]


RewriteCond %{HTTP_USER_AGENT} MSIE 
RewriteRule ^index.html http://www.yoursite.com/homepage-IE.html [R]

So what happens? The HTTP_USER_AGENT is first checked for the string 'Opera'. If it exists, then the user must be using the Opera browser, and he's redirected accordingly. The same goes for Internet Explorer (denoted by the string 'MSIE'). If neither RewriteCond is met, then the browser must be Netscape or some other type, and the default 'index.html' page is used.

The HTTP_USER_AGENT is just one of the many server variables that can be referenced through mod_rewrite. Check out the RewriteCond directive documentation for a complete listing.

In Closing

The complexity of the mod_rewrite module's vast capabilities has certainly brought many developers to tears. But it's exactly these capabilities that make it a technology well worth learning. Take some time to work with the examples I've provided, in addition to the many examples provided in the documentation (see sidebar for links), and see what you can come up with.


Jason has been an active Internet developer since 1995, and is a regular contributor to various online technical publications. He is the author of "A Programmer's Guide to PHP 4.0" published by Apress.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.