Guide to Applying 301 Redirects in Apache
Wednesday, November 30, 2005
seomoz.org was hosted under www.socengine.com/seo/ rather than as its own domain. We were moving seomoz.org to its own dedicated server and wanted it to be accessed as its own domain rather than as a subdirectory of socengine.com. We needed visitors accessing anything in www.socengine.com/seo/ to be redirected to www.seomoz.org. The redirection had to accommodate several file and folder name changes and had to be done with 301 redirects in order to be search engine friendly and to ensure compatibility across all web browsers. We also needed to forward http://seomoz.org to http://www.seomoz.org for aesthetic purposes and also to avoid a 301 sabotage.
Solution:
The simplest approach would have been to add 301 redirects to the PHP code that powers SEOmoz.org using PHP’s header function. Utilizing the power of the apache module mod_rewrite, however, we could match specific patterns for entire folders and redirect them to their new URLs without having to go through every PHP script. Also, several of our pages were static HTML and it was not practical to use javascript or META tags for redirection.
Installation:
If your web server does not have mod_rewrite installed, I suggest reading over the apache documentation for installing modules. It will usually require you to recompile apache with the option –enable-module=rewrite or --enable-module=most.
If your hosting services does not support mod_rewrite, I would urge your systems administrator to have it installed. Most apache installations will have mod_rewrite installed by default. Our server is running FreeBSD and mod_rewrite was included by default when installing from the ports collection. Once it is installed, you can verify it is working by adding this line to your apache configuration file or your .htaccess file:
RewriteEngine On
Context
The mod_rewrite module operates in per-server context or in per-directory context.
The per-server context requires you to edit the apache configuration file, httpd.conf , while the per-directory context uses .htaccess files that exist in each folder you want to configure. If you do not have access to httpd.conf, you will have to use .htaccess files.
Regular Expressions (aka Regexes)
From wikipedia.org:
A regular expression is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns.
We will be using regular expressions to match patterns in the client URL and redirect them accordingly. Regular expressions are an invaluable skill to learn if as both a programmer and a systems administrator. To redirect URLs according to the examples in this document, you will only have to understand the basics of using regexes. This is a list of the characters and operators you will use in the regexes described in this document:
- . Period - matches anything.
- * Asterick – matches zero or more of the preceding character
- + Plus sign – matches one or more of the preceding character
- ( ) Parenthesis - enclosing a value in parenthesis will store what was matched in a variable to be used later. This is also referred to as a back-reference.
- (value1|value2) - Enclosing two or more values in parenthesis and separating them with a pipe character is the equivalent of saying: “matching value1 OR value2.”
Redirecting specific files and folders from one domain to another.
We needed redirection from the old server to the new one with the filenames preserved.
Example
Redirect: http://www.socengine.com/seo/somefile.php
To: http://www.seomoz.org/somefile.php
Solution
Add the following directive:
RedirectMatch 301 /seo/(.*) http://www.seomoz.org/$1
Explanation:
The regular expression /seo/(.*) tells apache to match the seo folder followed by zero or more of any character. Surrounding the .* in parenthesis tells apache to save the matched string as a back-reference. This back-reference is placed at the end of the url we are directing to, in our case, $1.
Redirecting without preserving the filename
Several files that existed on the old server were no long present on the new server. Instead of preserving the file names in the redirection (which would result in a 404 not found error on the new server), the old files just needed to be redirected to the root URL of the new domain.
Redirect: http://www.socengine.com/seo/someoldfile.php
To: http://www.seomoz.org/
Solution:
Add the following directive:
RedirectMatch 301 /seo/someoldfile.php http://www.seomoz.org
Explanation:
Omitting any parenthesis, all requests for /seo/someoldfile.php should redirect to the root URL of http://www.seomoz.org
Redirecting the GET string
Some of our PHP scripts had different names but the GET string stayed the same. We needed to redirect the visitors to the new PHP scripts while preserving these GET strings. The GET string is the set of characters that come after a filename in the URL and are used to pass data to a web page. An example of a GET string in the URL http://www.seomoz.org/myfile.php?this=that&foo=bar would be “?this=that&foo=bar.”
Redirect: http://www.socengine.com/seo/categorydetail.php?CAT_ID=12345
To: http://www.seomoz.org/artcat.php?CAT_ID=12345
Solution:
Add the following directive:
RedirectMatch 301 /seo/categorydetail.php(.*) http://www.seomoz.org/artcat.php$1
Explanation:
Once again the regular expression (.*) tells apache to match zero or more of any character and save it as the back-reference $1. Since we put $1 after /seo/categorydetail.php, it will now redirect the get string to this new php file.
Redirecting while changing file extensions
We had a folder of files on the old server that were mixed HTML and PHP. On the new server these were all PHP and we needed the old HTML files to change to this new extension.
Redirect: http://www.socengine.com/seo/guide/anyfile.html
To: http://www.seomoz.org/articles/anyfile.php
Redirect: http://www.socengine.com/seo/guide/anyfile2.php
To: http://www.seomoz.org/articles/anyfile2.php
Solution:
Add the following directive:
RedirectMatch 301 /seo/guide/(.*)\.(php|html) http://www.seomoz.org/articles/$1.php
Explanation:
(*.) matches zero or more of any character and saves it as the back-reference $1. \.(php|html) tells apache to match a period followed by either “php” or “html” and saves it as the back-reference $2 (although we won’t be using it in this example). Notice we had to escape the period with a backslash. This is to ensure apache does not interpret the period as meaning “any character” but rather as an actual period. Enclosing “php” and “html” in parenthesis and separating them with a pipe “|” character means to match either one of the values. So if it were to say (php|html|css|js|jpg|gif) the regex would match any of the files with the extensions php, html, css, js, jpg, or gif.
Also, if for some reasons we needed to preserve the name of the extension we matched, it would be stored as the back-reference $2. Back-references are incremented in accordance with how many sets of parenthesis are in the regular expression.
Redirecting canonical hostnames
We needed to redirect any requests that do not start with www.seomoz.org to make sure they include the www. We did this not only because it looks better, but to avoid the now common 301 sabotage.
Redirect: http://seomoz.org/
To: http://www.seomoz.org/
Redirect: http://mail.seomoz.org/
To: http://www.seomoz.org
Redirect: http://seomoz.org/somefile.php
To: http://www.seomoz.org/somefile.php
Solution:
Add the following directive:
RewriteCond %{HTTP_HOST} !^www\.seomoz\.org
RewriteRule ^/(.*) http://www.seomoz.org/$1 [R=301,L]
Explanation:
This directive tells apache to examine the host the visitor is accessing (in this case: seomoz.org), and if it does not equal www.seomoz.org redirect them to www.seomoz.org. The exclamation point (!) in front of www.seomoz.org negates the comparison, saying “if the host IS NOT www.seomoz.org, then perform RewriteRule.” In our case RewriteRule redirects them to www.seomoz.org while preserving the exact file they were accessing in a back-reference.
Conclusion:
By harnessing the power of mod_rewrite and a little regular expression magic we can develop a set of simple rules for redirecting web pages. By using 301 redirects we are maintaining browser compatibility and are staying search engine friendly. If you are interested in learning more I recommend reading a few of the many regular expressions tutorials found on the internet. O’Reilly also has a great book, “Mastering regular expressions” that I highly recommend. I read the book in its entirety many years ago and what I learned from it has proved to be invaluable. I would also read up on mod_rewrite in the URL rewriting guide and the mod_rewrite reference documentation found at the Apache Software Foundation website.
0 Comments:
Post a Comment
<< Home