If you want to know when new articles go
subscribe to the WebWord.com
the Ws from URLs
Guest Article by
In September 1999, John Rhodes
published Are You Creating a Path of Resistance?
on WebWord. In the article, John discussed the problem with the Ws - the www prefix before domain names. John rightly recommended that all web sites should be set up to work the same with or without the Ws.
In this article, however, I will go one step further and recommend that sites should be set up to work with or without the Ws, but
also have the Ws automatically removed from the URL using a server-side technique I will henceforth refer to as "removing the
Ws". I will detail why I feel this is a more appropriate solution and explain how this can be done.
Why are the Ws there?
Originally, the purpose of the www prefix on a domain
(e.g., www.webword.com) was to distinguish the World Wide Web part of the domain from other aspects - FTP, Gopher, etc. However, in recent years, the web has dominated nearly all Internet traffic, and a good portion of the population is not even aware of the non-web aspects of the Internet.
So, while they are important from the standpoint of a cultural understanding and reference, the three letters are no more significant than any other prefix. In this day and age, they are extraneous and unnecessary.
Why should we keep them?
Many people are ingrained in the habit of
using "www." while typing a domain name, and it has become almost second nature for many to prepend the four extra characters onto anything web-related. (Many jokes'
punch lines now seem to end with the all-too-familiar "double-you double-you double-you dot," setting you up for a fake web site address before you even hear it.)
I group web sites into three categories with respect to the Ws:
Bad: The site works only with the Ws or only without the Ws. (Incidentally, it's usually the former.)
Better: The site works with or without the Ws.
Best: The site works with or without the Ws, but automatically removes the Ws if they are in the URL.
Fewer and fewer sites I find are
"Bad," though there are many still out there, and I'm continually surprised at how many I come across. At the very least, all sites should work with or without the Ws, end of story. John's original article made a compelling enough argument that I'm not going to try to improve upon.
Most sites fall into the
"Better" category. This is not horrible, and in some cases this may be good enough.
Very rarely is a site an example of what I call
"Best." I attribute this to several factors:
The people setting up the servers, configurations and domain technicalities are rarely the people with an understanding an interest in improving the user experience.
Most people do not know that it is possible to remove the Ws and may never think about removing the Ws, much less consider the benefits of doing so.
Most web sites are hosted on shared hosting services, where individual webmasters aren't given access to the carefully protected files and settings that control this setup. So, even if someone knew where the file/setting was and knew why it should be changed, he/she probably would not be able to gain access to it.
There is virtually no documentation available on how to correctly remove the Ws. Thus, even if someone knew where the change could be made, had access to that file, and understood the benefits of doing so, it would be very difficult for him/her to figure out how to set it up correctly. (Indeed, when I first attempted this over a year ago, I was unable to find any resources and ended up trying a number of different solutions before finding the correct settings.)
Why should they be removed?
I have come up with six reasons why the Ws should be removed from
They are no longer necessary
People are not confused when they're removed
It leaves four fewer characters to deal with
Browsers interpret Ws/no Ws differently
It provides a consistent domain for advanced scripting
Statistics packages treat Ws/no Ws as different domains
Here are some
1) They are no longer necessary
As John mentioned in his
original article, and I reiterated above, the www prefix is no longer necessary to inform the browser (or the person) that the location in question is a World Wide Web site.
2) People are not confused when they're removed
Removing four characters from a URL will not cause confusion or other problems. Redirection and sites hosted over multiple disparate domains are common, and so long as the site looks and acts the same, and, to some extent, has the same domain name, no problems should result.
3) It leaves four fewer characters to deal with
Four characters (3 Ws, 1 period) might not sound like a lot, and, in most cases it isn't. Still, it is an extra convenience that is easily provided. For email newsletters, 4 fewer characters reduce the likelihood that a URL will wrap on to two lines. When passing the URL as a variable or storing it a database, it's four fewer characters to have to deal with. It leaves more of the important part of the URL to show through in the status bar or address bar. It's four fewer irrelevant characters that people have to process. Each in and of itself is relatively minor, but when you put them all together, they start to add up.
4) Browsers interpret sites with and without the Ws differently.
This is one of the most important reasons from my perspective. Browsers consider www.somedomainname.com and somedomainname.com to be two different sites. From the browser perspective, www.somedomainname.com is simply a subdomain of somedomainname.com. Now, newer browsers are getting smarter (before, my visits to www.webword.com and webword.com were stored in different places of my browser's History), but they're not all the way there yet. Most notably, browsers treat links to the two versions of a site differently.
Let's take this example. Don't click these links, just look at what color they are and continue to read.
Both are links to this article. One should be visited (purple) and one should be unvisited (blue). If Link #1 is purple, then the URL of this page is
webword.com/moving/wwwremoval.html; if Link #2 is purple, then www.webword.com/moving/wwwremoval.html. Now, both links go to the same place, but your browser doesn't realize that, so it thinks you've only visited one of them. (Of course, if you've viewed this page at both URLs, both will appear as visited.) So, mixing links with Ws and no Ws can confuse people.
5) It provides a consistent domain for advanced scripting.
In many scripts (Perl, PHP, etc.), you need to identify what domain you're using the script on, to protect yourself so that someone can't link to your script from another site. In most cases, you have to explicitly identify your domain twice - once with the Ws, once without - even though you're only using it on one site.
In some cases, part of a script may need to check the URL of the page against a constant or string. If you can guarantee that your URLs will not have the "www." prefix, then you only need to write one condition; if visitors may be accessing the www. version of the domain, you need to write additional code to factor that in.
I ran into a scripting problem when accessing a password-protected area of one web site. I manually typed in the URL (without the Ws), entered my username and password, and got in fine. However, when I got into the site, it became clear that all of the links to other places within the password-protected area were hard-coded URLs that contained the Ws. So, when I wanted to access a page within that area (which should normally not require any additional login or verification), I was forced to log in again, because, to my browser, it appeared as though I was leaving that site and going to another password-protected site, even though the second site was just the original domain with the "www." prefix. (True, this could be classified as a browser flaw instead, but I feel in this case, it is a relatively simple scripting fix that should be tidied up and not left to be blamed on a the browser.)
6) Statistics packages treat Ws/no Ws as different domains.
With software that reads server logs and assembles that data into readable reports, many do not aggregate the information and treat the non-Ws URL of the domain as a separate site from the URL with the Ws. The meaningful server logs are polluted with needless referral information that claims that the top referrer to the somedomainname.com was www.somedomainname.com. To find out what pages are most popular and what links are working most effectively, one has to manually aggregate information, a time-consuming and less-than-exciting process. The amount of time it would take to rewrite the code in the statistics package to account for this error is much longer than it would take to fix the problem at its root - removing the Ws.
There may be additional benefits to removing the Ws that I did not include above; if you know of any, please let me know.
I did not include any reasons why the Ws shouldn't be removed, not to give undue emphasis to my point, but because I can honestly think of none. The only reasons I can think of are reasons why the Ws aren't removed (not why they shouldn't be removed):
Didn't even think about doing it
Didn't know it was possible
Didn't think there were any benefits of doing it
Didn't know how to do it
If you've read this far, I hope I've sufficiently covered the first three, and, in the last part below, I tackle #4.
How do I set up my server to automatically remove the Ws?
Here's the part you've been waiting for: Removing the Ws in Apache. (I deal here with Apache because it is the most widely used server, and because it is the only one that I have experience doing this with. If you know of a way to accomplish this with IIS, iPlanet, or any other web server, please let me know, and that information will be added here and credit will be given to you.)
While this process is actually extremely simple, let me warn you in advance that you will be editing an extremely important file on the server and you will need to restart Apache (but not the server box itself) before the settings take effect. I've never ran into any problems doing this, but you may want to make a backup of the file or consult your nearest Apache expert just to be on the safe side.
You need to have access to your Apache configuration file, which is almost always httpd.conf. If you are on a shared web site host, it is very unlikely that you have access to this file. I suggest contacting technical support and explain what you are trying to do (and forwarding them along a copy of this article for reference wouldn't hurt).
You will first need to go to your existing VirtualHost directive for the domain in question. Presumably, your domain was already set up to work with and without the Ws; here's a simplified version of what it probably looks like:
The ServerAlias is what makes your site work with or without the Ws. Delete just that one line, and create a new VirtualHost directive that looks like this:
Redirect permanent / http://somedomainname.com
Obviously replace somedomainname.com with your domain name (and use the proper IP number or servername), but other than that, you don't need to do anything else. You should have two VirtualHost directives that something like this:
Redirect permanent / http://somedomainname.com
(See, I told you it was pretty simple.) Save that file, and restart Apache. That should take care of it. Your site is now set up to automatically remove the Ws!
An explanation of what this
The new VirtualHost directive creates a new server called www.somedomainname.com and has it send all of its traffic to somedomainname.com.
The great thing about doing it this way is that these four lines take care of your entire site - you don't have to re-code anything or create individual redirects by hand. All of your existing content will appear exactly the same, except the Ws will be removed. All links point to your site will still work, too.
So now, if someone tries to go to...
...the URL will automatically convert to...
...but the user will get the exact same page. Everything is totally seamless, which is the beauty of doing it this way.
A bit of background on this topic: I initially discovered this problem (and figured out the solution) while working on the April 2001 redesign of xplane.com, and these concepts were originally expressed on October 20, 2001 in my presentation at the 2001 St. Louis Web Developers' Seminar.
I hope this has been informative and enlightening. I think it is a relatively important topic that has not been sufficiently explored. If you have any additional questions, comments, or additions, please let me know.
Lash is currently a User Experience Architect and the head of
Intranet development at Premcor.
He was previously an Information Architect at Xplane
and is the co-founder of the St. Louis Group for Information
Architecture. His personal website is jefflash.com