Abstract
A brief summary of the article
In this article I discuss the development of usable URLs and outline some guidelines that will aid developing them. I also briefly touch on the importance of choosing development frameworks that enable decent URLs.
Content
Main article body
Update on 13th October 2007: Mike Schinkel made me aware of some misused terminology, so I’ve updated the article to amend this.
Update on 19th October 2007: A user at Reddit highlighted how bad my first sentence was, I’ve rewritten it a bit.
Introduction
URL design is arguably one of the most important areas of website developement. Not only do URLs generally have huge visual priority in web browsers but they’re also shown on search result listings and get used for matching search terms. Let’s not forget the usability factors, what does a bunch of seemingly meaningless query strings and numerical database keys tell your users about where they are in the site? This is just the tip of the iceberg when it comes to reasons for investing resources in developing decent URL schemas, yet it’s only in the last couple of years that we’ve seen the emergence of web development frameworks that truly put an emphasis on them. In the slower-to-upgrade enterprise world most sites are still running on frameworks that produce some truly ugly URLs.
The next problem is the URLs people choose once they decide to switch away from query parameter infested schemas to ones that use readable URL segments to identify pages. Simply put, just converting your parameters to friendly pieces of text isn’t enough; you may get improved search rankings from your new found keywords but all you’ve really done is disguise the problem and search rankings should never be your primary aim in URL design.
4 Principles of URL design
Having read a number of articles on the subject as well as exploring my own views on the matter I’ve come up with what I believe to be the 4 most important aspects of URL design:
URLs must be Readable
Just by reading a URL a user should be able to make a fairly good guess as to what they will find if they visit it. Titles (converted to a readable slug format so that those nasty %20 things aren’t visible everywhere) should be used instead of numerical ids and the URL should make it clear whereabouts in the overall site structure the resource is.
Pages with unique content must have Unique URLs
A lot of factors come into play here, the first is that the same content shouldn’t have more than one URL, you should choose your preferred URL for each page on your site and if a second is required it should simply be done as a permanent redirect. A search engine will follow the redirect and only index the canonical URL. This applies to the www sub- domain which the majority of websites have set up as optional. Use Apache’s mod_rewrite (or equivalent) to add a permanent redirect that forwards the request to the non-www URL (or the other way round if you really want the www part). Localisation should also include the locale somewhere in the URL so that the preferred localised version can be bookmarked (not everybody has their browser configured to use their preferred locale). If a page has unique content it must not rely on sessions or cookies to load this information otherwise it will be invisible to search engines, Brad Fults discusses multilingual URL design in “Designing URLs for Multilingual Web Sites”, his conclusions may not match your own but the article does an excellent job explaining some of your options.
URLs must be Hackable
This follows on from readability and the idea that a URL is not only a location but also a map – much akin to breadcrumb navigation (“Breadcrumb Pattern at Yahoo”). One of the main features of successful breadcrumb navigation is that it doesn’t represent the route you took to get to the page but rather the route home or to other pages. Every URL is constructed from a number of segments separated by forward slashes, for a URL to be hackable the user must be able to repeatedly remove the last segment and also arrive at a valid page that makes sense within the context of the URL. The user must also be able to swap in alternative segment that make sense, like changing “2007” to “2006” or changing “reviews” to “news”. This sounds straightforward enough, but the biggest stumbling block is introducing new URL segments to resolve namespace conflicts (what if someone gives an article a title that causes conflict with an existing URL?), this is a bad thing because it means you now have a URL (everything up to and including your new segment) that doesn’t have a page behind it. The solutions range from refactoring your URL schema to simply preventing new content from using existing URLs programmatically.
URLs must be meaningful
Put simply, every single part of your URL should not only mean something to the user but also to your system. Let’s say that instead of just using your article title you’re using the numerical id as well, your system may not actually care what the text is as long as the id is right. This violates the idea of unique URLs because now every single article effectively has unlimited URL possibilities. Similarly, keywords shouldn’t be stuffed into the URL if they have no actual effect on what page is returned by the system. If a user gets any part of a URL wrong they should be served a 404 page which ideally would list the possible URLs the user may have been looking. By removing useless parts you also ensure that your URL is only as long as it needs to be, long URLs almost always become unclickable if put in emails because readers break the line before the running the algorithm that converts the URLs to clickable links.
Conclusion
The 4 principles outlined above represent my condensed findings on how ideal URL schemas should be constructed, in reality it’s not quite as simple as following 4 relatively straightforward guidelines (and there are probably a number of factors I’ve overlooked that may not fit within the 4 areas) but my opinion is that URL design is easily important to justify the work that may go into programming a system that allows good design. In my experience some web development frameworks make it unreasonably difficult to develop decent URLs whilst others can even make it enjoyable, this ease of URL development should be considered an important factor in framework choice.
I have deliberately refrained from mentioning SEO directly until now, I have no objections to SEO but I feel that if any design decision is made purely for SEO purposes you risk adversely affecting the user experience. However, applied correctly these principles also go hand-in-hand with optimising your URLs for search engines. In fact, you can find the majority of what I’ve said in various SEO articles just with different motivations behind the decisions. My point was to emphasise the importance of proper URL design and highlight that even if your site is so successful that you don’t need to worry about SEO you still need to worry about the user experience and therefore URLs.
Thanks to Nathan Smith for helping me out with some thoughts regarding this article. Andrew Ingram is aware that his site doesn’t remotely follow his guidelines but promises to try harder in the future.
Comments
What people have had to say about this article
-
Great article Andrew! It’s always great to find another web professional that is promoting Well Designed URLs.
A few things I’d like to emphasize and/or discuss though.
First, you mention CMSes are starting to respect URL design; right on! But you don’t mention any so I thought I would. I’ve been working with Drupal which does an admirable job (contrast with Joomla, which publishes horrid URLs) and also Django, to name a few. Ruby on Rails does a decent but not great job as Rails couples URL design to controller modeling which might not always be appropriate; you can get around it but its a real pita to do so.
Next you say “the same resource shouldn’t have two canoncial URLs” which is really a misnomer as the term “canoncial” implies only one.
That said, I actually disagree with “the conventional wisdom” that there “should only be one URL per resource but if needed to 301 redirect to the other.” My position is in conflict with many who participate in the W3C standards processes but I believe that they are optimizing for network effects and caching while ignoring usability when they could actually do both if they were not so ideological about it. For example, consider three paths with related breadcrumbs that arrive at the same content. Imagine a car site with the following URLs:
- http://example.com/2007/ford/mustang/
- http://example.com/ford/2007/mustang/
- http://example.com/ford/mustang/2007/
All three URLs logically would display information about a 2007 Ford Mustang but with different breadcrumbs that relate to the URL paths. Why should the site designer be forced to pick one and totally mess with the user by redirecting to a URL with a different set of breadcrumbs than the breadcrumb path down which the user descended?
Actually, by definition the whole “only be one URL per resource and only one resource per URL” is a misnomer because by definition each unique URL is a different resource. By definition! Yes I’m being pedantic, but without being pedantic it is easy to fall into terminogly traps related to URLs and get oneself really turned around conceptually.
What you really mean to discuss are multiple URLs that provide roughly equivalent representations. What is missing is a standard way to tell web crawlers via headers that when referenced from outside the site that one URL should be considered canonical and that all links to the other “equivalent” URLs should be considered a link to the canoncial one. This would allow multiple equivalent representations without loosing the benefits of the network effect.
Further what else is missing is a way to provide URLs for subsets of (X)HTML content in order that intermediaries may cache the shared subsets of content.
Besides network effects and caching, I know of no other reason it is a “problem” to have “more than one URL for a resource.” Do you? Personally I’d prefer to err on the side of site usability and to lobby the W3C to implement standards to resolve the other issues.
My next point is on a misuse of terminology. You use the term “fragment” to refer to what might be more clearly named “path segments.” It might seem I’m being anal but actually your use confused with the officially defined meaning of the term “fragment” in RFC3986. The term refers to the portion of the URL after the hash mark (”#”) which only the client is concerned with. See
http://wiki.welldesignedurls.org/Fragment and http://gbiv.com/protocols/uri/rfc/rfc3986.html#fragmentAnd I want to amplify your comments about SEO vs. URL Design! Definitely, I agree that people should not optimize for search engines at the expense of users. And if they instead optimize for users they actually have a much better chance of having users help them with their SEO by the users being much more likely to distribute links to their site from around the Internet. Of course the SEO spammers will pay no attention to this point as they don’t provide any real value anyway. But bravo on that point!
In summary, I’m really glad to see your article and in the grand scheme of things my points really were just nits. You pressed one of my hot buttons with the “always redirect to canonical URL” suggestion so I felt compelled to mention it. :) Hope to see you write in the subject again in the future.
-
Mike, thanks for your comment, I’ll amend the article to use more accurate terminology when I get the chance – one of the wonders of web publishing :)
I agree that if there is a clear 2-way hierarchy of categories like in your example you might end up with multiple URLs for the same content, the problem is that for each additional segment you massively increase the number of valid URLs.
My preference would be to either have just a 1-way hierarchy of categories, or use a single filtering segment like this:
http://example.com/2007+ford+mustang/
I don’t suggest that a filtering segment is the best choice in this case, but it eliminates the hierarchy issue and provides a mechanism for use when there is no clear hierarchy.
If a hierarchy had to be used, I would still probably choose one ordering to be the preferred one though. The choice comes down to what the primary navigational emphasis of your site is. But I do generally think that category/tag URLs are where it all gets a bit tricky.
-
Simon Willison #
10:17am, 19th October 2007Besides network effects and caching, I know of no other reason it is a “problem” to have “more than one URL for a resource.” Do you?
I do: people won’t know which URL they should be linking to, and as a result links to a certain resource will be distributed across multiple URLs. This hurts you in a bunch of ways:
- If your site has a search engine, you’ll need a way to tell it which of those URLs should be served up as a result (you don’t want the same page showing up three times).
- External search engines such as Google will have to figure out which is the canonical result. You may end up losing out on PageRank as a result.
- The one that has affected me the most: people will end up saving different URLs for your content on social bookmarking sites such as del.icio.us. This means that you’ll be much less likely to get on the del.ici.ous/popular page, the front page of digg, reddit and so forth – which can really help you promote your content.
- The browser’s visited link feature will break – it won’t be able to tell you if you’ve visited a page before because the URL might be different.
I wrote more about this subject here: http://simonwillison.net/2007/Feb/4/urls/
-
Peter #
12:53pm, 19th October 2007One more rule (and you’re breaking this one):
URLs should be short
This is important for a number of reasons. URLs sometimes get dictated by phone and typed in by hand. They often get sent by non-HTML mail, where line breaks wreak havoc. They’re occasionally referenced from a paper document or presentation slides.
Furthermore, if they’re longer than the address bar, the user gets no benefit from a human-readable URL. I see “http://www.andrewingram.net/articles/readability_uniqueness_h” in my browser—no one will scroll to see the rest of your URL. Compare to “http://www.andrewingram.net/good_urls”.
This is the reason de etra for sites like tinyurl.com.
-
You’re right, I should have been more specific about keeping URLs as short as possible. It’s what the “meaning” section was supposed to be getting at but I overlooked talking about the length of URL slugs.
There is a reason for the very last sentence of the article though. I set up this site a relatively long time before I started to take a closer look at URL design and it is something I intend to address properly when I get round to working on it again.
-
Gonzalo Sanchez #
01:03pm, 19th October 2007Great Content Andrew,
The 4 things you described are very well know for SEO and I know that was not the motive of your article, however reinforce the motives for SEO.
Two things that I have some no happy feelings about are, unique URLs wit
-
Maxwell Terry #
01:58pm, 19th October 2007@Mike Schinkel: Good point about the unneed for canonical URLs. I think the trouble comes down to the confusion about what the hell a URL/URI is supposed to be. I think many see it as an isolated file path, probably because of Microsoft Explorer. Then there’s the basic PHP approach, with arguments after the filename, and GET variables and so on. And then OpenID uses URIs as identities, XRIs try to standardize dynamic identifiers…
In the end, production wise, one has to devise a scheme and go with it. I really like the way web.py (http://webpy.org/) handles URLs, which you just define as regex and pair with a class name, which is your page. It makes it really easy to get any URL (say, the 2007 Ford Mustang example) to the appropriate resource.
-
Eric Monse #
03:37pm, 19th October 2007Thanks for the insight. I wish I had a nickel for every time I got URL that was difficult to read. – Eric Monse
-
Simon Willison #
06:10pm, 19th October 2007Here’s another reason it’s a good idea for every resource or “thing” in your system to have one and only one URL: it means that you can have a function along the lines of “get_url_for_thing(thing)” – which you can then use all over your site, essentially anywhere the “thing” is mentioned. If the thing has more than one URL you have to decide which one you’re going to use every time you want to link to it from within your own site.
-
Robby Slaughter #
01:05am, 20th October 2007I wrote a related article “Who Cares How This Website Was Built?” a few months ago for my site. Check it out.
http://www.robbyslaughter.com/musings/who-cares-how-this-website-was-built/
-
top star #
08:25pm, 05th November 2007that article is very helpful thnk yu
-
Paul M. #
04:14pm, 03rd December 2007What I am missing in this well written article are examples for all 4 points. It would be much easier to understand the topic.
-
That’s a good point Paul, I’ll probably be doing an article on each of the principles which will elaborate and demonstrate what they all mean. It’s clear from some of the Reddit comments that I’ve not explained things as well as I could have.
-
London Escorts #
10:32am, 30th April 2008Great article! I have to admit that it’s very useful. Thanks for posting it.
Add Comment
Use this form to add your own comments

Mike Schinkel #
05:24am, 09th October 2007