Abstract
A brief summary of the article
Just a quick few words about producing readable URL slugs
Content
Main article body
When I read around for methods of converting a piece of text (such as a blog entry title) to a URL-friendly slug, I generally just find one-line regular expression replace functions without any real explanation of what’s going on. So here’s a basic overview of what such an algorithm should be doing:
- You need to be aware of what characters are generally allowed in URL segments (here’s a good guide), basically any letters or numbers are allowed. You can probably also allow a single quote (no double quotes) as the only piece of punctuation available to you, most other punctuation is reserved or considered dangerous to use.
- Choose a character for spaces, normal spaces aren’t allowed (they are actually but they have to be encoded so they end up being quite ugly). Realistically you can use underscores and hyphens. The preference is hyphens because they don’t get disguised if written as hyperlinks.
- Convert your text to lower-case. Your URLs should be consistently cased for usability and lower-case is the easiest and most attractive option. If an upper-case version of a URL (or part of a URL) is accessed it should redirect the user to the canonical lower-case version.
- Make sure you only don’t have two spacing characters in a row, this shouldn’t be an issue if you’re careful about the order of each step
- Depending on preference and URL length you might want to remove certain words such as prepositions to keep the slug as concise as possible whilst still being relatively meaningful
In Java, the following method call is probably a good starting point:
replaceAll("[^a-z0-9']+", "-")The Ruby equivalent would be
gsub(/[^a-z0-9']+/, "-")You’ll need to do some other things before and after the above methods but they represent the key part of the algorithm.
Note: I’m uncertain on the best practice for using the vertical single quote in slugs. I couldn’t see anything that said they’re a bad idea, but I imagine most people would prefer to just remove them. This raises the issue of whether to collapse the single quotes or to replace them with hyphens.
Comments
What people have had to say about this article
-
You also want to make a pre-conversion pass that will transform any accented character in the non-accented version, and any non-ascii character into an ascii approximation of it (e.g. romaji for japanese)
-
Robby Slaughter #
01:32pm, 26th January 2008Pot, kettle, black? The URL of this article has underscores instead of hyphens!
Add Comment
Use this form to add your own comments

Masklinn #
11:21am, 19th October 2007