andrew ingram's homepage

  1. Home
  2. Archives
  3. Portfolio
  4. About

Abstract

A brief summary of the article

Just a quick few words about producing readable URL slugs

Content

Main article body

When I read around for methods of converting a piece of text (such as a blog entry title) to a URL-friendly slug, I generally just find one-line regular expression replace functions without any real explanation of what’s going on. So here’s a basic overview of what such an algorithm should be doing:

  1. You need to be aware of what characters are generally allowed in URL segments (here’s a good guide), basically any letters or numbers are allowed. You can probably also allow a single quote (no double quotes) as the only piece of punctuation available to you, most other punctuation is reserved or considered dangerous to use.
  2. Choose a character for spaces, normal spaces aren’t allowed (they are actually but they have to be encoded so they end up being quite ugly). Realistically you can use underscores and hyphens. The preference is hyphens because they don’t get disguised if written as hyperlinks.
  3. Convert your text to lower-case. Your URLs should be consistently cased for usability and lower-case is the easiest and most attractive option. If an upper-case version of a URL (or part of a URL) is accessed it should redirect the user to the canonical lower-case version.
  4. Make sure you only don’t have two spacing characters in a row, this shouldn’t be an issue if you’re careful about the order of each step
  1. Depending on preference and URL length you might want to remove certain words such as prepositions to keep the slug as concise as possible whilst still being relatively meaningful

    In Java, the following method call is probably a good starting point:

    replaceAll("[^a-z0-9']+", "-")

    The Ruby equivalent would be

    gsub(/[^a-z0-9']+/, "-")

    You’ll need to do some other things before and after the above methods but they represent the key part of the algorithm.

    Note: I’m uncertain on the best practice for using the vertical single quote in slugs. I couldn’t see anything that said they’re a bad idea, but I imagine most people would prefer to just remove them. This raises the issue of whether to collapse the single quotes or to replace them with hyphens.

Comments

What people have had to say about this article

  1. Masklinn #
    11:21am, 19th October 2007

    You also want to make a pre-conversion pass that will transform any accented character in the non-accented version, and any non-ascii character into an ascii approximation of it (e.g. romaji for japanese)

  2. Robby Slaughter #
    01:32pm, 26th January 2008

    Pot, kettle, black? The URL of this article has underscores instead of hyphens!

Add Comment

Use this form to add your own comments

Required
Required

You may use Textile markup to format your comment

Colophon

Andrew Ingram is a 22 year old Brit with far too many opinions. He hopes to one day be able to legitimately call himself a designer. He currently resides in Royal Leamington Spa, Warwickshire which sounds more sophisticated than it really is.

Feeds

Subscribe to the RSS feed to read new content at your leisure (requires compatible software).
Full-Entry Feed