User-friendly short URL aliases with words or emojis

First off, I’m not talking about quicklinks or SEO of URL paths on a site you’re in control of, but shorter aliases for third-party addresses. As far as I can see there are several major reasons why one would want to use such a service:

  • limited display space (e.g. on Twitter)
  • manual input (copied from a slide show, TV screen, print-out …)
  • aural transmission (phone conversation …)
  • memorization
  • traffic analytics and tracking (“benefit” on aliases, not of shortening)

The three bullet points in the middle are UX issues. Most popular services have a very short domain name (3 to 5 letters overall plus a single dot), but they vary in their choice (alphabet) and hence number (base) of available characters and in the length of the path string.

The alphabet is usually one of these:

  1. lowercase latin letters abcdefghijklmnopqrstuvwxyz – 26
  2. uppercase latin letters ABCDEFGHIJKLMNOPQRSTUVWXYZ – 26
  3. international decimal digits 0123456789 – 10
  4. latin letters, #1 and #2 – 52
    • with confusable lI excluded – 50
    • with confusable lI aliasing each other – 51
  5. lowercase and digits, #1 and #3 – 36
    • with confusable l1 excluded – 34
    • with confusable l1 aliasing each other – 35
  6. letters and digits, #1, #2 and #3 – 62
    • with confusable lI1, O0, Z2, S5, G6 excluded – 51
    • with confusable lI1, O0, Z2, S5, G6 aliasing each other – 56

There’s some variety on which characters are considered confusable, e.g. one could add letters where both cases look a lot alike: cC, fF, kK, oO, pP, sS, uU, vV, wW, xX, yY, zZ, or limit it to just lI1 and O0. Anyhow, the number of unique short URLs is baselength. If the length is variable, a quick approximation is to increment the base by 1 and use the maximum length as exponent.

For instance, five alphanumeric characters (#6) make hundreds of millions of possible strings and so do nine digits (#3). Let’s assume we want that kind of entropy to limit the chance of seeing the wrong shortcut URL. Without training and trying hard, the human brain is good at remembering up to about seven chunks of information. Such chunks can be single arbitrary characters, but also simple numbers (e.g. 15, 200), words or pictures.

XKCD Password Strength comic comparing ‘Tr0ub4dor&3’ to ‘correct-horse-battery-staple’

Would it make sense for UX purposes to use a larger but known alphabet and a smaller number of chunks? Since chunks can be multiple characters, the length of the string and hence URL could be longer, though. I’m thinking of two to four of either …

  1. short common words from the a dictionary of hundreds of entries in the user’s language or just English, possibly with a single-character separator (e.g. -) between them, or
  2. Unicode emojis (with the server handling encoding and canonization issues, e.g. variation selectors, Fitzpatrick scale and ZWJ sequences) – there are more than 100 smiley, human and animal face emoji alone.

Am I right that this would make more humane short URLs? Emojis (or other symbols) would also be more compact visually than alphabetic letters.

How would one decide which words or emojis to use for their alphabet?
I guess words should be well known and both easy (→ typos) and unambiguous to spell (e.g. not colo[u]r or fuchsia). They could even come from a limited subset (e.g. color names), which could reduce the chance of coincidental combinations that together have an unintended meaning (e.g. douche-bag). This also applies to emoji, where one should probably exclude newer ones and those that differ significantly between vendors. Emoji short URLs seem mostly appropriate where input with a soft keyboard or similar IME is anticipated (i.e. mobile phones and tablets) or provided (websites, apps).