UX principles for multilanguage websites?

Different websites--even the largest and most popular-- take different approaches to localization, and I suspect that not all are very user friendly. By looking at examples online and reading articles, as well as some questions on this forum (like this one Bilingual website usability or this one Language of language names in the language selector?)

Now I have my best understanding of the principles, which I suspect is not the best. Let me know any improvements.

For a large brand with a presence in multiple countries:

  • each country has an independent website. Country code TLDs for most countries, .com for the US.
  • if the user goes to a country code TLD, they go directly to the website for that site. If the user goes to the generic .com, then their location is determined from their IP address, and they go to the site for their location--.com for the US or no match, a country code TLD for the country they are in. Some locations may be mapped to neighboring countries, where appropriate, if they do not have their own website.
  • if the country has multiple languages that are commonly spoken, then language negotiation is used to read the user's language preference from the browser settings. A default language is set for each country, in case language negotiation fails to find a matching language.
  • a flag icon, with the name of the country in the currently selected language, appears with other navigation links near the top of the website. The flag indicates the country the website is targeted for and the name indicates the language; clicking it lets the user select which country's website they wish to see. The country name shows in the current language.
  • countries with multiple languages appear multiple times in the list, each one showing the same flag
  • each item shows a country name and language name (for example, "US - English") in the language itself. Items are sorted in Unicode order to allow for consistent sorting across languages.
  • if the user manually selects a locale, their preference is saved in their account (if logged in to an account) or as a cookie (if not logged in)
  • each page has a content identifier--most likely a URL path on the .com site (although not all of these identifiers will exist as true URLs). When the user changes national websites, a match is attempted to be found for the content identifier, moving up one level of the path at a time until a match is made. For example, suppose the user is on a page of a Japan site talking about an upcoming special event in Tokyo. Let's say the content identifier is /news/events/tokyo/groundbreaking-ceremony. The user switches to the website for Turkey, which has no page with a /news/events/tokyo/groundbreaking-ceremony content identifier. It moves a level up; no page is found with a /news/events/tokyo content identifier. It moves a level up; a page is found with a /news/events content identifier (the actual URL, of course, would be quite different and in Turkish). The user would land on a page talking about upcoming company events relevant to Turkey. Since there is no exact match for the page they were looking at, it tries to find something as close as possible without just dumping them on the home page.
  • for multi-language national websites, the language is set as a subdomain. The Switzerland localized website for French-speakers would be something like fr.mycompany.ch, while the German-speaking version would be de.mycompany.ch.
  • each national website may have wildly different content and design, related to what the company is doing or selling in that country or how they wish to portray themselves, but every page is translated into every supported language for that country.
  • when the user changes languages on the same national website, they go to the same URL, but with the other subdomain, loading the same page in the newly selected language.
  • each website/language appropriately sets the lang attribute

That's the best I can think of to allow complete marketing freedom in each geographic market, while trying to keep things as simple for the user as possible. IMHO, the switcher should be obvious and easy to find, but take little space in the design. The only case where I think the approach I outlined in detail above would falter is if someone was residing in (or interested in) a particular country, but spoke a language not commonly found in that country. A German visitor could see the national website for Germany in German, or the Mexican website in Spanish, but could not see the Mexican website in German. Since each website would be so different, it would not be economical to translate the national website into a language not very commonly spoken in that country.

Note: I realize flags should not be used for language, only for countries. However, when dealing with the combination of country and language, I think it's easier to show multiple flag/country/language items rather than make the user go through multiple steps, such as first selecting a country and then selecting a language.

Now we move on to simpler websites: websites where the content is the same, but we are only worried about languages, not locations. Many popular information websites would not want to try to create completely different content for each country; they simply want to offer their content translated in multiple languages to engage a larger readership, regardless of location. Here are my thoughts on that situation.

  • language negotiation is used to read the user's language preference from the browser. That language is selected, or the default language is used if no match is made
  • for purposes of SEO and clarity, use subdomains (not subfolders) for languages.
  • instead of location-focused symbols like flags or a globe, the Language Icon (http://www.languageicon.org/) is shown with other navigation near the top of the page, along with the name of the current language (displayed in the current language). This shows what is currently selected and makes it clear that the user can click that to change the language.
  • when the list of languages appears, each language name is shown in its own language, sorted by Unicode order
  • when a RTL language is used, sidebars should be displayed on the opposite side, the "direction" attribute of text blocks should be changed, and horizontal menus should be aligned in the opposite direction. (On a technical note, I suspect this would be best accomplished by setting a rtl or ltr class on the body element, based on the language, and then writing two versions of relevant CSS rules.)
  • if the user manually selects a different language, this preference is saved to their account (if logged in) or as a cookie (if not logged in)
  • any internal link the user follows takes them to that URL in the currently selected language. If they change languages, they access the same page, but using the subdomain for the new language
  • the only comments displayed on a blog post are those made on the same language version of the site. You don't see all the comments, but you see the comments that are (presumably) in the same language as the content.
  • each website/language appropriately sets the lang attribute

What about heavily crowdsourced websites, like Wikipedia? That would be a less common situation, but the crux there is that the website is basically the same across languages, but not every page is translated to every language. I think the Wikipedia approach would be the best here; show the list of available languages in the sidebar. This list will be longer for some articles and shorter for others.

I tried to read and research what I could and look at existing websites, but that provided me with more conflict and fewer solid best practices than I hoped, so most of this is based on whatever sounds to me like it makes the most sense at the moment. I'm sure it's not perfect; I have many doubts. For example, I don't know if it's better to have a small language selector at the top of the site, or to have a list of available languages spelled out at the bottom... I could go on and on. In any case, please let me know your thoughts--which of these best practices would you correct, and why? I want to learn the best approach from a UX perspective.