Tuesday, March 29, 2016

The Guide to International Website Expansion: Hreflang, ccTLDs, & More!

Posted by katemorris

Growth. Revenue, visits, conversions. We all want to see growth. For many, focusing on a new set of potential customers in another market (international, for instance) is a source of growth. It can sometimes seem like an easy expansion. If your current target market is in the US, UK, or Australia, the other two look promising. Same language, same content — all you need is to set up a site for them and target it at them, right?

International expansion is more complicated than that. The ease of expansion depends highly on your business, your resources, and your customers. How you approach expansion and scale it over time takes consideration and planning. Once you’ve gone down a path of URL structure and a process for marketing and content, it’s difficult to change.

This guide is here to help you go down the international expansion path on the web, focused on ensuring your users see the right content for their query in the search engines. This guide isn’t about recommendations for translation tools or how to target a specific country. It is all about international expansion from a technical standpoint that will grow with your business over time.

At the end is a bonus! A flow chart to help you troubleshoot international listings showing up in the wrong place in the SERPs. Have you ever wondered why your Canadian page showed for a user in the US? This will help you figure that out!

Before we begin: Terminology

ccTLD – A country-specific top-level domain. These are assigned by ICANN and are geo-targeted automatically in Google Search Console.

gTLD – A generic top-level domain. These are not country-specific and if used for country-specific content, they must be geo-target inside of Google Search Console or Bing Webmaster Tools. Examples include .com, .net, and .tv. Examples from Google found here.

Subdomain – A major section of a domain, distinguished by a change to the characters before the root domain. The most-used standard subdomain is www. Many sites start with www.domain.com as their main subdomain. Subdomains can be used for many reasons: marketing, region targeting, branded micro sites, and more.

Subfolder – A section of a subdomain/domain. Subfolders are sections marked by a trailing slash. Examples include www.domain.com/subfolder, or in terms of this guide, www.domain.com/en or www.domain.ca/fr.

Parameter – A modifier of a URL that either tracks a path of a user to the content or changes the content on the page based on the parameters in the URL. These are often used to indicate the language of a page. An example is www.domain.com/page1?lang=fr, with lang being the parameter.

Country – A recognized country that has a ccTLD by ICANN or an ISO code. Google uses ISO 3166-1 Alpha-2 for hreflang.

Region – Collections of countries that the general public groups together based on geography. Examples include the EU or the Middle East. These are not countries and cannot be geo-targeted at this time.

Hreflang – A tag used by Google to allow website owners to indicate that a specific page has a copy in another language. The tags indicate all other translated versions of that page along with the language. The language tags can have regional dialects to distinguish between language differences like British English and American English. These tags can reside on-page or in XML sitemaps.

Meta language – The language-distinguishing tag used by Bing. This tag merely informs Bing of the language of the current page.

Geo-targeting – Both Bing Webmaster Tools and Google Search Console allow website owners to claim a specific domain, subfolder, or subdomain, and inform the search engine that the content in that domain or section is developed for and targeted at the residents of a specific country.

Translation – Changing content from one language or regional dialect to another language or regional dialect. This should never be done with a machine, but rather always performed by someone fluent in that language or regional dialect.

Understanding country and language targeting

The first step in international expansion planning is to determine your target. There is some misunderstanding between country targeting and language targeting. Most businesses start international expansion wanting to do one of two things:

  1. Target users that speak another language.
    Example – A business in Germany: “We should translate our content to French.”
  2. Target users that live in another part of the world.
    Example – A business in Australia: “We should expand into the UK.”

False associations: Country and language

The first issue people run into is associating a country and a language. Many of the world’s top languages have root countries that share the same name; specifically, France/French, Germany/German, Portugal/Portuguese, Spain/Spanish, China/Chinese, Japan/Japanese, and Russia/Russian. Many of these languages are used in a number of other countries, however. Below is a list of the top languages used by Internet users.

Click to open a bigger version in a new tab!

Please note this is not the list of top languages in the world; that is a vastly different list. This list is based on Internet usage. And there are some languages that only have one country set as the official language, but users exist in other countries that browse the Internet with that language as their preferred language. An example might be a Japanese national working in the US setting up a new office.

Another note is that the “main” country chosen above is what country is the originator of the language (English) or what country shares a name with/is close to the language name. This is how many people associate languages and countries in most instances, but those assumptions are not correct.

Flags and languages

We must disassociate languages and countries. There are too many times when a country flag is used to note a language change on a site. Flags should only be used when the country is being targeted, not the language.

Click to open a bigger version in a new tab!

Web technology and use impacts targeting

The second issue arises in the execution. The business in Germany from the first few examples might hire a translator from France and translate their content to French. From there, the targeting can get confused based on where that content is placed and how it is tagged.

Below are some implementations of posting the translated content we might see by the business. This table looks at a variety of combinations of ccTLDs, gTLDs, subfolders, subdomains, hreflang tagging, and geo-targeting. Each combination of URL setup and tagging results in different targeting according to search engines and how that can impact the base number of Internet users in that group.

Click to open a bigger version in a new tab!

Given the above, you can see that the implementation is not as straightforward as it might seem. There's no single right answer in the above possible implementations. However, many of them change the focus of the original target market (speakers of the French language) and that has an impact on the base target market.

International search strategy tool

This is what many of us face when trying to do international expansion. There is conflicting data on what should be done. This is why I developed a tool to help businesses determine which route they should take in international expansion. It helps them determine what their real focus should be (language, country, or if they need to use both) and narrows down the list of choices above while understanding their business needs, resources, and user needs. It's developed over the years from a flow chart, to a poorly designed tool, to a better-structured tool found by clicking the link in the image below.

Start with those questions and then come back here when you have other questions. That’s what the rest of this guide is about. It’s broken down into three types of targeting:

  1. Language
  2. Country
  3. Hybrid (multiple countries with multiple languages)

No one type is easier than another. You really need to choose the path early on and use what you know of your business, user needs, and resources.

Language targeting

Language-only targeting can seem like the easiest route to take, as it doesn’t require a major change and multiple instances of marketing plans. Country-focused targeting requires new targeted content to each targeted country. There are far fewer languages in the world than countries. In addition, if you target the major world languages, you could potentially start with a base of millions of users that speak those languages.

However, language targeting involves two very tricky components: translation and language tagging. If either of these components are not done right, it can cause major issues with user experience and indexation.

Translation

The first rule of working with languages and translation is NEVER machine translate. Machine translation is highly inaccurate. I was just at an all-inclusive resort in Mexico, and you could tell the translations were done by a machine, not a person. Using machine translations produces a very poor user experience and poor SEO targeting as well.

Translations of content should always be done by a human who is fluent both in that language and the original language of the content. If you are dealing with regional variations, it is recommended to get someone that is native to and/or living in that area to translate, as well as being fluent.

Spending the right resources on translation will ensure the best user experience and the most organic traffic.

Language tagging: Hreflang and meta language

When you hear about translation and international expansion, the first thing people think about is the hreflang tag. Relative to the Internet, the hreflang tag is new. This launched in late 2010. It is only used by Google as of when this post was written. If the bulk of your traffic comes from Google and you are translating only, this is of use to you. However, do know that Bing uses a different tag format, called the meta language tag.

Tips: Ensure that there's an hreflang tag on every page that's translated to every other translated instance of that page. I prefer the tags be put in XML sitemaps (instructions here) to keep the tagging off the page, as any removal of code increases page load time, no matter how small. Do what works for your team.

What about x-default?

One of the tagging mistakes that happens most often is using x-default. Many people misunderstand its use. X-default was added to the hreflang markup family to help Google serve un-targeted pages, like those from IKEA and FedEx, to users that don’t have language-targeted content on that site or Google doesn’t know where to place them. This tag is not meant to set the "original" page.

Checking for tagging issues

Once you have your tagging live (or on a testing server that is crawlable by Google but not indexable), you can check for issues inside of Google Search Console. This will let you know what tag issues you are having and where they're located.

URL selections

Choosing the URL structure of your language extensions is totally up to you. If you are focusing on language targeting only, don’t use a ccTLD. Those are meant for targeting a specific country, not a language. ccTLDs automatically geo-target and that selection cannot be changed. Your other choices are subfolder, subdomain, and parameter. They're listed below in order of my professional preference and why.

  1. Subfolders provide a structure that's easier to build upon and develop as your site and business grows and changes. You might not want to target specific countries now or have the resources, but you may someday. Setting up a subfolder structure allows you to use the same structure for any future ccTLDs or subdomains for country sections in the future. Your developers will appreciate this choice because it's scalable for hreflang tags, as well.
  2. Parameters allow a backup system in case your tagging fails in a site update in the future. Parameters can be defined in Google as being used to modify the language on the page. If your other tags are lost, that parameter setting is still telling Google that the content is being translated.
    Using a parameter for language is also scalable for future plans and easy for tagging, like subfolders. The downsides are that they're ugly and might accidentally be negated by a misplaced rel canonical tag in the future.
  3. Subdomains for language targeting is my least favorite option. Only use this if it's the only option you have, by decree of your technical team. Using subdomains for languages means that if you change plans to target countries in the future, you'll lose many options for URLs there. To follow the same structure for each country, you would need to use ccTLDs; while those are the strongest signal for geo-targeting, they are also the option that requires the most investment.

Notice that ccTLDs are not on this list. Those are only for geo-targeting. Unless you're changing your content to focus on a specific country, do not use ccTLDs. I say this multiple times for a reason: too many websites make this mistake.

Detecting languages

Many companies want to try to make the website experience as easy as possible for the user. They attempt to detect the user’s preferences without needing input from the user. This can cause problems with languages.

There are a few ways to try to determine a user’s language preferences. The most-used are browser settings and IP address. It is not recommended to ever use the IP address for language detection. An IP address can show an approximate user location, but not their preferred language. The IP address is also highly inaccurate (just the other day I was "in" North Carolina and live in Austin) and Google still only crawls from a US IP address. Any automatic redirects based on IP should be avoided.

If you choose to try to guess at the user’s language preference when they enter your site, you can use the browser’s language setting or the IP address and ask the user to confirm the choice. Using JavaScript to do this will ensure that Googlebot does not get confused. Pair this with a good XML sitemap and the user can have a great interaction. Plus, the search engines will be able to crawl and index all of your translated content.

Country targeting, AKA geo-targeting

If your business or content changes depending on the location of the user, country targeting is for you. This is the most common answer for those businesses in retail. If you offer a different set of products, if you have different shipping, pricing, grouping structure, or even different images and descriptions, this is the way to go.

Example: If a greeting card business in the US wanted to expand to Australia, not only are the prices and products different (some different holidays), the Christmas cards are VASTLY different. Think of Christmas in summer, as it is in Australia, and only being able to pick from cards with winter scenes!

Don’t go down the geo-targeting route if your content or offerings don’t change or you don’t have the resources to change the content. If you launch country-targeted content in any URL structure (ccTLD, subdomain, or subfolder) and the content is identical, you run the risk of users coming across another country’s section.

Check out the flow chart at the end to help figure out why one version of your site might be ranking over another.

Example: As a web development service in Canada, you want to expand into the US. Your domain at the moment is www.webdevexpress.ca (totally made up!). You buy www.webdevexpress.us (that’s the ccTLD for the US, by the way). Nothing really needs to change, so you just use the same content and go live. A few months down the road, US clients are still seeing www.webdevexpress.ca when they do a brand name search. The US domain is weaker (fewer links, mentions, etc.) and has the same content! Google is going to show the more relevant, stronger page when everything is the same.

Regions versus countries

Knowing what country or which countries you want to focus on in expansion is usually decided before you determine how to get there. That's what spawns the conversation.

There's one misconception that can throw off the whole process of expansion, and that is that you can target a region with geo-targeting. As of right now, you can purchase a regional top-level domain like .eu, but those are treated as general top-level domains like .com or .net.

The search engines only operate geo-targeting in terms of countries right now. The Middle East and the European Union are collections of countries. If you set up a site dedicated to a region, there are no geo-targeting options for you.

One workaround is to select a primary country in that region, perhaps one in which you have offices, and geo-target to that country. It’s possible to rank for terms in that primary language in surrounding countries. We see this all the time with Canada and the US. If the content is relevant to the searcher, it’s possible to rank no matter the searcher.

Example: If you’re anywhere other than the UK, Google "fancy dress" — you see UK sites, right? At least in the US, "fancy dress" is not a term we use, so the most relevant content is shown. I can’t think of a good Canadian/US term, but I guarantee there are some out there!

URL selections

The first thing to determine in geo-targeting beyond the target countries is URL structure. This is immensely important because once you choose a structure, every country expansion should follow that. Changing URL structure in the future is difficult and costly when it comes to short-term organic traffic.

In order of my professional preference, your choices are:

  1. Subfolders. As with the language/translation option, this is my preferred setup, as it utilizes the same domain and subdomain across the board. This translates to utilizing some of the power you already built with other country-focused areas (or the initial site). This setup works well for adding different translations within one country (hybrid approach) down the line.
    Note: If you go with subfolders on both, always lead with the country, then language down the line.
    Example:
    www.domain.com/us/es (US-focused, in Spanish language) or www.domain.com/ca/fr (Canada-focused, in Canadian French).
  2. ccTLDs. This is the strongest signal that you're focusing your content on a specific country. They geo-target automatically (one less step!), but that has a downside as well. If you started with a ccTLD and expanded later, you can’t geo-target a subfolder within a ccTLD at this point in time.
    Example: www.domain.ca/us will not work to target the US. The target will remain Canada. It might rank in the US, depending on the term competition and relevance, but you can’t technically geo-target the /us subfolder within the Canadian ccTLD.
  3. Subdomains. My last choice, because while you're still on the same root domain, there's that old SEO part of me that thinks a subdomain loses some equity from the main domain. BUT, if your tech team prefers this, there's nothing wrong with using a subdomain to geo-target. You'll need to claim each subdomain in Search Console and Bing Webmaster Tools and set the geo-target for each, just as you would with subfolders.
    Example: gb.domain.com

Content changes

The biggest question asked when someone embarks on country-targeting expansion is: “How much does my content need to change to not be duplicated?” In short — there is no magic number. No metric. There isn’t a number of sentences or a percentage. How much your content needs to change per country site or subsite is entirely up to your target market and your business.

You'll need to do research into your new target market to determine how your content should change to meet their needs. There are a number of ways you might change your content to target a new country. The most common are:

Product differentiation

If you offer a different set of products or services to different countries by removing those that are not in demand, outlawed, or otherwise not wanted, or by adding new products for that country specifically, that is changing your site content.

Example #1: Amazon sells the movie "Elf" in the US and the UK, but they are different products. DVDs in Europe are coded for Europe and might not play on US players.

Example #2: Imagine you're a drugstore in the UK and want to expand to the US. One of your products, 2.5% Selenium Sulphide, is not approved for use in the US. This is one among hundreds or thousands of products that are different.

Naming schema

The meaning of product names can change in different countries. How a specific region terms a product or service can change as well, making it necessary to change your product or service naming schema.

Keyword usage

Like the above, the words you use to describe your products or services might change in a new country. This can look like translation, but if it’s the change of just a few terms, it’s not considered full translation. There's a fine line between these two things. If you realize that the only thing you're changing is the wording between US and UK English, for example, you might not need to geo-target at all and mark the different pages as translations.

Keyword use change example: "Mum" versus "Mom" or "Mother" when it comes to Happy Mother’s Day cards. You need to offer different cards in this and other categories because of the country change. This is more than a word change, so it’s a case of geo-targeting — not just translation.

Translation change example: Etsy.com. Down at the bottom of the page, you can change your language setting. I set mine to UK English, and words like "favourite" started to show up. If this sounds like what you would need to do and your content would not change otherwise (Etsy shows all content to all users regardless of their location), consider translation only.

Pricing structure

Many times, one of the most common things that change in country-specific content is pricing. There's the issue of different currency, but more than that, different countries have different supply and demand markets that should and will change your pricing structure.

Imagery changes

When dealing with different cultures, sometimes you find the need to change your site imagery. If you’ve never explored psychology, I highly recommend checking out The Web Psychologist – Nathalie Nahai and some of her talks. Understanding your new target market’s culture is imperative to marketing effectively.

Example: Samsung changes the images on their UK versus China sites to change the focus from an individualistic to a collectivistic culture. See my presentation at SearchLove San Diego for more examples.

Laws, rules, and regulations

One of the most important ways to change your content is to satisfy the local laws and regulations. This is going to depend on each business. You might deal with tons, while others might deal with none. Check out local competitors — the biggest you can identify — to see what you might need to do.

Example: If you move into the UK and set cookies on your visitor’s machine, you have to alert them to the use of cookies. This is not a law in the US and is easily missed.

User experience and IP redirects

When people start moving into other countries, one of the things they want to ensure is that users get to the right content. This is especially important when products change and the purchase of an incorrect product would cause issues for the user, or the product isn’t available to them. Your customer service, user experience, or legal team is going to ask that you redirect users to the correct country. Everyone gets to the right place and the headaches lessen.

There isn’t anything wrong with asking a user to select the country they reside in and set a cookie, but many people don’t want to bother their users. Therefore, they detect the user’s IP address and then force a redirect from there. There are two problems with this setup.

  1. IP addresses are inaccurate – I was in Seattle, WA once and my IP had me in Washington, DC. No kidding. Look at that distance on a map. Think about that distance in terms of Europe and how much might change there.
  2. Google crawls from California – For the time being, using an IP-based forced redirect will ensure your international content is not indexed. Google will only ever see the US content if you do a forced redirect.

You can deal with this by detecting the country-using IP address (or if organic traffic, what version of Google they came from) and using a JavaScript popup to ask what their preferred country is, then set a cookie with that preference. Even if the user clicks on another country’s content in the future, they will be redirected to their own.

No hreflang??

If you went through that tool, you noticed that my geo-targeting plan does not include hreflang. Many other people disagree with me on this point, saying that the more signals you can send, the better.

Before I get into why I don’t recommend setting up hreflang between country targeted sub-sites, let me make one thing clear. Setting up hreflang will not hurt your site if you are really focusing on country targeting and it’s not that intricate of a setup yet (more on that later). Let’s say you're in Canada and want to open a US-targeted site. Your content changes because your products change, your prices change, your shipping info changes. You create domain.com/us and geo-target it to the US. You can add hreflang between each page that is the same between the two sub-sites — two products that exist in both locations, for example. The hreflang will not hurt.

Example: If you don’t have the resources to change your content at the moment to fully target the UK, only translate your content a bit between your US (domain.com) and UK (domain.co.uk), and have plans to change your content down the road, an hreflang tag between those two ccTLDs can help Google understand the content change and who you're targeting.

Why I don’t recommend hreflang for geo-targeting only

Hreflang was meant to help Google understand when two pages are exactly the same, but translated. It works much like a canonical tag (which is why using another canonical can be detrimental to the hreflang working) in which you have multiple versions of one page with slight changes.

Many people get confused because there's the ability to use country codes in the hreflang tags. This is for when you need to tell Google of a dialect change. An example would be if you have two sub-sites that are identical, but the American English has been changed to British English. It's not meant to inform Google that content that's targeted at a different country is targeted at that country.

When I recommend geo-targeting only, I make it very clear to clients that going down this route means you really need to change the content. International business is so much more than just translation. Translating content only might hurt your conversion rates if you miss some aspect of the new target market.

Hiring content writers in that country that understand the nuances is very important. I worked for a British company for 4 years, so I get some of the differences, but things continually surprise me still. I would never feel comfortable as an American writing content for a British audience.

I also don’t recommend hreflang in most geo-targeting cases, because the use of geo-targeting and hreflang can get really confusing. This has led to incorrect hreflang tags in the past that have wreaked havoc on Google's understanding of the site structure.

Example: A business starts off with a Canadian domain (domain.ca) and a France domain (domain.fr). They use hreflang between the English for Canada and French for France using the code below. They then add a US site and the code is modified to add a line for the US content.

<link rel="alternate" hreflang="en" href="http://domain.ca/" />
<link rel="alternate" hreflang="fr" href="http://domain.fr/" />
<link rel="alternate" hreflang="en-us" href="http://domain.com/" />

This looks odd because there is one English-language page with no regional modifications that is on a Canadian-targeted domain. There is a US regional English dialect version on a general top-level domain (as .com is general and is not US-specific, but people use it that way).

Remember, this is a bot that's trying to logic out a structure. For a user that prefers UK English, there is no logical choice. The general English is a Canadian site and the general TLD is in US English. This is where we get some of the inconsistencies with international targeting.

You might be saying things like “That would never happen!” and “They should have changed the first English to Canadian English (en-ca)!”, but if you've ever dealt with hurried developers (they really do have at least 50 requests at once sometimes) you'll know that they, like search bots, prefer consistency.

Hreflang should not be needed in geo-targeting cases because, if you're really going to target a new country-specific market, you should treat them as a whole new market and create content just for them. If you can’t, or don’t think it’s needed, then providing language translations is probably all you need to do at the moment. And hreflang in geo-targeting cases can cause confusion with code that might confuse the search engines. The less we can confuse them, the better the results are!

Hybrid targeting

Finally, there is the route I call "hybrid," or utilizing both geo-targeting and translation. This is what most major retail corporations should be doing if they're international. Due to laws, currency, market changes, and cultural changes, there is a big need for geo-targeted content. But in addition to that, there are countries that require multiple language versions. There might be anywhere from one to a few hundred used languages in a single country! Here are the top countries that use the web and how many recognized languages are used in each.

Click to open a bigger version in a new tab!

Do you need to translate into all 31 languages used in the US? Probably not. But if 50% of your target market in Canada prefers Canadian French as their primary language, the translation investment might be a good one.

In cases where a geo-targeted site (ccTLD use) or sub-site (subdomain or subfolder) needs more than one language, then there is the need to geo-target the site or sub-site and then use hreflang within that country-specific site.

This statement can be confusing, so let me show you what I mean:

Click to open a bigger version in a new tab!

This requires a good amount of planning and resources, so if you need to embark on this path in the future, start setting up the structure now. If you need to go the hybrid route, I recommend the following URL structures for language and country targeting. As with before, these are in order of my professional preference and are all focused on content targeted to Canada in Canadian French.

(Country structure/Language structure)

  1. Subfolder/Subfolder
    Example: domain.com/ca/fr
  2. Subfolder/Parameter
    Example: domain.com/ca/page.html?lang=fr
  3. ccTLD/Subfolder
    Example: domain.ca/fr
  4. ccTLD/Parameter
    Example: domain.ca/page.html?lang=fr
  5. Subdomain/Subfolder
    Example: ca.domain.com/fr
  6. Subdomain/Parameter
    Example: ca.domain.com/page.html?lang=fr
  7. ccTLD/Subdomain (not recommended, nor are the other combinations I intentionally left out)
    Example: fr.domain.ca

The hybrid option is where the hreflang setup can get the most messed up. Make sure you have mapped everything out before implementing, and ensure you're considering future business plans as well.

I hope this helps clear up some of the confusion around international expansion. It really is specific to each individual business, so take the time to plan and happy expansion!

Troubleshooting International SEO: A flowchart

Click to open a bigger version in a new tab!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

No comments:

Post a Comment