Tuesday, October 13, 2015

The Anatomy of a Link - What Makes a Good (and Bad) Link?

Posted by Paddy_Moogan

The following is an excerpt from The Linkbuilding Book, an e-book by Paddy Moogan available for purchase and download. This chapter, entitled "The Anatomy of a Link," offers deeper insight into what makes for a quality link. Enjoy!

Not all links are created equal. One part of the Google algorithm is the number of links pointing at your website, but it would be foolish to make this a raw number and not take into account the quality of those links. Otherwise it would just be a free for all, and everyone would be trying to get as many links as they can with no regard for the quality of those links.

Back in the early days of search engine optimization, it was pretty much a free-for-all because the search engines were not as good at determining the quality of a link. Even the introduction of PageRank, combined with anchor text as a signal, didn’t deter link spammers. As search engines have become more advanced, they have been able to expand the link-related signals they can use beyond raw numbers. Search engines can look at a number of factors, which can all combine and give them an indicator of quality. More to the point, they can tell whether the link is likely to be a genuine, editorially-given link, or a spammy link.

These factors are outlined in more detail below. There is something important to remember here, though: it isn't really the link itself you care about (to a certain degree). It is the page and the domain you are getting the link from which we care about right now. Once we know what these factors are, it helps set the scene for the types of links you should (and shouldn’t) be getting for your own website.

Before diving into the finer details of links and linking pages, I want to take a much broader look at what makes a good link. To me, there are three broad elements of a link:

  • Trust
  • Diversity
  • Relevance

If you can get a link that ticks off all three of these, you’re onto a winner! However, the reality is that this is quite hard to do consistently. But you should always have it in the back of your mind.

Links that are trusted

In an ideal world, all links that you get would be from trusted websites. By trust, we often mean what Google thinks of a website, and some will also refer to this as authority. As we’ve discussed, Google came up with PageRank as a way to objectively measure the trust of every single link they find on the web. Generally, the more PageRank a page has, the more trusted it is by Google, the more likely it is to rank and the more likely it is to help you rank better if it links to you.

However, there is another concept here that you need to be aware of—TrustRank.

TrustRank differs from PageRank in that it is designed to be harder to game if you’re a spammer. Taken from the TrustRank paper, written in 2004:

Let us discuss the difference between PageRank and TrustRank first. Remember, the PageRank algorithm does not incorporate any knowledge about the quality of a site, nor does it explicitly penalize badness. In fact, we will see that it is not very uncommon that some site created by a skilled spammer receives high PageRank score. In contrast, our TrustRank is meant to differentiate good and bad sites: we expect that spam sites were not assigned high TrustRank scores.

Source: http://www.vldb.org/conf/2004/RS15P3.PDF

If you click through to this PDF to read the full paper on TrustRank, you’ll notice that it is a joint effort between Stanford and Yahoo. There was some confusion as to who came up with the original idea for TrustRank because of this. Also, a patent granted to Google in 2009 referring to “Trust Rank” describes a very different process to the one in the original paper from 2004.

For now, we’re going to briefly discuss the idea of TrustRank from 2004 and how it may be used by the search engines to calculate trust.

Let’s start with this simple diagram:

Starting from the left hand side, if you imagine that you had a list of websites that you trust 100%, it may include sites like the BBC, CNN, The New York Times, etc. In this “seed list” you have no spam whatsoever because these are super high-quality websites with a high level of editorial control. As we move one step to the right, we first have a list of websites that are one link away from the trusted seed set. The amount of spam increases ever so slightly, but not a lot. Hat tip to Rand for the original visualization of this.

Now go to the far right of the diagram, and we can see that, even if a list of websites is just three links away from the trusted seed set, websites in that list are more likely to be spam—as many as 14% of them, in fact.

Therefore, the search engines could define their own trusted seed set of websites and use this as a starting point for crawling the web. As they crawl through these websites and follow the external links, they can see how far away any given website is from the trusted seed set. The implication is that the further away from there a website is, the higher likelihood it has to be spam. While this isn’t an exact science, when you think of the billions of pages online which need to be measured for trust, this is a highly scalable way of doing it, and the tests from the original paper showed that it worked well, too.

Links that are diverse

There are two types of diversity that I want to cover here:

  • Diversity of linking domains
  • Diversity of link type

Both of these are important if we want to build good links and have a strong, robust link profile.

Diversity of linking domains simply means getting links from lots of different domains—not the same ones over and over again. I discuss this in much more detail below.

Diversity of link type means getting links from different types of domains. If all of your links are from web directories, that isn’t very diverse. If all of your links come from press release syndicators, that isn’t very diverse. I’m sure you see what I mean. A natural link profile will contain links from many different types of websites.

Links that are relevant

The word "relevant" here is not referring to the page that the link is on, but rather it is referring to the link itself. Anchor text allowed Google to discover the possible topic of a page without even having to crawl it, and it became a strong signal to them.

Therefore, we need to acquire links to our website that are relevant to us—we can do this by trying to make the anchor text contain a keyword that we are targeting and is relevant to us. However, caution is needed here in light of Google updates in 2012, namely Penguin, which had a massive effect on link building.

I won't go into too much detail here so that I don't distract from the goal of this chapter, but the key takeaway is that building links with overly-commercial, keyword-focused anchor text is a lot riskier than it used to be. It can still definitely work, but does pose a risk if you overdo it.

Elements of a page that may affect the quality of a link

As we have talked about at the start of this chapter, Google does not simply look at the raw number of links pointing at your website. They look at many other factors to try and assess the quality of a link and how much value it should pass to the target page. In this chapter, we will take a detailed look into what these factors could be and what this means to your work as a link builder.

Some of these factors are mentioned in a patent filed by Google in 2004 and granted in 2010, which became known as the “reasonable surfer” model. It basically outlines how various elements of a link, as well as the page containing the link, may affect how Google treats a link.

Below we’ll take a look at these and explore how they may affect your work and what you need to remember about each of them.

Number of other outgoing links on a page

If the link pointing to your website is among hundreds or thousands of other outgoing links on a single page, then chances are that it isn't as valuable. If you think about it from a user’s point of view, they probably are not going to find a page with hundreds of links particularly useful. There are, of course, exceptions, but on the whole, these types of pages do not provide a good user experience. There is also the probability that these types of pages are only created for links and do not have much real content on them, which is also a signal of a page that ultimately isn't a good user experience.

Also, going back to our knowledge of how PageRank works, the higher the number of outgoing links on a page there are, the less value each of those links is worth. This isn’t a hard and fast rule, though, and has been the topic of hot debate in the SEO industry for many years, particularly in relation to PageRank sculpting, which is discussed in another chapter.

How this affects your work as an SEO

When seeking to get links from existing pages on a website, as opposed to new pages, take a look at the number of other outgoing links on a page using a tool such as Search Status (Firefox) or OpenSEO stats (Chrome). If the number looks very high, then you may want to consider whether the link is worth going for and spending time acquiring. Obviously you should take account of other factors too, such as whether the domain is a particularly strong one to get a link from, even if it is among hundreds of other links.

You may also want to consider whether there is a genuine reason for a high number of other links on the page. If there is a genuine reason, then the link may still be worth going for. One thing you should definitely look out for is a lot of links to other websites which are not related to the topic of your page. In particular, look for links which look like they go to gambling, poker, pills, and health websites. If you see these, then you may be looking at a link exchange page where the webmaster has only put those links in place because he or she got one back from the site being linked to. These are the type of reciprocal links that Google does not like to see and will probably reduce the value of.

One good rule of thumb here is to think whether the page is of value to a real user and whether someone may actually click through to your website from it. If the answer to both of these is no, then it may not be the best link to pursue.

The page having a penalty or filter applied

This is a bit of a controversial one. Traditionally, the official line from Google has always been that links from bad pages can't hurt you. There have been a few comments from Google employees to the contrary, but, on the whole, their stance has always been the same. However, we have seen this stance downplayed a little in recent years, with various comments from Googlers becoming a lot softer and not explicitly saying that links from bad pages can’t hurt you. My own personal experience (and that of many SEOs) is that links from bad pages or penalized pages can hurt you, and of course we’ve seen the Penguin update reduce the rankings of many websites that had low-quality links pointing at them. The release of the disavow tool was pretty much an acknowledgement that bad links could hurt you; therefore, you had a tool available to help you deal with this.

I can see why Google, up until recently, held this public stance. They do not want to encourage people to deliberately point bad links at their competitors in an effort to hurt their rankings. The fact is that this is a practice which does happen a lot more than people think. Therefore, I feel it is one that every SEO should be aware of and know how to deal with. We will get into a lot more detail on identifying and removing link-based penalties in a later chapter, but for now we will stick within the context of this chapter.

How this affects your work as an SEO

You need to be able to identify links from pages which may be low quality in the eyes of Google. You also need to be able to spot low-quality pages when identifying possible link targets. We will explore a method for identifying large numbers of low-quality links in a link profile later on.

The quality of other websites being linked to from that page

There is the concept of being within a "bad neighborhood" when it comes to your link profile. This stems from the idea that if you are seen to be clustered and associated with a load of other low-quality websites, your website could be hurt and the trust lowered. One way to get into a bad neighborhood is to get links from the same places as low-quality, spammy websites. So if your website is linked to from the same page as 25 other websites, most of which are low quality, it isn't a good signal to send to Google.

This ties in with your analysis of the number of outgoing links on a page which we discussed earlier. Quite often, you will find that pages with very high numbers of outgoing links will have lower editorial standards. This naturally means that they are more likely to be linking to lower-quality websites. There is also the possibility that the list of links isn't checked very often for quality.

You definitely want to avoid instances of your website getting links from the same pages as low-quality websites. This helps Google see that you are a genuine website that doesn’t partake in any low-quality link building. If you find one or two instances of getting these types of links, then you probably will not have any issues. But if you find that you are getting many of your links from low-quality pages and bad neighborhoods, then you will want to take a closer look and see if these links are hurting you.

How this affects your work as an SEO

It can be hard to go and investigate the quality of every website being linked to from the page you are considering as a link target. You could do some scraping and assess the quality of outgoing links using some metrics, but doing this on scale can be quite intensive and take a lot of time. What I’d advise doing is trying to develop your gut feeling and instincts for link building. Many experienced link builders will be able to look at a page and know right away if the outgoing links are to low-quality websites. This gut feeling only comes with time and practice.

Personally, if I look at a page of links and it looks like a link exchange page that doesn’t appeal to me as a user, it probably isn’t a high-quality page. I’d also look for lots of exact match keyword links to other websites, which is a tell-tale sign of low editorial standards.

Again, it can help to put yourself in the position of a user and assess whether the page is genuinely useful or not.

Number of incoming links to the page

If the page you are getting a link from has lots of other links pointing at it, then that gives the page a level of authority that is then passed onto your website. Chances are, if the page is a genuinely good resource, then it will accrue links over time, giving it a level of link equity that many spammy pages will never get. Therefore, a link from a page with lots of link equity is going to be far more valuable to you.

At the same time, if this page is a genuinely good resource, the editorial standards will be a lot higher, and you’ll have a tougher time getting your link placed. This is actually a good thing: the harder a link is to get, the higher the value that link usually is.

How this affects your work as an SEO

When you are looking at a page as a possible link target, take a quick look at a few metrics to get a feel for how strong that page is and how many links it has. By far, the quickest way to do this is to have a few extensions or plugins added to your browser that can instantly give you some info. For example, if you have the Moz Toolbar installed, you can get a quick measure of the Page Authority and the number of links Moz has discovered pointing to that page.

Number of incoming links to the domain

This is similar to the above factor, but looking at the number of links pointing to the domain as a whole instead. The more quality links a domain has, the more likely it is to be a high-quality website.

Age of the domain

I'm not sure, personally, if age of a domain is strictly a factor, but with age comes authority if the website is high-quality. Also, if you get a link from a brand new domain, naturally that domain is not going to be that strong, as it has not had time to get many links. The reality is that you can’t affect the age of a domain, so you shouldn’t really worry about it too much. The only way you could possibly use it is as a way to filter a huge set of possible link targets. For example, you could filter link targets to only show you ones which are more than two years old, which may give you a slightly higher-quality set of results.

How this affects your work as an SEO

As mentioned, you can’t really affect this factor, so it's generally something you shouldn’t worry too much about. You can use it as a way to filter large sets of link targets, but there are many other better metrics to use rather than domain age.

Link from within a PDF

Within a PDF file, you can link out to external websites, much in the same way you can on a normal webpage. If this PDF is accessible on the web, the search engines are capable of crawling it and finding the links.

How this affects your work as an SEO

In most cases, your day-to-day work will probably not be affected that much, given that many link building techniques involve standard links on webpages. But if you work in an industry where PDFs are regularly created and distributed in some form (i.e. white papers), you should take the time to make sure you include links and that they're pointing to the right pages.

In this case, you can also take advantage of various websites that offer submission of PDFs or white papers to get more links. This can work well because some of these sites may not usually link to you from a standard web page. You need to ensure you're submitting to relevant and quality websites; otherwise, the links you get are not likely to make much difference to you in terms of rankings or traffic.

The page being crawlable by the search engines

This is a big one. If the search engines never find the page where your link is placed, it will never count. This is usually not a problem, but it is something you should be aware of. The main way a page can be blocked is by using a robots.txt file, so you should get into the habit of checking that pages are crawlable by the search engines. You can use this simple JavaScript bookmarklet to test if a page is blocked in robots.txt.

There are other ways that a page may be blocked from search engines, keeping them from discovering your links. For example, if a page has elements such as JavaScript or AJAX, it is possible that search engines may not be able to crawl those elements. If your link is inside one of these elements, it may never be discovered and counted.

In general, the search engines are getting much better at discovering links and content within these elements, so it isn't something to worry about too much, but you should be aware of it.

To check whether or not a page is cached by Google, you can simply type “cache:” before the URL and put it into the Google Chrome toolbar. If the page is cached, you will see a copy of it. If it isn’t cached, you'll see something like this:

Elements of a link that affect its quality

Above, we have looked at the elements of a page that can affect the quality of a link. We must also consider what elements of a link itself the search engines can use to assess its quality and relevance. They can then decide how much link equity to pass across that link.

As mentioned above, many of these elements are part of the “reasonable surfer” model and may include things such as:

  • The position of the link on the page (i.e. in the body, footer, sidebar, etc.)
  • Font size/color of the link
  • If the link is within a list, and the position within that list
  • If the link is text or an image, and if it is an image, how big that image is
  • Number of words used as the anchor text

There are more, and we’ll look at a few in more detail below. Here is the basic anatomy of a link:

URL

The most important part of a link is the URL that is contained within it. If the URL is one that points to your website, then you’ve built a link. At first glance, you may not realize that the URL can affect the quality and trust that Google puts into that link, but in fact it can have quite a big effect.

For example, if the link is pointing to a URL that is one of the following:

  • Goes through lots of redirects
  • Is blocked by a robots.txt file
  • Is a spammy page (i.e. keyword stuffed, sells links, or machine-generated)
  • Contains viruses or malware
  • Contains characters that Google can’t/won’t crawl
  • Contains extra tracking parameters at the end of the URL

All of these things can alter the way that Google handles that link. It could choose not to follow the link, or it could follow the link, but choose not to pass any PageRank across it. In extreme cases, such as linking to spammy pages or malware, Google may even choose to penalize the page containing the link to protect their users. Google does not want its users to visit pages that link to spam and malware, so it may decide to take those pages out of its index or make them very hard to find.

How this affects your work as an SEO

In general, you probably don’t need to worry too much on a daily basis about this stuff, but it is certainly something you need to be aware of. For example, if you’re linking out to other websites from your own, you really need to make sure that the page you’re linking to is good quality. This is common sense, really, but SEOs tend to take it a lot more seriously when they realize that they could receive a penalty if they don’t pay attention!

In terms of getting links, there are a few things you can do to make your links as clean as possible:

  • Avoid getting links to pages that redirect to others—certainly avoid linking to a page that has a 302 redirect because Google doesn't tend to pass PageRank across these unless they're in place for a long time
  • Avoid linking to pages that have tracking parameters on the end, because sometimes Google will index two copies of the same page and the link equity will be split. If you absolutely can’t avoid doing this, then you can use a rel=canonical tag to tell Google which URL is the canonical so that they pass the link equity across to that version

Position of the link of a page

As a user, you are probably more likely to click on links in the middle of the page than in the footer. Google understands this, and in 2004 it filed a patent which was covered very well by Bill Slawski. The patent outlined a model which became known as the “reasonable surfer” model (which we briefly mentioned earlier), and it included the following:

“Systems and methods consistent with the principles of the invention may provide a reasonable surfer model that indicates that when a surfer accesses a document with a set of links, the surfer will follow some of the links with higher probability than others.

This reasonable surfer model reflects the fact that not all of the links associated with a document are equally likely to be followed. Examples of unlikely followed links may include “Terms of Service” links, banner advertisements, and links unrelated to the document.”

Source: http://www.seobythesea.com/2010/05/googles-reasonable-surfer-how-the-value-of-a-link-may-differ-based-upon-link-and-document-features-and-user-data/

The following diagram, courtesy of Moz, helps explain this a bit more:

With crawling technology improving, the search engines are able to find the position of a link on a page as a user would see it and, therefore, treat it appropriately.

If you’re a blogger and you want to share a really good resource with your users, you are unlikely to put the link in the footer, where very few readers will actually read. Instead, you’re likely to place it front and center in your blog so that as many people see it and click on as possible. Now, compare this to a link in your footer to your earnings disclosure page. It seems a little unfair to pass the same amount of link equity to both pages, right? You’d want to pass more to the genuinely good resource rather than a standard page that users don’t worry about too much.

Anchor text

For SEOs, this is probably second in importance to the URL, particularly as Google puts so much weight on it as a ranking signal even today where, arguably, it isn’t as strong a signal as it used to be.

Historically, SEOs have worked very hard to make anchor text of incoming links the same as the keywords which they want to rank for. So, if you wanted to rank for “car insurance,” you’d try to get a link that has “car insurance” as the anchor text.

However, since the rollout of Penguin into search results, SEOs have started to be a lot more cautious with their approach to anchor text. Many SEOs reported that a high proportion of unnatural anchor text in a link profile led to a penalty from Google after Penguin was launched.

The truth is that an average blogger, webmaster, or Internet user will NOT link to you using your exact keywords. It is even more unlikely that lots of them will! Google seems to be finally picking up on this and hitting websites that have over-done their anchor text targeting.

Ultimately, you want the anchor text in your link profile to be a varied mix of words. Some of it should be keyword-focused, some of it focused on the brand, and some of it not focused on anything at all. This helps reduce the chance of you being put on Google’s radar for having unnatural links.

Nofollow vs. followed

The nofollow attribute was adopted in 2005 by Yahoo, Google, and MSN (now Bing) and was intended to tell the search engines when a webmaster didn’t trust the website they were linking to. It was also intended to be a way of declaring paid links, such as advertising.

In terms of the quality of a link, if it has the nofollow attribute applied, it shouldn’t pass any PageRank. This effectively means that nofollow links are not counted by Google and shouldn’t make any difference when it comes to organic search results.

Therefore, when building links, you should always try to get links that are followed, which means they should help you with ranking better. Having said that, having a few nofollow links in your profile is natural, and you should also think of the other benefit of a link: traffic. If a link is nofollow but receives lots of targeted traffic, then it is worth building.

Link title

If you’re not very familiar with it, take a look at this page for some examples and explanations.

The intention here is to help provide more context about the link, particularly for accessibility, as it provides people with more information if they need it. If you hover over a link without clicking it, most modern browsers should display the link title, much in the same way they’d show the ALT text of an image. Note that it is not meant to be a duplication of anchor text; it is an aid to help describe the link.

In terms of SEO, the link title doesn’t appear to carry much weight at all when it comes to ranking. In fact, Google appeared to confirm that they do not use it at PubCon in 2005 according to this forum thread. Obviously this was a few years ago now, but my testing seems to confirm this as well.

Text link vs. image link

This section so far has been discussing text-based links, by which we mean a link that has anchor text containing standard letters or numbers. It is also possible to get links directly from images. The HTML for this looks slightly different:

Notice the addition of the <img src tag, which contains the image itself. Also note how there is no anchor text, such as we’d usually find with a text link. Instead, the ALT text (in this example, the words “Example Alt Text”) is used.

My limited testing on this has shown that the ALT text acts in a similar way to anchor text, but doesn’t appear to be as powerful.

Link contained within JavaScript/AJAX/Flash

In the past, the search engines have struggled with crawling certain web technologies, such as JavaScript, Flash and AJAX. They simply didn’t have the resources or technology to crawl through these relatively advanced pieces of code. These pieces of code were mainly designed for users with full browsers capable of rendering them. For a single user who can interact with a web page, it's pretty straightforward for them to execute things like JavaScript and Flash. However, a search engine crawler isn’t like a standard web browser and doesn’t interact with a page the way a user does.

This meant that if a link to your website was contained within a piece of JavaScript code, it was possible that the search engines would never see it, meaning your link would not be counted. Believe it or not, this actually used to be a good way of intentionally stopping them from crawling certain links.

However, the search engines and the technology they use has developed quite a bit and they are now more capable of understanding things like JavaScript. They can sometimes execute it and find what happens next, such as links and content being loaded. In May 2014, Google released a blog post explicitly stating that they were trying harder to get better at understanding websites that used JavaScript. They also released a new feature in Google Webmaster Tools so that we could better see when Google was having problems with JavaScript.

But this doesn’t mean that we don’t have to worry about things. My advice is still to make links as clean as possible and make it straightforward for the search engines to find your links. This means building them in simple HTML, if at all possible.

How this affects your work as an SEO

You should also know how to check if a search engine can find your link. This is actually pretty straightforward and there are a few methods:

  • Disabling Flash, JavaScript, and AJAX in your browser using a tool like the Web Developer Toolbar for Chrome and Firefox
  • Checking the cache of your page
  • Looking at the source code and seeing if the linked-to page is there and easy to understand

Text surrounding the link

There was some hot debate around this topic toward the end of 2012, mainly fueled by this Whiteboard Friday on Moz, where Rand predicted that anchor text, as a signal, would become weaker. In this video, Rand gave a number of examples of what appeared to be strange rankings for certain websites that didn’t have any exact match anchor text for the keyword being searched. In Rand’s opinion, a possible reason for this could be co-occurrence of related keywords which are used by Google as another signal.

It was followed by a post by Bill Slawski which gave a number of alternative reasons why these apparently strange rankings may be happening. It was then followed by another great post by Joshua Giardino which dove into the topic in a lot of detail. I’d recommend a good read of these posts.

Having said all of that, there is some belief that Google could certainly use the text surrounding a link to infer some relevance, particularly in the case of anchor text being used that isn’t very descriptive (such as “click here”). If you’re building links, you may not always have control of the anchor text itself, let alone the content surrounding it. But if you are, then it's worth thinking about how you can surround the link with related keywords and possibly tone down the use of exact keywords in the anchor text itself.

This has been a chapter from Paddy Moogan's e-book, The Linkbuilding Book, available for full purchase and download below.

Buy The Linkbuilding Book

Moz Pro subscribers will receive an additional 25% discount via the Moz Perks page. If you're not a Pro subscriber, you can always take a free 30-day trial to access your very own discount code!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

No comments:

Post a Comment