How to find and implement internal linking opportunities at scale

In this post I talk about:

Share the Post:

Internal linking is a goldmine to beat your competitors if you use it right, there is countless evidence to back that up. Nevertheless, it’s often underused.

Imagine a situation that shouldn’t be too difficult to picture if you’re not new in our field: a client with a high revenue-dependency on key strategic commercial keywords and a blog with quite decent pieces of content, with little to no impact on the business bottom-line. They generate traffic, but no sales or other micro-conversions.

Under this situation, a business could ask why they should continue investing in content. At the end of the day, there seems to have no business impact, so why even bother? You could argue that this traffic helps build the brand, which is indeed true. But in my experience, the brand contributes more to the SEO results than the other way around.

Need a proof for that statement? Take a look at this graph.

  • Payfit is a French payroll company. You’ve probably never eared of them unless you’re working in HR in France, Spain or the UK. But believe me, in France they have a strong brand.
  • Coover is a website that created a content-machine to scale its traffic from 2019. They’re not only targeting the payroll vertical, but it’s a part of what they do.

Do you see how fast Payfit was able to catch Coover in terms of visibility? Because a strong brand and a good content strategy work well together. And Coover wouldn’t have been able to achieve this pace because they have an almost non-existing brand.

I’m sure there are counter-examples out there, but believe me, this is the reality in most cases.

Going back to our initial topic: if my content doesn’t generate sales and doesn’t help my brand? Why even bother? Well, there are two key elements: topical relevance and internal linking.

Topical relevance is the process used by search engines to ascertain whether a web page is relevant to whatever you are looking for. For instance, you wouldn’t expect to see a travel brand on a YMYL query, would you? A strong backlink profile can help you rank on whatever you want (see my article on subdomain leasing) but this is not the norm.

If you have countless articles on license plates, and you are selling license plates, you are more likely to rank better on commercial terms than your competitor without these articles if you leverage internal linking. All other things being equal, obviously.

The situation we often face

Internal linking is simple in theory, but we often face legacy debt. Indeed, it is not rare to find websites with hundreds of articles, written over the year, and with little to no internal linking. Picture hundreds of content with no links.

There are often two explanations that clarify why we face this situation:

  • A complete gap between the content & the SEO strategy. Two teams that don’t communicate and don’t understand how they can work together. More common than you might think.
  • A lack of clear SEO guidelines. You can’t imagine the impact simple guidelines can have. I’m not even speaking about how to conduct a proper keyword research here, but simple things such as how to define internal links. Having it written in some documentation helps a lot.

The usual approach

To deal with the problem and after the audit, the following steps are often followed. Don’t get me wrong, they are enough in most cases, but you’ll see in the following section that there is a better process to achieve the same result.

STEP 1: Add some internal links manually

For the main commercial keywords, you try to find other contents containing these keywords in the blog. For instance, you could use a Google search to achieve that:

Exhausting and time-consuming. Indeed, you have to manually check what Google returns to ensure that:

  • The link would make sense. It’s not because the keyword is mentioned in the content that you have to add a link.
  • The link is not already present

You can achieve something better with Screaming Frog (or any decent crawler) by following these steps:

  1. Find a way to identify the main content. If we stick to our previous example, it’s a div whose class is entry-content

2. Launch a crawl and configure it to extract the content

3. Use Excel export to try to check if the content contains a specific text. If you merge it with Google Search Console data and the all inlinks report, you could actually flag contents where you have the keyword but not the link. Better than using a Google Search operator, but I wish you good luck if you need to edit these contents for clients having thousands of articles. Also, if you have a lot of money keywords, you will have a hard time creating the formula to help Excel find them all.

STEP 2: Update some contents

After the obvious links are added, you come up with some kind of update calendar where you add a better internal linking logic. Not only would you add missing links using the logic explained in the step 1, but you’d also improve your content to make sure that these links are always added.

At the end of the day, an article without a link to at least a commercial page (if it makes sense) has no commercial value, assuming the incoming traffic is not likely to convert.

A better approach

As I explained, we can apply this approach to small projects, but if your client has thousands of articles, you’ll lose quite some time on the implementation. Can’t we speed things up? Of course, we can.

Most CMS, such as WordPress, rely on a database where the data is stored. Even if you use the UI, what you do is write on a database your website uses to display content to your users. Here is an example of what is stored inside the wp_posts table, for instance. You can take a look at the official documentation if you need further information on the available fields.

Why am I explaining this? Because if the content is written in a database, you can apply your changes in bulk directly in the database without having to open the UI. You see where I’m getting to, right?

But first, we need a logic.

Define your money keywords

In this example, we’ll focus on our money keywords. What we want is to ensure that if one of our content mentions one of them, we add a link to our best page for this query. You can obtain this through GSC by applying some basic filters you may already be familiar with.

I wouldn’t keep the full list, but maybe the top 1000 keywords (in terms of traffic) as a POC. GSC is great, but you often have strange keywords, so you can’t keep them all.

Obviously, you can complete this list with your keyword research.

Define the target page per query

For these queries, you need to define what page you want to rank with. Indeed, GSC will often report more than one page for a query, and you can’t define more than one target for a link, can you? Common reasons:

  • Cannibalization: you may have more than one page targeting the same query. Not always a bad thing is you manage to rank both of them (such as the example below), but still something I would avoid in most cases.
  • Sitelinks: Google can rank more than one page through sitelinks. In this case, we’d have plenty of impressions but not a lot of clicks in GSC.

The best process to define the best page to rank, at scale: get the page with the highest number of clicks per query. Not a 100% bulletproof logic, but works 99% of the time, and a manual check to fix some issues is enough.

Define the articles you want to alter

Especially if this is the first time you’d apply such a process, I strongly advise starting with a reduces number of articles to modify at scale. A couple of hundreds to start with.

As you may already know, an internal link has more impact if it is in the main content of a page with a lot of traffic and/or plenty of backlinks. You can use your top-performing articles from GSC and start with them, for instance.

Check if the article contains your keyword

For these articles, extract the content and check if it contains one of your keywords. You don’t have to crawl them if you have access to the database, a simple SQL query would do the job. For instance, the following would extract HTML code for all published articles:

select 
wp_posts.ID, 
wp_posts.post_title, 
wp_posts.post_content
from wp_posts
where 
wp_posts.post_status= 'publish'
and 
--update value based on the language your WP is using 
--exemple is for Spanish
wp_posts.post_type = 'articulo'

The result? A table with the HTML code generated by WordPress through the UI when you created the entries. Easy!

This is where handling the process through Excel only might be tricky. Indeed, we can’t add links in heading tags; hence you need to extract from what we have in our database only the content included in <p> and <li> tags.

I’m sure this is something we could figure a function for, but I have no idea how to achieve that inside Excel. In Python, you can use BeautifulSoup to achieve that easily. You can then craft a simple function to flag keywords, from a previous step, found in your content.

def find_keywords(text, keywords, url):
    found_keywords = []
    for keyword in keywords:
        pattern = r"\b" + re.escape(keyword) + r"\b"
        if re.search(pattern, text, re.IGNORECASE):
            found_keywords.append(keyword)
    return found_keywords

At the end of this process, you should double-check what you’d implement. I like to output a simple table summarizing what the changes would do:

  • query: the anchor text used in the link
  • page: the page we’d link
  • COUNTA of post_name: the number of pages where we’d add this link

This is a simple way to double-check everything before moving forward with the implementation. I wouldn’t skip it.

Implementing the changes

Once you’ve validated everything, you can bulk-add these links. Again, I’m unsure how you’d achieve that with Excel.

For the cases included in the Excel file, we used to double-check, we replace the query found in the text by a proper link. There are some extra validation steps that you must add, obviously, such as checking if the page is not linked somewhere else in the article, but you get the point.

You can then replace the HTML you had in your WP database by the new code, including the link.

The process is not straightforward and will come with challenges based on the project you work on. But it will allow you to improve quite significantly the internal linking you have, before improving it further by doing better manual edits. See it as a quick win for your client, using a process you’d iterate upon later on.

Do not be surprises if you don’t find thousands of links to add with this process: it depends heavily on the content (duh!) but also how grammatically correct are keywords used in your vertical. For instance, “car mat audi” is not something you’d use in a text. You’ll write something like “a mat for your Audi“.

Final words

Internal linking is key, especially for big projects.

But you need to be smart to find optimizing opportunities at scale, and this process is one of the multiple ways to find such room for improvement.

Share the Post: