A tiny bit of code in the wrong place. A poorly thought-out website structure. A missing “forward-slash” on an internal link. All completely innocuous it could seem. However, work on the technical side of SEO for any length of time and these small errors will send shivers down your spine. I’ve been auditing websites for years, and I’ve seen my fair share of technical SEO terrors. Be warned, I’m going to run through some of the more common technical issues I’ve seen below, but this article is not for the easily spooked!
The case of the disappearing website
Thankfully not something I’ve been guilty of doing myself, but definitely an issue I’ve encountered in the wild before. A prospective client comes to you with a confusing situation; they were ranking really well for some traffic-driving keywords but following a website migration they’re nowhere to be seen in the search results. Checking for the usual, “have redirects been implemented?” and “has an XML sitemap been uploaded to Google Search Console?” does not yield any solution. One check of the robots.txt and it all becomes clear. In an attempt to preserve the development website’s anonymity whilst it was being built, a “disallow” command has been added to the robots.txt. These two simple lines of code “User-agent: * Disallow: /” have been overlooked during the launch process and as a result a once well-performing website has been rendered off-limits to the search bots. Although a quick fix, the results can be devastating if this code has been left unnoticed for long. I’ve been asked to audit sites that have had this issue for over a year. You can imagine how well they were ranking until it was discovered. Thankfully seeing this on other sites has made me hypervigilant when helping launch a new site. It’s definitely not a mistake I want to make!
I always advise there be a robust post-launch checklist in place for anyone looking to build a website. It helps to make sure everything crucial is covered off and nothing is overlooked. Make sure you keep updating the checklist after each website launch until there is a process in place to keep pesky issues like forgotten “disallow” commands in check.
Redirects gone rogue
A common task when working on the SEO of a website is redirecting pages. Whether it’s one URL returning a 404 code, or an entire website is moving to a new domain there are many instances where a redirect is needed. I find mapping redirects quite therapeutic, it’s a systematic task, there is a known structure to it and it’s not that taxing to work on. Unfortunately due to its methodical nature, working through a long list of URLs that need redirecting can lead to wandering attention. If due care isn’t taken it’s very easy to redirect a page to a URL that isn’t really relevant, or accidentally create a redirect loop. A very talented SEO friend of mine recently confessed to redirecting an e-commerce site’s “cart” page to the home page, meaning no transactions could be carried out on the site. A truly terrifying experience. It’s easily done, especially if a list of soft 404s has been downloaded from Google Search Console and not been thoroughly checked to determine if they are actually live pages.
My advice? Take care with every URL you are redirecting. If you think it is returning a 404 error code and that’s why you are redirecting it, just check that it definitely is! If you are redirecting a URL as part of a migration, have a word with the site’s developers and make sure the page you are redirecting to will definitely exist once the site goes live!
Spider traps: a sticky situation
This next example is unfortunately a common occurrence on the web, and not something the SEO auditing the website is usually responsible for – the spider trap. This is normally flagged when you know your website isn’t huge yet you’ve managed to go for lunch and come back and your crawler is still crawling the site. A spider trap is essentially a set of pages on a website which cause a website crawling bot to get stuck visiting pages of little value to users of the site.
Pages created automatically by applications on a website which contain links to dynamic pages are often causes of spider traps. A calendar widget for instance could have a page detailing all the events in January 2019 with a link that allows a visitor to click to view a page with February 2019’s events. Each page has a link to the next month’s events page. Unfortunately, there might not be an end to these pages and with a spare half an hour and a lot of patience you could click through to April 3530 if you wanted to. The search bots are designed to follow links on a webpage to discover new pages. Although a human might get bored of clicking a “view next month” link, a search bot would not, it would therefore continue to follow the links to these blank pages until the time it is allowed to spend on the site runs out.
The problem with spider traps for search engines is that they take up the bots’ valuable time and resources, distracting them from the pages on the site the webmaster intends to have crawled and indexed. They can also add a large number of low value pages to the search bots’ list of pages on the website, bringing the overall quality of the site down.
Spider traps are quite easy to spot with crawling software, however, it can be a difficult task determining what has created caused them. It helps to determine what the anchor text of the link that is leading to these unwanted pages. Once that’s been identified it is a lot simpler to correct it or add a no-follow meta tag if necessary.
The undead page
Pages returning a 404 code are a natural part of the internet. Products go out of stock, a page is turned off, or a link to a non-existent page is created; there are many reasons a page might return a 404 code. They can be frustrating for users of a website as they are unable to find the page they wanted, but there are fixes that work in this situation, such as redirecting the page to another relevant one or even creating a custom 404 page that helps users navigate to another page they want to view.
I’ve noticed some strange practices being carried out over the past few years to make sure 404 pages are eliminated. I’ve seen a worrying number of websites employing an automatic redirect for any page that should be returning a 404 code. Rather than someone identifying the best page to redirect each 404-returning URL to, they are automatically redirected to the home page, or worse still, a page “/404” that has been set-up to mimic the look of a page returning a genuine 404 response. There have also been occasions where I’ve seen a page that should be returning a 404 error code, indeed the original resource that should have been reached at that particular URL is no longer there and the page looks like an error page, however the response code is a 200 “OK”. This dead page, is in fact, alive.
All of these attempts to deal with a 404 page fall far short of ideal. Returning a success code, rather than a 404 (or even a 410 “gone”) suggests to the search engines that this is a live page that is intentionally part of the website. It will cause the search bots to crawl this low-quality page, and even index it if allowed.
Best practice ways of dealing with pages that have been turned off is simple. If the page has an obvious replacement then redirect the old URL to the alternative page. This will allow users to find another page that will be useful to them and gives the search engines another page to pass existing ranking power to. If the page is gone and is never going to return it is perfectly acceptable to change the page to return a 410 “gone” response code. This informs the search engines that the page will not be returning so there is no need to continue to crawl it to check if it’s back. This possibly doesn’t give your users the best experience but as long as you sign-post them to other pages that are relevant to them it shouldn’t cause a dead-end to their journey.
Stay safe out there
I have many more stories of horrifying SEO mistakes I’ve come across on websites. It’s crucial to make sure the technical foundation of your digital properties is solid otherwise you could be in for a horrendous time trying to rank them. Conduct regular checks and audits to prevent development changes or content tweaks from causing long-term headaches. Remember, a well-performing website is a marketer’s dream but a site riddled with technical problems becomes a nightmare fast!