The Internet is an interconnected network that works by simple logic. To visit a website, a link has to exist pointing to the page that you want to view.
Consider Google as part of that. You can enter a search term or keyword and find a landing page related to the search you have performed, which will in turn lead you to the page you are looking for. Or you might get a direct link from an external source, such as a blog article or news story. Linking is what allows us to reach our desired page or resource.
So, what happens when there’s a page to which you want to navigate, but there’s no link to get you there? And, how does this impact SEO? These are referred to as orphan pages.
What is an orphan page?
An orphan page is one that search engines may have difficulty discovering because they have no internal links from elsewhere on your website, they only have one link in the navigation menu.
These URLs and pages tend to get ignored and fall through the cracks because search engine crawlers, like GoogleBot, can only discover pages from the sitemap.xml or backlinks, and users can only get to the page if they know the specific URL.
What causes orphan pages?
Usually, orphan pages are not created with that purpose and instead occur for various reasons. The most common cause is not having processes for site migrations, navigation changes, site redesigns, out-of-stock products, testing, or dev pages.
However, there are some instances where orphan pages may also be intentional, as with PPC landing pages that are used for certain campaigns, or any instance where you do not want the page to be part of the user journey.
An example of this is that recently we have been creating landing pages for a client using Unbounce for specific PPC campaigns due to a CRO issue with the main website. There is no need for these pages to have multiple links within the main site as they are specific to the PPC campaign.
The effect of orphan pages on SEO
Orphan pages cause two main problems in SEO:
- Low rankings and traffic: Even if they contain great content, orphan pages typically don’t rank well in SERPs or get much organic search traffic.
- Wasted crawl budget: Low-value orphan pages can waste the crawl budget, taking away from your important pages.
Search engines have a hard time finding orphan pages because they use links to help discover new content and understand the page’s significance within the website.
Here’s what Google says:
Google searches the web with automated programs called crawlers, looking for pages that are new or updated. […] We find pages by many different methods, but the main method is following links from pages that we already know about.
For example, let’s say you publish a new webpage and forget to link to it from elsewhere on your site. If the page isn’t in your sitemap and has no backlinks, Google will not find or index it, making it pointless.
Even worse, the page cannot receive PageRank; Google’s way of understanding the significance of the page by counting the number of “votes of popularity” a page gets.
Identifying orphan pages
There are some places where orphan pages may be identified but aren’t necessary to seek out. This includes, for example, pages that return 4XX status codes.
Have you ever clicked on a link only to find it leads to a “page not found” message? This is common and can be the source of an orphaned page. These types of errors are eventually resolved by search engines like Google or Bing as something that’s not considered a permanent orphaned page. So, you don’t need to do anything in this case as it won’t harm your ranking ability.
But, there are cases where you’ll need to take action, and that’s where tools like Oncrawl come in handy to help you find orphan pages on your site.
Finding orphan pages with Oncrawl
Oncrawl offers a cross analysis that allows you to go further than simple crawl data and combine them with log analysis. Therefore you can have visibility for all of your pages: those in the structure and those not crawled by Google.
From that data, you can gauge which pages are not counted by Google and still generate SEO visits and thus organic traffic. Those pages are still valuable for SEO.
Oncrawl also clearly shows how many orphan pages and active orphan pages you have and displays your orphan page’s distribution by page group and allows you to determine where these pages are located.
You can also analyze the proportion of orphan pages and pages in the structure among all pages known by Google and Oncrawl by page group.
It is also interesting to know if those orphan pages generate SEO visits. It means that they could be optimized if they were linked in the structure to help drive organic traffic to other areas of the site.
Using Oncrawl, you can also know if Google wastes too much crawl budget analyzing your orphan pages.
Finally, you can also compare your active and inactive orphan pages and see which ones don’t generate any SEO traffic meaning that they hold little to no value for SEO.
How to fix orphan pages
There are two kinds of orphan pages: the expected kind and the unexpected kind.
Here are a few reasons for expected orphan pages:
- Pages linked on external websites, as redirects. Redirected pages are all orphans as internal links should always go directly to the correct page.
- Expired pages on a website with many pages with a short lifespan. They actually expire during the crawling time so it can become dangerous if they remain orphans for too long.
- Pages returning errors that have been corrected but that Google still crawls for a few moments.
On the other hand, orphan pages can also occur unintentionally and become an issue. For example:
- Pages that are only linked in the structure regarding navigation criterias (like category pages or internal search result pages). Those pages should always be linked to the structure if they generate organic traffic.
- Expired pages still returning content: some websites stop linking old content that has expired and do not deliver the right status code (like a 404 or a redirect to a newer version). The expired page is thus still available.
- Pages that have not been migrated correctly: there is no redirection and the old content is still available.
- Syntax errors during canonical tags creation. It creates wrong URLs (HTTP 200 or errors)
- Syntax errors during sitemaps creation. It creates wrong URLs that can deliver content and duplicates or return HTTP errors.
The route you take to fix your orphan pages will depend on what type they are. So, the first thing to do when we see a high volume of orphan pages is to check what they look like and if they are to be expected or not.
[Case Study] Refine your SEO strategy based on relevant data and granular segmentation
Orphan pages that are valuable for site visitors should be incorporated into your site’s internal linking structure to make them easier for users and search engines to find.
Orphan pages that were intentionally not linked to, like landing pages for ads, should be noindexed to prevent them from appearing in organic search results.
Most SEO plugins have made this as easy as checking a box, but you can also do it manually by copying and pasting this into the <head> section of the page:
<meta name="robots" content="noindex" />
For more information about noindexing a page read what Google has to say here.
Merge or consolidate the content
Orphan pages with the same or similar content to another page should be merged. This means consolidating the content and redirecting the orphan URL to the other page.
Orphan pages that offer no value for visitors and serve no other purpose (e.g., paid traffic campaign) should be deleted.
For example, an unused CMS theme page can be removed. This will result in a 404 page and naturally drop out of search results over time.
If the page has backlinks, you may want to redirect the URL to another relevant page to preserve link equity after deleting.
Looking at rows and rows of orphan page errors and trying to make sense of heavy technical jargon is intimidating but it doesn’t have to be. Finding and fixing orphan pages doesn’t have to be painstaking if you have the right tools to get the job done.