Orphan pages are pages that can’t be reached from anywhere on the website and thus that users can’t find. We use the word ‘orphan’ to mention the lack of parent pages or in other words, pages that have links pointing to their child page. An orphan page is almost impossible to find for search engines, as bots follow links when they crawl a website.
OnCrawl knows that well and this is why, thanks to our crossed analysis, you can combine your crawl data with log analysis. You can discover all your pages, hence the ones present in your website structure and the ones that are not crawled by Google.
OnCrawl also displays your orphan pages distribution by page group so that you can determine the location of your different orphan pages.
Why do we get orphan pages?
Here are a few reasons for expected orphan pages:
- Pages linked on external websites, as redirects. Redirected pages are all orphans as internal links should always go directly to the correct page.
- Expired pages on a website with many pages with a short lifespan. They actually expire during the crawling time so it can become dangerous if they remain orphans for too long.
- Pages returning errors that have been corrected but that Google still crawls for a few moments.
Best practices for orphan pages
- Link all pages that could possibly generate traffic to your website’s structure (like category pages or internal search result pages).
- Avoid syntax errors when creating canonical tags as it creates wrong URLs (HTTP 200 or errors).
- Make sure that your expired content delivers the appropriate status code (a 404 or a redirection to a newer version).
- Be careful when setting up your sitemap in order to avoid any syntax errors.
- Reattach the ones that you have identified and that bring you the most value, to your website structure.
Make sure you are not wasting some valuable organic traffic!
If you have any questions regarding orphan pages, feel free to drop us a line @OnCrawl_CS