The webinar How to grow money pages from orphan pages is a part of the SEO in Orbit series, and aired on June 4th, 2019. For this episode, we asked the question: What if your orphan pages are the secret to earning more traffic and conversions? How do you take a page from an SEO orphan page all the way to the top? OnCrawl’s François Goube and Eric Enge explore the process of transforming orphan pages.
SEO in Orbit is the first webinar series sending SEO into space. Throughout the series, we discussed the present and the future of technical SEO with some of the finest SEO specialists and sent their top tips into space on June 27th, 2019.
Watch the replay here:
Presenting Eric Enge
Eric was named 2018 Search Personality of the Year at the Drum Search Awards, 2016 Search Personality of the Year at the US Search Awards, and 2016 Search Marketer of the Year at the Landys. Eric has been speaking about digital marketing for more than a decade. He keynotes many conferences every year, is lead co-author of The Art of SEO, and also writes columns in sites such as Search Engine Land, and Moz.
Eric is GM of Digital Marketing at Perficient (NASDAQ: PRFT), where his business unit offers content marketing, SEO, and social media services. His team’s clients include many of the world’s brands. Prior to Perficient, Eric was the Founder and CEO of Stone Temple Consulting (STC), a digital marketing agency based in Massachusetts, which was bought by Perficient in July of 2018.
3 interesting facts about Eric:
In 1984, Eric won a foosball (table football) world championship.
Eric has also camped outside on the northern Minnesota border in February, where it got as cold as -30 degrees Fahrenheit (roughly -35 degrees Celsius)–cold enough that if you throw a boiling pot of hot water into the air, it comes down frozen as snow.
Eric has walked about 25 feet over red hot coals in bare feet at a Tony Robbins event. You have to know the right way to walk to avoid severe burns, but fortunately the sole of the human foot is not a particularly good conductor of heat.
This episode was hosted by François Goube, serial entrepreneur, and Co-Founder and CEO of OnCrawl.
What are orphan pages
An orphan page is a page on your site that receives no links from your site. This is not a page that can be found naturally by a crawl of the site.
Google tends to frown on these pages if they start ranking because they’re not part of your website’s ecosystem.
Discovering orphan pages
Crawling does play a role in finding orphan pages. You’ll want to first crawl your site to obtain a list of all of the pages that are linked to the structure of your site.
Then, hopefully, the tools you use to build the site will give you insight into all of the URLs that they generate, such as through an automatically generated sitemap or any other list of URLs.
You can then pair these two lists and find the URLs in the list of generated pages that weren’t found in the crawl. These are orphan pages.
OnCrawl also discovers orphan pages by looking at other lists of known URLs and comparing them to crawled URLs: lists from log data, analytics data, and many other sources.
The most orphan pages that Eric has ever found on a website was a case where the company had more than 100 million orphan pages that were not in the general link graph of the site, although they cross-linked to the other. It was an impressive set of pages, and the company ran into problems because of this.
You should treat orphan pages differently based on whether or not they provide you with some level of value. Two major signs of value are receiving organic traffic, and receiving backlinks.
If pages provide value, you should link to them to integrate them into the link graph of your site.
If you have pages that offer no value, Eric’s advice is to delete them and have your server return a 404 or a 410 status to pull them out of the picture entirely.
François would start by asking whether the situation is normal, or whether it’s the result of a mistake, such as someone deleting links to those pages.
– How Google discovers orphan pages
Some orphan pages can have traffic. The question to ask here is “how does Google discover pages if your site has no links to it?”
There are multiple ways this can happen:
Your CMS or ecommerce platform might create an XML sitemap for you, which might contain the orphan pages’ URLs.
A URL might be shared in a marketing campaign.
Orphan pages might be highly inter-linked, so if one gets discovered, others may also be discovered.
– Ranked orphan pages as a sign of good content
Because orphan pages are a weak spot in the link graph and don’t benefit from popularity signals on a website level, if they get ranked, it probably means they have good content.
– Orphan pages with organic traffic
Orphan pages that get organic traffic (that is, enough organic traffic to be interesting), then you definitely want to save the page.
– Orphan pages with only bot traffic
If the orphan page is getting bot traffic but not organic traffic, that’s even more worrisome than if it has no traffic at all. This means Google found it but didn’t like it enough to rank it.
There’s a higher risk that pages like this will be seen as a doorway page, or a page used in other spam techniques.
– Crawl budget waste
When there’s bot traffic on orphan pages with little to no value, you’re spending your crawl budget on pages that don’t matter.
Crawl budget is precious, and especially if you have a lot of this type of page that delivers no traffic, it can amount to a lot of waste. It could be spent instead on pages that deserve to rank. In terms of SEO, this is like shooting yourself in the foot.
– Orphan pages with backlinks
You also might have orphan pages that receive backlinks, and links have value in the SEO world. You would definitely want to wrap these into the rest of the site, or at least redirect it to the closest related page on the main site if it’s getting no organic traffic.
Page importance and the internal link graph
When a page is part of the website’s link graph, it gains certain advantages over an orphan page. Most obviously, it gets links, though they’re internal ones.
One way you can think about this is the following: Google relies on webmasters to give them clues as to which pages are important. The internal link graph is probably the most important clue we can give them. Pages linked to from the home page are seen as more important. Pages that are far from the homepage are less important. When pages have no links (orphans), we’re essentially saying they aren’t important at all.
Once you’ve reintegrated orphan pages into your site’s link graph, you should expect to see improvements in the page’s performance:
Ranking for more keywords
Ranking higher on keywords it currently ranks for
More organic search traffic
Relevance to rest of site
François and Eric are both presuming that the orphan pages they’re discussing have a reasonable degree of relevance to the website. If the site does movie reviews, and the orphan pages are about toaster ovens, the difference is likely too important for the answers given previously to be useful.
Disallow and noindex strategies
You can use a rule in your robots.txt file to force Google to not crawl the pages. Eric points out that you can also use robots.txt to noindex pages as well. Although Google’s official stance on this capability is that it’s not supported, it appears to work.
[Note: About a month after this webinar aired, in July 2019, Google announced that they would no longer continue their unofficial support for noindex directives in the robots.txt file. Their official statement declares:
“[W]e’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019. For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options.”
These alternative options include noindex meta robots tags, 404 and 410 status codes, password protection, disallow in robots.txt, and the Google Search Console URL Removal tool.]
Eric strongly recommends doing this using a noindex meta robots tag.
However, this brings up the question: if you have orphan pages that are noindexed–why do you have them? There can be reasons for this, including marketing campaigns that shared the URL in emails, newsletters, press releases, but that wasn’t intended to be part of the main site experience.
Impact on a site as a whole
– Recent Google core algorithm updates
Beginning in March 2018, Google has been making a series of core algorithm updates. There’s been a lot of speculation about the goal behind these updates. To Eric, it’s fair to say they’ve been focused on user intent and the site’s ability to meet it.
Site that do a really good job of having great depth and breadth of content have seen superior results and strong growth through these updates. Your orphan pages can add to the depth and breadth of your site.
– Long-tail search questions and details
In addressing depth and breadth of content on your site, your orphan page may have the ability to address user questions that don’t have much search volume. They can be pages that goes into greater specifics or details.
In short, it can help your whole site if the orphan pages you link back into the website’s link graph can make your content offering more robust.
– Google ranks an overall site experience
Sites don’t get ranked one page at a time today. The whole experience created by a site is taken into account. If adding orphan pages back in improves your site’s offering, you make yourself more attractive to Google to rank the site. Consequently, other pages will benefit, too.
– What results to expect from technical implementations
People have recently asked Eric about how much of an increase in search traffic they could expect after implementing machine-readable entity IDs on their site. This whole class of questions that assume that a specific, remote technical thing will lead to their SEO traffic doubling. It’s not that simple.
It’s a question of creating user value.
– Googlebot-image and visual search
Visual search and Googlebot-image will become more important. This includes using your phone’s camera to search for something you’ve taken a picture of. Most people don’t realize how big Google image search already is. It’s something like 20% of all search, based on data from Jumpshot.
François shares the fact that OnCrawl has a very large ecommerce client, who, when OnCrawl analyzed where their traffic was coming from, found that half of their sales were coming from Google Image queries. The client was then able to focus on that and has achieved great results.
Many people will find this is a great area for optimization.
– Orphans with backlinks but poor content
If you have orphan pages with backlinks but poor content, Eric would advise improving the content. He’s only consider using a 301 redirect if the content can’t be improved.
– Recent North Face incident
Sometimes even respected brands try to do things that they shouldn’t. Not everybody understands the web ecosystem and its ethics as well as the practicing SEO community.
Any comments would also have to take into account the internal culture at North Face, which Eric is not in a place to do.
Wikipedia is also pretty good at defending itself. They’re able to see when brands are editing pages that they shouldn’t be, and were able to fend off North Face’s program.
– Awareness of crawl budget in the SEO community
The “deep veteran SEO types” have a good understanding of crawl budget.
However, Eric thinks that the understanding of crawl budget gets very weak very quickly as you move outside of that inner circle. Many people hold misunderstandings of what the crawl budget is.
Eric worked with a website with 200 million pages, due to an involved and complex faceted navigation. He got them to take the sort orders and filters and even some of the basic facets–and reimplement them with Ajax. This meant that when filtering and sorting, you no longer needed to create a new webpage each time. This brought the number of pages down from 200 million to 200 thousand–a 1000:1 factor.
In the short term, their traffic dropped by 50%. Eric predicted that this would be the case, due to such a large in the site structure and the way Google crawls by sampling pages in a non-consecutive manner. However, the site recovered within 4 month, and doubled their original traffic within 8 months due to such a dramatic efficient use of the crawl budget. The pages that were removed weren’t pages that ranked or could rank.
– Use of entity IDs
Using Schema or entity IDs is definitely useful. It gives the bots that visit your site a faster confirmation of who you are and what you offer, strengthens associations and increases confidence.
“If you use orphans as landing pages in a paid search campaign, then you might want to NoIndex them.
SEO in Orbit went to space
If you missed our voyage to space on June 27th, catch it here and discover all of the tips we sent into space.
Rebecca is the Content Manager at OnCrawl. She's a fan of content strategy, data analysis, and anything technical. She regularly writes articles for the OnCrawl blog, but you can also find her on Twitter.