How to find ALL your URLs before a website migration

June 30, 2020 - 6  min reading time - by Helen Pollitt
Accueil > Technical SEO > How to find ALL your URLs before a website migration

No one likes website migrations. They are an anxious time, and rightly so. Get something wrong and you could see a drastic down-turn in your organic traffic levels. Miss a step and you could see your rankings plummet.

There are copious great guides out there on the considerations you should make before migrating a website. I want to focus on one very important aspect; the redirects.

When deciding which URLs need to be redirected it’s crucial that you have a complete list of URLs to work from. It might be that you won’t need to redirect a lot of them but you need to make sure you aren’t overlooking any important ones.

So, where do you start when gathering together your list of URLs?

Google Analytics

Google Analytics, or your web tracking tool of choice, will contain a lot of information about the URLs that are being visited on your site.

Set your date to be a long enough period of time that you will capture a significant number of URLs. This range will vary depending on how popular the website is. Usually, somewhere around 6-12 months of data will be sufficient.

Go to Behaviour > Site Content > All Pages

all urls with Google Analytics

This will give you a list of all URLs that have been visited, regardless of what traffic channel the visitors came from.

Although focus might be on ensuring redirects are in place for the purposes of maintaining organic traffic to the site other channels may need to update their URLs or redirect them. For instance, if the URL structure on the new site is changing then PPC adverts, email campaigns and social media adverts might need their URLs updated or redirected.

Export this list.

If you are only concerned about finding URLs that you know visitors are landing on from the organic search results then instead go to Acquisition > Channels and select Organic and then a secondary dimension of Landing Pages. This will give you a list of any URLs that have been clicked on from the organic search results.

any URLs that have been clicked on from the organic search results

Google Search Console

Another selection of URLs that are useful to include in your list for redirect consideration are those found in Google Search Console.

These URLs differ from the ones available in Google Analytics because they have been seen in the search results but not necessarily clicked on. This means there might be URLs available in Google Search Console that are not shown in Google Analytics.

Go to Google Search Console > Performance and click on Pages. Having set your date period to as long as you have data for (maximum being 16 months) you can then export this list.

Whilst you are in Google Search Console visit the “Error” and “Excluded” reports under “Coverage”. These will give you an idea of what URLs Google has found for your site that aren’t being displayed in the SERPs for various reasons.

This might highlight some pages that are actually valuable to you that need fixing and redirecting to the new site.

Crawl

Your next step is to crawl your website using a crawling tool.

Set the crawler to respect the robots.txt and follow all internal links.

If you’re using OnCrawl, for example, go to Set up a new crawl to adjust the crawler settings. The crawler will respect the robots.txt and will follow links by default:

Oncrawl crawler settings

Link follow defaults in OnCrawl’s crawl setup.

In OnCrawl, you can also extend the list of known URLs by connecting other sources of information. Maybe you have a list of URLs from your backlink tool, or in a spreadsheet used by the SEA and paid search teams.

In the crawl profile setup in OnCrawl, scroll down to Analysis > Data ingestion to add lists of URLs from outside sources.

Once the crawl has finished, go to Show analysis > Tools > Data explorer.

crawl profile setup in OnCrawl

This will be the most comprehensive list that you can find of all URLs the search engines could find through crawling links within your website.

As you crawl you will notice that some URLs will return a 301 or 302 status code. These are just as important to note down as those returning a 200 code. The redirecting URLs need to be considered as part of your new redirect file depending on what will happen to existing redirects when the site has migrated.

list of URLs Oncrawl

You can export these lists and open them in Excel.

Old redirects

If you can, try to get a complete list of any redirects that are already in existence beyond those that you found in your crawl.

Some CMSs will allow you to export lists of redirects that have been added manually through them. If you can’t access your redirects this way then you can also check in your server config files.

It is important to know what redirects are already in place on the website because you may need to make sure these remain active after the migration. It may be that these redirects need to be updated to prevent too many redirect hops, or to stop redirect loops occurring.

What to do with your list

You will be left with a fairly long list of URLs that might need redirecting. What do you do with it?

Remove duplicates

The first task is to make sure you only have each URL listed once. Put the URLs into a spreadsheet like Excel and “remove duplicates”.

This will likely whittle your list down a lot as there will certainly be overlap in the URLs you have found through Google Analytics, Google Search Console and your crawl.

spreadsheet remove duplicate

Crawl your development site

Once your development site has had all the pages created on it that are going to exist when launched, and you are certain there will be no further changes to the URLs, crawl it.

This will give you a list of all the possible URLs on the site that are available to be redirected to.

Export this list and then find and replace all instances of your development site domain with the domain the site will go live on. For example, replace https://example.staging-site.com with https://www.example.com.

Take this list of URLs and add it to the top of your spreadsheet of potential URLs for redirection. Then highlight all the URLs and remove duplicates again.

potential URLs for redirection

This will remove any URLs in the bottom list that also exist in the top list. This means you are removing URLs from your potential redirects list that won’t need to be redirected.

Check status codes

Next, take your list of URLs with the duplicates removed and upload it back into your crawling tool.

Run the crawl of these particular URLs and identify what status code is being returned for each. This will help you when it comes to deciding which URLs need redirecting and which don’t.

[Case Study] Managing Google’s bot crawling

With more than 26 000 product references, 1001Pneus needed a reliable tool to monitor their SEO performance and be sure that Google was devoting its crawl budget on the right categories and pages. Learn how to successfully manage crawl budget for e-commerce websites with OnCrawl.

Your final list

After these steps you should be left with a list of URLs that you know definitely received some traffic over the past 6-12 months, have been shown to searchers in the SERPs or are available to Google when it crawls your site. You will also know what status code they currently return.

From here you can begin looking at the best URL to redirect them to.

For some sites there might be a lot of URLs in the list and therefore redirecting them all would put too much of a load on the server. You will still need to review this final list and make a decision as to whether the URL merits a redirect or not.

Conclusion

The first step to redirecting your URLs for a migration is understanding what URLs are already available on the live site. Follow these steps and you will have a comprehensive list to analyse.

Helen is a tech-focused Managing Director at Arrows Up with a passion for all things digital. With over 10 years experience in marketing, she is focused on creating strategic solutions for clients, building teams and delivering comprehensive training and talks.
Related subjects: