How to view URLs rendered by Google in Google Analytics

June 11, 2021 - 16  min reading time - by Lino Uruñuela
Accueil > Technical SEO > How to view URLs rendered by Google in GA

First of all, special thanks to Christian for helping me with the translation!

It’s been several years since Google started making huge improvements in its capacity to crawl and get the content loaded with Javascript in the same way a real user with a mobile phone would see it.

Nevertheless, Google won’t be able to crawl all the URLs it finds even if it improves its infrastructure, as the cost of crawling and rendering (the process of transforming an HTML document in a visual structure) is hugely bigger than simply crawling and getting the HTML of an URL. Google itself acknowledged this, so it’s important to make sure Google will find, render and index the most important URLs of our site, specially in bigger sites and/or sites that depend on JS being executed by the client (CSR, Client Side Rendering)

Years ago, in order to value the content of a URL and decide whether to add it to the index or not, Google had enough with getting the URL’s HTML (as well as the images linked there). That HTML was the content Google used to index and rank the URL, without taking into account if that HTML was modified using Javascript after rendering or not.

Now, with the hype of Javascript frameworks that modify the HTML client-side, Google needs to get the HTML, the JS code, the CSS styles and other resources (as images, fronts, etc.) in order to render the content and get the final HTML, so it can decide whether it goes to the index or not.

This whole process is not done in one batch as a normal user would see: it’s done in two steps. First, Google crawls the URL and gets the “unrendered” HTML (as it has been doing until now), and a while after (without a fixed specified amount of time), it gets the rest of the resources referenced in that HTML and tries to render the page to get the final, rendered HTML. This second step is referred to as “second wave of indexing”.

We don’t need to have a lot of technical knowledge to understand that, in order to crawl and index a number of URLs, the cost of rendering them is much bigger than just getting the unrendered HTML, both in time and resources. So, using the same amount of finite resources, Google will be able to crawl and index less URLs if it needs to render them. That’s why Google needs to decide/prioritize which URLs to render and which ones not.

In order to decide which URL should be crawled next, Google calculates the probability of that URL having changed since the last time it was crawled, taking into account other factors as the PageRank of each URL or if the webmaster has configured any specific setting regarding crawl frequency. This makes sense as it’s useless to spend limited resources in crawling something that haven’t changed.

I would like to share this article with you as I think it’s not really well known and it can be very revealing in order to understand how Google decides which URL to crawl next. It’s written by Google’s engineers and it’s a mathematical abstraction to solve a real problem. Don’t get scared with the mathematical formulas, it’s perfectly explained for people who are not Data Scientists.

After deciding which URL to crawl next, Googlebot needs to decide, for each crawled URL, if it should render that URL or not, and if it decides to render a URL it will need all the resources to accomplish that. In order to decide if it needs to request each of the needed resources, it probably uses a similar mathematical process but with some differences like cache times, cost of obtaining the resource, etc.

Due to all of this, it’s very important to know which URLs from our site are being crawled by Google, and also which ones are being rendered. An easy way of obtaining that information, which we use at Funnel▼Punk (where we work with big websites), is by analysing the server logs (here is a post about that in my blog, in spanish, and another one on Oncrawl’s blog), obtaining a full picture of what Googlebot is doing in our site. Log analysis can be tedious and expensive for a lot of people, that’s why I would like to share with you a way of tracking which URLs are being rendered by Googlebot in Google Analytics.

[Case Study] Managing Google’s bot crawling

With more than 26 000 product references, 1001Pneus needed a reliable tool to monitor their SEO performance and be sure that Google was devoting its crawl budget on the right categories and pages. Learn how to successfully manage crawl budget for e-commerce websites with OnCrawl.

Tracking URLs rendered by Google

The method is relatively simple, at least for any dev team and for any webmaster used to work with PHP or similar. It has 3 steps:

  1. Add javascript code
    The code will detect when Googlebot has executed Javascript the same way a normal user would, and will load an image using Javascript (a transparent pixel).
  2. Server config
    Configure the server to execute a PHP file (or any other programming language used in the backend) when the transparent pixel’s URL is requested.
  3. Send the data to Google Analytics
    Our PHP file will check if Googlebot is really Googlebot and, if so, will send the data to Google Analytics.

Add javascript code
In different experiments I tried, I’ve checked that Googlebot will execute Javascript only when the Javascript code doesn’t need an user interaction. For example, Googlebot will execute any Javascript code that is triggered with the onload or onready events. In this example, we are going to create a function that will be triggered with the onLoad event, that is, when all the elements of the page are loaded.

This function will check if the User Agent contains any of the Googlebot’s known bots and if so, it will load an image (a transparent pixel), which we will name as TransparentPixelGooglebot.gif

<script>
window.addEventListener("load", function(){
    var botPattern = "googlebot|Googlebot-Mobile|Googlebot-Image|Google favicon|Mediapartners-Google";
    var re = new RegExp(botPattern, 'i');
    var userAgent = navigator.userAgent; 

    if (re.test(userAgent)) {
        var client = new XMLHttpRequest();
        var trackRenderURL='https://www.mecagoenlos.com/TransparentPixelGooglebot.gif?OriginUrl='+window.location.href;
        
        client.open('GET',trackRenderURL);
        client.setRequestHeader('Content-Type', 'text/plain;charset=UTF-8');
        client.send(null);
    }
});
</script>
Anytime Googlebot access and executes Javascript, our function will be triggered, loading the “TransparentPixelGooglebot.gif” image, adding to the URL of the image some parameters where we will specify the specific URL which has been accessed.

In this variable we will compose the full URL that will be requested to load our “TransparentPixelGooglebot.gif” image, where we add the URL accessed as well as the User Agent that is requesting it.

var trackRenderURL='https://www.mecagoenlos.com/TransparentPixelGooglebot.gif?OriginUrl='+window.location.href;

Server config (.htaccess)

Our next step is to configure our server so that anytime the URL of the pixel (TransparentPixelGooglebot.gif) is requested, a PHP file is executed (GooglebotRenderJS.php)

In order to do this, we have to do some changes in our .htaccess file (as we are using an Apache server and PHP as the backend programming language)

These two specific lines are the ones that will make that happen:

RewriteCond %{REQUEST_URI} TransparentPixelGooglebot.gif
RewriteRule TransparentPixelGooglebot.gif(.*)$ https://www.mecagoenlos.com.com/GooglebotRenderJS.php$1

As you can guess, the parameters included with the pixel request are propagated so that the PHP file (GooglebotRenderJS.php) can “read” them.

Oncrawl Data³

Expand your analysis with seamless connections to additional datasets. Analyze your SEO strategy based on data on backlinks, SEO traffic, rankings, and custom datasets from your CRM, monitoring solution, or any other source.

Send the data to Google Analytics from the PHP file

In our last step, we create the PHP file (GooglebotRenderJS.php) that will be executed anytime the pixel (TransparentPixelGooglebot.gif) is requested.

This file will:

  1. Check, using reverse DNS, if the request was actually made by Googlebot or a fake Googlebot using Googlebot’s User Agent
  2. Identify what type of bot it is (Googlebot Mobile, Images, Ads, etc.)
  3. Send the data to Google Analytics (using Google Analytics’ measurement protocol) inside an event where we will assign the following variables:
  • Event Category: “GoogleRenderFromHtaccess”
  • Event Action: Rendered URL (the referrer of the pixel request)
  • Event Label: A string concatenating the User Agent, The IP and if the bot is real Googlebot (“Real”) or a fake one (“Fake”). I send the three of them to GA in order to be able to see if the identification of Googlebot is working correctly.
  • *Important: I stored the IP only for a couple of days in order to test if everything was working correctly, I stopped doing that afterwards just in case there is any problem with data protection laws
<?php

header("Pragma-directive: no-cache");
header("Cache-directive: no-cache");
header("Cache-control: no-cache");
header("Pragma: no-cache");
header("Expires: 0");
if ($_GET["OriginUrl"])
    $src=$_GET["OriginUrl"];
else
    $src = $_SERVER['HTTP_REFERER']; 
$UA=$_SERVER["HTTP_USER_AGENT"]; 
$RenderParameters=$_GET["RenderParameters"];
function GoogleCheker($Ip){

    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    
        $hostname=gethostbyaddr($Ip);    
        $ip_by_hostname=gethostbyname($hostname);   
        if(preg_match("/googlebot/i",$hostname))
        if ($ip_by_hostname == $Ip)
            return true;
        else
                return false;
        
        else
                return false;
        
        
}
function GoogleChekerExtend($Ip){

    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    
        $hostname=gethostbyaddr($Ip);    
        $ip_by_hostname=gethostbyname($hostname);   
        if(preg_match("/\.google\.com[\.]?$/i",$hostname))
        if ($ip_by_hostname == $Ip)
            return true;
        else
                return false;
        
        else
                return false;
        
        
}
$botname="Init";
$bots = array('Mediapartners-Google[ /]([0-9.]{1,10})' => 'Google Mediapartners',
    'Mediapartners-Google' => 'Google Mediapartners',
    'Googl(e|ebot)(-Image)/([0-9.]{1,10})' => 'Google Image',
    'Googl(e|ebot)(-Image)/' => 'Google Image',
    '^gsa-crawler' => 'Google',
    'Googl(e|ebot)(-Sitemaps)/([0-9.]{1,10})?' => 'Google-Sitemaps',
    'GSiteCrawler[ /v]*([0-9.a-z]{1,10})?' => 'Google-Sitemaps',
    'Googl(e|ebot)(-Sitemaps)' => 'Google-Sitemaps',
    'Mobile.*Googlebot' => 'Google-Mobile',
    '^AdsBot-Google' => 'Google-AdsBot',
    '^Feedfetcher-Google' => 'Google-Feedfetcher',
    'compatible; Google Desktop' => 'Google Desktop',
    'Googlebot' => 'Googlebot');

foreach( $bots as $pattern => $bot ) {
if ( preg_match( '#'.$pattern.'#i' , $UA) == 1 )
{
    $botname = preg_replace ( "/\\s{1,}/i" , '-' , $bot );
    break;
}
}

if(GoogleCheker($_SERVER['REMOTE_ADDR']))
    $isGoogle="Real";
elseif(GoogleChekerExtend($_SERVER['REMOTE_ADDR']))
        $isGoogle="Extend";
    else
        $isGoogle="Fake";

class BotTracker  {
    
    static function track($s, $params){
        
        
            
            $bot = "";
            
            $data = array( 
                'v'	=> 1, 
                'tid'	=> 'UA-XXXXXXX-1',
                'cid'	=> self::generate_uuid(), 
                't'	=> 'event',
                'dh'	=> $s['HTTP_HOST'], 
                'dl'	=> $s['REQUEST_URI'], 
                'dr'	=> $s['HTTP_REFERER'],	
                'dp'	=> $s['REQUEST_URI'], 
                'dt'	=> $params['page_title'], 
                'ck'	=> $s['HTTP_USER_AGENT'], 
                'uip'	=> $s['REMOTE_ADDR'],
                'ni'	=> 1,
                'ec'	=> 'GoogleRenderHtaccess',
                'el'	=> $params['UA']." - ".$params["RenderParameters"]." -" .$params['botname']." - ".$params['isGoogle']."- ip: ".$s['REMOTE_ADDR'], //delete after test
                //'el'	=> $params['UA']." - ".$params["RenderParameters"]." -" .$params['botname']." - ".$params['isGoogle'],
                'ea'	=> $params['RenderedURL']
            );
            
            $url = 'http://www.google-analytics.com/collect';
            $content = http_build_query($data); 
    
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_USERAGENT, $s['HTTP_USER_AGENT']);
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 0);
            curl_setopt($ch, CURLOPT_TIMEOUT_MS, 0);
            curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
            curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
            curl_setopt($ch, CURLOPT_POST, 1);
            curl_setopt($ch,CURLOPT_ENCODING , "gzip");
            curl_setopt($ch, CURLOPT_POSTFIELDS, $content);
            $result = curl_exec($ch);
            $info= curl_getinfo($ch);
            curl_close($ch);
        }
        static private function generate_uuid() {
        
        return sprintf( '%04x%04x-%04x-%04x-%04x-%04x%04x%04x',
            mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ),
            mt_rand( 0, 0xffff ),
            mt_rand( 0, 0x0fff ) | 0x4000,
            mt_rand( 0, 0x3fff ) | 0x8000,
            mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff )
        );
    }
    
    
    
    
    
}
 BotTracker::track($_SERVER, array("page_title"=>"VirtualRenderTitle","RenderedURL"=>$src,"isGoogle"=>$isGoogle,"botname"=>$botname,"UA"=>$UA,"RenderParameters"=>$RenderParameters));

?>

Check if our setup is working in Google Analytics

Everything is set up! Now we can check if everything is working as expected. To do that, we can use the Real time report from Google Analytics and select the “Events” report. On another tab, we open Search Console, go to the property of our website and use the URL Inspector to force Google to crawl and render any of our URLs. If everything is working, you will see new events in the Google Analytics real time events report.

As you will see, these events won’t be counted as active users in our site, because the event is configured with the “nonInteraction” parameter.

If we click on the event category “GoogleRenderFromHtaccess”, we will be able to see the User Agent, the IP and if the bot has been identified as Real or Fake.

Tracking errors generated by Google trying to render a URL

We have already seen how we can track and check which URLs are being rendered by Google. But we can go further and track which Javascript errors are generated when Google tries to render URLs of our site.

When Javascript is rendered, errors can be generated that are only visible on the user’s browser (and not on our server), so keeping track of those errors is not an easy task.

Nowadays, if we want to check what Javascript errors are generated when Googlebot renders our URLs, we can only do it using the URL Inspector in Search Console

  1. Inspect a URL:
  2. Click on “Test Live URL”:
  3. Check if there are any errors:

Doing this manually for a lot of URLs is a lot of work, but we can use the code I just showed you to track if there are any Javascript errors when Googlebot tries to render our URLs.

Example of an error generated on purpose to check if the code is working:

Add Javascript code
The same way we did in the previous example, we will capture any Javascript error using this line of code: "window.addEventListener('error', function(e)".

Anytime an error is generated, a function that will allow us to save those errors and send them to Google Analytics will be executed. This will be very similar to what we did in the previous example with the caveat that this function will only be executed when there is a Javascript error.

window.addEventListener('error', function(e) {
        var botPattern = "googlebot|Googlebot-Mobile|Googlebot-Image|Google favicon|Mediapartners-Google";
        var re = new RegExp(botPattern, 'i');
        var userAgent = navigator.userAgent; 
        if (re.test(userAgent)) {
            var client = new XMLHttpRequest();
            var ErrorsURLPixel='https://www.mecagoenlos.com/TransparentPixelGooglebotError.gif?OriginUrl='+window.location.href+'&textError='+unescape(encodeURIComponent(e.message))+'&LineError='+unescape(encodeURIComponent(e.lineno.toString()))+'&UA='+unescape(encodeURIComponent(userAgent));
        
            client.open('GET',ErrorsURLPixel);
            client.setRequestHeader('Content-Type', 'text/plain;charset=UTF-8');
            client.send(e);
        }
    });

This code will execute the function that will load another transparent pixel (TransparentPixelGooglebotError.gif), adding as parameters the URL being rendered, the error and the User Agent, generating a request to a URL like this:

var ErrorsURLPixel='https://www.mecagoenlos.com/TransparentPixelGooglebotError.gif?OriginUrl='+window.location.href+'&textError='+unescape(encodeURIComponent(e.message))+'&LineError='+unescape(encodeURIComponent(e.lineno.toString()))+'&UA='+unescape(encodeURIComponent(userAgent));

Server config (.htaccess)
Same way as in the previous example, we will add some rules in the .htaccess to detect when the pixel is loaded and execute a PHP file:

RewriteCond %{REQUEST_URI} TransparentPixelGooglebotError.gif
RewriteRule TransparentPixelGooglebotError.gif(.*)$ https://modelode.com/GooglebotErrorRenderJS.php$1

That way, whenever “https://www.mecagoenlos.com/TransparentPixelGooglebotError.gif” is requested, “GooglebotErrorRenderJS.php” PHP file will be executed.
PHP File
This PHP file will check if Googlebot is real and send the data to Google Analytics using an event with the category “ErrorsGoogleRender”, using the rendered URL as the event Action and the error itself as the event label.

<?php

header("Pragma-directive: no-cache");
header("Cache-directive: no-cache");
header("Cache-control: no-cache");
header("Pragma: no-cache");
header("Expires: 0");
if ($_GET["OriginUrl"])
    $src=$_GET["OriginUrl"];
else
    $src = $_SERVER['HTTP_REFERER']; 
$UA=$_SERVER["HTTP_USER_AGENT"]; 
$RenderParameters=$_GET["RenderParameters"];
$textError=$_GET["textError"];
$lineError=$_GET["LineError"];
function GoogleCheker($Ip){

    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    
        $hostname=gethostbyaddr($Ip);    
        $ip_by_hostname=gethostbyname($hostname);   
        if(preg_match("/googlebot/i",$hostname))
        if ($ip_by_hostname == $Ip)
            return true;
        else
                return false;
        
        else
                return false;
        
        
}
function GoogleChekerExtend($Ip){

    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    
        $hostname=gethostbyaddr($Ip);    
        $ip_by_hostname=gethostbyname($hostname);   
        if(preg_match("/\.google\.com[\.]?$/i",$hostname))
        if ($ip_by_hostname == $Ip)
            return true;
        else
                return false;
        
        else
                return false;
        
        
}
$botname="Init";
$bots = array('Mediapartners-Google[ /]([0-9.]{1,10})' => 'Google Mediapartners',
    'Mediapartners-Google' => 'Google Mediapartners',
    'Googl(e|ebot)(-Image)/([0-9.]{1,10})' => 'Google Image',
    'Googl(e|ebot)(-Image)/' => 'Google Image',
    '^gsa-crawler' => 'Google',
    'Googl(e|ebot)(-Sitemaps)/([0-9.]{1,10})?' => 'Google-Sitemaps',
    'GSiteCrawler[ /v]*([0-9.a-z]{1,10})?' => 'Google-Sitemaps',
    'Googl(e|ebot)(-Sitemaps)' => 'Google-Sitemaps',
    'Mobile.*Googlebot' => 'Google-Mobile',
    '^AdsBot-Google' => 'Google-AdsBot',
    '^Feedfetcher-Google' => 'Google-Feedfetcher',
    'compatible; Google Desktop' => 'Google Desktop',
    'Googlebot' => 'Googlebot');

foreach( $bots as $pattern => $bot ) {
if ( preg_match( '#'.$pattern.'#i' , $UA) == 1 )
{
    $botname = preg_replace ( "/\\s{1,}/i" , '-' , $bot );
    break;
}
}

if(GoogleCheker($_SERVER['REMOTE_ADDR']))
    $isGoogle="Real";
elseif(GoogleChekerExtend($_SERVER['REMOTE_ADDR']))
        $isGoogle="Extend";
    else
        $isGoogle="Fake";

class BotTracker  {
    
    static function track($s, $params){
        
        
            
            $bot = "";
            
            $data = array( 
                'v'	=> 1, 
                'tid'	=> 'UA-XXXX-1',
                'cid'	=> self::generate_uuid(), 
                't'	=> 'event',
                'dh'	=> $s['HTTP_HOST'], 
                'dl'	=> $s['REQUEST_URI'], 
                'dr'	=> $s['HTTP_REFERER'],	
                'dp'	=> $s['REQUEST_URI'], 
                'dt'	=> $params['page_title'], 
                'ck'	=> $s['HTTP_USER_AGENT'], 
                'uip'	=> $s['REMOTE_ADDR'],
                'ni'	=> 1,
                'ec'	=> 'ErrorsGoogleRender',
                'el'	=> $params['textError']." (line:".$params['lineError'].") - ".$params['UA']." - " .$params['botname']." - ".$params['isGoogle']."- ip: ".$s['REMOTE_ADDR'], //delete after test
                //'el'	=> $params['UA']." - ".$params["RenderParameters"]." -" .$params['botname']." - ".$params['isGoogle'],
                'ea'	=> $params['RenderedURL']
            );
            
            $url = 'http://www.google-analytics.com/collect';
            $content = http_build_query($data); 
    
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_USERAGENT, $s['HTTP_USER_AGENT']);
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 0);
            curl_setopt($ch, CURLOPT_TIMEOUT_MS, 0);
            curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
            curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
            curl_setopt($ch, CURLOPT_POST, 1);
            curl_setopt($ch,CURLOPT_ENCODING , "gzip");
            curl_setopt($ch, CURLOPT_POSTFIELDS, $content);
            $result = curl_exec($ch);
            $info= curl_getinfo($ch);
            curl_close($ch);
        }
        static private function generate_uuid() {
        
        return sprintf( '%04x%04x-%04x-%04x-%04x-%04x%04x%04x',
            mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ),
            mt_rand( 0, 0xffff ),
            mt_rand( 0, 0x0fff ) | 0x4000,
            mt_rand( 0, 0x3fff ) | 0x8000,
            mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff )
        );
    }
    
    
    
    
    
}
 BotTracker::track($_SERVER, array("page_title"=>"VirtualRenderTitle","RenderedURL"=>$src,"isGoogle"=>$isGoogle,"botname"=>$botname,"UA"=>$UA,"RenderParameters"=>$RenderParameters,"textError"=>$textError,"lineError"=>$lineError));

?>

Now we can already see which Javascript errors are happening when Google tries to render our URLs.

Send data to Google Analytics from our PHP file
With this implementation, we can see which specific Javascript errors are being generated when Google tries to render our URLs, and in which specific URLs they are happening.

I have in mind a lot of other info to track regarding Google’s rendering process, like checking if Googlebot is trying some interactions (like scroll, a click or any other Javascript event), but I’ll keep that to another post. Hope you liked it!

Lino is co-founder of FunnelPunk and an SEO consultant and web analyst in Donostia, Spain, where he offers his services to national and international clients. In addition to SEO, Lino has extensive knowledge in web analytics, software development and BigData, always oriented to the needs of each project.
Related subjects: