Using SEO log file analysis is a pretty relevant and important solution to improve your technical and on-site SEO. Actually, you cannot get crawled, parse, indexed and ranked by search engines without any good relevant SEO. Log files are the only data that are 100% accurate to truly understand how search engines are crawling your website. Log analysis helps you rank higher, get more traffic, conversions and sales. But before digging on its benefits, let’s understand what are log analysis.
What is a web server log file?
A web server log file is a file output made from a web server containing ‘hits’ or record of all requests that the server has received.
The data received is stored anonymously, and includes details such as the time and date in which the request was made, the request IP address, the URL/content requested, and the user-agent from the browser.
These files exist typically for technical site auditing & troubleshooting, but can be extremely valuable for SEO auditing as well. Builtvisible
Actually, when someone fills out an URL, like http://www.oncrawl.com, into a browser, here is what happen: the browser translates the URL into 3 parts:
- Server name
- File name
The server name (oncrawl.com) is converted into an IP address via the domain name server. The connection made between the browser and the dedicated web server allows to reach the requested file. An HTTP Get request is then sent to the web server for the right page which then lead to display the visible page you see on the screen. Each of these requests is thus regarded as a ‘hit’ by the web server.
The appearance of a log file can depend on the type of server and configurations used (Apache, IIS, etc) but there are attributes integrated you can find quite always :
- Server IP
- Timestamp (date & time)
- HTTP status code
- Method (GET / POST)
- Request URL (aka: URL stem + URL query)
And then, some other attributes can be added, like:
- Host name
- Bytes downloaded
- Time taken
- Request/Client IP
Log file analysis and SEO, what’s the point?
Log file analysis allows you to exactly understand how search engines are crawling your website as every request made on the hosting web server is being saved. You just need to filter the user-agent and the client IP to access crawl details. You can thus analyze crawlers behavior on your website and spot the following points:
- Is your crawl budget spent efficiently?
- What accessibility errors were met during crawl?
- Where are the areas of crawl deficiency?
Those 3 questions are just a quick overview of log analysis’ potential. Log analysis could also help you determine whether your site architecture is optimized, if you have site performance issues, etc.
Technical SEO insights you can find in log data
There are different metrics you can look at in your log files to improve your SEO.
Bot crawl volume
Bot crawl volume refers to the number of requests made by GoogleBot, BingBot, Baidu, Yahoo and Yandex, etc for a given period of time. Bot crawl volume can show you if you have been crawled by a specific search engine. For instance, if you want to get found in China but Baidu is not crawling you, that is an issue.
Crawl budget waste
A crawl budget refers to the number of pages a search engine will crawl each time it visits your site. This budget is linked to the authority of the domain and proportional to the flow of link equity through the website.
Actually, this crawl budget could be wasted to irrelevant pages. Let’s say you have a budget of 1 000 pages per day, then you want these 1 000 pages to appear in the SERPs. If you have fresh content you want to be indexed but no budget left, then Google won’t index this new content. That’s why you want to watch where you spend your crawl budget with log analysis.
Temporary 302 redirects
Those kind of redirects are not optimized for your SEO as they don’t pass along the “link juice” of external links from the old URL to the new one. Prefer permanent 301 redirects. Log data analysis can help you spot those redirections.
Response code errors
Log data analysis can help you spot status code errors like 4XX and 5XX that can badly impact your SEO. Understanding the different HTTP status code can help you rank higher.
You can set up your crawl priority in your XML sitemap or by checking your internal linking structure. It avoids that Google ignore some pages or sections of your site. Actually, analyzing your logs can highlight URLs or directories that are not often crawled by bots. For instance if you want a specific blog post to rank for a targeted search query but is located in a directory that Google only visit once every 6 months, then you will miss the opportunity to gain organic search traffic from this specific post for up to 6 months.
Duplicate URL crawling
URL parameters like filters or tracking can results in crawl budget waste since search engines are crawling different URLs with the same content. Search Engine Land wrote an excellent paper about how to fix this problem.
Last crawl date
Log file analysis can point out when Google last crawled a specific page you want to be indexed quickly.
As we said before, a crawl budget is quite linked to the authority of the domain and proportional to the flow of link equity through the website. Because Google does not want to waste crawling time on a low quality websites, you want to check Googlebot’s real-time activity on your website to see if bots are spending enough time on your content.
We do offer a SEO log file analyzer at OnCrawl. You can download it as open source and realize your log analysis for free (you only pay your hosting costs). Or you can get it hosted by sending us your logs and we will give you an access to our OnCrawl Advanced Platform.