Using SEO log file analysis is a very relevant and important solution to improve your technical and on-site SEO. Currently, you cannot get your site crawled, parsed, indexed and ranked by search engines without good, relevant SEO. Log files are the only data that are 100% accurate and truly help you understand how search engines crawl your website. Log analysis helps you rank higher and get more traffic, conversions and sales. But before digging into the benefits, let’s understand what log analysis is.
A web server log is a journal of outputs made by a web server containing ‘hits’ or records of all requests that the server has received.
The data received is stored anonymously, and includes details such as the time and date in which the request was made, the request IP address, the URL/content requested, and the user-agent from the browser.
These files exist typically for technical site auditing & troubleshooting, but can be extremely valuable for SEO auditing as well. Builtvisible
In fact, when someone fills types a URL, like https://www.oncrawl.com, into a browser, here is what happens: the browser translates the URL into 3 parts:
The server name (oncrawl.com) is converted into an IP address via the domain name server. The connection made between the browser and the dedicated web server allows to you reach the requested file. An HTTP Get request is then sent to the web server for the right page which then sends back the content of page you see displayed on the screen. Each of these requests is thus regarded as a ‘hit’ by the web server.
The appearance of a log file depends on the type of server and configurations used (Apache, IIS, etc.) but there are elements of a record that you will always be able to find:
Additional attributes can be added, such as:
Log file analysis allows you to understand exactly how search engines crawl your website as every request made on the hosting web server is saved. You just need to filter by user-agent and the client IP to access crawl details for search engine bots. You can thus analyze crawlers’ behavior on your website by examining when, how frequently, and on what pages crawlers are present. This lets you answer the following questions:
These 3 questions are just a quick overview of the potential of log analysis. Log analysis can also help you determine whether your site architecture is optimized, if you have site performance issues, and so on.
OnCrawl Log Analyzer
Log file analysis for bot monitoring and crawl budget optimization. Detect site health issues and improve your crawl frequency.
There are different metrics you can look at in your log files to improve your SEO.
Bot crawl volume refers to the number of requests made by GoogleBot, BingBot, Baidu, Yahoo and Yandex, etc for a given period of time. Bot crawl volume can show you if you have been crawled by a specific search engine. For instance, if you want to get found in China but Baidu is not crawling you, that is an issue.
A crawl budget refers to the number of pages a search engine will crawl on your site in a given period of time, usually a day. This budget is linked to the authority of the domain and proportional to the flow of link equity through the website.
This crawl budget is often wasted to irrelevant pages. Let’s say you have a budget of 1 000 pages per day. You want these 1 000 pages that are crawled to appear in the SERPs. But bots may be crawling old pages, duplicate pages, redirected pages, or other pages that aren’t important to your SEO strategy. If you have fresh content you want to be indexed but no budget left, then Google won’t index this new content. That’s why you want to watch where you spend your crawl budget with log analysis.
Those kind of redirects are not optimized for your SEO as they use a lot of crawl budget: search engines return often to see if the temporary redirect is still in place from the old URL to the new one. Prefer permanent 301 redirects. Log data analysis can help you spot these redirections.
Log data analysis can also help you spot status code errors like 4XX and 5XX that can badly impact your SEO. Understanding the different HTTP status codes can help you rank higher.
You can set up your crawl priority in your XML sitemap or by checking your internal linking structure. This prevents Google from ignoring some pages or sections of your site.
In fact, analyzing your logs can highlight URLs or directories that are not often crawled by bots. For instance, if you want a specific blog post to rank for a targeted search query but it is located in a directory that Google only visits once every 6 months, then you will miss the opportunity to gain organic search traffic from this specific post for up to 6 months before Google comes to crawl it again.
URL parameters like filters or tracking can result in crawl budget waste, since in this case search engines are crawling different URLs with the same content. Search Engine Land published an excellent article about how to fix this problem.
Log file analysis can point out when Google last crawled a specific page you want to be indexed quickly.
If you monitor your logs regularly, you can also track how long it takes between the time you publish content, the time it is crawled, and the time you get your first organic hits. This will allow you to set up your content calendar for seasonal campaigns or for events with a specific date.
As we said before, a crawl budget is quite linked to the authority of the domain and proportional to the flow of link equity through the website. Because Google does not want to waste crawling time on a low quality website, you will want to monitor Googlebot’s real-time activity on your website to see if bots are spending enough time on your content.
Changes in bot activity on your website are early warning signs of algorithm changes or site changes that affect your SEO. With log analysis, you can spot changes before they become issues.
We offer an SEO log file analyzer at OnCrawl. We have a free open source version and a cloud-based version – which is integrated to our Technical SEO platform – where you can upload your logs through a secure, private ftp space, and get additional insights through cross-analysis.