Server log files are like the “black box” of a website, as in they record everything that happens on your site.
Log files include information on who visited your site, what pages they looked at, how long they stayed, and even if they ran into any errors. These logs store details such as the client’s IP address, what type of device or web browser they’re using, and much more.
The client can be anything from a user, to a search engine bot, or privately hosted scrapers. It’s the search engine bot part we’re most interested in, as they provide insight into how search engines are crawling your website.
Crawling is how search engines, like Google, visit and understand what’s on your website. So, if search engines have a hard time crawling your site, this can cause issues in the processing (indexing) and then ranking phases.
This type of information is especially valuable for capturing the potential of newly published content or recent updates and ensuring that these changes are reflected in search engine rankings as quickly as possible.
By analyzing log files in real-time, you can rapidly identify any issues regarding crawl errors, inefficiencies, or bottlenecks, and promptly make adjustments. Let’s take a look at how log file analysis can be useful specifically in the SEO industry.
The basics of log files
Before we examine where log file analysis is useful, it’s important to understand some of the basics. Server log files are text files that automatically record events and transactions that happen on a web server.
Think of them as a detailed diary for your website, jotting down every single visit and action that occurs.
These logs are stored on the server where your website is hosted, and they provide valuable insights into how both humans and search engine bots interact with your site.
The following is a quick rundown of some of the key pieces of information that you can find in a server log file:
- IP Address: This is the unique address of the computer or server that’s making a request to your website. If Google is crawling your site, you’ll see Google’s IP address here.
- User Agent: This tells you what type of browser or service is accessing your website. For example, it could be Chrome for a human user or Googlebot for Google’s search engine crawler.
- Timestamp: This is the exact date and time when the visit or action happened. It’s like a time-stamp on a receipt, showing you when something took place.
- Requested URL: This is the specific page or resource that was asked for. For example, if someone visited your homepage, you might see something like “/index.html” here.
- HTTP Status Codes: These are three-digit codes that tell you the outcome of the request. For instance, a “200” means everything is okay, and the page loaded correctly. A “404” means the page wasn’t found, and a “500” suggests a server error.
- Referrer: This shows where the visitor came from. If they clicked a link on another website to get to yours, you’ll see that website’s URL here.
- Bytes Transferred: This indicates the size of the file that was sent in response to a request. It helps you understand the amount of data your server is serving up.
For the most part when you request these from a client (or developer), they will be cleansed to only include specific user-agents such as Googlebot and Bingbot. This is to avoid transferring or unnecessary handling of user PII (Personal Identifying Information).
The cleansing of log files to only include the user-agents (specifically search engine bots) reduces the file size, and allows you to analyze a clean set of data.
Routine log file analysis
Regularly examining your server log files can offer great insights into your website, and how Google (and other search engines) are crawling it. This is especially important if you run a large, enterprise website, or a website with high URL change frequency such as e-commerce.
Ongoing issue identification
Log file analysis isn’t a silver bullet, but ongoing or routine log file analysis can help improve crawl efficiency, especially on larger/enterprise websites. During this analysis there are a handful of important metrics to keep an eye on.
On their own, and in silo they’re not high priority issues, but if you notice them trending upwards or affecting URLs they shouldn’t – then you have sight on a developing problem.
When a page is not found, a “404 error” is recorded in the server log file. This is a signal to you that either a page has been deleted or a link is broken. These errors can prevent search engines from efficiently crawling your site, affecting your SEO performance.
Codes like “500” or “503” indicate server issues that may be preventing search engines from accessing your site altogether. These need immediate attention because they can result in pages or even the whole site getting de-indexed (depending on the scale of the issue).
Multiple redirects (for example, from “http” to “https” to a “www” version of the site) can slow down crawling. You can spot these in your log files and streamline the process, helping search engines crawl more efficiently.
Log file analysis post-major website changes
Changing your website’s domain or undergoing significant URL architecture changes is often a necessary step for business growth and rebranding. However, these changes come with potential risks that can impact SEO and user experience.
Log file analysis post-migration can help identify any missed, misconfigured, or broken redirects, or identify if Google is wasting time crawling older URLs and redirects instead of prioritizing the URLs of high performing pages from the most recent version of the website.
Broken links & misconfigured redirects
When you move your website or change the URL structure, links that pointed to your old address can break, leading to 404 errors. This not only disrupts user experience but can also impact the time it takes for Google to process your URL changes.
If redirects are not configured correctly, you could end up creating infinite loops or pointing users to the wrong pages, further deteriorating user experience and crawl efficiency.
Crawl area focus
Post-migration, you’ll need to keep a closer eye on your server log files to understand how search engines are interacting with your new site architecture.
Log files can reveal which sections or URLs of your website are being crawled more or less frequently. For instance, if you see that your blog posts are being crawled frequently but your product pages aren’t, you might need to look into why that’s happening.
By looking at the IP addresses and User Agents in your log files, you can identify how often search engine bots are crawling your site. High crawl frequency is generally a good sign but can become problematic if your server resources are limited.
After a migration or major URL change, search engine bots may end up wasting their crawl budget on non-existent or less-important pages. This could lead to your more critical pages getting crawled and indexed less frequently, negatively affecting your site’s visibility in search results.
Tools for log file analysis
Log file analysis can take time, but there are tools to facilitate the process. Depending on your level of implication in your site’s SEO, different tools may be more useful than others.
If you’re a technical SEO or work with a website that aggregates a lot of information – an e-commerce site for example – you’re likely looking for an in-depth analysis of your log files. In this case, Oncrawl could be a useful solution.
Oncrawl’s SEO log analyzer processes hundreds of millions of logs per day to provide information about how, when, and where search engine bots and visitors interact with your site. This in turn helps you to identify the speed and frequency your site responds to queries, where any problems may occur and how to fix them.
On the other hand, if you work in IT or in a tech department, your work may be more focused on debugging. In that case, a tool like Splunk, that incorporate log and machine data in their analysis, could be useful.
Likewise, the needs of someone working in site security will likely different. With the threat of cyber attacks, certain sites require specialized log analysis tools that can be integrated with other security tools. For example, a tool like Gray Log can be used both for real-time security log analysis, and for forensic analysis of log data after a cyber attack.
This list is by no means exhaustive, but it’s important to find the log analyzer that best fits your needs.
Log monitoring has been around for quite some time and as search engines grow increasingly sophisticated, so too will your analysis process have to evolve.
Optimizing the visibility and performance of your website requires more advanced techniques than ever before and real-time log file analysis has emerged as a crucial element in this evolving landscape.
Now, with the growing usage of machine learning and AI, will these things have an even greater impact on how data is analyzed? Or will future data protection measures result in certain consequences on how search engines crawl your site and what information they can collect? Only time will tell, but rest assured that we will always have logs.