Technical SEO _ Oncrawl blog

How to do log file analysis for free?

June 17, 2016 - 2  min reading time - by Emma Labrador
Home > Technical SEO > How to do log file analysis for free?

Log file analysis helps you understand how search engines are crawling a website and their impact on SEO. These insights are great to help you improve your crawlability and SEO performance.
With these data, you can analyze crawl behavior and determine some interesting metrics like:

  • Is your crawl budget spent efficiently?
  • What accessibility errors were met during crawl?
  • Where are the areas of crawl deficiency?
  • What are my most active pages?
  • Which pages Google does not know?

The great thing is that you can also do it for free. Oncrawl offers an open source log analyzer.

It will help you spot:

  • Unique pages crawled by Google
  • Crawl frequency by group of pages
  • Active and inactive pages
  • and monitor status codes

How does it work?

1- Install Docker

Install Docker Tool Box.
Choose Docker Quickstart terminal to start.
Copy/paste the IP address 192.168.99.100

open source docker compose

Then, download oncrawl-elk release: https://github.com/cogniteev/oncrawl-elk/archive/1.1.zip
Add these lines in the terminal to create a directory and unzip the file:

  • MacBook-Air:~ cogniteev$ mkdir oncrawl-elk
  • MacBook-Air:~ cogniteev$ cd oncrawl-elk/
  • MacBook-Air:oncrawl-elk cogniteev$ unzip ~/Downloads/oncrawl-elk-1.1.zip

And then, add:

  • MacBook-Air:oncrawl-elk cogniteev$ cd oncrawl-elk-1.1/
  • MacBook-Air:oncrawl-elk-1.1 cogniteev$ docker-compose -f docker-compose.yml up -d

Docker-compose will download all necessary images from docker hub, this may take a few minutes. Once the docker container has started, you can enter the following address in your browser: http://DOCKER-IP:9000. Make sure to replace DOCKER-IP with the IP you copied earlier.
You should see the Oncrawl-ELK dashboard, but there are no data yet. Let’s get some data to analyze.

oncrawl elk log file analysis

2-Import log files

Importing data is as easy as copying log access files to the right folder. Logstash start indexing any file found at logs/apache/*.log , logs/nginx/*.log , automatically.
If your web server is powered by Apache or NGinx, make sure the format is combined for log format. They should look like:

127.0.0.1 — — [28/Aug/2015:06:45:41 +0200] “GET /apache_pb.gif HTTP/1.0” 200 2326 “http://www.example.com/start.html” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

Drop your .log files into the logs/apache or logs/nginx directory accordingly.

3-Play

Go back to http://DOCKER-IP:9000. You should have figures and graphs, congrats !

oncrawl log file analysis open source

You can also combine those data with crawl data and access a complete view of your SEO performance. You will be able to detect active orphan pages, check crawl ratio by depth or page groups and many more interesting information. To know more about combined analysis, you can check this page.

Emma Labrador See all their articles
Emma was the Head of Communication & Marketing at Oncrawl for over seven years. She contributed articles about SEO and search engine updates.
Related subjects:

0 responses to “How to do log file analysis for free?”

  1. Xavier says:

    Hi Emma,

    I just do all what you wrote here, but i got all blocs (Unique pages crawled, Bots hits, etc…) with: “An error occurred.

    Sorry, the page you are looking for is currently unavailable.
    Please try again later.”

    Any help ?

    Thanks.

  2. juju34 says:

    Do you know how to create my own report? I want to know inactives pages for example…

    • Emma says:

      Hi,
      The ELK stack on which the open source OnCrawl ELK is built is best suited to perform analysis on individual log events.

      Detecting active and inactive pages goes beyond individual log events: it requires to groups events per URL, and for each URL detect if it receives at least on SEO visit on the selected period or not. Although this may be possible using pipeline aggregation in elasticsearch, they are not currently exposed in kibana (https://github.com/elastic/kibana/issues/4584) and this is a very advanced topic.

      The other option is to compute these analysis with other data processing engines, and send enriched objects to elasticsearch. That’s precisely what the OnCrawl platform does.

      If you are interested in testing our hosted log monitoring, you can create a test account at https://app.concrawl.com and contact us in the in-app chat.

      Hope this help.

  3. Wiehan says:

    Thanks for the share.

    Having issues and unclear on part where it reads “Add these lines in the terminal to create a directory and unzip the file”

    Do you have screenshots to further illustrate the instructions?

    Sounds like a really useful tool and can’t wait to test your software.

  4. Sam says:

    Hi Emma,
    Ok for the Apple tuto, it looks nice, but how about a Windows 10 tuto ? ;)
    Thank you by advance.
    Kind regards