Log file analysis helps you understand how search engines are crawling a website and their impact on SEO. These insights are great to help you improve your crawlability and SEO performance.
With these data, you can analyze crawl behavior and determine some interesting metrics like:
- Is your crawl budget spent efficiently?
- What accessibility errors were met during crawl?
- Where are the areas of crawl deficiency?
- What are my most active pages?
- Which pages Google does not know?
The great thing is that you can also do it for free. Oncrawl offers an open source log analyzer.
It will help you spot:
- Unique pages crawled by Google
- Crawl frequency by group of pages
- Active and inactive pages
- and monitor status codes
How does it work?
1- Install Docker
Install Docker Tool Box.
Choose Docker Quickstart terminal to start.
Copy/paste the IP address 192.168.99.100
Then, download oncrawl-elk release: https://github.com/cogniteev/oncrawl-elk/archive/1.1.zip
Add these lines in the terminal to create a directory and unzip the file:
- MacBook-Air:~ cogniteev$ mkdir oncrawl-elk
- MacBook-Air:~ cogniteev$ cd oncrawl-elk/
- MacBook-Air:oncrawl-elk cogniteev$ unzip ~/Downloads/oncrawl-elk-1.1.zip
And then, add:
- MacBook-Air:oncrawl-elk cogniteev$ cd oncrawl-elk-1.1/
- MacBook-Air:oncrawl-elk-1.1 cogniteev$ docker-compose -f docker-compose.yml up -d
Docker-compose will download all necessary images from docker hub, this may take a few minutes. Once the docker container has started, you can enter the following address in your browser: http://DOCKER-IP:9000. Make sure to replace DOCKER-IP with the IP you copied earlier.
You should see the Oncrawl-ELK dashboard, but there are no data yet. Let’s get some data to analyze.
2-Import log files
Importing data is as easy as copying log access files to the right folder. Logstash start indexing any file found at logs/apache/*.log , logs/nginx/*.log , automatically.
If your web server is powered by Apache or NGinx, make sure the format is combined for log format. They should look like:
127.0.0.1 — — [28/Aug/2015:06:45:41 +0200] “GET /apache_pb.gif HTTP/1.0” 200 2326 “http://www.example.com/start.html” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
Drop your .log files into the logs/apache or logs/nginx directory accordingly.
3-Play
Go back to http://DOCKER-IP:9000. You should have figures and graphs, congrats !
You can also combine those data with crawl data and access a complete view of your SEO performance. You will be able to detect active orphan pages, check crawl ratio by depth or page groups and many more interesting information. To know more about combined analysis, you can check this page.
Hi Emma,
I just do all what you wrote here, but i got all blocs (Unique pages crawled, Bots hits, etc…) with: “An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.”
Any help ?
Thanks.
Here is the error, the interface can’t find this address : http://192.168.99.100:9000/app/kibana
Hi,
Sorry for this inconvenience. You should try with this link: https://github.com/cogniteev/oncrawl-elk#troubleshoot
Hope that does help.
Thanks for the link. I tried a full restart but i get a (in Kitematic):
“.kibana index initialization failed. Abort!
Waiting for kibana-config container
.kibana-config index is now opened.
.kibana index initialization failed. Abort!
Fatal: .kibana-config index does not exists. Timeout error.”
If somebody can help…
Unfortunately Kitematic does not seem to support docker-compose yet (https://github.com/docker/kitematic/issues/137), can you try with the Docker Quickstart terminal as described above ?
Do you know how to create my own report? I want to know inactives pages for example…
Hi,
The ELK stack on which the open source OnCrawl ELK is built is best suited to perform analysis on individual log events.
Detecting active and inactive pages goes beyond individual log events: it requires to groups events per URL, and for each URL detect if it receives at least on SEO visit on the selected period or not. Although this may be possible using pipeline aggregation in elasticsearch, they are not currently exposed in kibana (https://github.com/elastic/kibana/issues/4584) and this is a very advanced topic.
The other option is to compute these analysis with other data processing engines, and send enriched objects to elasticsearch. That’s precisely what the OnCrawl platform does.
If you are interested in testing our hosted log monitoring, you can create a test account at https://app.concrawl.com and contact us in the in-app chat.
Hope this help.
Thanks for the share.
Having issues and unclear on part where it reads “Add these lines in the terminal to create a directory and unzip the file”
Do you have screenshots to further illustrate the instructions?
Sounds like a really useful tool and can’t wait to test your software.
Hi Emma,
Ok for the Apple tuto, it looks nice, but how about a Windows 10 tuto ? ;)
Thank you by advance.
Kind regards