, 2 min read

URL Count Statistics

When I moved away from WordPress to Saaze, see Moved Blog To eklausmeier.goip.de, I first used StatCounter to count how many times my blog posts were visited. Later I added Matomo, or Google Analytics, or Yandex Metrika. It turned out that all these counting mechanism use quite large JavaScript libraries. These JavaScript code libraries, which they download, are often way larger then my actual web pages. So clearly, this is unbalanced.

I tried to use goaccess to get an overview which blog posts are read a lot, and which posts are not. Unfortunately, goaccess is not well suited for that, as you have to filter out all bots and crawlers, and goaccess does not show you the time evolution of individual blog posts. I wrote about goaccess here: Using GoAccess with Hiawatha Web-Server. Also AWStats does not provide this functionality. I wrote about AWStats here: AWStats and Hiawatha. So, again, I had to implement it myself.

The output of the statistics is a single web page. I use DataTables JavaScript library to show the URL. I use Apache ECharts to draw the bar charts. Input is just the access.log file. This file can be filtered beforehand to get rid of bots and crawlers.

As central data structure I use a hash H where its key is the method (GET/POST/etc.) + requested URL + protocol, e.g.,

GET /img/LitMotorBike.jpg HTTP/1.1

Its value is an array with six entries. The entries in the array are:

  1. simple counter
  2. sum of transferred bytes
  3. stack of counters for the year in question
  4. stack of counters for the combination year+month
  5. stack of counters for the combination year+week
  6. consecutive number used for indexing an Apache ECharts array

The rest of the Perl code is basically just updating this central hash H. The Perl script is in GitHub: blogurlcnt.

Once I have H populated, I then generate the data for DataTables using a JavaScript array urlFrequency. Also, I populate the ecoption[] JavaScript array for Apache ECharts.

I always use this Perl script blogurlcnt in conjunction with accesslogFilter, which I described here: Filtering Bots and Crawlers from Access.log. Reason: I am not interested in getting statistics of bots & crawlers, but in statistics of real readers of my own postings. Therefore I call:

accesslogFilter /tmp/access.log | blogurlcnt > /srv/http/urlstat.html

When initially loading the resulting HTML page:

One of the nice feature of DataTables is that you can easily filter the data in the table. When filtering for "lua" and clicking the first entry: