Konstantin Konstantin - 4 months ago 8
Perl Question

Counting daily visitors in a log file



I have a log file of visitors which spans over 1.5 years. Every line represents a page load. Structure of each line is the following:

2016-08-05 00:48:10 +0200 -> 170.67.51.153 -> Beijing - Beijing Shi: China -> http://example.com/?ref=1676 -> Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) -> AS55966 Beijing Baidu Netcom Science and Technology Co., Ltd. -> Beijing Baidu Netcom Science and Technology Co. -> 0.9301


I used " -> " to delimit fields.

My Log file is about 50MB in size, and it takes a long time to parse the whole file for today's or yesterday's visitor count, because those actual lines are of course at the end of the file.

I would like to use the bash command "tac" which is a reverse "cat" or something similar technique to get the lines in a reverse order. My first attempt was (to get daily visitors of for example 2016-08-04):

tac visitor_log.txt|grep 2016-08-04|cut -d " " -f 5|sort|uniq|wc -l


It of course outputs the visitor count, but unfortunately it is also time consuming as it reads through the whole file, because one can't tell "grep" to stop matching the lines if the previous line matched and the actual line doesn't match.

Maybe should I emulate "tac" in Ruby to get the daily visitor count effectively? Or should I using some flip-flop technique which is possibly available in "sed"? Unfortunately I don't know "sed" at all.

Answer

It's hard to know how to help without more information, but this Perl program will display the number of visits for every day logged

The program expects the input file as a parameter on the command line. The output is as simplistic as the sample data you have given, and shows a single visit on 5 August 2016

use strict;
use warnings 'all';

my %visits;

while ( <> ) {
    next unless /^(\d\d\d\d-\d\d-\d\d)/;
    ++$visits{$1};
}

for my $date ( sort keys %visits ) {
    printf "%s  --  %d\n", $date, $visits{$date};
}

output

2016-08-05  --  1

It should take only a second or two if your file is really only 50MB

I have tested by replicating the line you show to create a 50MB file, and it is processed in less than half a second, reporting 162,823 visits on one day

I suggest that you reformat your log file into a database so that you can query it more easily. That way you will have to process the log file just once; thereafter your queries will be instantaneous