Click here to Skip to main content
15,879,326 members
Articles / Web Development / Apache

How to generate full visitor count from an Apache log file

Rate me:
Please Sign up or sign in to vote.
3.86/5 (4 votes)
16 Jan 2012CPOL3 min read 27.2K   10   4
Count how many hits were generated from each IP address and show the top 10 sources.

Introduction

In the previous article, I described how to create a report from an Apache log file for the number of hits from localhost vs. elsewhere. That script can be easily changed to provide a report for any single IP address vs. the rest of the world just by replacing the IP address with another address.

It can be also changed to provide a report with full visitor count, showing how many hits came from each IP address. Then it is easy to show the top 10 sources, or filter them in some other way.

Background

Just to recall, in the default format, each line in the log file of Apache starts like this:

127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] ...
217.1.20.22 - - [10/Apr/2007:10:40:54 +0300] ...

That means if we take any single line and put it in the $line variable, we can extract the IP address by the following code:

PERL
my $length = index ($line, " ");
my $ip = substr($line, 0, $length);

Using the code

In order to count an arbitrary set of strings, we need a data structure that can map strings to scalar values. In Perl, this data structure is called "associative array" or in short "hash". In other languages, a similar thing might be called a map, a dictionary, or a look-up table.

A hash is basically an unordered set of key-value pairs, where the keys are unique strings and the values can be any scalar value (number, string, or a reference).

In Perl, a hash is marked with the percentage character (%). So we declare the %count hash to hold the IP to "number of hits" mapping. Most of the code is the same as in the previous example but instead of increasing two separate scalars, we increase the elements of the hash using the following construct:

PERL
$count{$ip}++;

When we encounter an IP address for the first time, $count{$ip} does not exist yet. If a value is not there yet, Perl assumes it has an "undef" value in it. If that is used in some numerical operation such as the ++ auto-increment, then it pretends to be the number 0. That becomes 1 and this operation also creates the appropriate entry in the hash. The key-value pair automatically springs to existence. This is also called auto-vivification.

As you can see, the hash grows automatically. Perl does all the memory management.

Once this is done, we'll have a hash in which each key is an IP address and each value is the number of times that IP address appears in the file. The keys function gets a hash as a parameter and returns the unordered list of keys of the hash. This code will print all the IP addresses with the corresponding number of hits:

PERL
foreach my $ip (keys %count) {
    print "$ip   $count{$ip}\n";
}

Code

The full script is here:

PERL
#!/usr/bin/perl
use strict;
use warnings;

my $file = shift or die "Usage: $0 FILENAME\n";
open my $fh, '<', $file or die "Could not open '$file': $!";

my %count;

while (my $line = <$fh>) {
    my $length = index ($line, " ");
    my $ip = substr($line, 0, $length);
    $count{$ip}++;   
}

foreach my $ip (keys %count) {
    print "$ip   $count{$ip}\n";
}

Points of interest

Of course it would be nicer to have them sorted and this code will do it:

PERL
foreach my $ip (sort keys %count) {
    print "$ip   $count{$ip}\n";
}

But this sorts the IP addresses based on the ASCII table. Probably not very interesting.

A better sorting might be this:

PERL
foreach my $ip (reverse sort { $count{$a} <=> $count{$b} } keys %count) {
    print "$ip   $count{$ip}\n";
}

Here we sort the keys according to the corresponding values and then we reverse the order to get the IPs with the largest numbers first. This is the expression, but let's take it apart:

PERL
reverse sort { $count{$a} <=> $count{$b} } keys %count

You can sort any list of strings.

PERL
sort @strings;

By default this sorts comparing every two values based on the ASCII table.

You can also sort them using any other condition. E.g., the length of the strings:

PERL
sort { length($a) <=> length($b) } @strings;

The sort() function of Perl will take any two values it wants to compare, put them in the two variables $a and $b, and evaluate the block. Based on the result, it will either keep the order of the two values or swap them.

PERL
sort { $count{$a} <=> $count{$b} } keys %count

This code does the same but it sorts the keys of the hash and when comparing two keys, the expression will compare the values of the two keys. The result will be in increasing order but if we would like to display the IP with the biggest number of hits, then we need to reverse the results:

PERL
reverse sort { $count{$a} <=> $count{$b} } keys %count

In the last example, we do the same but when displaying, we use a helper variable to limit the number of items to the top two IP addresses.

PERL
my $top = 2;
foreach my $ip (reverse sort { $count{$a} <=> $count{$b} } keys %count) {
    print "$ip   $count{$ip}\n";
    $top--;
    if ($top <= 0) {
        last;
    }
}

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Instructor / Trainer Self Employed
Israel Israel
I have started programming on a Casio calculator running some mini-BASIC in 1982 while I was in high-school in Budapest, Hungary.

Since then I switch hardware, programing language and country of residence.

Today I live in Israel and provide Perl training all over the world. Both by traveling to clients and on-line as a video course under the "Perl Maven" brand. I am also running a weekly newsletter about Perl called "Perl Weekly" and have initiated writing an IDE for Perl in Perl called "Padre, the Perl IDE".

Comments and Discussions

 
GeneralMy vote of 3 Pin
Jonathan [Darka]25-Jan-12 3:29
professionalJonathan [Darka]25-Jan-12 3:29 
SuggestionWarning about Apache log format Pin
Peter_in_278016-Jan-12 12:50
professionalPeter_in_278016-Jan-12 12:50 
GeneralRe: Warning about Apache log format Pin
Gabor Szabo (szabgab)16-Jan-12 19:15
Gabor Szabo (szabgab)16-Jan-12 19:15 
GeneralRe: Warning about Apache log format Pin
Peter_in_278016-Jan-12 19:44
professionalPeter_in_278016-Jan-12 19:44 
Gabor Szabo (szabgab) wrote:
BTW wouldn't such change break all the log parsers out there?
Yes! My Apache logs are so sparse I can afford to view them, and I use a more human-friendly log format. (Well, me-friendly, and I think I'm human! Poke tongue | ;-P )

Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.