Click here to Skip to main content
15,881,424 members
Articles / Web Development

How to build a simple website analysis services (like google analytics)

,
Rate me:
Please Sign up or sign in to vote.
4.52/5 (7 votes)
16 Jan 2014CPOL5 min read 26.8K   476   14   4
What's behind the website analysis services (like google analytics) & how to build one.
Introduction   

Hello everybody!   

This is my second article on CodeProject and it is the natural sequel of the first dedicated to development of a visitors counter with autorefresh feature. This time I want to explain how an analysis service, like Google Analytics, works to provide realtime information on what happens in a website. Furthermore, I would explain how to build one. (extremely semplified)  

Like I said before, my English is very very ugly and poor, but i hope you will help me improving text quality. As usual, i use .NET and VB in order to speed up development time but all methods described in this article can be implemented in any development language. 

If you use google analytics you can see some spectacular features: the number of active visitors, the current viewed pages, the behavior of users and much more. (All of them, in realtime!) 

But someone can answer: how this works? How it is made? We try to answer this questions. 

As first, we can observe that many (google isn't the only) analysis services needs to add Javascript code in all your pages. Essentially, in order to work properly, the following HTML template will be implemented in all your pages:       

HTML
 <html>
    <head>
      <script language="javascript" src="SERVICE_URL/file.js"></script>
    </head>
     ... page contents ...
    <body> 
</body>
</html> 
 
So, the first question could be: what does file.js do?   

When you add the <script> tag in your header, your browser will download third-part code to execute. What this code do is reported in the next section.

You could observe that to download a file, your browser must create an Http Request to the server: on the server-side, it is possible to know many info about the request source: For example, a server can access to your Ip Address, to your Remote HostName and to your Browser's name; and this, alone, can provide sufficient information to build a little tracking system.  

A simple schematization of an analysis server is shown in following figures.      

Image 1 

As you can see, for each page loaded in your browser a copy of file.js is downloaded from analysis server. This mean that you allow the server to do some things(non-hazardous, javascript run in a sandbox)  for you.  

So, the second question could be: "what kind of information the server reads from my browser? and how can he read it? Therefore how my browser send information to server". 

The following image answer this questions.   

Image 2

First, in order to trace the user uniquely a UUID is generated: This UUID is stored in a cookie and it will be used as User Identifier (because IP Address can change between two connection from same user, if user use NAT or other protocol); Subsequently, the current location (the url), the browser name, the OS name and version and other info can be read by code in file.js and will sent to server.   

Each 'track' packet has the form [UUID],{User_DATA} and server only needs to manage a set of [UUID,User_DATA] in order to supply analysis features: the set is an hashtable with UUID as key and an ArrayList as value 

So, our next goal is the building of a prototype that supply following features (that also are available in Google analytics):   

  1. Number of active visitors 
  2. Number of connected visitors  
  3. Current page viewed by connected visitors  
  4. Page History for each connected visitor   

The following figure shows an example of 'Console' accessible on the serverside that show current website status.  (I use my homemade analysis service daily, in some of my e-commerce websites. Obviously, the image is  changed to hide ip users )   

Image 3 

Background              

For the purpose of this article you need to know what we mean for UUID, Hashtable (named Dictionary in .net) and Arraylist. We only need a copy of Microsoft Web Developer Express (that is free downloadable from Microsoft website).   

Since we use Microsoft IIS we exploit the Application object that live within Application Pool until it is not recycled: This simple implementation does not save data in a DBMS, then for each IISreset all data will be lost; But, if you want, you can exploit App_End and App_Start event to Save and Restore data between DBMS and Application memory. 

Using the code      

Now we analyze how to structure the project in order to track users activity on a website: In my previous article i talked about a simple hashtable used to manage visitors, in order to build a realtime counter. Now i want to extend previous code in order to handle the other informations we want.  

Client side 

As first, we need to implement the logic inside file.js or rather the creation of uuid, the reading of information and the sending to the server. Particularly we need to:  

  1. Read cookies and set Cookies (function getCookie and setCookie) 
  2. Create an Ajax async call (function getXmlReq) 
  3. Generate an UUID (Body of file) 
  4. Reading location of browser  (function __as__ping) 
My simple implementation of file.js is following: 

JavaScript
// Address of track server. This address is communicated by server when browser download this file. 
var NETSELL_STAT = 'http://localhost:82';


function getCookie(c_name, remote) {


    // get normal cookies
    if (document.cookie.length > 0) {
        c_start = document.cookie.indexOf(c_name + "=");
        if (c_start != -1) {
            c_start = c_start + c_name.length + 1;
            c_end = document.cookie.indexOf(";", c_start);
            if (c_end == -1) c_end = document.cookie.length;
            return unescape(document.cookie.substring(c_start, c_end));
        }
    }
    return "";
}

function setCookie(c_name, value, expiredays, remote) {


    var cookiebody;
    var exdate = new Date();
    exdate.setSeconds(exdate.getSeconds() + expiredays);
    //exdate.setDate(exdate.getDate() + expiredays);

    cookiebody = c_name + "=" + escape(value) +
((expiredays == null) ? "" : ";expires=" + exdate.toUTCString());

    if (remote != null) {
        // remote cookie// send cookies to LogonServ
    }
    else // normal cookie
        document.cookie = cookiebody;
}

function getXMLReq() {
    var xmlhttp;
    if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari
        xmlhttp = new XMLHttpRequest();
    }
    else {// code for IE6, IE5
        xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
    }
    return xmlhttp;
}

// Check for UUID of this user (if not exist create one)
var uuid = getCookie("site_uuid");
if (uuid == "") {
    var d = new Date();
    var rnd = Math.floor((Math.random() * 100000) + 1);
    uuid = d.getDay() + '_' + d.getMonth() + '_' + d.getYear() + '_' + rnd + '_' + d.getSeconds() + '_' + d.getMilliseconds() + '_' + d.getMinutes() + '_' + d.getHours();
    setCookie("site_uuid", uuid);
}

// send uuid to server (the ping)
function __as_ping() {    
    var ping = getXMLReq();    
    ping.open("GET", NETSELL_STAT + "/srv/serverside.aspx?TYPE=PING&UUID=" + uuid + '&L=' + location.href.toString().replace('&', '::'), true);
    ping.send();
}

__as_ping();

When all data was read and sent the client doesn't have to do anything. 

Server side 

On the other hand, the server must manage all information about user. Previously, I have talked about Hashtable (the Dictionary) and following you can view a simple implementation it. 

First, we need to initialize the memory space where we want to maintain data.    

In global.asax file we write:

VB
Sub Application_Start(ByVal sender As Object, ByVal e As EventArgs)
    ' When application start

    Application.Add("LastReset", Date.Now)
    ' We make sure that 'memory' is available
    SyncLock Application
        Dim ActiveUser As Dictionary(Of String, decorablePosition)
        ActiveUser = CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        If IsNothing(ActiveUser) Then
            ActiveUser = New Dictionary(Of String, decorablePosition)
            Application.Add("ActiveUser", ActiveUser)
        End If
        Application.Add("ActiveUser", ActiveUser)
    End SyncLock


End Sub

Subsequently, we only need to store 'track' packet from client side. We can create an aspx page (the page where file.js send data) named serverside.aspx with following content:

<%@ Page Language="VB" AutoEventWireup="false" CodeFile="serverside.aspx.vb" Inherits="srv_serverside" %>
<%@ Import Namespace="System.collections.generic" %>
<%@Import Namespace="DIBIASI.CALCE_Min.ABSTRACT.TDA.UTILS" %>
<%
    
    ' on PING receive we check if UUID is known
    ' then save last action date and time (and location, and ip)
    If Request("TYPE") = "PING" Then        
        Dim UUID As String = Request("UUID")
        SyncLock CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
            If Not CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition)).ContainsKey(UUID) Then
                CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition)).Add(UUID, New decorablePosition)
                CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("LOCATION_STORY", New ArrayList)
            End If                
            CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("DATE", Date.Now)
            CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("LOCATION", Request("L"))
            CType(CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).getValueOf("LOCATION_STORY"), ArrayList).Add(Date.Now & "|" & Request("L"))
             CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("IPADDR", Request.UserHostAddress)
         End SyncLock             
        
    End If
 %>
<span style="font-size: 14px; white-space: normal;">
</span>

Finally, we only need to use stored data, proceeding (for example) as follow.  

First, we need to compute total user and it is the number of entry in the dictionary because for each user we have an uuid. As second, we want to compute connected user and we can iterate all entries of dictionary and count only entry with last-action-date less than 240 secs.

The active user field can be determined in the same way (last-action less than 60 secs). Finally, we can access to current page viewed by user reading the "LOCATION" field 

Following you can read an example of page that use stored data.

 

ASPX
<%@ Page Language="vb" AutoEventWireup="false" CodeBehind="stats.aspx.vb" Inherits="Analysis.stats" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    
    
<div style="font-family:Tahoma;background-color:#f6f6f6;float:left;border:1px solid #e6e6e6;width:20%;height:180px;text-align:center;vertical-align:middle;">
<%
    Dim ConnectedUser As Integer = 0
    Dim actu As Integer = 0
    Dim visitorFromLastReset As Integer = 0
    Dim visitorToday As Integer = 0
    Dim ActiveKart As Integer = 0
    dim euroinKart as double=0

    For Each it As KeyValuePair(Of String, decorablePosition) In CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        '# count visit from last reset
        visitorFromLastReset += 1
        '# count visit today
        If Format(CDate(it.Value.getValueOf("DATE")), "yyyyMMdd") = Format(Date.Now, "yyyyMMdd") Then
            visitorToday += 1
        End If
        '# count connected users
        If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 240 Then
            ConnectedUser += 1
            
            '# count active users
            If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 60 Then
                actu += 1
            End If
        End If
    Next it
 %>        
    <table width="100%">
    <tr>
        <td><%=Format(Application("LastReset"),"dd/MM/yy HHHH.mm") %></td>
        <td>Today Visitors</td>
    </tr>
    <tr>
        <td><span style="font-size:1.3em;"><%=visitorFromLastReset%></span></td>
        <td><span style="font-size:1.3em;color:Blue;"><%=visitorToday%></span></td>
    </tr>
    </table>

    <table width="100%">
    <tr>
        <td>Connected Now</td>
        <td></td>
    </tr>
    <tr>
    <td><span style="font-size:1.3em;"><%=ConnectedUser%></span></td>
    <td></td>
    </tr>
    </table>
    Active Now
    <br />
    <span style="font-size:2em;color:blue;"><%=actu%></span>
    
</div>   
 

 <!-- show active page for each user -->
<br />
<div style="font-family:tahoma;font-size:0.8em;display:block;float:left;border:1px solid #e6e6e6;width:99%;height:200px;overflow:auto;text-align:center;vertical-align:middle;">
<table border="0" cellspacing="0" cellpadding="0">
<%      
    Dim foreColor As String = "#000"
    Dim LOCATION As String = ""
    Dim RASCL As String = ""
    For Each it As KeyValuePair(Of String, decorablePosition) In CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 240 Then
            foreColor="#000"
            If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 60 Then
                foreColor = "#33CC33"
            End If
            
            LOCATION = it.Value.getValueOf("LOCATION").ToString.Split("/")(it.Value.getValueOf("LOCATION").ToString.Split("/").Length - 1)
            RASCL = " <strong>" & mid(it.Value.getValueOf("RASCL"),1,20) & "</strong>"
%>
        <tr style="color:<%=foreColor%>">            
            <td style="width:35%;padding:1px;" align="left"><span><a href="followUserAction.aspx?IPADDR=<%=it.Value.getValueOf("IPADDR") %>" target="_blank"><%=it.Value.getValueOf("IPADDR") %> <%=RASCL %></a></span></td>
            <td align="left"><span><%=LOCATION%></span></td>
        </tr>
<%           
        End If
    Next it
 %>    
 </table>
</div> 
</body>
</html>

In the zip attached to this article you can found a complete prototype runnable in Web Developer Express. (Remember to start debug on port 82 or change the path in file.js) 

History   

14/01/2014: draft release 

16/01/2014: first release

16/01/2014: English revisione made by Bruno Interlandi

17/01/2014: reload zip file




License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer BiTS - Softmining
Italy Italy
Luigi Di Biasi is a PhD student in a computer science @ University of Salerno.

He is sole director of Arrowsoft.IT SRLs and CIO of Softmining SRL. He is founder of dibiasi.it materiale elettrico.

He studied Computer Science at the University of Salerno in 2013. From 2014 to 2016 he is [assegnista di ricerca] for PRIN 2010-2011 Data-Centric Genomic Computing (GenData 2020)).

He collaborate to the GRIMD developing (grid for molecular dynamics), which permits the parallel and distributed computation of many kinds of scientific software, to the YADA developing (Yet Another Docking Approach), a tool that improve and refine the predictions made by VINA Autodock and to the implementation of ProtComp alignment free tools, a new algorithm for sequences analysis that outperforms existing methods. He collaborate on 7 scientific paper.

In 2017 he is responsible developer for the RIRIBOX Project (in collaboration with Acea Pinerolese and Ecofficina SRL) and for energii.dk (in collaboration with Peme IVS). All these projects has involved collaborations between Italian, Danish and American developers.

In 2016 he is responsible developer for the “Una Buona Occasione” Project (in collaboration with Ecofficina SRL). He developed all the edutainment games related to this project (16 WebGL games).

Actually, his research activities range on the AI (algorithms developing, applications) and on the game developing.

Written By
Student
Italy Italy
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
SuggestionA better user ID Pin
ObiWan_MCC20-Jan-14 0:15
ObiWan_MCC20-Jan-14 0:15 
A better way to generate the unique IDs would be the following

JavaScript
function createUUID() {
  // http://www.ietf.org/rfc/rfc4122.txt
  var s = [];
  var hexDigits = "0123456789abcdef";
  for (var i = 0; i < 36; i++) {
    s[i] = hexDigits.substr(Math.floor(Math.random() * 0x10), 1);
  }
  // bits 12-15 of the time_hi_and_version field to 0010
  s[14] = "4";
  // bits 6-7 of the clock_seq_hi_and_reserved to 01
  s[19] = hexDigits.substr((s[19] & 0x3) | 0x8, 1);
  s[8] = s[13] = s[18] = s[23] = "-";;
  var uuid = s.join("");
  return uuid;
}


the above code generates a standard RFC-4122 GUID which can be used inside your tracking code; also, you may consider avoiding the use of the IIS application object and, instead, using the .NET Caching (system.web.caching or system.runtime.caching) to store your informations
GeneralRe: A better user ID Pin
luigidibiasi20-Jan-14 0:55
luigidibiasi20-Jan-14 0:55 
Questionsource code file link broken Pin
Tridip Bhattacharjee16-Jan-14 20:05
professionalTridip Bhattacharjee16-Jan-14 20:05 
AnswerRe: source code file link broken Pin
luigidibiasi16-Jan-14 23:48
luigidibiasi16-Jan-14 23:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.