65.9K
CodeProject is changing. Read more.
Home

URL/Web Addresses Logger

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.76/5 (13 votes)

Nov 3, 2006

4 min read

viewsIcon

85854

downloadIcon

4548

This application can be used to track web URLs of the current user and store them in a log file on the root or in any folder.

Sample Image

Introduction

First of all, I would like to say, English is not my native language, but I like to share here my knowledge about how to log web URLs in a hidden way. If I make any mistakes then apologies for it in advance. The main requirement for this article is the WinpCap 3.1 SDK, which is used to capture the network packets. This can help capture all kinds of packets, and using this, any one can log long links, short links, just domain names, HTTP links with picture formats like JPG or BMP etc. But this article will focus on ports 8080 and 1080 (any one can change it inside the code according to their requirements).

Background

This application is used to track all web URLs of the current user and store them in a log file on the root or in any folder. Those who would like to download the Winpcap libraries (SDK) should visit its website. Also, you might need the complete Win32 SDK. There are four main C++ files that are at the heart of this project, WinpcapSniffer.h, WinpcapSniffer.cpp, UrlSniffer.h, and UrlSniffer.cpp.

Using the code

The WinpcapSniffer class is the main class which handles the initialization and shutdown routines of the Winpcap functions. It is simply a wrapper class. The WinpcapSniffer ::InitializeWinpCapSniffer() function initializes the WinpCap adapter function and the filter functions.

int WinpcapSniffer::InitializeWinpCapSniffer()
{
    InitializeAdapter(2);
    InitializeFilter();

    return 1;
}
The constant “2” as the aurgument for InitializeAdapter(2); means to automatically select the second adapter information which gets the display on the screen. E.g., either the LAN card or the modem card adapter. They are actually a way to make a path to the internet, LAN etc. Most people have a LAN adapter at the second option just like I have. That’s why I have given the 2 to let it be automatically selected when the console gets initialized. InitializeFilter(); is a function which simply releases the resources, and initializes the filter to capture packets on specific ports. Before calling this, it is a must to set a filter string if you want to monitor specific ports. For that, just set the filter string inside the member variable m_strPacket_filter using a setter function SetFilterString(string). E.g., "tcp port 8080 or tcp port 1080" tells us to capture the packets specifically at 8080 and 1080. For more information about the WinpCap SDK, visit its website.

Now, we have the wrapper class, so we can use it for any purpose we want. As our requirement is to capture the URLs of websites, I have created another class called UrlSniffer, which should be inherited from WinpcapSniffer. Now, in the constructor of UrlSniffer, the filter function should be the first function to be called, and we should also set the string.

UrlSniffer::UrlSniffer()
{
    this->SetFilterString("tcp port 8080 or tcp port 1080");

}
Now, UrlSniffer seems logical as this class is just made only to watch the specific ports which Internet Explorer uses. (You can give other ports as well inside this string, just put an “or” between them; or for more explanations, try the Winpcap documents about filters.)
int UrlSniffer::InitializeUrlSniffer()
{
    InitializeWinpCapSniffer();

    pcap_loop(adhandle, 0,PacketHandler, NULL);

    return 1;
}
pcap_loop is the function which takes the user defined function as a handler for capturing packets. It takes four arguments in which adhandle is a structure which is defined in the parent class as pcap_t *adhandle;, and PacketHandler is the user defined static function which is responsible for receiving packets and then processing according to the user requirements.
/* Callback function invoked by libpcap for every incoming packet */
void PacketHandler(u_char *param, const struct pcap_pkthdr *header, 
                   const u_char *pkt_data)
{
    struct tm *ltime;
    ip_header *ih;
    
    /* retireve the position of the internet packet header */
    ih = (ip_header *) (pkt_data +  14); //length of ethernet header

    bool bFoundUrl=false;
    
    string data = UrlSniffer::FilterNetworkPacket(
                  (char*)pkt_data,(int)ih->tlen,bFoundUrl);

    if(bFoundUrl)
    {
        string Urldata = UrlSniffer::ExtractUrlOnly(data);

        g_urls.insert(Urldata);
        printf("%s\n %d",Urldata.c_str(),g_urls.size());
    }

}

The code is self explanatory; pkt_data is the buffer containing the packet data, and ih = (ip_header *) (pkt_data + 14); retrieves the length of the data .

bool bFoundUrl=false;
// write now we don’t know if the packet URL we found it or not

string data = UrlSniffer::FilterNetworkPacket(
              (char*)pkt_data,(int)ih->tlen,bFoundUrl);
FilterNetworkPacket gets the real thing which we want, as we know the GET request is used when we visit any site. This function simply checks that if the current packet request is GET, and if yes, then it filters the data with the ASCII readable words and sents back to the caller. If the function finds the GET request, then it changes the status of bFoundUrl to “true”. Here is the filter packet code for GET:
string UrlSniffer::FilterNetworkPacket(const char 
       *r_szDataToFilter,int iLen,bool &r_bFoundUrl)
{
    bool bIsUrlMsg=false,bPostMsg=false;
    int iCounter = 0;
    string strFiltered;

    for(int iLoop=1; iLoop < iLen; iLoop++)
    {                      
        
        if(!bIsUrlMsg)
        {
                
            if(r_szDataToFilter[iLoop-1]=='G'
               && r_szDataToFilter[iLoop]=='E'
               && r_szDataToFilter[iLoop+1]=='T' && 
               r_szDataToFilter[iLoop+2]==' '
             )
            {
                bIsUrlMsg = r_bFoundUrl = true;
                
            }


        }

        if(bIsUrlMsg)
        {
            if(RequiredData(r_szDataToFilter[iLoop-1]))
                strFiltered+=r_szDataToFilter[iLoop-1];
        }
    }
    return strFiltered;

}

The bool RequiredData(char) data function simply checks if the character is human readable data or not. If it finds garbage or non-ASCII type characters, then it returns false, otherwise true if the character is readable.

bool RequiredData(char c)
{

    bool flag  = false;
    
    if(isalnum(c) ||ispunct(c)||isspace(c))
    {
            flag = true;
    }
    
    return flag;
}

Inside the PacketHandler, the last few statements simply extract the URLs, like:

if(bFoundUrl)
{
    string Urldata = UrlSniffer::ExtractUrlOnly(data);

    //it has all address , links , pics whatever 
    //came on  Get request as a some sort 
    //of address with http
    g_urls.insert(Urldata);
    printf("%s\n %d",Urldata.c_str(),g_urls.size());
}

The g_urls is a list object, and it contains all the unique addresses that it has found, which could be domains, long names, file names, pictures etc., whatever was on the HTTP request of GET as a URL. UrlSniffer::ExtractUrlOnly simply checks if the current link is having .net, .com, or .org in the end of the URL. Then, it will simply create a log file at the root as an HTM file and writes those addresses. (We can write anything and any long name as well, but I am right now just retrieving simple names rather than long names which are all saved in g_urls.)

string UrlSniffer::ExtractUrlOnly(const string r_szDataToFilter)
{
    string strUrlData;

    int iGetEnd = r_szDataToFilter.find("HTTP/1.0");
    
    if(iGetEnd>-1)
    {
        strUrlData = r_szDataToFilter.substr(3,
                     (int)r_szDataToFilter.length());
        iGetEnd = strUrlData.find("HTTP/1.0");
        strUrlData = strUrlData.substr(0,iGetEnd);

        //now strUrlData is came in shape to be use as how u like
        
        int iEndPart =-1;
        string strEndpart = strUrlData.substr(strUrlData.length()-7,5);
        
        //I just need .com .net and .org kind site so 
        //I am putting few if statements
        if((iEndPart =(int) strEndpart.find(URL_PROPERADDRESS1)!=-1) || 
            (iEndPart =(int) strEndpart.find(URL_PROPERADDRESS2)!=-1)||
            (iEndPart =(int) strEndpart.find(URL_PROPERADDRESS3)!=-1)
            )
        {

            AddWebAddressLog(strUrlData); //this just maintain the log
        }
    }

    return strUrlData;
}

These are the macros:

#define URL_PROPERADDRESS1 ".com"
#define URL_PROPERADDRESS2 ".net"
#define URL_PROPERADDRESS3 ".org"

Conclusion

I don’t say that this is the only way to capture packets and maintain URL loggers, but this is another way to create a URL logger. This example can be used for a variety of different purposes. If I couldn’t explain this in a better way, then apologies in advance… I hope I shared this knowledge with beginners and gurus.