Click here to Skip to main content
Click here to Skip to main content

Analyzing Syslog files can be easy...

By , 17 Apr 2013
Rate this:
Please Sign up or sign in to vote.

Reading in syslog files is easy, a lot of script languages provides means to do that. But is it also possible to perform a quick analysis request with these languages ?

Your task: Extract the IP addresses out of the text portion and get the access count of every single address.

Dedicated Time: None. Your Boss stands behind you...

Result

The whole solution, only a handful of operators.

OriginalLog

The first step was to read in the compressed logfile and convert it into a table.

You can inspect the first part of the resulting table when you hover the mouse over the output connector 'o1' of 'strexplode1'. It is a vector ( table without a column header ) with 100001 rows.

Our next job is to split the strings into columns.

OriginalLog

We route the vector into a macro 'GetPriority'. You may notice the lock icon, it marks this macro as an operator class. Operator classes are a great method to create 'reusable code'. If you need to change the behaviour of an operator class, lets says because of an error, you can do that without having the problem to alter all instances of this macro in this or other FlowSheets. You only have to update the FlowSheet once...

Because we can expect four 'spaces' as delimiters, the best fit here is a 'strexplode' operator. It gets the remaining part of the message :

and delivers back this table :

The single parts of the message are now joined together into a new table.

Now we extract the ip address out of the text column. This is also a combination of strplits, no magic.

We attach the IP address to the table.

Now the fun part. We have ~100000 IP's now. In a traditional scripting language you may start here programming loops, we use the operator dcCompressWizard from the DataCube library.

We just need to set two checkboxes and select the accumulation method.

Result : a new table containing the unique IP address along with their access counts.

Your boss is happy ! ( It took only 5:35 minutes )

A big advantage is that you can inspect the results after each step in the processing chain.

You develop with and not for the data...

Feel your data flow...

Download the FlowSheet here (zip file)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Alexander Golde
Software Developer (Senior) ANKHOR Software GmbH
Germany Germany
No Biography provided

Comments and Discussions

 
Questionthat's interesting Pinmemberww252122-Apr-13 21:50 
AnswerRe: that's interesting PinmemberBuzzLightyear22-Apr-13 23:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140415.2 | Last Updated 17 Apr 2013
Article Copyright 2013 by Alexander Golde
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid