Click here to Skip to main content
15,884,836 members
Articles / General Programming / Regular Expressions

Collect and Compare Log Statistics using LogJoin

Rate me:
Please Sign up or sign in to vote.
4.83/5 (3 votes)
22 Oct 2013BSD4 min read 7.8K   88   5  
The LogJoin tool helps to collect any unstructured data from text files and join it to a simple table representation for easy analysis.
<?xml version="1.0" encoding="utf-8" ?>
<!--
   Copyright (c) 2013, Yuriy Nelipovich

   If you find this application useful or in case of any questions, suggestions
   bug reports, donation, please email me: dev.yuriy.n@gmail.com
-->

<configuration>
  <configSections>
    <section name="parameters" type="LogJoin.Config.Parameters, LogJoin"/>
  </configSections>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
  </startup>

  <!-- Configuration of the application is below -->
  <parameters>
    <!-- <expressions> element contains a list of regular expressions used to parse content of input files. -->
    <expressions>
      <!-- An expression is defined in <expr> element. You should specify an unique name for each expression.
           Expression can be multiline, this means that it can match multiple lines at once (see System.Text.RegularExpressions.RegexOptions.Multiline for details).
           When parsing input text, expression extracts a set of values and then the values are stored to the Record. Each new match of the expression produces new Record.
           To capture the values the expression should define named groups. In the example below, the group (?<messageAuthor>\w+) will capture authors name.
           Some of the values should represent a unique key for the Record, see example below. -->
      <expr name="recipientRecord" multiline="true"><![CDATA[^New message received, ID: (?<messageID>\d+), sender: (?<messageAuthor>\w+)\. Processing it.\r\nTitle: (?<title>[\w ]+)\r\nCurrent time: (?<timeReceived>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\r\nIs spam: (?<isSpam>\w+)\r\n]]></expr>
      
      <expr name="senderRecord" multiline="false"><![CDATA[^Meesage sent from account (?<messageAuthor>\w+)\. Message ID: (?<messageID>\d+), time: (?<timeSent>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}); title: (?<title>[\w ]+); words count: (?<wordsCount>\d+)]]></expr>
      
    </expressions>
    
    <!-- <inputs> element contains a list of inputs that provide data for processing -->
    <inputs>
      <!-- Input is a text file (or files) containing values for Records. The input should have unique name 
           taht has similar meaning as table name in sql queries.
           "lines" attribute defines how many lines of text in input file may correspond to one Record. In other words
           this is maximum count of lines that the regular expression can match. -->
      <input name="recipient" lines="5">
        
        <!-- Path to the directory containing the input file (or files) -->
        <dirPath>..\..\sample</dirPath>
    
        <!-- Name of the input file (or file mask for multiple files) -->
        <fileName>log.recipient.txt</fileName>
        
        <!-- Name of the regular expression (defined above) that will be used to parse the input -->
        <regEx>recipientRecord</regEx>
        
        <!-- Set of column names (fields) that each Record contains. These names must correspond to named groups defined for the expression -->
        <columns>timeReceived, messageAuthor, messageID, isSpam, title</columns>
        
        <!-- Set of values that together form a unique key for each Record. Order of key values is important.
             This key is used in Left Join operation between different inputs, see details below. -->
        <key>messageAuthor, messageID</key>
      </input>
      
      <!-- The second input produces set of records for left side of Join operation -->
      <input name="sender" lines="1">
        
        <dirPath>..\..\sample</dirPath>
        
        <!-- This input contains a set of text files, they are ordered by file name -->
        <fileName>log.sender*.txt</fileName>
        
        <regEx>senderRecord</regEx>
        
        <columns>messageAuthor, messageID, timeSent, title, wordsCount</columns>
        
        <!-- Keys of each input should have the same order and the same format
             because when joining Records from different inputs the key values are compared as strings. -->
        <key>messageAuthor, messageID</key>
      </input>
    </inputs>
    
    <!-- Path to output file. The file contains a table in csv format.
         File name can have string format argument that has DateTime value (default is current date and time) -->
    <output>..\..\sample\result-{0:dd-MM-yyy}.csv</output>
    
    <!-- Delimiter for values in output file -->
    <outputDelimiter>,</outputDelimiter>
  </parameters>

</configuration>

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Software Developer CactusSoft
Belarus Belarus
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions