Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Part 2 – Bits for Event Patterns

, 5 Aug 2009
Rate this:
Please Sign up or sign in to vote.
I promised last time (http://blogs.msdn.com/microsoftbob/archive/2009/06/03/the-power-of-bits-for-historical-event-analysis.aspx) that I would follow up with some code to provide a proof of concept for using bit masks for date patterns.  I have good news!  I have it working in my simulation applicat

I promised last time (http://blogs.msdn.com/microsoftbob/archive/2009/06/03/the-power-of-bits-for-historical-event-analysis.aspx) that I would follow up with some code to provide a proof of concept for using bit masks for date patterns.  I have good news!  I have it working in my simulation application and have prepared a little demo for how this works with a Windows form. 

I found it wasn’t too difficult to use .NET BitArrays and convert to varbinary and also was able to do this using SQL-to-LINQ.  The tricky part was writing a table-valued function that could actually interpret the bit mask and return back all the associated dates.  This would probably be better done via SQLCLR with a .NET function, but I’m out of practice with doing .NET CLR functions so will put that off for another iteration.

So, first of all here’s a screen shot of the testing form. 

image

The goal of the form was just to prototype the concept by allowing the specification of a day of week pattern to repeat for a particular period.  In the above example, I wanted to generate a bit pattern that would contain 14 bits (actually 16 bits -8 bits for each week - as I explain later) representing whether an event occurred on a given day offset from Monday, June 22 and then store the data in a database.

Here’s the table definition I used to store the data:

CREATE TABLE [dbo].[TestBitArray](

    [PeriodStartDate] [date] NOT NULL,

    [PeriodLength] [smallint] NOT NULL,

    [EventName] [varchar](50) NOT NULL,

    [SelectedWeekDayMask] [tinyint] NOT NULL,

    [DateRangeBitArray] [varbinary](50) NOT NULL,

 CONSTRAINT [PK_TestBitArray] PRIMARY KEY CLUSTERED

(

    [PeriodStartDate] ASC,

    [PeriodLength] ASC,

    [EventName] ASC

))

Here’s what ends up in the database

PeriodStartDate PeriodLength EventName SelectedWeekDayMask DateRangeBitArray
6/22/2009 14 TestEvent1 74 0x4A4A

Please excuse me for using a compound primary key, this is just a quick and dirty example!  I wanted to test with different combinations of period start dates, lengths, and event names.

Just to contrast different ways to deal with bits, I decided to store my selection bit mask as a simple tinyint rather than as a byte.  I use tinyint, since the max number of bits that can be set for a week is 7 for each day, which works out to 127 maximum value using low order bits.

Without getting too bogged down, (full code is in the attachment) here’s what the .NET code fragment looks like to calculate the bit mask and display the items in the list box.

BitArray ba = new BitArray((((periodDays-1) / 7) * 8) + 8 , false);
this.listBoxResults.Items.Clear();

DateTime currentDate = periodStartDate;

int j = 0;
for (int i = 0; i < periodDays; i++)
{
    if ((i % 7) == 0)
    {
        j++; // Don't set trailing bit, so we store 7 days per byte - simplifies comparisons
    }
    currentDate = periodStartDate.Add(new TimeSpan(i, 0, 0, 0));
    int currentDayOfWeekIndex = (int)currentDate.DayOfWeek;
    ba.Set(i+j, (checkForDayOfWeek(currentDate.DayOfWeek)));
    this.listBoxResults.Items.Add(currentDate.ToShortDateString() + " : " + ba[i+j].ToString());
}
_ba = ba;

Note, the extra bit not being used so that each week is allocated to 1 byte, so 2 weeks of data is actually 14 bytes.    I decided to go ahead and “waste” a bit so that each week is in a single byte.  This makes it easier to look at the data byte-by-byte, which is the only way to natively view it from T-SQL and see what the patterns are for each week.  I could have just strung them all together, but didn’t think it worth the space savings and it’s not too hard to work around in the functions on both the .NET and SQL side.

Storing to the database is a bit tricky and requires some helper functions that are included in the attachment.

TestBitArrayDataDataContext dc = new TestBitArrayDataDataContext();
dc.UpsertTestBitArray(
    this.dateTimePickerStartDate.Value,
    (short)this.numericUpDownPeriodLength.Value,
    BitArrayConversions.ConvertBitMaskToByte(selectedDaysMask),
   BitArrayConversions.BitArrayToByteArray(_ba),this.textBoxEvent.Text);
dc.SubmitChanges();

It is simple to read integer-type (tinyint, smallint, integer) data type into a bit array as well as to load from an array of varbinary.

For the selected days bit mask:

BitArray selectedDays = new BitArray(System.BitConverter.GetBytes(tb.SelectedWeekDayMask));

For the computed period detailed history bitmap:

_ba = new BitArray(tb.DateRangeBitArray.ToArray());

Now, the fun part is the SQL function to return the data, again for brevity, I’ve not pasted the supporting functions but are included in the TestBitArray.sql attachment in the zip.

CREATE FUNCTION [dbo].[udf_GetDatesForBitMask]

(

     -- Add the parameters for the function here

      @PeriodStartDate date,

      @DateBitMask varbinary(8000)

)

RETURNS

  @PeriodDates TABLE

(

      EventDate DateTime

)

AS

BEGIN

      -- Fill the table variable with the rows for your result set

      DECLARE @ByteLength smallint = LEN(@DateBitMask)

      DECLARE @ByteString varchar(8000) = master.dbo.fn_varbintohexstr(@DateBitMask)

      DECLARE @BytePointer smallint = 0

      DECLARE @EventDate datetime

      WHILE @BytePointer < @ByteLength

      BEGIN

            DECLARE @ByteValue tinyint = dbo.fn_ConvertHexUnsignedCharToTinyInt(SUBSTRING(@ByteString,3+@BytePointer*2,2))

            DECLARE @BitPower smallint = 7

            WHILE @BitPower > 0 -- First bit not used

            BEGIN

                  DECLARE @BitTestValue smallint = POWER(2.00,@BitPower)

                  IF @ByteValue >= @BitTestValue

                  BEGIN

                        SET @EventDate = DATEADD(DD,@BitPower - 1 + @BytePointer * 7 ,@PeriodStartDate)

                        SET @ByteValue = @ByteValue - @BitTestValue

                        INSERT @PeriodDates (EventDate) Values (@EventDate)

                  END

                  SET @BitPower = @BitPower -1

            END

            SET @BytePointer = @BytePointer + 1

      END

      RETURN

END

The ConvertHexUnsignedCharToTinyInt function is based on a similar function for converting to varbinary from string at http://sqlblog.com/blogs/peter_debetta/archive/2007/03/09/t-sql-convert-hex-string-to-varbinary.aspx

So, what do we get when we actually query the table and join with the function?

/****** Script for SelectTopNRows command from SSMS  ******/
SELECT [EventName]
      ,[PeriodStartDate]
      ,[PeriodLength]
      ,[Eventdate]
      ,master.dbo.fn_varbintohexstr(DateRangeBitArray) as BitArray
  FROM [dbo].[TestBitArray]
  CROSS APPLY dbo.udf_GetDatesForBitMask(PeriodStartDate,DateRangeBitArray)
  Where EventName = 'TestEvent1'
  Order by PeriodStartDate, Eventdate

EventName

 

 

PeriodStartDate

 

 

PeriodLength

 

 

Eventdate

 

 

BitArray

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

6/22/2009

 

 

0x4a4a

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

6/24/2009

 

 

0x4a4a

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

6/27/2009

 

 

0x4a4a

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

6/29/2009

 

 

0x4a4a

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

7/1/2009

 

 

0x4a4a

 

 

TestEvent1

 

 

6/22/2009

 

 

14

 

 

7/4/2009

 

 

0x4a4a

 

 

Voila, we’ve stored 2 weeks of history for an event in a single row of a table at a cost of just 2 bytes and we are able to query it with T-SQL and no .CLR by joining to a table valued function.  A month could be stored in 5 bytes and whole year’s worth of history could be stored in 53 bytes.

For our example,l we repeated the same days for each week, but that is just a limitation of our test harness, we could have stored a different pattern of bits within each byte to represent dates. 

Later, I’ll show more practical value with the simulation example including fuzzy logic matching to show for example what stocks traded together with statistical confidence within a particular period of time, by merely comparing bit masks enumerating a few hundred summary rows instead of enumerating through hundreds of thousands of history rows.  The main application I see for this design pattern is analytic applications that want to be able to rapidly compare correlations of entities based on events associated with periods over time.  This is also useful to reduce storage requirements and I/O performance if there is a huge mass of history that is linked to a specific event during a period where you only need to track if the event occurred and not the details of the event itself.

I’ve included an attachment with all of the code, if you wish to try it out.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

bobleith

United States United States
No Biography provided

Comments and Discussions

 
Generalfull code is in the attachment PinmemberAlphaData14-Apr-11 8:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140916.1 | Last Updated 5 Aug 2009
Article Copyright 2009 by bobleith
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid