Click here to Skip to main content
13,351,731 members (54,036 online)
Click here to Skip to main content
Add your own
alternative version


28 bookmarked
Posted 31 Dec 2008

Data Mining with SQL Server 2005 without Analysis Services

, 31 Dec 2008
Rate this:
Please Sign up or sign in to vote.
A C# implementation of CHAID for MSSQL 2005 à la Cognos scenario.


Long time ago, I worked with a Cognos data mining product called Scenario. I liked it so much that I decided to write my own implementation of the underlying CHAID algorithm (see Background below). Back then, I wrote it in Borland C++ Builder v. 4 (see BCB source download above) and tested it against Scenario. Having got what I wanted (identical results), I lost further interest and put my code in cold storage ...

After almost a decade, I am facing a data mining task again. In the meantime, the world has changed quite a bit. The old Cognos Scenario seems to have been forgotten (and I do not have access to it any more), and databases have evolved, too. At work, we have a SQL Server 2005 database suitable for data mining, and CHAID looks like a good candidate for the method. Naturally, I remembered my code, and looked for a way to reuse it as much as possible. I could not use it directly because of the change in the database platform; my Borland Database Engine license does not include any support for MS SQL Server. I briefly considered trying to bridge the gap with a third-party component, but I failed to find a free one on the web. Then, I discovered Microsoft Chart Controls for .NET Framework 3.5, and decided to re-write my application in C#. I must admit I was also inspired by the "Data Mining with SQL Server 2005" article here on CodeProject. Likewise, I listed this article as Intermediate, because I am not a professional programmer myself. The tough part might be understanding the CHAID algorithm and the statistical background, Bonferonni adjustments, etc.


CHAID stands for Chi-square Automatic Interaction Detection. My implementation is based on an older article by G. V. Kass (An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics 29, No. 2, 1980, pp. 119-127). The Cognos documentation is publicly unavailable, but referring to it would be a major advantage, too. (I think I used to have access to a document entitled something like "How Scenario Works".)

Using the code

The code can be used to analyze categorical data where all predictors are "free" (not monotonic). Both dependent and predictor variables should be represented as strings in the database. I have only tested my application against a single MSSQL table named Records, but I believe database views could be used as well. For a quick start, I would recommend setting up the Records table as follows:

USE [DatabaseName]
/****** Object:  Table [dbo].[Records]    Script Date: 12/31/2008 22:27:05 ******/
CREATE TABLE [dbo].[Records](
    [ID] [int] NOT NULL,
    [FlavorPreference] [nvarchar](255) NULL,
    [FreeGoProductSelected] [nvarchar](255) NULL,
    [WrappingColorPreference] [nvarchar](255) NULL,
    [PricePoint] [nvarchar](255) NULL,
    [WouldBuy] [nvarchar](255) NULL,
    [SurveyLocation] [nvarchar](255) NULL,
    [Age] [nvarchar](255) NULL,
    [FlavorCode] [nvarchar](255) NULL,
    [Gender] [nvarchar](255) NULL,
    [WebAccess] [nvarchar](255) NULL,
    [LikesCamping] [nvarchar](255) NULL,
    [GoKnowledge] [nvarchar](255) NULL,
    [ID] ASC

I have not includes any data in the downloads, but I am sure you can fill in your table with your own test data. Please do not forget to edit the app.config file.

I do not think anyone would want to actually use the original C++ implementation; it is only meant for reference. Please note, there is no configuration available, e.g., table and column names are hard-coded there.

If you succeed in building the C# application and populating your MSSQL database, you should be able to see something like this, with the corresponding BCB screenshot at the top.

Points of interest

If you compare the C# code (MSSQL) with the original C++ version (BDE, MS Access 97), you will notice that the new SQL SELECT statements (auto-generated behind the scenes) include "order by" clauses as well as "group by" ones. For me, it was quite a surprise to find out it was necessary to specify both clauses in the case of SQL Server. Another lesson learned was the importance of the MultipleActiveResultSets=True specification in the database connection string, see app.config.


  • December 31, 2008 - Initial publishing.


This article, along with any associated source code and files, is licensed under A Public Domain dedication


About the Author

Czech Republic Czech Republic
No Biography provided

You may also be interested in...


Comments and Discussions

GeneralExcellent! Pin
Fayu Zhang8-Feb-09 11:48
memberFayu Zhang8-Feb-09 11:48 
QuestionA little more description? Pin
Alan Balkany6-Jan-09 4:47
memberAlan Balkany6-Jan-09 4:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180111.1 | Last Updated 31 Dec 2008
Article Copyright 2008 by kvlcek
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid