Click here to Skip to main content
Licence CPOL
First Posted 2 Apr 2010
Views 55,679
Downloads 4,412
Bookmarked 62 times

Apriori Algorithm

By Omar Gamil | 25 Apr 2010
Implementation of the Apriori algorithm in C#.

1

2

3
3 votes, 6.8%
4
41 votes, 93.2%
5
4.95/5 - 44 votes
μ 4.95, σa 1.05 [?]

Apriori Source Code

Introduction

In data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).

Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).

Overview

The whole point of the algorithm (and data mining, in general) is to extract useful information from large amounts of data. For example, the information that a customer who purchases a keyboard also tends to buy a mouse at the same time is acquired from the association rule below:

Keyboard -> Mouse [support = 6%, confidence = 70%]

Support: The percentage of task-relevant data transactions for which the pattern is true.

Support (Keyboard -> Mouse) = eq_1.JPG

Confidence: The measure of certainty or trustworthiness associated with each discovered pattern.

Confidence (Keyboard -> Mouse) = eq_2.JPG

The algorithm aims to find the rules which satisfy both a minimum support threshold and a minimum confidence threshold (Strong Rules).

  • Item: article in the basket.
  • Itemset: a group of items purchased together in a single transaction.

How Apriori Works

  1. Find all frequent itemsets:
    • Get frequent items:
      • Items whose occurrence in database is greater than or equal to the min.support threshold.
    • Get frequent itemsets:
      • Generate candidates from frequent items.
      • Prune the results to find the frequent itemsets.
  2. Generate strong association rules from frequent itemsets
    • Rules which satisfy the min.support and min.confidence threshold.

High Level Design

1-Apriori_HLD_Big.JPG

Low Level Design

Apriori_LLD.jpg

Using the Code

Tricky Part 1: Generating Candidates

private Dictionary<string, double> GenerateCandidates(Dictionary<string, 
                                   double> dic_FrequentItems)
{
    Dictionary<string, double> dic_CandidatesReturn = new Dictionary<string, double>();
    for (int i = 0; i < dic_FrequentItems.Count - 1; i++)
    {
        string strFirstItem = Alphabetize(dic_FrequentItems.Keys.ElementAt(i));
        for (int j = i + 1; j < dic_FrequentItems.Count; j++)
        {
            string strSecondItem = Alphabetize(dic_FrequentItems.Keys.ElementAt(j));
            string strGeneratedCandidate = GetCandidate(strFirstItem, strSecondItem);
            if (strGeneratedCandidate != string.Empty)
            {
                strGeneratedCandidate = Alphabetize(strGeneratedCandidate);
                double dSupport = GetSupport(strGeneratedCandidate);
                dic_CandidatesReturn.Add(strGeneratedCandidate, dSupport);
            }
        }
    }
    return dic_CandidatesReturn;
}

private string GetCandidate(string strFirstItem, string strSecondItem)
{
    int nLength = strFirstItem.Length;
    if (nLength == 1)
    {
        return strFirstItem + strSecondItem;
    }
    else
    {
        string strFirstSubString = strFirstItem.Substring(0, nLength - 1);
        string strSecondSubString = strSecondItem.Substring(0, nLength - 1);
        if (strFirstSubString == strSecondSubString)
        {
            return strFirstItem + strSecondItem[nLength - 1];
        }
        else
            return string.Empty;
    }
}

Tricky Part 2: Generating Rules

private List<clssRules> GenerateRules()
{
    List<clssRules> lstRulesReturn = new List<clssRules>();
    foreach (string strItem in m_dicAllFrequentItems.Keys)
    {
        if (strItem.Length > 1)
        {
            int nMaxCombinationLength = strItem.Length / 2;
            GenerateCombination(strItem, nMaxCombinationLength, ref lstRulesReturn);
        }
    }
    return lstRulesReturn;
}

private void GenerateCombination(string strItem, int nCombinationLength, 
             ref List<clssRules> lstRulesReturn)
{
    int nItemLength = strItem.Length;
    if (nItemLength == 2)
    {
        AddItem(strItem[0].ToString(), strItem, ref lstRulesReturn);
        return;
    }
    else if (nItemLength == 3)
    {
        for (int i = 0; i < nItemLength; i++)
        {
            AddItem(strItem[i].ToString(), strItem, ref lstRulesReturn);
        }
        return;
    }
    else
    {
        for (int i = 0; i < nItemLength; i++)
        {
            GetCombinationRecursive(strItem[i].ToString(), strItem, 
                          nCombinationLength, ref lstRulesReturn);
        }
    }
}

private string GetCombinationRecursive(string strCombination, string strItem, 
               int nCombinationLength, ref List<clssRules> lstRulesReturn)
{
    AddItem(strCombination, strItem, ref lstRulesReturn);
    char cLastTokenCharacter = strCombination[strCombination.Length - 1];
    int nLastTokenCharcaterIndex = strCombination.IndexOf(cLastTokenCharacter);
    int nLastTokenCharcaterIndexInParent = strItem.IndexOf(cLastTokenCharacter);
    char cNextCharacter;
    char cLastItemCharacter = strItem[strItem.Length - 1];
    if (strCombination.Length == nCombinationLength)
    {
        if (cLastTokenCharacter != cLastItemCharacter)
        {
            strCombination = strCombination.Remove(nLastTokenCharcaterIndex, 1);
            cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1];
            string strNewToken = strCombination + cNextCharacter;
            return (GetCombinationRecursive(strNewToken, strItem, 
                    nCombinationLength, ref lstRulesReturn));
        }
        else
        {
            return string.Empty;
        }
    }
    else
    {
        if (strCombination != cLastItemCharacter.ToString())
        {
            cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1];
            string strNewToken = strCombination + cNextCharacter;
            return (GetCombinationRecursive(strNewToken, strItem, 
                    nCombinationLength, ref lstRulesReturn));
        }
        else
        {
            return string.Empty;
        }
    }
}

Example

A database has five transactions. Let the min sup = 50% and min con f = 80%.

3-DB.JPG

Solution

Step 1: Find all Frequent Itemsets

4-Example.JPG

Frequent Itemsets

{A}   {B}   {C}   {E}   {A C}   {B C}   {B E}   {C E}   {B C E}

Step 2: Generate strong association rules from the frequent itemsets

5-StrongRules.JPG

Lattice

Closed Itemset: support of all parents are not equal to the support of the itemset.

Maximal Itemset: all parents of that itemset must be infrequent.

Keep in mind:

6-Set.JPG

7-Lattice.JPG

Itemset {c} is closed as support of parents (supersets) {A C}:2, {B C}:2, {C D}:1, {C E}:2 not equal support of {c}:3.

And the same for {A C}, {B E} & {B C E}.

Itemset {A C} is maximal as all parents (supersets) {A B C}, {A C D}, {A C E} are infrequent.

And the same for {B C E}.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Omar Gamil

Software Developer

Egypt Egypt

Member
Enthusiastic programmer/researcher, passionate to learn new technologies, interested in problem solving,data structures, algorithms and automation.
 
If you have a question\suggestion about one of my articles, or you want an algorithm implemented in C#, feel free to contact me.
 
Résumé
vWorker Account

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralThanks Pinmemberhimanshu21p20:37 6 Jan '12  
QuestionMy 5 Pinmembertheanil10:39 6 Jan '12  
QuestionVisual basics PinmemberMember 843790912:02 27 Nov '11  
Generalmy vote of 5 Pinmembershehbazshkh0:29 22 Nov '11  
GeneralMy vote of 5 PinmemberBillWoodruff23:44 6 Nov '11  
QuestionDoes this code only work for single character ? Pinmemberhaem savla7:27 15 Oct '11  
Questionproblem Pinmemberabarna1221:00 10 Oct '11  
AnswerRe: problem PinmemberOmar Gamil0:47 11 Oct '11  
Questionerror i got.. PinmemberMember 79873119:06 8 Oct '11  
AnswerRe: error i got.. PinmemberOmar Gamil9:15 8 Oct '11  
GeneralRe: error i got.. PinmemberMember 79873110:43 9 Oct '11  
GeneralRe: error i got.. PinmemberOmar Gamil0:42 11 Oct '11  
Questionapriori algorithm Pinmembersalv038:03 27 Sep '11  
GeneralMy vote of 5 Pinmembermilianoo23:19 6 Sep '11  
QuestionHelp on modifications PinmemberMember 81100981:36 25 Jul '11  
AnswerRe: Help on modifications PinmemberOmar Gamil2:46 25 Jul '11  
GeneralRe: Help on modifications PinmemberMember 81100984:39 25 Jul '11  
GeneralRe: Help on modifications PinmemberOmar Gamil0:17 26 Jul '11  
GeneralRe: Help on modifications PinmemberMember 81100980:31 26 Jul '11  
Questionrevision of borderset PinmemberMember 408286019:28 5 Jul '11  
QuestionInput help in C++ for prices' algorithm(similar to apriori) Pinmemberadiiscool323:54 27 May '11  
AnswerRe: Input help in C++ for prices' algorithm(similar to apriori) Pinmembersalv038:05 27 Sep '11  
GeneralMy vote of 5 PinmemberFilip D'haene8:09 24 May '11  
Generalhello Pinmembersalsaaabail0:19 13 May '11  
GeneralRe: hello PinmemberOmar Gamil1:09 13 May '11  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web03 | 2.5.120209.1 | Last Updated 25 Apr 2010
Article Copyright 2010 by Omar Gamil
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid