Click here to Skip to main content
14,236,176 members

Step-by-Step Guide to Implement Machine Learning XII - Apriori

Rate this:
3.70 (6 votes)
Please Sign up or sign in to vote.
3.70 (6 votes)
4 Jul 2019CPOL
Easy to implement machine learning

Introduction

Apriori is an algorithm to learn frequent item set and association rule. Apriori is a bottom-up method, which extends an object to frequent item set at each iteration. When there is no object to be extended, algorithm terminates.

Apriori Model

Frequent itemset

Frequent itemset is the set of objects which usually occur simultaneously.

The traditional method will be traversing the dataset several times. To reduce the computation, researchers identified something called the Apriori principle. The Apriori principle helps us reduce the number of possible interesting itemsets. The Apriori principle says that if an itemset is frequent, then all of its subsets are frequent. It does not seem useful at first sight. However, if we turn it inside out, it says if a itemset is not frequent, all of its supersets are not frequent. It means that in the bottom-up process, if the bottom itemset is not frequent, we don't need to process the set at top. For example, there exists a set A= {1, 2, 3, 4}. All possible itemsets generated by A are shown below:

If itemset {0, 1} is not frequent, all the itemsets which contain {0, 1} are not frequent, which means that we can skip these itemset as marked with blue circle.

The generation of frequent itemset includes the following steps:

  1. First, calculate the singleton set. All the elements are regarded as the candidate set.
    def createSingletonSet(self, data):
        singleton_set = []
        for record in data:
            for item in record:
                if [item] not in singleton_set:
                    singleton_set.append([item])
        singleton_set.sort()
        singleton_set = list(map(frozenset, singleton_set))   # generate a invariable set
        return singleton_set
    
  2. Merge singleton set to generate more candidate set.
    def createCandidateSet(self, frequent_items, k):
        candidate_set = []
        items_num = len(frequent_items)
        # merge the sets which have same front k-2 element
        for i in range(items_num):
            for j in range(i+1, items_num):
                L1 = list(frequent_items[i])[: k-2]
                L2 = list(frequent_items[j])[: k-2]
                if L1 == L2:
                    candidate_set.append(frequent_items[i] | frequent_items[j])
        return candidate_set
    
  3. Calculate the support degree of all the candidate sets and select candidate sets whose support degrees are larger than the minimum support degree as the frequent itemsets.

    [公式]

    def calculateSupportDegree(self, data, candidate_set):
        sample_sum = len(data)
        data = map(set, data)                   # transform data into set
    
        # calculate the frequency of each set in candidate_set appearing in data
        frequency = {}
        for record in data:
            for element in candidate_set:
                if element.issubset(record):  # elements in record
                    frequency[element] = frequency.get(element, 0) + 1
    
        # calculate the support degree for each set
        support_degree = {}
        frequent_items = []
        for key in frequency:
            support = frequency[key]/sample_sum
            if support >= self.min_support:
                frequent_items.insert(0, key)
            support_degree[key] = support
        return frequent_items, support_degree
    
  4. Combination of the above functions:
    def findFrequentItem(self, data):
        singleton_set = self.createSingletonSet(data)
        sub_frequent_items, sub_support_degree =
                  self.calculateSupportDegree(data, singleton_set)
        frequent_items = [sub_frequent_items]
        support_degree = sub_support_degree
        k = 2
    
        while len(frequent_items[k-2]) > 0:
            candidate_set = self.createCandidateSet(frequent_items[k-2], k)
            sub_frequent_items, sub_support_degree =
                     self.calculateSupportDegree(data, candidate_set)
            support_degree.update(sub_support_degree)
            if len(sub_frequent_items) == 0:
                break
            frequent_items.append(sub_frequent_items)
            k = k + 1
        return frequent_items, support_degree
    

Association Rule

Association rule is that something takes place leading to other things happening. For example, there is a frequent itemset {beer, diaper}. There may be a association rule diaper->beer, which means that someone buy diaper may also by beer. Note that the rule is unidirectional, which means rule beer->diaper may be not correct.

The generation of association rule includes the following steps:

  1. Calculate the confidence degree of each frequent itemset, and choose the itemsets whose confidence degree are larger than the mimimum confidence degree.

    [公式]

    def calculateConfidence(self, frequent_item, support_degree, candidate_set, rule_list):
        rule = []
        confidence = []
        for item in candidate_set:
            temp = support_degree[frequent_item]/support_degree[frequent_item - item]
            confidence.append(temp)
            if temp >= self.min_confidence:
                rule_list.append((frequent_item - item, item, temp))  # association rule
                rule.append(item)
        return rule
    
  2. Merge frequent itemsets to generate more association rules:
    def mergeFrequentItem(self, frequent_item, support_degree, candidate_set, rule_list):
        item_num = len(candidate_set[0])
        if len(frequent_item) > item_num + 1:
            candidate_set = self.createCandidateSet(candidate_set, item_num+1)
            rule = self.calculateConfidence
                 (frequent_item, support_degree, candidate_set, rule_list)
            if len(rule) > 1:
                self.mergeFrequentItem(frequent_item, support_degree, rule, rule_list)
    
  3. Combination of the above functions:
    def generateRules(self, frequent_set, support_degree):
        rules = []
        for i in range(1, len(frequent_set)):    # generate rule from sets
                                                 # which contain more than two elements
            for frequent_item in frequent_set[i]:
                candidate_set = [frozenset([item]) for item in frequent_item]
                if i > 1:
                    self.mergeFrequentItem
                      (frequent_item, support_degree, candidate_set, rules)
                else:
                    self.calculateConfidence
                       (frequent_item, support_degree, candidate_set, rules)
        return rules
    

Conclusion and Analysis

Apriori is a simple algorithm to generate frequent itemsets and association rules. But it is more suitable sprase dataset. Now, we have a dataset as follows. Let's see the result of Apriori. The result is a tuple as (X, Y, confidence degree).

The related code and dataset in this article can be found in MachineLearning.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

danc1elion
Germany Germany
Danc1elion is interested in Machine Learning/Signal Processing/VoIP. Now, he is focusing on writing a tutorial of machine learning for beginner.

Comments and Discussions

 
Questioncannot see images 5 and 6 Pin
Maniezzo20-Jun-19 23:03
professionalManiezzo20-Jun-19 23:03 
QuestionMessage Closed Pin
20-Jun-19 10:41
memberMember 1450720820-Jun-19 10:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 20 Jun 2019

Stats

10K views
7 bookmarked