11,642,658 members (65,932 online)

# ID3 Decision Tree Algorithm - Part 1

, 1 Feb 2012 CPOL 36.8K 26
 Rate this:
ID3 Decision Tree Algorithm - Part 1 (Attribute Selection Basic Information)

## Introduction

Iterative Dichotomiser 3 or ID3 is an algorithm which is used to generate decision tree, details about the ID3 algorithm is in here. There are many usage of ID3 algorithm specially in the machine learning field. In this article, we will see the attribute selection procedure uses in ID3 algorithm. Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data.

## Attribute selection

Attribute selection is the fundamental step to construct a decision tree. There two term Entropy and Information Gain is used to process attribute selection, using attribute selection ID3 algorithm select which attribute will be selected to become a node of the decision tree and so on. Before dive into deep need to introduce few terminology used in attribute selection process,

 Outlook Temperature Humidity Wind Play ball Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Total 14
Fig 1: Data set to calculate Entropy and Information gain using ID3 Algorithm

### Attributes

In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes,

#### Class(C) or Classifier

among these attributes Play ball refers as Class(C) or Classifier. Because based on Outlook, Temperature, Humidity and Wind we need to decide whether we can Play ball or not, that’s why Play ball is a classifier to make decision.

#### Collection (S)

All the records in the table refer as Collection (S).

### Entropy Calculation

Entropy is one kind of measurement procedure in information theory, details about Entropy is in here. In here, we will see how to calculate Entropy of given set of data. The test data used in here is Fig 1 data,

Entropy(S) = n=1-p(I)log2p(I)

p(I) refers to the proportion of S belonging to class I i.e. in the above table we have two kinds of class {No, Yes} with {5, 9} (in here 5 is the total no of No and 9 is the total no of Yes), the collection size is S=14. So the p(I) over C for the Entire collection is No (5/14) and Yes (9/14).

log2p(I), refers to the log2(5/14) and log2(9/14) over C.

is over c i.e. summation of all the classifier items. In this case summation of -p(No/S)log2p(No/S) and -p(Yes/S)log2p(Yes/S) = -p(No/S)log2p(No/S) + -p(Yes/S)log2p(Yes/S)

So,

Entropy(S) = ∑-p(I)log2p(I)

=) Entropy(S) = -p(No/S)log2p(No/S) + -p(Yes/S)log2p(Yes/S)

=) Entropy(S) = ((-(5/14)log2(5/14)) + (-(9/14)log2(9/14)) )

=) Entropy(S) = (-0.64285 x -0.63743) + (-0.35714 x -1.485427)

=) Entropy(S) = 0.40977 + 0.5305096 = 0.940

So the Entropy of S is 0.940

### Information Gain G(S,A)

Information gain is the procedure to select a particular attribute to be a decision node of a decision tree. Please see here for details about information gain.

Information gain is G(S,A)

where S is the collection of the data in the data set and A is the attribute for which information gain will be calculated over the collection S. So if Gain(S, Wind) then it refers gain of Wind over S.

Gain(S, A) = Entropy(S) - ( ( |Sv|/|S| ) x Entropy(Sv) )

Where,

S is the total collection of the records.

A is the attribute for which gain will be calculated.

v is all the possible of the attribute A, for instance in this case if we consider Windy attribute then the set of v is { Weak, Strong }.

Sv is the number of elements for each v for instance Sweak = 8 and Sstrong = 6.

is the summation of ( ( |Sv|/|S| ) x Entropy(Sv) ) for all the items from the set of v i.e. ( ( |Sweak|/|S| ) x Entropy(Sweak) ) + ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

So if we want to calculate information gain of Wind over the collection set S using following formula,

Gain(S, A) = Entropy(S) - ∑( ( |Sv|/|S| ) x Entropy(Sv) )

then the it will be as below,

=) Gain(S, Wind) = Entropy(S) - ( ( |Sweak|/|S| ) x Entropy(Sweak) ) - ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

from the above table for the Wind attribute there are two types of data ( Weak, Strong ) So new data set will be divided into following two subsets as below,

 Outlook Temperature Humidity Wind Play ball Sunny Hot High Weak No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Overcast Hot Normal Weak Yes Total 8
 Outlook Temperature Humidity Wind Play ball Sunny Hot High Strong No Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Rain Mild High Strong No Total 6
Fig 2. Fig 1 data has been divided by types of Wind which is Weak and Strong

So, the set of Wind attribute is (Weak, Strong) and total number of elements set is (8,6).

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) )

As mentioned earlier, How to calculate Entropy, now we will calculate Entropy of Weak, there two classifier No and Yes, for Weak the set of classifier is (No,Yes) with number of elements set is (2,6) and Strong has set of (No,Yes) is (3,3).

Entropy(Sweak) = ∑-p(I)log2p(I) = ( - ( (2/8)xlog2(2/8) ) ) + ( - ( (6/8)xlog2(6/8) ) )

=) Entropy(Sweak) = calculated value Value Of Sweak using entropy calculation procedure mentioned earlier.

and

Entropy(Sstrong) = ∑-p(I)log2p(I) = ( - ( (3/6)xlog2(3/6) ) ) + ( - ( (3/6)xlog2(3/6) ) )

=) Entropy(Sstrong) = calculated value ValueOf Sstrong using entropy calculation procedure mentioned earlier.

So Gain(S, Wind) will be as below,

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) )

=) Gain(S,Wind) = 0.940 - Value Of Sweak - Value Of Sstrong

=) Gain(S, Wind) = 0.048

So the information gain of Wind over S is 0.048. using the same procedure it is possible to calculate information gain for Outlook, Temperature and Humidity. Based on the ID3 algorithm highest gained will be selected for the decision node, in here Outlook, as a result the data set showed in Fig 1 will be divided s below,

 Outlook Temperature Humidity Wind Play ball Sunny Hot High Weak No Sunny Hot High Strong No Sunny Mild High Weak No Sunny Cool Normal Weak Yes Sunny Mild Normal Strong Yes Total 5
 Outlook Temperature Humidity Wind Play ball Overcast Hot High Weak Yes Overcast Cool Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Total 4
 Outlook Temperature Humidity Wind Play ball Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Rain Mild Normal Weak Yes Rain Mild High Strong No Total 5
Fig 3: Revised data set divided using Outlook elements Sunny, Overcast and Rain

Outlook has three different values Sunny, Overcast,Rain which will be used as decision nodes.

So, using above three new sets, information gain will be calculated for Temperature, Humidity and Wind and based on calculated information gain value further node will be selected to generate decision tree nodes. For example, if we want to calculate information gain of Humidity against Sunny then,

Gain(S, A) = Entropy(S) - ( ( |Sv|/|S| ) x Entropy(Sv) )

=) Gain(S, A) = ∑-p(I)log2p(I) - ∑( ( |Sv|/|S| ) x Entropy(Sv) ) =) Gain(Ssunny, Humidity) = (- 3/5log23/5 - 2/5log22/5) - ( ( 3/5 ) x Entropy(Shigh) ) - ( ( 2/5 ) x Entropy(Slow) )

where,
3/5 refers to Total number of No/ Total number of No and Yes
2/5 refers to Total number of Yes/ Total number of No and Yes from the set of Sunny

=) Gain(Ssunny, Humidity) = (0.4421 +0.528 ) - ( ( 3/5 ) x ∑-p(I)log2p(I) ) - ( ( 2/5 ) x ∑-p(I)log2p(I) ) )

=) Gain(Ssunny, Humidity) = (0.9708 ) - ( ( 3/5 ) x (- (3/3log2 3/3) - ( 0/3log2 0/3 ) ) ) - ( ( 2/5 ) x ( -(0/2)log2 0/2 - (2/2log22/2) ) ) )

where,
3/3 refers to the Total number of No/Total number of No and Yes for the set of high
0/3 refers to the Total number of Yes/Total number of No and Yes for the set of high
0/2 refers to the Total number of No/Total number of No and Yes for the set of low
2/2 refers to the Total number of Yes/Total number of No and Yes for the set of low

=) Gain(Ssunny, Humidity) = 0.9708 - 3/5 x (-1 x 0 ) - 2/5 x ( 0 - ( 1x 0))

=) Gain(Ssunny, Humidity) = 0.9708

So the information gain of Humidity against Ssunny set is 0.9708. Similar way, we can calculate information gain for Temperature, Wind against Ssunny.

## References

During this writing following site being used as reference,

History

Version 1.0

## Share

 Software Developer Australia
Designer and Architect.
Author of the

## You may also be interested in...

 First PrevNext
 Any Other ways Santhoshobject23-Aug-13 20:22 Santhoshobject 23-Aug-13 20:22
 Re: Any Other ways Mohammad A Rahman23-Aug-13 22:44 Mohammad A Rahman 23-Aug-13 22:44
 Re: Any Other ways Santhoshobject25-Aug-13 16:32 Santhoshobject 25-Aug-13 16:32
 Thank you ! Anoos11015-May-13 7:44 Anoos110 15-May-13 7:44
 Vote of 5 VikasAgarwal8412-Jan-13 20:18 VikasAgarwal84 12-Jan-13 20:18
 Re: Vote of 5 Mohammad A Rahman13-Jan-13 4:38 Mohammad A Rahman 13-Jan-13 4:38
 My vote of 4 Duc Hien Vuong7-May-12 19:09 Duc Hien Vuong 7-May-12 19:09
 Re: My vote of 4 Mohammad A Rahman7-May-12 20:28 Mohammad A Rahman 7-May-12 20:28
 My vote of 5 rumpa_kazol15-Nov-11 21:14 rumpa_kazol 15-Nov-11 21:14
 Excellent rumpa_kazol15-Nov-11 2:57 rumpa_kazol 15-Nov-11 2:57
 Last Visit: 31-Dec-99 18:00     Last Update: 2-Aug-15 14:41 Refresh 12 Next »