Introduction

Iterative Dichotomiser 3 or ID3 is an algorithm which is used to generate decision tree, details about the ID3 algorithm is in here. There are many usage of ID3 algorithm specially in the machine learning field. In this article, we will see the attribute selection procedure uses in ID3 algorithm. Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data.

Attribute selection

Attribute selection is the fundamental step to construct a decision tree. There two term Entropy and Information Gain is used to process attribute selection, using attribute selection ID3 algorithm select which attribute will be selected to become a node of the decision tree and so on. Before dive into deep need to introduce few terminology used in attribute selection process,

Outlook	Temperature	Humidity	Wind	Play ball
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Overcast	Hot	High	Weak	Yes
Rain	Mild	High	Weak	Yes
Rain	Cool	Normal	Weak	Yes
Rain	Cool	Normal	Strong	No
Overcast	Cool	Normal	Strong	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rain	Mild	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes
Overcast	Mild	High	Strong	Yes
Overcast	Hot	Normal	Weak	Yes
Rain	Mild	High	Strong	No
			Total	14

Fig 1: Data set to calculate Entropy and Information gain using ID3 Algorithm

Attributes

In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes,

Class(C) or Classifier

among these attributes Play ball refers as Class(C) or Classifier. Because based on Outlook, Temperature, Humidity and Wind we need to decide whether we can Play ball or not, that’s why Play ball is a classifier to make decision.

Collection (S)

All the records in the table refer as Collection (S).

Entropy Calculation

Entropy is one kind of measurement procedure in information theory, details about Entropy is in here. In here, we will see how to calculate Entropy of given set of data. The test data used in here is Fig 1 data,

Entropy(S) = ∑_n=1-p(I)log₂p(I)

p(I) refers to the proportion of S belonging to class I i.e. in the above table we have two kinds of class {No, Yes} with {5, 9} (in here 5 is the total no of No and 9 is the total no of Yes), the collection size is S=14. So the p(I) over C for the Entire collection is No (5/14) and Yes (9/14).

log₂p(I), refers to the log₂(5/14) and log₂(9/14) over C.

∑ is over c i.e. summation of all the classifier items. In this case summation of -p(No/S)log₂p(No/S) and -p(Yes/S)log₂p(Yes/S) = -p(No/S)log₂p(No/S) + -p(Yes/S)log₂p(Yes/S)

So,

Entropy(S) = ∑-p(I)log₂p(I)

=) Entropy(S) = -p(No/S)log₂p(No/S) + -p(Yes/S)log₂p(Yes/S)

=) Entropy(S) = ((-(5/14)log₂(5/14)) + (-(9/14)log₂(9/14)) )

=) Entropy(S) = (-0.64285 x -0.63743) + (-0.35714 x -1.485427)

=) Entropy(S) = 0.40977 + 0.5305096 = 0.940

So the Entropy of S is 0.940

Information Gain G(S,A)

Information gain is the procedure to select a particular attribute to be a decision node of a decision tree. Please see here for details about information gain.

Information gain is G(S,A)

where S is the collection of the data in the data set and A is the attribute for which information gain will be calculated over the collection S. So if Gain(S, Wind) then it refers gain of Wind over S.

Gain(S, A) = Entropy(S) - ∑( ( |S_v|/|S| ) x Entropy(S_v) )

Where,

S is the total collection of the records.

A is the attribute for which gain will be calculated.

v is all the possible of the attribute A, for instance in this case if we consider Windy attribute then the set of v is { Weak, Strong }.

S_v is the number of elements for each v for instance Sweak = 8 and Sstrong = 6.

∑ is the summation of ( ( |S_v|/|S| ) x Entropy(S_v) ) for all the items from the set of v i.e. ( ( |S_weak|/|S| ) x Entropy(S_weak) ) + ( ( |S_strong|/|S| ) x Entropy(S_strong) )

So if we want to calculate information gain of Wind over the collection set S using following formula,

Gain(S, A) = Entropy(S) - ∑( ( |S_v|/|S| ) x Entropy(S_v) )

then the it will be as below,

=) Gain(S, Wind) = Entropy(S) - ( ( |S_weak|/|S| ) x Entropy(S_weak) ) - ( ( |S_strong|/|S| ) x Entropy(S_strong) )

from the above table for the Wind attribute there are two types of data ( Weak, Strong ) So new data set will be divided into following two subsets as below,

Outlook	Temperature	Humidity	Wind	Play ball
Sunny	Hot	High	Weak	No
Overcast	Hot	High	Weak	Yes
Rain	Mild	High	Weak	Yes
Rain	Cool	Normal	Weak	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rain	Mild	Normal	Weak	Yes
Overcast	Hot	Normal	Weak	Yes
			Total	8

Outlook	Temperature	Humidity	Wind	Play ball
Sunny	Hot	High	Strong	No
Rain	Cool	Normal	Strong	No
Overcast	Cool	Normal	Strong	Yes
Sunny	Mild	Normal	Strong	Yes
Overcast	Mild	High	Strong	Yes
Rain	Mild	High	Strong	No
			Total	6

Fig 2. Fig 1 data has been divided by types of Wind which is Weak and Strong

So, the set of Wind attribute is (Weak, Strong) and total number of elements set is (8,6).

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(S_weak) ) - ( ( 6/14 ) x Entropy(S_strong) )

As mentioned earlier, How to calculate Entropy, now we will calculate Entropy of Weak, there two classifier No and Yes, for Weak the set of classifier is (No,Yes) with number of elements set is (2,6) and Strong has set of (No,Yes) is (3,3).

Entropy(S_weak) = ∑-p(I)log₂p(I) = ( - ( (2/8)xlog₂(2/8) ) ) + ( - ( (6/8)xlog₂(6/8) ) )

=) Entropy(S_weak) = calculated value Value Of S_weakusing entropy calculation procedure mentioned earlier.

and

Entropy(S_strong) = ∑-p(I)log₂p(I) = ( - ( (3/6)xlog₂(3/6) ) ) + ( - ( (3/6)xlog₂(3/6) ) )

=) Entropy(S_strong) = calculated value ValueOf S_strongusing entropy calculation procedure mentioned earlier.

So Gain(S, Wind) will be as below,

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(S_weak) ) - ( ( 6/14 ) x Entropy(S_strong) )

=) Gain(S,Wind) = 0.940 - Value Of S_weak - Value Of S_strong

=) Gain(S, Wind) = 0.048

So the information gain of Wind over S is 0.048. using the same procedure it is possible to calculate information gain for Outlook, Temperature and Humidity. Based on the ID3 algorithm highest gained will be selected for the decision node, in here Outlook, as a result the data set showed in Fig 1 will be divided s below,

Outlook	Temperature	Humidity	Wind	Play ball
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes
			Total	5

Outlook	Temperature	Humidity	Wind	Play ball
Overcast	Hot	High	Weak	Yes
Overcast	Cool	Normal	Strong	Yes
Overcast	Mild	High	Strong	Yes
Overcast	Hot	Normal	Weak	Yes
			Total	4

Outlook	Temperature	Humidity	Wind	Play ball
Rain	Mild	High	Weak	Yes
Rain	Cool	Normal	Weak	Yes
Rain	Cool	Normal	Strong	No
Rain	Mild	Normal	Weak	Yes
Rain	Mild	High	Strong	No
			Total	5

Fig 3: Revised data set divided using Outlook elements Sunny, Overcast and Rain

Outlook has three different values Sunny, Overcast,Rain which will be used as decision nodes.

So, using above three new sets, information gain will be calculated for Temperature, Humidity and Wind and based on calculated information gain value further node will be selected to generate decision tree nodes. For example, if we want to calculate information gain of Humidity against Sunny then,

Gain(S, A) = Entropy(S) - ∑( ( |S_v|/|S| ) x Entropy(S_v) )

=) Gain(S, A) = ∑-p(I)log₂p(I) - ∑( ( |S_v|/|S| ) x Entropy(S_v) ) =) Gain(S_sunny, Humidity) = (- 3/5log₂3/5 - 2/5log₂₂/5) - ( ( 3/5 ) x Entropy(S_high) ) - ( ( 2/5 ) x Entropy(S_low) )

where,
3/5 refers to Total number of No/ Total number of No and Yes
2/5 refers to Total number of Yes/ Total number of No and Yes from the set of Sunny

=) Gain(S_sunny, Humidity) = (0.4421 +0.528 ) - ( ( 3/5 ) x ∑-p(I)log₂p(I) ) - ( ( 2/5 ) x ∑-p(I)log₂p(I) ) )

=) Gain(S_sunny, Humidity) = (0.9708 ) - ( ( 3/5 ) x (- (3/3log₂ 3/3) - ( 0/3log₂ 0/3 ) ) ) - ( ( 2/5 ) x ( -(0/2)log₂ 0/2 - (2/2log₂2/2) ) ) )

where,
3/3 refers to the Total number of No/Total number of No and Yes for the set of high
0/3 refers to the Total number of Yes/Total number of No and Yes for the set of high
0/2 refers to the Total number of No/Total number of No and Yes for the set of low
2/2 refers to the Total number of Yes/Total number of No and Yes for the set of low

=) Gain(S_sunny, Humidity) = 0.9708 - 3/5 x (-1 x 0 ) - 2/5 x ( 0 - ( 1x 0))

=) Gain(S_sunny, Humidity) = 0.9708

So the information gain of Humidity against S_sunnyset is 0.9708. Similar way, we can calculate information gain for Temperature, Wind against S_sunny.

References

During this writing following site being used as reference,

History

Version 1.0