Introduction
Iterative Dichotomiser 3 or ID3 is an algorithm which is used to generate decision tree, details about the ID3 algorithm is in here. There are many usage of ID3 algorithm specially in the machine learning field. In this article, we will see the attribute selection procedure uses in ID3 algorithm. Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data.
Attribute selection
Attribute selection is the fundamental step to construct a decision tree. There two term Entropy and Information Gain is used to process attribute selection, using attribute selection ID3 algorithm select which attribute will be selected to become a node of the decision tree and so on. Before dive into deep need to introduce few terminology used in attribute selection process,
Outlook  Temperature  Humidity  Wind  Play ball 
Sunny  Hot  High  Weak  No 
Sunny  Hot  High  Strong  No 
Overcast  Hot  High  Weak  Yes 
Rain  Mild  High  Weak  Yes 
Rain  Cool  Normal  Weak  Yes 
Rain  Cool  Normal  Strong  No 
Overcast  Cool  Normal  Strong  Yes 
Sunny  Mild  High  Weak  No 
Sunny  Cool  Normal  Weak  Yes 
Rain  Mild  Normal  Weak  Yes 
Sunny  Mild  Normal  Strong  Yes 
Overcast  Mild  High  Strong  Yes 
Overcast  Hot  Normal  Weak  Yes 
Rain  Mild  High  Strong  No 
   Total  14 
Fig 1: Data set to calculate Entropy and Information gain using ID3 Algorithm
Attributes
In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes,
Class(C) or Classifier
among these attributes Play ball refers as
Class(C) or Classifier. Because based on Outlook, Temperature, Humidity and Wind
we need to decide whether we can Play ball or not, that’s why Play ball is a
classifier to make decision.
Collection (S)
All the records in the table refer as Collection (S).
Entropy Calculation
Entropy is one kind of measurement procedure in information theory, details about Entropy is in here. In here, we will see how to calculate Entropy of given set of data. The test data used in here is Fig 1 data,
Entropy(S) = ∑_{n=1}p(I)log_{2}p(I)
p(I) refers
to the proportion of S belonging to class I i.e. in the above table we have
two kinds of class {No, Yes} with {5, 9} (in here 5 is the total no of No and 9 is the total no of Yes), the collection size is S=14. So the p(I) over C for the
Entire collection is No (5/14)
and Yes (9/14).
log_{2}p(I), refers to the log_{2}(5/14) and log_{2}(9/14) over C.
∑ is
over c i.e. summation of all the classifier items. In this case summation of 
p(No/S)log_{2}p(No/S) and

p(Yes/S)log_{2}p(Yes/S) = 
p(No/S)log_{2}p(No/S) +

p(Yes/S)log_{2}p(Yes/S)
So,
Entropy(S) = ∑p(I)log_{2}p(I)
=) Entropy(S) = p(No/S)log_{2}p(No/S) + p(Yes/S)log_{2}p(Yes/S)
=) Entropy(S) = (((5/14)log_{2}(5/14)) + ((9/14)log_{2}(9/14)) )
=) Entropy(S) = (0.64285 x 0.63743) + (0.35714 x 1.485427)
=) Entropy(S) = 0.40977 + 0.5305096 = 0.940
So the Entropy of S is 0.940
Information Gain G(S,A)
Information gain is the procedure to select a particular attribute to be a decision node of a decision tree. Please see here for details about information gain.
Information gain is G(S,A)
where S is the collection of the data in the data set and A is the
attribute for which information gain will be calculated over the collection S.
So if Gain(S, Wind) then it refers gain of Wind over S.
Gain(S, A) = Entropy(S)  ∑( ( S_{v}/S
) x Entropy(S_{v})
)
Where,
S is the total collection of the
records.
A is the attribute for which gain will
be calculated.
v is all the possible of the attribute
A, for instance in this case if we consider Windy attribute then the set of v is
{ Weak, Strong }.
S_{v} is the number of elements for each v
for instance Sweak
= 8 and Sstrong = 6.
∑ is the summation of ( ( S_{v}/S
) x Entropy(S_{v})
) for all the items from the set of v i.e. ( ( S_{weak}/S ) x Entropy(S_{weak}) ) +
( ( S_{strong}/S ) x Entropy(S_{strong}) )
So if we want to calculate information gain of Wind over the collection set S using following formula,
Gain(S, A) = Entropy(S)  ∑( ( S_{v}/S
) x Entropy(S_{v})
)
then the it will be as below,
=) Gain(S, Wind) = Entropy(S)  ( ( S_{weak}/S ) x
Entropy(S_{weak}) )  ( ( S_{strong}/S ) x Entropy(S_{strong}) )
from the above table for the Wind attribute there are two types of data ( Weak, Strong ) So new data set will be divided into following two subsets as below,
Outlook  Temperature  Humidity  Wind  Play ball 
Sunny  Hot  High  Weak  No 
Overcast  Hot  High  Weak  Yes 
Rain  Mild  High  Weak  Yes 
Rain  Cool  Normal  Weak  Yes 
Sunny  Mild  High  Weak  No 
Sunny  Cool  Normal  Weak  Yes 
Rain  Mild  Normal  Weak  Yes 
Overcast  Hot  Normal  Weak  Yes 
   Total  8 
Outlook  Temperature  Humidity  Wind  Play ball 
Sunny  Hot  High  Strong  No 
Rain  Cool  Normal  Strong  No 
Overcast  Cool  Normal  Strong  Yes 
Sunny  Mild  Normal  Strong  Yes 
Overcast  Mild  High  Strong  Yes 
Rain  Mild  High  Strong  No 
   Total  6 
Fig 2. Fig 1 data has been divided by types of Wind which is Weak and Strong
So, the set of Wind attribute is (Weak, Strong) and total number of elements set is (8,6).
=) Gain(S, Wind) = 0.940  ( (8/14 ) x Entropy(S_{weak}) ) 
( ( 6/14 ) x Entropy(S_{strong}) )
As mentioned earlier, How to calculate Entropy, now we will calculate Entropy of Weak, there two classifier No and Yes, for Weak the set of classifier is (No,Yes) with number of elements set is (2,6) and Strong has set of (No,Yes) is (3,3).
Entropy(S_{weak}) = ∑p(I)log_{2}p(I) = (  ( (2/8)xlog_{2}(2/8) ) )
+ (  ( (6/8)xlog_{2}(6/8) ) )
=) Entropy(S_{weak}) = calculated value Value Of S_{weak }using entropy calculation procedure mentioned earlier.
and
Entropy(S_{strong}) = ∑p(I)log_{2}p(I) = (  ( (3/6)xlog_{2}(3/6) )
) + (  ( (3/6)xlog_{2}(3/6) ) )<o:p />
=) Entropy(S_{strong}) = calculated value ValueOf S_{strong }using entropy calculation procedure mentioned earlier.
So
Gain(S, Wind) will be as below,
=) Gain(S, Wind) = 0.940  ( (8/14 ) x Entropy(S
_{weak}) )  ( ( 6/14 ) x Entropy(S
_{strong}) )
<o:p />
=) Gain(S,Wind) = 0.940  Value Of S_{weak}  Value Of S_{strong}<o:p />
=) Gain(S, Wind) = 0.048<o:p />
So the information gain of Wind over S is 0.048. using the same procedure it is possible to calculate information gain for Outlook, Temperature and Humidity. Based on the ID3 algorithm highest gained will be selected for the decision node, in here Outlook, as a result the data set showed in Fig 1 will be divided s below,
Outlook  Temperature  Humidity  Wind  Play ball 
Sunny  Hot  High  Weak  No 
Sunny  Hot  High  Strong  No 
Sunny  Mild  High  Weak  No 
Sunny  Cool  Normal  Weak  Yes 
Sunny  Mild  Normal  Strong  Yes 
   Total  5 
Outlook  Temperature  Humidity  Wind  Play ball 
Overcast  Hot  High  Weak  Yes 
Overcast  Cool  Normal  Strong  Yes 
Overcast  Mild  High  Strong  Yes 
Overcast  Hot  Normal  Weak  Yes 
   Total  4 
Outlook  Temperature  Humidity  Wind  Play ball 
Rain  Mild  High  Weak  Yes 
Rain  Cool  Normal  Weak  Yes 
Rain  Cool  Normal  Strong  No 
Rain  Mild  Normal  Weak  Yes 
Rain  Mild  High  Strong  No 
   Total  5 
Fig 3: Revised data set divided using Outlook elements Sunny, Overcast and Rain
Outlook has three different values Sunny, Overcast,Rain which will be used as decision nodes.
So, using above three new sets, information gain will be calculated for Temperature, Humidity and Wind and based on calculated information gain value further node will be selected to generate decision tree nodes. For example, if we want to calculate information gain of Humidity against Sunny then,
Gain(S, A) = Entropy(S)  ∑( ( S_{v}/S ) x Entropy(S_{v}) )
=) Gain(S, A) = ∑p(I)log_{2}p(I)  ∑( ( S_{v}/S ) x Entropy(S_{v}) ) =) Gain(S_{sunny}, Humidity) = ( 3/5log_{2}3/5  2/5log_{22}/5)  ( ( 3/5 ) x Entropy(S_{high}) )  ( ( 2/5 ) x Entropy(S_{low}) )
where,
3/5 refers to Total number of No/ Total number of No and Yes
2/5 refers to Total number of Yes/ Total number of No and Yes from the set of Sunny
=) Gain(S_{sunny}, Humidity) = (0.4421 +0.528 )  ( ( 3/5 ) x ∑p(I)log_{2}p(I) )  ( ( 2/5 ) x ∑p(I)log_{2}p(I) ) )
=) Gain(S_{sunny}, Humidity) = (0.9708 )  ( ( 3/5 ) x ( (3/3log_{2} 3/3)  ( 0/3log_{2} 0/3 ) ) )  ( ( 2/5 ) x ( (0/2)log_{2} 0/2  (2/2log_{2}2/2) ) ) )
where,
3/3 refers to the Total number of No/Total number of No and Yes for the set of high
0/3 refers to the Total number of Yes/Total number of No and Yes for the set of high
0/2 refers to the Total number of No/Total number of No and Yes for the set of low
2/2 refers to the Total number of Yes/Total number of No and Yes for the set of low
=) Gain(S_{sunny}, Humidity) = 0.9708  3/5 x (1 x 0 )  2/5 x ( 0  ( 1x 0))
=) Gain(S_{sunny}, Humidity) = 0.9708
So the information gain of Humidity against S_{sunny }set is 0.9708. Similar way, we can calculate information gain for Temperature, Wind against S_{sunny}.
References
During this writing following site being used as reference,
History
Version 1.0