Click here to Skip to main content
15,891,529 members
Articles / Programming Languages / C#
Article

ID3 Decision Tree Algorithm in C#

Rate me:
Please Sign up or sign in to vote.
3.05/5 (44 votes)
21 Oct 20031 min read 308.1K   20.2K   58   56
Sample of ID3 Decision Tree Algorithm in C#

Image 1

Introduction

The algorithm ID3 (Quinlan) uses the method top-down induction of decision trees. Given a set of classified examples a decision tree is induced, biased by the information gain measure, which heuristically leads to small trees. The examples are given in attribute-value representation. The set of possible classes is finite. Only tests, that split the set of instances of the underlying example languages depending on the value of a single attribute are supported.

Details

Depending on whether the attributes are nominal or numerical, the tests either

  • have a successor for each possible attribute value, or
  • split according to a comparison of an attribute value to a constant, or depending on if an attribute value belongs to a certain interval or not.

The algorithm starts with the complete set of examples, a set of possible tests and the root node as the actual node. As long as the examples propagated to a node do not all belong to the same class and there are tests left,

  • a test with highest information gain is chosen,
  • the corresponding set of successors is created for the actual node,
  • each example is propagated to the successor given by the chosen test,
  • ID3 is called recursively for all successors.

How it works

The core of sample is builded with 3 classes (Attribute, TreeNode and DecisionTreeID3).

  • TreeNode - are the nodes of the decision tree;
  • Attribute - is the class with have a name e any possible values;
  • DecisionTreeID3 - is the class what get a data samples and generate a decision tree.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer iddi
Brazil Brazil
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalerror Pin
lqht25-Apr-10 8:05
lqht25-Apr-10 8:05 
GeneralError Pin
strepto125-Apr-10 6:21
strepto125-Apr-10 6:21 
Generalhi Pin
rodalyn15-Mar-10 18:15
rodalyn15-Mar-10 18:15 
Generalhelp:id3 decision tree Pin
std_112-Mar-10 1:34
std_112-Mar-10 1:34 
GeneralRe: help:id3 decision tree Pin
Sama Ramba15-Oct-10 22:36
Sama Ramba15-Oct-10 22:36 
GeneralRe: help:id3 decision tree Pin
leavetrace24-Oct-10 23:08
leavetrace24-Oct-10 23:08 
GeneralRe: help:id3 decision tree Pin
leavetrace24-Oct-10 23:13
leavetrace24-Oct-10 23:13 
GeneralRe: help:id3 decision tree Pin
leavetrace24-Oct-10 23:16
leavetrace24-Oct-10 23:16 
class ID3

package guiID3;<br />
import java.text.DecimalFormat;<br />
import java.util.ArrayList;<br />
import java.util.Date;<br />
import java.util.LinkedList;<br />
import java.util.List;<br />
import java.util.ListIterator;<br />
import java.util.Random;<br />
import javax.swing.JOptionPane;<br />
<br />
<br />
public class ID3 implements Runnable  {<br />
        private Attribute [] attributesArray;<br />
	private DataInstance [] dataInstancesArray;<br />
	private GUIID3JFrame caller;<br />
	private boolean interrupted;<br />
        private int numClasses;<br />
        private Thread cvThread;<br />
        private int[] count;<br />
        private int[][][] freqTij;<br />
        private double [][] infoTij;<br />
	<br />
        <br />
  public ID3(GUIID3JFrame caller){<br />
		this.caller = caller;<br />
		interrupted = false;<br />
	}<br />
<br />
  public void start() {<br />
	  	cvThread = new Thread(this);<br />
	  	try {<br />
			<br />
		  	cvThread.start();<br />
		  	caller.getJProgressBar1().setIndeterminate(true);<br />
		} catch (Exception e) {<br />
			e.printStackTrace();<br />
		}<br />
	}<br />
	<br />
	public void stop(){<br />
		cvThread = null;<br />
		interrupted = true;<br />
	}<br />
	<br />
	<br />
  <br />
    public void run() {<br />
       String targetAttribute="";<br />
     //   Thread currentThread = Thread.currentThread();<br />
    //    caller.getJTextArea1().setLineWrap(true);<br />
        int  arraysSize = attributesArray.length - 1;<br />
        int  numClasses = attributesArray[arraysSize].getTypes().length; <br />
        ArrayList usedAttributes=new ArrayList();<br />
        TreeNode root = mountTree(dataInstancesArray, "result", attributesArray,usedAttributes);<br />
        printNode(root, " ", caller);<br />
        stop();<br />
        <br />
    }<br />
    public static void printNode(TreeNode root,String tabs, GUIID3JFrame caller)<br />
		{<br />
                    <br />
			caller.getJTextArea1().append(tabs + "|" + root.getMAttribute().getAttributeName() + "|"+"\n");<br />
			<br />
			if (root.getMAttribute().getIntTypesArray() != null)<br />
			{<br />
				for (int i = 0; i < root.getMAttribute().getIntTypesArray().length; i++)<br />
				{<br />
					caller.getJTextArea1().append(tabs + "\t" + "<" + root.getMAttribute().getTypes()[i] + ">"+"\n");<br />
                                        <br />
					TreeNode childNode = root.getChildByBranchName(root.getMAttribute().getTypes()[i]);<br />
					printNode(childNode,"\t" + tabs,caller);<br />
				}<br />
			}<br />
		}<br />
    <br />
     private boolean checkSameClass(DataInstance[] dataInstancesArray, int targetClass,Attribute [] attributesArray)<br />
		{<br />
			boolean isEqual = true;<br />
			for(int n=0; n < dataInstancesArray.length && isEqual; n++){<br />
			if( dataInstancesArray[n].getClassValue()== targetClass)<br />
				isEqual = true;<br />
			else<br />
			         isEqual = false;<br />
                        }<br />
			return isEqual;<br />
		}<br />
<br />
		<br />
		private double calcEntropy(int [] count,int total)<br />
		{<br />
                        double result=0.0;<br />
			double [] ratio=new double[count.length];<br />
                        for(int i=0;i<=count.length-1;i++){<br />
			 ratio[i]=(double)count[i]/total;<br />
                         if (ratio[i]!=0.0){<br />
                            ratio[i]=-(double)(ratio[i])* log2(ratio[i]);<br />
                            result=result+ratio[i];<br />
                        }<br />
                        }<br />
			return result;<br />
		<br />
                }<br />
<br />
		private int[] getValuesToAttribute(DataInstance [] dataInstancesArray, Attribute att, int value, Attribute[] attributes)<br />
		{<br />
			int[] count = new int[attributes[attributes.length-1].getIntTypesArray().length];<br />
<br />
			for(DataInstance d:dataInstancesArray)<br />
			{<br />
				if(d.getValues()[att.getAttributeAsInt()]== value){ <br />
				  for(int i=0;i<=count.length-1;i++){<br />
                                    if(d.getClassValue()==i){<br />
                                        count[i]++;<br />
                                        }<br />
			           }<br />
                                }   <br />
                        }<br />
                        return count;<br />
		}<br />
<br />
		<br />
		private double gain(DataInstance [] dataInstancesArray, Attribute att, Attribute[] attributes,double mEntropySet)<br />
		{   <br />
			int [] values = att.getIntTypesArray();<br />
                     <br />
			double sum = 0.0;<br />
<br />
			for (int i = 0; i < values.length; i++)<br />
			{<br />
				<br />
				int [] count=getValuesToAttribute( dataInstancesArray, att, values[i],attributes);<br />
                                int s=0;<br />
				for(int j=0;j<=count.length-1;j++)<br />
                                    s=s+count[j];<br />
				double entropy = calcEntropy(count,s);				<br />
				sum += -(double)(s)/dataInstancesArray.length * entropy;<br />
			}<br />
			return mEntropySet + sum;<br />
		}<br />
<br />
		<br />
		private Attribute getBestAttribute( DataInstance [] dataInstancesArray, Attribute[] attributes,double mEntropySet)<br />
		{<br />
			double maxGain = 0.0;<br />
			Attribute result = null;<br />
<br />
			for( int i=0;i<attributes.length-1;i++)<br />
			{<br />
                        <br />
				double aux = gain(dataInstancesArray, attributes[i] ,attributes, mEntropySet);<br />
				if (aux > maxGain)<br />
				{<br />
					maxGain = aux;<br />
					result = attributes[i];<br />
				}<br />
			}<br />
                      <br />
			return result;<br />
		}<br />
<br />
                <br />
		<br />
<br />
    <br />
<br />
        	private int [] countFrqClass(DataInstance [] dataInstancesArray,Attribute[] attributes){<br />
                        int  arraysSize = attributes.length - 1;<br />
		        numClasses = attributes[arraysSize].getTypes().length;<br />
		<br />
			int[] count = new int[numClasses];<br />
<br />
			for(DataInstance sample :dataInstancesArray)<br />
			{<br />
				int index = sample.getClassValue();<br />
				count[index]++;<br />
			}<br />
                        return count;<br />
                }<br />
<br />
       <br />
		<br />
		private String getMostCommonValue(DataInstance [] dataInstancesArray,Attribute[] attributes)<br />
		{<br />
                     <br />
                        int []count=countFrqClass(dataInstancesArray,attributes);<br />
			int MaxIndex = 0;<br />
			int MaxCount = 0;<br />
<br />
			for(int i = 0; i < count.length; i++)<br />
			{<br />
				if (count[i] > MaxCount)<br />
				{<br />
					MaxCount = count[i];<br />
					MaxIndex = i;<br />
				}<br />
			}<br />
<br />
			return attributes[attributes.length - 1].getTypes()[MaxIndex]  ;<br />
		}<br />
<br />
		private TreeNode internalMountTree(DataInstance [] dataInstancesArray, Attribute[] attributes,String targetAttribute, ArrayList usedAttributes)<br />
		{<br />
                  for(int i=0;i<attributes[attributes.length-1].getIntTypesArray().length;i++){<br />
                      if (checkSameClass(dataInstancesArray,i ,attributes) == true){<br />
                              return new TreeNode(new Attribute(attributes[attributes.length-1].getTypes()[i]));<br />
                      }<br />
                  }<br />
//                    if (checkSameClass(dataInstancesArray,0 ,attributes) == true){<br />
//                              return new TreeNode(new Attribute("yes"));<br />
//                        }<br />
//			if (checkSameClass(dataInstancesArray, 1,attributes) == true){<br />
//                            	return new TreeNode(new Attribute("no"));<br />
//			  }<br />
                        <br />
//                       if (checkSameClass(dataInstancesArray, 2,attributes) == true){<br />
//                             System.out.println("class"+ "hard");<br />
//				return new TreeNode(new Attribute("hard"));<br />
//                       }<br />
			if (attributes.length == 0)<br />
				return new TreeNode(new Attribute(getMostCommonValue(dataInstancesArray, attributes)));		<br />
		        int mTotal = dataInstancesArray.length;<br />
                        int[] count = countFrqClass(dataInstancesArray, attributes);<br />
	                double mEntropySet = calcEntropy(count,mTotal);<br />
			Attribute bestAttribute = getBestAttribute( dataInstancesArray,attributes,mEntropySet); <br />
                        usedAttributes.add(bestAttribute.getAttributeName());<br />
                      <br />
			TreeNode root = new TreeNode(bestAttribute);<br />
                        ArrayList listOfInstances=new ArrayList();<br />
                        <br />
                        for(int value:bestAttribute.getIntTypesArray()){<br />
                          <br />
                             listOfInstances.clear();<br />
                              for(int i=0;i<dataInstancesArray.length;i++){<br />
                                if(dataInstancesArray[i].getValues()[bestAttribute.getAttributeAsInt()]==value){<br />
                                     listOfInstances.add((DataInstance)dataInstancesArray[i]);         <br />
                                 }<br />
                               }<br />
<br />
			     DataInstance [] trainingSetClone = new DataInstance[listOfInstances.size()];<br />
                         <br />
                             int  y=0;<br />
                             for(ListIterator i2=listOfInstances.listIterator(); i2.hasNext();){<br />
                              trainingSetClone[y]=(DataInstance)i2.next();<br />
                              y++;<br />
                               }<br />
                          <br />
				ArrayList aAttributes = new ArrayList(attributes.length - 1);<br />
				for(int i = 0; i < attributes.length-1; i++)<br />
				{<br />
					if (usedAttributes.contains(attributes[i].getAttributeName())!=true)<br />
						aAttributes.add(attributes[i]);<br />
			<br />
                                }<br />
			      if (trainingSetClone.length== 0)<br />
				{<br />
					return new TreeNode(new Attribute(getMostCommonValue(trainingSetClone,attributes )));<br />
				}<br />
				else<br />
				{	<br />
                                        Attribute[ ] attributeSet=(Attribute[ ])aAttributes.toArray(new Attribute[aAttributes.size()]);<br />
					TreeNode ChildNode = mountTree(trainingSetClone,targetAttribute,attributeSet,usedAttributes);<br />
					root.AddTreeNode(ChildNode, value);<br />
				}<br />
			}<br />
			return root;<br />
		}<br />
<br />
<br />
             <br />
       public void setAttributesArray(Attribute[] attributesArray) {<br />
        this.attributesArray = attributesArray;<br />
    }<br />
<br />
    public void setDataInstancesArray(DataInstance[] dataInstancesArray) {<br />
        this.dataInstancesArray = dataInstancesArray;<br />
    }<br />
	<br />
   public  static double log2(double d) {<br />
		return Math.log(d)/Math.log(2.0);<br />
	}<br />
   <br />
    public TreeNode mountTree(DataInstance[] dataInstancesArray, String targetAttribute,Attribute [] attributesArrayused, ArrayList usedAttributes)<br />
		{<br />
			this.dataInstancesArray = dataInstancesArray;<br />
			return internalMountTree(dataInstancesArray,attributesArray ,targetAttribute,usedAttributes );<br />
		}<br />
<br />
}<br />
<br />


-----class MyFileReader-----


package guiID3;<br />
import java.io.BufferedReader;<br />
import java.io.File;<br />
import java.io.FileReader;<br />
import java.io.IOException;<br />
import java.io.StreamTokenizer;<br />
import java.util.ArrayList;<br />
import java.util.List;<br />
import java.util.ListIterator;<br />
import java.io.*;<br />
<br />
import javax.swing.JOptionPane;<br />
<br />
    public class MyFileReader {<br />
    private String relation;<br />
    <br />
    private DataInstance [] dataInstancesArray;<br />
    private Attribute [] attributesArray;<br />
    <br />
    private boolean fileIsOk;<br />
    <br />
    public MyFileReader(File file){<br />
        String lastWord = "";<br />
        boolean gotNext, error=false;<br />
        <br />
        List attributesNameList = new ArrayList();<br />
        List attributesList = new ArrayList();<br />
        List attributeValuesList = new ArrayList();  <br />
        List dataInstancesList = new ArrayList();<br />
        <br />
        try{<br />
            <br />
            BufferedReader in = new BufferedReader(new FileReader(file));<br />
            StreamTokenizer st = new StreamTokenizer(in);<br />
            <br />
            //st.resetSyntax();<br />
            //st.quoteChar(39);<br />
            st.commentChar('%');<br />
            st.ordinaryChars(45,62);<br />
            st.wordChars(95,95);<br />
            st.wordChars(45,63);<br />
            st.wordChars(38,39);<br />
            st.whitespaceChars(',',',');<br />
            st.nextToken();<br />
            <br />
            while(st.ttype != StreamTokenizer.TT_EOF){<br />
                gotNext = false;<br />
                if(st.ttype == 64){ // (64 = @)<br />
                    st.nextToken();<br />
                    <br />
                    if(st.sval.trim().compareToIgnoreCase("RELATION") == 0){<br />
                        st.nextToken();<br />
                        relation = st.sval;<br />
                    } else if(st.sval.trim().compareToIgnoreCase("ATTRIBUTE") == 0){<br />
                        st.nextToken();<br />
                        <br />
                        attributesNameList.add(st.sval);<br />
                        attributeValuesList.clear();<br />
                        <br />
                        int linenum = st.lineno();<br />
                        st.nextToken();<br />
                        while(linenum == st.lineno()){<br />
                            if(st.ttype == StreamTokenizer.TT_WORD){<br />
                                attributeValuesList.add(st.sval);<br />
                                lastWord = st.sval;<br />
                            }<br />
                            st.nextToken();<br />
                            if(linenum != st.lineno() && lastWord.trim().compareToIgnoreCase("REAL") == 0){<br />
                                JOptionPane.showMessageDialog(null, "<html><font face=Dialog size=3>This version of GUI Ant-Miner can only handle nominal attributes.</font></html>", "Error", JOptionPane.ERROR_MESSAGE);<br />
                                fileIsOk = false;<br />
                                throw new IOException("File error");<br />
                            }<br />
                            gotNext = true;<br />
                        }<br />
                        <br />
//                        java.util.Iterator iter = attributeValuesList.iterator();<br />
//                        while ( iter.hasNext()){<br />
//                            System.out.println(iter.next());<br />
//                        }<br />
//                       <br />
                        <br />
                        <br />
                        attributesList.add(attributeValuesList.toArray(new String[attributeValuesList.size()]));<br />
                        <br />
//                        java.util.Iterator<String[]> i = attributesList.iterator();<br />
//                        <br />
//                        while ( i.hasNext()){<br />
//                            String[] arr = i.next();<br />
//                            for ( String s : arr)<br />
//                                System.out.print(s + "\t");<br />
//                            System.out.println ();<br />
//                        }<br />
                        <br />
                        <br />
                    } else if(st.sval.trim().compareToIgnoreCase("DATA") == 0){<br />
                        st.nextToken();<br />
                        while(st.ttype != StreamTokenizer.TT_EOF){<br />
                            List dataList = new ArrayList();<br />
                            int linenum = st.lineno();<br />
                            while(linenum == st.lineno()){<br />
                                if(st.ttype == StreamTokenizer.TT_WORD)<br />
                                    dataList.add(st.sval);<br />
                                st.nextToken();<br />
                                gotNext = true;<br />
                            }<br />
                            dataInstancesList.add(dataList);<br />
                        }<br />
                    }<br />
                }// end of if start with  @<br />
                if(!gotNext)// if gotNext!= true<br />
                    st.nextToken();<br />
            }<br />
            <br />
            <br />
        } catch (IOException e) {<br />
            error = true;<br />
            //caller.setInitialState();<br />
        }<br />
        <br />
        if(!error){<br />
            fileIsOk = true;<br />
            <br />
            int x, y;<br />
            x=0;<br />
            attributesArray = new Attribute[attributesList.size()];<br />
            <br />
            //System.out.println("\n" + attributesList.size());<br />
            for(ListIterator i=attributesList.listIterator(); i.hasNext();){<br />
                Attribute attribute = new Attribute((String[])i.next());<br />
                attribute.setAttributeName((String)attributesNameList.get(x));<br />
                 attribute.setAttributeAsInt(x);<br />
                attributesArray[x] = attribute;<br />
                x++;<br />
            }<br />
            <br />
            x=0;<br />
            <br />
            dataInstancesArray = new DataInstance[dataInstancesList.size()];<br />
            for(ListIterator i=dataInstancesList.listIterator(); i.hasNext();){<br />
                ArrayList tempList = (ArrayList)i.next();<br />
                int dataArray[] = new int[attributesList.size()];<br />
                y=0;<br />
                for(ListIterator i2=tempList.listIterator(); i2.hasNext();){<br />
                    dataArray[y] = attributesArray[y].indexOf((String)i2.next());<br />
                    y++;<br />
                }<br />
                DataInstance dataInstance = new DataInstance(dataArray);<br />
                dataInstancesArray[x++] = dataInstance;<br />
            }<br />
            <br />
//              for(ListIterator i=dataInstancesList.listIterator(); i.hasNext();){<br />
//                ArrayList tempList = (ArrayList)i.next();<br />
//                for (Object e:tempList)<br />
//                 System.out.print(e+" ");<br />
//                System.out.println();<br />
//              }<br />
//            for(int i=0;i<dataInstancesArray.length;i++){<br />
//                DataInstance dinstance = dataInstancesArray[i];<br />
//                 int[] arr = dinstance.getValues();<br />
//                for( int g:arr)<br />
//                System.out.print(g+" ");<br />
//                System.out.println();<br />
//            }<br />
            <br />
           <br />
        }<br />
        <br />
        <br />
    }<br />
    <br />
    public boolean fileIsOk(){<br />
        return fileIsOk;<br />
    }<br />
    public int getAttributesNo(){<br />
        return attributesArray.length;<br />
    }<br />
    public int getInstancesNo(){<br />
        return dataInstancesArray.length;<br />
    }<br />
    public Attribute[] getAttributesArray(){<br />
        return attributesArray;<br />
    }<br />
    public DataInstance[] getDataInstancesArray(){<br />
        return dataInstancesArray;<br />
    }<br />
    public String getRelation(){<br />
        return relation;<br />
    }<br />
    <br />
    /**<br />
     * @param attributePos<br />
     * @return<br />
     */<br />
    public int calculateMissing(int attributePos){<br />
        int count=0;<br />
        for(int x=0; x < dataInstancesArray.length; x++){<br />
            if(dataInstancesArray[x].getValues()[attributePos] == -1)<br />
                count++;<br />
        }<br />
        return count;<br />
    }<br />
    <br />
    /**<br />
     * Determines the number of occurences of the attribute values in the dataset<br />
     * @param attributePos<br />
     * @return<br />
     */<br />
    public int [] generateStats(int attributePos){<br />
        int count;<br />
        int [] typesArray = (attributesArray[attributePos]).getIntTypesArray();<br />
        int [] returnArray = new int[typesArray.length];<br />
        for(int type=0; type < typesArray.length; type++){<br />
            count = 0;<br />
            for(int x2=0; x2 < dataInstancesArray.length; x2++){<br />
                if(dataInstancesArray[x2].getValues()[attributePos] == type)<br />
                    count++;<br />
            }<br />
            returnArray[type] = count;<br />
        }<br />
        return returnArray;<br />
    }<br />
    <br />
    <br />
}<br />

GeneralRe: help:id3 decision tree Pin
leavetrace24-Oct-10 23:18
leavetrace24-Oct-10 23:18 
GeneralMy vote of 1 Pin
Code Deamon20-Apr-09 4:27
Code Deamon20-Apr-09 4:27 
Questioncan not work with more than two class Pin
leavetrace1-Jan-09 18:50
leavetrace1-Jan-09 18:50 
GeneralI think there's a mistake, correct me if i'm wrong Pin
rock_me1-Jul-08 6:56
rock_me1-Jul-08 6:56 
QuestionDo u interested in association rule? Pin
alice_nhan27-Nov-07 4:39
alice_nhan27-Nov-07 4:39 
Answerfind duplicate item in childsample [modified] Pin
alice_nhan27-Nov-07 4:34
alice_nhan27-Nov-07 4:34 
GeneralRe: find duplicate item in childsample Pin
leavetrace12-Jan-09 11:36
leavetrace12-Jan-09 11:36 
GeneralTranslated Version (English) of this Algorithm Pin
kirankonathala20-Apr-07 10:16
kirankonathala20-Apr-07 10:16 
GeneralRe: Translated Version (English) of this Algorithm Pin
kroket30-Dec-07 0:40
kroket30-Dec-07 0:40 
GeneralRe: Translated Version (English) of this Algorithm Pin
jingweidu25-Mar-08 0:58
jingweidu25-Mar-08 0:58 
GeneralRe: Translated Version (English) of this Algorithm Pin
Vaasugi16-Aug-08 21:16
Vaasugi16-Aug-08 21:16 
GeneralRe: Translated Version (English) of this Algorithm Pin
wesemy19-Aug-08 3:51
wesemy19-Aug-08 3:51 
GeneralRe: Translated Version (English) of this Algorithm Pin
mikaal10018-Mar-09 6:49
mikaal10018-Mar-09 6:49 
GeneralRe: Translated Version (English) of this Algorithm Pin
u06050943-Apr-09 16:47
u06050943-Apr-09 16:47 
GeneralRe: Translated Version (English) of this Algorithm Pin
Code Deamon20-Apr-09 4:16
Code Deamon20-Apr-09 4:16 
GeneralComments in Portugese Pin
kirankonathala19-Apr-07 12:59
kirankonathala19-Apr-07 12:59 
GeneralRe: Comments in Portugese Pin
Code Deamon20-Apr-09 4:17
Code Deamon20-Apr-09 4:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.