15,166,358 members
Articles / General Programming / Algorithms
Article
Posted 22 Jan 2012

140.2K views
9.7K downloads
47 bookmarked

# Naive Bayes Classifier

Rate me:
4.90/5 (23 votes)
22 Jan 2012CPOL4 min read
Implementation of Wikipedia’s “Naive Bayes classifier Algorithm”.

## Introduction

This is a simple probabilistic classifier based on the Bayes theorem, from the Wikipedia article. This project contains source files that can be included in any C# project.

The Bayesian Classifier is capable of calculating the most probable output depending on the input. It is possible to add new raw data at runtime and have a better probabilistic classifier. A naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

### Bayesian interpretation

In the Bayesian (or epistemological) interpretation, probability measures a degree of belief. Bayes' theorem then links the degree of belief in a proposition before and after accounting for evidence. For example, suppose somebody proposes that a biased coin is twice as likely to land heads than tails. Degree of belief in this might initially be 50%. The coin is then flipped a number of times to collect evidence. Belief may rise to 70% if the evidence supports the proposition.

For proposition A and evidence B,

• P(A), the prior, is the initial degree of belief in A.
• P(A | B), the posterior, is the degree of belief having accounted for B.
• P(B | A) / P(B) represents the support B provides for A.

### Sex classification

Problem: classify whether a given person is a male or a female based on the measured features. The features include height, weight, and foot size.

#### Training

Example training set is shown below.

sexheight (feet)weight (lbs)foot size (inches)
male618012
male5.92 (5'11")19011
male5.58 (5'7")17012
male5.92 (5'11")16510
female51006
female5.5 (5'6")1508
female5.42 (5'5")1307
female5.75 (5'9")1509

The classifier created from the training set using a Gaussian distribution assumption would be:

sexmean (height)variance (height)mean (weight)variance (weight)mean (foot size)variance (foot size)
male5.8553.5033e-02176.251.2292e+0211.259.1667e-01
female5.41759.7225e-02132.55.5833e+027.51.6667e+00

Let's say we have equiprobable classes so P(male)= P(female) = 0.5. There was no identified reason for making this assumption so it may have been a bad idea. If we determine P(C) based on frequency in the training set, we happen to get the same answer.

Below is a sample to be classified as a male or female.

sexheight (feet)weight (lbs)foot size (inches)
sample61308

We wish to determine which posterior is greater, male or female. For the classification as male, the posterior is given by:

For the classification as female, the posterior is given by:

The evidence (also termed normalizing constant) may be calculated since the sum of the posteriors equals one.

The evidence may be ignored since it is a positive constant. (Normal distributions are always positive.) We now determine the sex of the sample.

P(male) = 0.5

, where μ = 5.855 and σ2 = 3.5033e − 02 are the parameters of normal distribution which have been previously determined from the training set. Note that a value greater than 1 is OK here – it is a probability density rather the probability, because height is a continuous variable.

p(weight | male) = 5.9881e-06

p(foot size | male) = 1.3112e-3

posterior numerator (male) = their product = 6.1984e-09

P(female) = 0.5

p(height | female) = 2.2346e-1

p(weight | female) = 1.6789e-2

p(foot size | female) = 2.8669e-1

posterior numerator (female) = their product = 5.3778e-04

Since posterior numerator is greater in the female case, we predict the sample is female.

## Using the code

C#
DataTable table = new DataTable();
table.Columns.Add("Sex");
table.Columns.Add("Height", typeof(double));
table.Columns.Add("Weight", typeof(double));
table.Columns.Add("FootSize", typeof(double));

//training data.
table.Rows.Add("male", 6, 180, 12);
table.Rows.Add("male", 5.92, 190, 11);
table.Rows.Add("male", 5.58, 170, 12);
table.Rows.Add("male", 5.92, 165, 10);
table.Rows.Add("female", 5, 100, 6);
table.Rows.Add("female", 5.5, 150, 8);
table.Rows.Add("female", 5.42, 130, 7);
table.Rows.Add("female", 5.75, 150, 9);
table.Rows.Add("transgender", 4, 200, 5);
table.Rows.Add("transgender", 4.10, 150, 8);
table.Rows.Add("transgender", 5.42, 190, 7);
table.Rows.Add("transgender", 5.50, 150, 9);

Classifier classifier = new Classifier();
classifier.TrainClassifier(table);
//output would be transgender.
Console.WriteLine(classifier.Classify(new double[] { 4, 150, 12 }));
Console.Read();

public void TrainClassifier(DataTable table)
{
dataSet.Tables.Add(table);

//table
DataTable GaussianDistribution = dataSet.Tables.Add("Gaussian");
GaussianDistribution.Columns.Add(table.Columns[0].ColumnName);

//columns
for (int i = 1; i < table.Columns.Count; i++)
{
GaussianDistribution.Columns.Add(table.Columns[i].ColumnName + "Mean");
GaussianDistribution.Columns.Add(table.Columns[i].ColumnName + "Variance");
}

//calc data
var results = (from myRow in table.AsEnumerable()
group myRow by myRow.Field<string>(table.Columns[0].ColumnName) into g
select new { Name = g.Key, Count = g.Count() }).ToList();

for (int j = 0; j < results.Count; j++)
{
DataRow row = GaussianDistribution.Rows.Add();
row[0] = results[j].Name;

int a = 1;
for (int i = 1; i < table.Columns.Count; i++)
{
row[a] = Helper.Mean(SelectRows(table, i, string.Format("{0} = '{1}'",
table.Columns[0].ColumnName, results[j].Name)));
row[++a] = Helper.Variance(SelectRows(table, i,
string.Format("{0} = '{1}'",
table.Columns[0].ColumnName, results[j].Name)));
a++;
}
}
}

public string Classify(double[] obj)
{
Dictionary<string,> score = new Dictionary<string,>();

var results = (from myRow in dataSet.Tables[0].AsEnumerable()
group myRow by myRow.Field<string>(
dataSet.Tables[0].Columns[0].ColumnName) into g
select new { Name = g.Key, Count = g.Count() }).ToList();

for (int i = 0; i < results.Count; i++)
{
List<double> subScoreList = new List<double>();
int a = 1, b = 1;
for (int k = 1; k < dataSet.Tables["Gaussian"].Columns.Count; k = k + 2)
{
double mean = Convert.ToDouble(dataSet.Tables["Gaussian"].Rows[i][a]);
double variance = Convert.ToDouble(dataSet.Tables["Gaussian"].Rows[i][++a]);
double result = Helper.NormalDist(obj[b - 1], mean, Helper.SquareRoot(variance));
subScoreList.Add(result);
a++; b++;
}

double finalScore = 0;
for (int z = 0; z < subScoreList.Count; z++)
{
if (finalScore == 0)
{
finalScore = subScoreList[z];
continue;
}

finalScore = finalScore * subScoreList[z];
}

score.Add(results[i].Name, finalScore * 0.5);
}

double maxOne = score.Max(c => c.Value);
var name = (from c in score
where c.Value == maxOne
select c.Key).First();

return name;
}

The Classifier class is very easy to use, having two functions Train and Classify. To train the classifier, training data set is created. The example shows how a set of data related to height, weight, foot-size is used to to classify sex.

Please let me know if better code is possible.

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

## About the Author

 Unknown
No Biography provided

## Comments and Discussions

 First Prev Next
 accuracy of the naive bayes classifier Member 1060926223-Feb-15 18:55 Member 10609262 23-Feb-15 18:55
 Naive Bayes Classifier Member 1129830410-Dec-14 11:10 Member 11298304 10-Dec-14 11:10
 Good Job! it's easy and awesome! Thanks
 More steps? skanskan26-Nov-14 10:50 skanskan 26-Nov-14 10:50
 some bugs that I have found Member 111889823-Nov-14 3:40 Member 11188982 3-Nov-14 3:40
 Bug if all values are the same for a single attribute JaneHuang22-Dec-13 6:04 JaneHuang 22-Dec-13 6:04
 filtering geetika gautam1630-Oct-13 23:19 geetika gautam16 30-Oct-13 23:19
 How to use your code for text classification Stephin Francis13-Sep-13 0:02 Stephin Francis 13-Sep-13 0:02
 Great! jetcai190012-Jun-13 3:29 jetcai1900 12-Jun-13 3:29
 what should we do for null values Member 302950027-May-13 7:36 Member 3029500 27-May-13 7:36
 Nice Article Donald Knuth4-Jan-13 4:02 Donald Knuth 4-Jan-13 4:02
 Re: Nice Article Flapsi20-Feb-13 11:06 Flapsi 20-Feb-13 11:06
 My vote of 5 DreamSoft.ps5-May-12 12:36 DreamSoft.ps 5-May-12 12:36
 Text classification DreamSoft.ps5-May-12 12:35 DreamSoft.ps 5-May-12 12:35
 My vote of 5 Filip D'haene22-Jan-12 17:10 Filip D'haene 22-Jan-12 17:10
 Last Visit: 31-Dec-99 19:00     Last Update: 16-Jan-22 10:05 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.