12,454,477 members (56,811 online)
Technical Blog
Add your own
alternative version

174.6K views
8.4K downloads
103 bookmarked
Posted

# Hidden Markov Models in C#

, 5 Dec 2010 CPOL
 Rate this:
Please Sign up or sign in to vote.
Hidden Markov Models (HMM) are stochastic methods to model temporal and sequence data. They are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following and bioinformatics.

Hidden Markov Models (HMM) are stochastic methods to model temporal and sequence data. They are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools.

## Contents

1. Introduction
2. Definition
3. Algorithms
4. Using the code
5. Remarks
6. Acknowledgements
7. See also
8. References

## Introduction

Hidden Markov Models were first described in a series of statistical papers by Leonard E. Baum and other authors in the second half of the 1960s. One of the first applications of HMMs was speech recognition, starting in the mid-1970s. Indeed, one of the most comprehensive explanations on the topic was published in “A Tutorial On Hidden Markov Models And Selected Applications in Speech Recognition”, by Lawrence R. Rabiner in 1989. In the second half of the 1980s, HMMs began to be applied to the analysis of biological sequences, in particular DNA. Since then, they have become ubiquitous in the field of bioinformatics.

Dynamical systems of discrete nature assumed to be governed by a Markov chain emits a sequence of observable outputs. Under the Markov assumption, it is also assumed that the latest output depends only on the current state of the system. Such states are often not known from the observer when only the output values are observable.

Hidden Markov Models attempt to model such systems and allow, among other things, (1) to infer the most likely sequence of states that produced a given output sequence, to (2) infer which will be the most likely next state (and thus predicting the next output) and (3) calculate the probability that a given sequence of outputs originated from the system (allowing the use of hidden Markov models for sequence classification).

The “hidden” in Hidden Markov Models comes from the fact that the observer does not know in which state the system may be in, but has only a probabilistic insight on where it should be.

## Definition

Hidden Markov Models can be seem as finite state machines where for each sequence unit observation there is a state transition and, for each state, there is a output symbol emission.

### Notation

Traditionally, HMMs have been defined by the following quintuple:

$\lambda = (N, M, A, B, \pi)$

where

• N is the number of states for the model
• M is the number of distinct observations symbols per state, i.e. the discrete alphabet size.
• A is the NxN state transition probability distribution given in the form of a matrix A = {aij}
• B is the NxM observation symbol probability distribution given in the form of a matrix B = {bj(k)}
• π is the initial state distribution vector π = {πi}

Note that, if we opt out the structure parameters M and N we have the more often used compact notation

$\lambda = (A, B, \pi)$

### Canonical problems

There are three canonical problems associated with hidden Markov models, which I'll quote from Wikipedia:

1. Given the parameters of the model, compute the probability of a particular output sequence. This requires summation over all possible state sequences, but can be done efficiently using the Forward algorithm, which is a form of dynamic programming.
2. Given the parameters of the model and a particular output sequence, find the state sequence that is most likely to have generated that output sequence. This requires finding a maximum over all possible state sequences, but can similarly be solved efficiently by the Viterbi algorithm.
3. Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities. In other words, derive the maximum likelihood estimate of the parameters of the HMM given a dataset of output sequences. No tractable algorithm is known for solving this problem exactly, but a local maximum likelihood can be derived efficiently using the Baum-Welch algorithm or the Baldi-Chauvin algorithm. The Baum-Welch algorithm is an example of a forward-backward algorithm, and is a special case of the Expectation-maximization algorithm.

The solution for those problems are exactly what makes Hidden Markov Models useful. The ability to learn from the data (using the solution of problem 3) and then become able to make predictions (solution to problem 2) and able to classify sequences (solution of problem 2) is nothing but applied machine learning. From this perspective, HMMs can just be seem as supervisioned sequence classifiers and sequence predictors with some other useful interesting properties.

### Choosing the structure

Choosing the structure for a hidden Markov model is not always obvious. The number of states depend on the application and to what interpretation one is willing to give to the hidden states. Some domain knowledge is required to build a suitable model and also to choose the initial parameters that an HMM can take. There is also some trial and error involved, and there are sometimes complex tradeoffs that have to be made between model complexity and difficulty of learning, just as is the case with most machine learning techniques.

Additional information can be found on http://www.cse.unsw.edu.au/~waleed/phd/tr9806/node12.html.

## Algorithms

The solution to the three canonical problems are the algorithms that makes HMMs useful. Each of the three problems are described in the three subsections below.

### Evaluation

The first canonical problem is the evaluation of the probability of a particular output sequence. It can be efficiently computed using either the Viterbi-forward or the Forward algorithms, both of which are forms of dynamic programming.

The Viterbi algorithm originally computes the most likely sequence of states which has originated a sequence of observations. In doing so, it is also able to return the probability of traversing this particular sequence of states. So to obtain Viterbi probabilities, please refer to the Decoding problem referred below.

The Forward algorithm, unlike the Viterbi algorithm, does not find a particular sequence of states; instead it computes the probability that any sequence of states has produced the sequence of observations. In both algorithms, a matrix is used to store computations about the possible state sequence paths that the model can assume. The forward algorithm also plays a key role in the Learning problem, and is thus implemented as a separate method.

/// <span class="code-SummaryComment"><summary>
</span>

### Decoding

The second canonical problem is the discovery of the most likely sequence of states that generated a given output sequence. This can be computed efficiently using the Viterbi algorithm. A trackback is used to detect the maximum probability path travelled by the algorithm. The probability of travelling such sequence is also computed in the process.

/// <span class="code-SummaryComment"><summary>
</span>

### Learning

The third and last problem is the problem of learning the most likely parameters that best models a system given a set of sequences originated from this system. Most implementations I've seem did not consider the problem of learning from a set of sequences, but only from a single sequence at a time. The algorithm below, however, is fully suitable to learn from a set of sequences and also uses scaling, which is another thing I have not seem in other implementations.

The source code follows the original algorithm by Rabiner (1989). There are, however, some known issues with the algorithms detailed in Rabiner's paper. More information about those issues is available in a next section of this article entitled “Remarks”.

/// <span class="code-SummaryComment"><summary>
</span>

## Using the code

Lets suppose we have gathered some sequences from a system we wish to model. The sequences are expressed as a integer array such as:

int[][] sequences = new int[][]
{
new int[] { 0,1,1,1,1,1,1 },
new int[] { 0,1,1,1 },
new int[] { 0,1,1,1,1 },
new int[] { 0,1, },
new int[] { 0,1,1 },
};


For us, it can be obvious to see that the system is outputting sequences that always start with a zero and have one or more ones at the end. But lets try to fit a Hidden Markov Model to predict those sequences.
// Creates a new Hidden Markov Model with 2 states for
//  an output alphabet of two characters (zero and one)
HiddenMarkovModel hmm = new HiddenMarkovModel(2, 2);

// Try to fit the model to the data until the difference in
//  the average likelihood changes only by as little as 0.01
hmm.Learn(sequences, 0.01);


Once the model is trained, lets test to see if it recognizes some sequences:
// Calculate the probability that the given
//  sequences originated from the model
double l1 = hmm.Evaluate(new int[] { 0, 1 });       // l1 = 0.9999
double l2 = hmm.Evaluate(new int[] { 0, 1, 1, 1 }); // l2 = 0.9999

double l3 = hmm.Evaluate(new int[] { 1, 1 });       // l3 = 0.0000
double l4 = hmm.Evaluate(new int[] { 1, 0, 0, 0 }); // l4 = 0.0000


Of course the model performs well as this a rather simple example. A more useful test case would consist of allowing for some errors in the input sequences in the hope that the model will become more tolerant to measurement errors.
int[][] sequences = new int[][]
{
new int[] { 0,1,1,1,1,0,1,1,1,1 },
new int[] { 0,1,1,1,0,1,1,1,1,1 },
new int[] { 0,1,1,1,1,1,1,1,1,1 },
new int[] { 0,1,1,1,1,1         },
new int[] { 0,1,1,1,1,1,1       },
new int[] { 0,1,1,1,1,1,1,1,1,1 },
new int[] { 0,1,1,1,1,1,1,1,1,1 },
};

// Creates a new Hidden Markov Model with 3 states for
//  an output alphabet of two characters (zero and one)
HiddenMarkovModel hmm = new HiddenMarkovModel(2, 3);

// Try to fit the model to the data until the difference in
//  the average likelihood changes only by as little as 0.0001
hmm.Learn(sequences, 0.0001);

// Calculate the probability that the given
//  sequences originated from the model
double l1 = hmm.Evaluate(new int[] { 0,1 });      // 0.9999
double l2 = hmm.Evaluate(new int[] { 0,1,1,1 });  // 0.9166

double l3 = hmm.Evaluate(new int[] { 1,1 });      // 0.0000
double l4 = hmm.Evaluate(new int[] { 1,0,0,0 });  // 0.0000

double l5 = hmm.Evaluate(new int[] { 0,1,0,1,1,1,1,1,1 }); // 0.0342
double l6 = hmm.Evaluate(new int[] { 0,1,1,1,1,1,1,0,1 }); // 0.0342


We can see that, despite having a very low probability, the likelihood values for the sequences containing a simulated measurement error are greater than the likelihoods for the sequences which do not follow the sequence structure at all.

In a subsequent article, we will see that those low values for the likelihoods will not be a problem because HMMs are often used in sets to form sequence classifiers. When used in such configurations, what really matters is which HMM returns the highest probability among others in the set.

## Remarks

A practical issue in the use of Hidden Markov Models to model long sequences is the numerical scaling of conditional probabilities. The probability of observing a long sequence given most models is extremely small, and the use of these extremely small numbers in computations often leads to numerical instability, making application of HMMs to genome length sequences quite challenging.

There are two common approaches to dealing with small conditional probabilities. One approach is to rescale the conditional probabilities using carefully designed scaling factors, and the other approach is to work with the logarithms of the conditional probabilities. For more information on using logarithms please see the work entitled “Numerically Stable Hidden Markov Model Implementation”, by Tobias P. Mann.

### Known issues

The code on this article is based on the Tutorial by Rabiner. There are, however, some problems with the scaling and other algorithms. An errata depicting all issues is available in the website “An Erratum for ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition’” and is maintained by Ali Rahimi. I have not yet verified if the implementation presented here also suffers from the same mistakes explained there. This code has worked well under many situations, but I cannot guarantee its perfectness. Please use at your own risk.

## Acknowledgements

Thanks to Guilherme C. Pedroso, for the help with the Baum-Welch generalization for multiple input sequences. He has also co-written a very interesting article using hidden Markov models for gesture recognition, entitled “Automatic Recognition of Finger Spelling for LIBRAS based on a Two-Layer Architecture” published in the 25th Symposium On Applied Computing (ACM SAC 2010).

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

## About the Author

 Engineer Xerox Research Center Europe Brazil
Computer and technology enthusiast, interested in artificial intelligence and image processing. Has a Master's degree on Computer Science specialized on Image and Signal Processing, with expertise on Machine Learning, Computer Vision, Pattern Recognition and Data Mining systems. Author of the Accord.NET Framework for developing scientific computing applications.

If you would like to hire good developers to build your dream application, please check out DaitanGroup, one of the top outsourcing companies in Brazil. This company, located in Brazil's Sillicon Valley but with US-based offices, has huge experience developing telecommunications software for large and small companies worldwide.

## You may also be interested in...

 Pro Pro

## Comments and Discussions

 First PrevNext
 How can i use this framework to clustering with EM algorithm? PokemonCraft5-Sep-15 23:42 PokemonCraft 5-Sep-15 23:42
 Re: How can i use this framework to clustering with EM algorithm? César de Souza6-Sep-15 1:43 César de Souza 6-Sep-15 1:43
 states Michael Borg29-Jan-15 2:03 Michael Borg 29-Jan-15 2:03
 Evaluation function kentmc4-Dec-14 2:33 kentmc 4-Dec-14 2:33
 Re: Evaluation function César de Souza4-Dec-14 3:08 César de Souza 4-Dec-14 3:08
 GitHub? KommuSoft15-Sep-14 2:19 KommuSoft 15-Sep-14 2:19
 Re: GitHub? César de Souza15-Sep-14 3:20 César de Souza 15-Sep-14 3:20
 Re: GitHub? KommuSoft15-Sep-14 13:19 KommuSoft 15-Sep-14 13:19
 Emission matrix Perora8-Sep-14 23:03 Perora 8-Sep-14 23:03
 Re: Emission matrix César de Souza8-Sep-14 23:53 César de Souza 8-Sep-14 23:53
 Last Visit: 31-Dec-99 18:00     Last Update: 29-Aug-16 20:45 Refresh 12345678 Next »

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160826.1 | Last Updated 5 Dec 2010
Article Copyright 2010 by César de Souza
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid