65.9K
CodeProject is changing. Read more.
Home

Basic Word Prediction in C#

starIconstarIconstarIcon
emptyStarIcon
starIcon
emptyStarIcon

3.67/5 (3 votes)

Nov 24, 2016

CPOL

1 min read

viewsIcon

8420

In this little post, I will go through a small and very basic prediction engine written in C# for one of my projects.

In this little post, I will go through a small and very basic prediction engine written in C# for one of my projects.
Basically, what it does is the following:

  • It will collect data in the form of lists of strings.
  • Given an input, it will give back a list of predictions of the next item. In falling probability order.

So no rocket science, but a good and basic prediction engine that can be built upon depending on what scenarios you want to solve.

First, Some Unit Tests

[TestMethod]
public void PredictorTest_NoMatch()
{
    var target = new Predictor();
    var actual = target.Predict("The rabbit-hole went straight on like a tunnel for some way");
    Assert.AreEqual(0, actual.Count);
}
[TestMethod]
public void PredictorTest_SingleMatch()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_OtherNoise()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    target.Predict("Alice took up the fan and gloves");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_TwoResultsInOrder()
{
    var target = new Predictor();
    target.Predict("Alice thought she might as well wait");
    target.Predict("Alice thought this a very curious thing");
    target.Predict("Alice thought she had never seen such a curious");
    var actual = target.Predict("Alice thought");
    Assert.AreEqual(2, actual.Count);
    Assert.AreEqual("she", actual.First());
    Assert.AreEqual("this", actual.Last());
}

So, nothing too fancy. You provide it with input and as expected, it returns a list of possible next values. Examples are from Alice's Adventures in Wonderland by Lewis Carroll, the Project Gutenberg edition.

Predictor Class

public class Predictor
{
    private readonly Dictionary<string, Dictionary<string, int>> 
              _items = new Dictionary<string, Dictionary<string, int>>();
    private readonly char[] _tokenDelimeter = {' '};
    public List<string> Predict(string input)
    {
        var tokens = input.Split(_tokenDelimeter, StringSplitOptions.RemoveEmptyEntries);
        var previousBuilder = new StringBuilder();
        Dictionary<string, int> nextFullList;
        foreach (var token in tokens)
        {
            nextFullList = GetOrCreate(_items, previousBuilder.ToString());
            if (nextFullList.ContainsKey(token))
                nextFullList[token] += 1;
            else
                nextFullList.Add(token, 1);

            if (previousBuilder.Length > 0)
                previousBuilder.Append(" ");
            previousBuilder.Append(token);
        }
        nextFullList = GetOrCreate(_items, previousBuilder.ToString());
        var prediction = (from x in nextFullList
            orderby x.Value descending
            select x.Key).ToList();

        return prediction;
    }
    private static T GetOrCreate<T>(Dictionary<string, T> d, string key)
    {
        if (d.ContainsKey(key))
        {
            return d[key];
        }
        var t = Activator.CreateInstance<T>();
        d.Add(key, t);
        return t;
    }
}

It's not smart, and it requires having seen quite a lot of samples to work well.
An additional improvement could be to add storage of previous/next token combinations so that if the full search doesn't yield any result, a guess could be made based on only the previous word. But I guess that is up to you.

All code provided as-is. This is copied from my own code-base, and may need some additional programming to work.

Until next time: Work to Live, Don’t Live to Work