Click here to Skip to main content
15,886,100 members
Articles / Programming Languages / C#

Basic Word Prediction in C#

Rate me:
Please Sign up or sign in to vote.
3.67/5 (3 votes)
24 Nov 2016CPOL1 min read 8.3K   6  
In this little post, I will go through a small and very basic prediction engine written in C# for one of my projects.

Image 1

In this little post, I will go through a small and very basic prediction engine written in C# for one of my projects.
Basically, what it does is the following:

  • It will collect data in the form of lists of strings.
  • Given an input, it will give back a list of predictions of the next item. In falling probability order.

So no rocket science, but a good and basic prediction engine that can be built upon depending on what scenarios you want to solve.

First, Some Unit Tests

C#
[TestMethod]
public void PredictorTest_NoMatch()
{
    var target = new Predictor();
    var actual = target.Predict("The rabbit-hole went straight on like a tunnel for some way");
    Assert.AreEqual(0, actual.Count);
}
C#
[TestMethod]
public void PredictorTest_SingleMatch()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
C#
[TestMethod]
public void PredictorTest_SingleMatch_OtherNoise()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    target.Predict("Alice took up the fan and gloves");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
C#
[TestMethod]
public void PredictorTest_SingleMatch_TwoResultsInOrder()
{
    var target = new Predictor();
    target.Predict("Alice thought she might as well wait");
    target.Predict("Alice thought this a very curious thing");
    target.Predict("Alice thought she had never seen such a curious");
    var actual = target.Predict("Alice thought");
    Assert.AreEqual(2, actual.Count);
    Assert.AreEqual("she", actual.First());
    Assert.AreEqual("this", actual.Last());
}

So, nothing too fancy. You provide it with input and as expected, it returns a list of possible next values. Examples are from Alice's Adventures in Wonderland by Lewis Carroll, the Project Gutenberg edition.

Predictor Class

C#
public class Predictor
{
    private readonly Dictionary<string, Dictionary<string, int>> 
              _items = new Dictionary<string, Dictionary<string, int>>();
    private readonly char[] _tokenDelimeter = {' '};
    public List<string> Predict(string input)
    {
        var tokens = input.Split(_tokenDelimeter, StringSplitOptions.RemoveEmptyEntries);
        var previousBuilder = new StringBuilder();
        Dictionary<string, int> nextFullList;
        foreach (var token in tokens)
        {
            nextFullList = GetOrCreate(_items, previousBuilder.ToString());
            if (nextFullList.ContainsKey(token))
                nextFullList[token] += 1;
            else
                nextFullList.Add(token, 1);

            if (previousBuilder.Length > 0)
                previousBuilder.Append(" ");
            previousBuilder.Append(token);
        }
        nextFullList = GetOrCreate(_items, previousBuilder.ToString());
        var prediction = (from x in nextFullList
            orderby x.Value descending
            select x.Key).ToList();

        return prediction;
    }
    private static T GetOrCreate<T>(Dictionary<string, T> d, string key)
    {
        if (d.ContainsKey(key))
        {
            return d[key];
        }
        var t = Activator.CreateInstance<T>();
        d.Add(key, t);
        return t;
    }
}

It's not smart, and it requires having seen quite a lot of samples to work well.
An additional improvement could be to add storage of previous/next token combinations so that if the full search doesn't yield any result, a guess could be made based on only the previous word. But I guess that is up to you.

All code provided as-is. This is copied from my own code-base, and may need some additional programming to work.

Until next time: Work to Live, Don’t Live to Work

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect
Sweden Sweden
http://dreamstatecoding.blogspot.com

Comments and Discussions

 
-- There are no messages in this forum --