TDD the Anagrams Kata

Richard A. Dalton

Rate me:

5.00/5 (5 votes)

4 Dec 2012CPOL10 min read

28.6K

148

A worked example of Test Driving the anagrams kata

Download source code - 9.2 KB

Introduction

This article presents my C# implementation of the Anagrams Kata as described at www.cyber-dojo.com

Write a program to generate all potential anagrams of an input string. 
For example, the potential anagrams of "biro" are 
biro bior brio broi boir bori 
ibro ibor irbo irob iobr iorb 
rbio rboi ribo riob roib robi 
obir obri oibr oirb orbi orib

Background

A Kata is a form of deliberate practice. Katas are small coding exercise that a programmer completes on a daily basis. Rather than attempt a new Kata each day the programmer should work on the same Kata repeatedly until completing the Kata is almost automatic.

The benefits include greater efficiency in the way you use your development tools, Better understanding of features of your programming language, and a better grasp of TDD.

Using the code

The attached project contains the final version of the code. To get a benefit from the article you shouldn't skip straight to the final code. Try the exercise yourself, or follow along with the steps I take.

The Kata

Let's write a test.

[Test]
public void NoCharacters()
{
    var expected = new List<string> {""};
    Assert.That(Anagrams.Of(""), Is.EqualTo(expected));
}

As is almost always the case, we start with the degenerate case of an empty string. We don't do this solely to ensure the degenerate case is handled (that's part of it), but with this first test we're thinking about the API of our new code.

You can see that the Function we're writing will be a static method called 'Of' on a class called 'Anagrams', which allows us the somewhat fluent usage 'Anagrams.Of(s)'. This is simple Test Driven Design, we're writing tests as if the desired code already exists, which in theory gives us the best API we could possibly hope for.

To make that test pass. We'll need a Class, a Method, and we'll return a literal that satisfies the test.

public class Anagrams 
{
    public static List<string> Of(string s)
    {
        return new List<string> {""};
    }
}

We have GREEN. We might have been tempted to just return s and be done with it. It would have passed the test, and it would have handled the case of one character strings also. Thinking about passing tests that we haven't even written yet suggests we're getting ahead of ourselves. Let's write the test for the one character example.

[Test]
public void OneCharacter()
{
    var expected = new List<string> { "A" };
    Assert.That(Anagrams.Of("A"), Is.EqualTo(expected));
}

Now we've got a failing test, let's get it passing. We can return 's' as mentioned above, and both our tests would pass. It's ok to take these kinds of steps, if a test fails unexpectedly we can revert the changes and try again a little more slowly.

For the sake of the exercise, let's go slow for now and look at a technique that will serve us well later when things are more complicated. We'll isolate the existing working code, make the new test pass with a branch in the code, then try to eliminate the branch. Here's a way of getting both tests passing.

public static List<string> Of(string s)
{
    if(s == "")
        return new List<string> {""};
    return new List<string>{"A"};
}

The 'if' statement handles the empty string case, so our new code doesn't break it. We use a literal value "A" to get the new test passing as simply as possible. This leaves us with duplication between the literal value "A" in the test and the same value in the code. We can remove the duplication by generalising the solution for One character, returning s is the more general form of returning "A".

public static List<string> Of(string s)
{ 
    if(s == "")
        return new List<string> {""};
    return new List<string>{s};
}

It's clear now that returning s will make both our tests pass, so let's refactor.

public static List<string> Of(string s)
{
    return new List<string>{s};
}

Time for another test. How about two characters?

[Test]
public void TwoCharacters()
{
    var expected = new List<string> { "AB", "BA" };
    Assert.That(Anagrams.Of("AB"), Is.EqualTo(expected));
}

Let's repeat the same trick of protecting the existing working code with an if statement, and making the test pass using the simplest possible implementation.

public static List<string> Of(string s)
{
    if(s.Length<=1)
        return new List<string>{s};
    return new List<string>
               {
                   "AB", 
                   "BA"
               };
}

We're back to having duplication between our test and our code, let's break that.

public static List<string> Of(string s)
{
    if(s.Length<=1)
        return new List<string>{s};
    return new List<string>
               {
                   s, 
                   s.Substring(1,1) + s.Substring(0,1)
               };
}

That'll work for any strings from 0 to 2 characters. Next test, three characters.

[Test]
public void ThreeCharacters()
{
    var expected = new List<string> { "ABC", "ACB", "BAC", "BCA", "CAB", "CBA" };
    Assert.That(Anagrams.Of("ABC"), Is.EqualTo(expected));
}

And to make this pass? You should know the ropes by now. Isolate existing code, and hard code a solution.

public static List<string> Of(string s)
{
    if (s.Length <= 1)
        return new List<string> { s };
    if(s.Length == 2)
    {
        return new List<string>
               {
                   s, 
                   s.Substring(1,1) + s.Substring(0,1)
               };
    }
    return new List<string>
               {
                   "ABC", 
                   "ACB", 
                   "BAC", 
                   "BCA", 
                   "CAB", 
                   "CBA"
               };
}

There's a pattern emerging here. Lets split up the strings to make it more clear.

return new List<string>
               {
                   "A" + "BC", 
                   "A" + "CB", 
                   "B" + "AC", 
                   "B" + "CA", 
                   "C" + "AB", 
                   "C" + "BA"
               };

We can replace the first character on each row.

return new List<string>
               {
                   s.Substring(0,1) + "BC", 
                   s.Substring(0,1) + "CB", 
                   s.Substring(1,1) + "AC", 
                   s.Substring(1,1) + "CA", 
                   s.Substring(2,1) + "AB", 
                   s.Substring(2,1) + "BA"
               };

What has that achieved? We still have literal values on each line. Well, we do, but they look familiar "BC" and "CB" look like the anagrams of "BC". Same applies to "AC", "CA", and "AB", "BA". We've discovered that part of the job of generating anagrams for three characters is to generate anagrams for two characters. Let's see if we can make that explicit.

return new List<string>
               {
                   s.Substring(0,1) + Anagrams.Of("BC")[0], 
                   s.Substring(0,1) + Anagrams.Of("BC")[1], 
                   s.Substring(1,1) + "AC", 
                   s.Substring(1,1) + "CA", 
                   s.Substring(2,1) + "AB", 
                   s.Substring(2,1) + "BA"
               };

If we get Anagrams.Of("BC") we'll get two results. The first will be "BC" the second will be "CB". We can repeat this change for the remaining literal values.

return new List<string>
{
   s.Substring(0,1) + Anagrams.Of("BC")[0], 
   s.Substring(0,1) + Anagrams.Of("BC")[1], 
   s.Substring(1,1) + Anagrams.Of("AC")[0], 
   s.Substring(1,1) + Anagrams.Of("AC")[1], 
   s.Substring(2,1) + Anagrams.Of("AB")[0], 
   s.Substring(2,1) + Anagrams.Of("AB")[1]
};

Hang on, we still have literal values on all 6 lines. Are we making any progress at all? or are we just moving the problem around? We have made progress. We've actually eliminated half the literals. Now we're almost in a position to remove those literals completely. Remember the original string 's' has a value of "ABC", so to get the "BC" of the first two lines, we need to drop the first character ("A") from "ABC".

Time to do a little more TDDing, let's pretend we have the function we need.

return new List<string> 
{
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[0], 
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[1], 
   s.Substring(1,1) + Anagrams.Of("AC")[0], 
   s.Substring(1,1) + Anagrams.Of("AC")[1], 
   s.Substring(2,1) + Anagrams.Of("AB")[0], 
   s.Substring(2,1) + Anagrams.Of("AB")[1]
};

This won't compile, we don't have the DropCharacter function. Let's get the test passing in the simplest possible way.

private static string DropCharacter(string s, int index)
{
    return "BC";
}

Strictly speaking I've skipped a step there. What I would actually do is create the function so that everything compiles, but have it return the wrong value, so that I have a failing test. Then by having the function return "BC" I get the test to Green. This reassures me that my tests are actually joined up to the code.

We're still have literal value duplication between our tests and our code, but let's get DropCharacter fully working before we tackle that. Right now it works if we want to drop the first character ("A"). Let's try to expand it's use.

return new List<string> 
{
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[0], 
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[1], 
   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[0], 
   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[1], 
   s.Substring(2,1) + Anagrams.Of("AB")[0], 
   s.Substring(2,1) + Anagrams.Of("AB")[1]
};

This will give us a failing test. The next failing test doesn't always have to be a new test. We actually have all of the tests we need. What has happened here is that we've tried to make one part of our code more general, in doing so we've identified a hard coding issue elsewhere in the code. We are effectively test driving the 'DropCharacter' function by proxy. Our 'Anagrams.Of' function is using 'DropCharacter' in ways that it can't handle.

You should know the drill for getting back to green by now, protect the working code with an 'if' and get the new test passing as simply as possible.

private static string DropCharacter(string s, int index)
{ 
    if(index == 0)
        return "BC";
    return "AC";
}

Let's sort out the final case for 'DropCharacter' and then see where we stand.

return new List<string>
{
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[0], 
   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[1], 
   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[0], 
   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[1], 
   s.Substring(2,1) + Anagrams.Of(DropCharacter(s,2))[0], 
   s.Substring(2,1) + Anagrams.Of(DropCharacter(s,2))[1]
};

To get this passing we'll use another 'if'

private static string DropCharacter(string s, int index)
{
    if(index == 0)
        return "BC";
    if(index == 1)
        return "AC";
    return "AB";
}

Time for some refactoring now. We need to eliminate these literal values once and for all. Dropping the first and last characters is easy.

private static string DropCharacter(string s, int index)
{
    if(index == 0)
        return s.Substring(1,2);
    if(index == 1)
        return "AC";
    return s.Substring(0,2);
}

We've hard coded the parameters to Substring, which means we're still tied to strings of three characters. One step at a time. Let's sort out the case where we're dropping the middle character, and then see if we can generalise for any string length.

if(index == 1)
    return "A" + "C";

The letter "B" has been dropped, "A" and "C" represent the parts of the string before and after the dropped letter. So, we split the string and deal with each part seperately.

if(index == 1)
    return s.Substring(0,1) + s.Substring(2,1);

Here's where we stand.

private static string DropCharacter(string s, int index)
{
    if(index == 0)
        return s.Substring(1,2);
    if(index == 1)
        return s.Substring(0,1) + s.Substring(2,1);
    return s.Substring(0,2);
}

We've eliminated the dependency on literal strings, but we've replaced it with a dependency on strings a of particular length. That's progress of sorts. We have the string 's' and the position of the dropped character 'index' to work with. We need to trace these hard coded values back to these two inputs. This can be a little tricky.

Let's start with the portion of the string, before the dropped character. There are two places in the function where that piece of string is referred to, and by using the 'index' variable we can eliminate some of the hard coding of the number of characters.

private static string DropCharacter(string s, int index)
{
    if(index == 0)
        return s.Substring(1,2);
    if(index == 1)
        return s.Substring(0,index) + s.Substring(2,1);
    return s.Substring(0,index);
}

We can introduce a 'before' variable that makes this explicit.

private static string DropCharacter(string s, int index)
{
    var before = s.Substring(0, index);
    if (index == 0)
        return s.Substring(1,2);
    if(index == 1)
        return before + s.Substring(2,1);
    return before;
}

Now we need to deal with the portion of the string that comes after the dropped character. Again we have two examples of this in the code. We can see that the first parameter to Substring, the starting position of the substring seems to be 'index + 1'. Let's plug that in.

private static string DropCharacter(string s, int index)
{
    var before = s.Substring(0, index);
    if (index == 0)
        return s.Substring(index + 1, 2);
    if(index == 1)
        return before + s.Substring(index + 1, 1);
    return before;
}

The second parameter to Substring isn't quite so simple, it depends on both the index of the dropped character and the length of the original string. Nonetheless it's reasonably easy to figure out.

private static string DropCharacter(string s, int index)
{
    var before = s.Substring(0, index);
    if (index == 0)
        return s.Substring(index + 1, s.Length - (index + 1));
    if(index == 1)
        return before + s.Substring(index + 1, s.Length - (index + 1));
    return before;
}

For clarity, let's introduce another variable to really make this all obvious.

private static string DropCharacter(string s, int index)
{
    var before = s.Substring(0, index);
    var after = s.Substring(index + 1, s.Length - (index+1));
    if (index == 0)
        return after;
    if(index == 1)
        return before + after;
    return before;
}

And, we can simplify this, as follows

private static string DropCharacter(string s, int index)
{
    var before = s.Substring(0, index);
    var after = s.Substring(index + 1, s.Length - (index+1));
    return before + after;
}

Our final task is to turn our attention back to the 'Anagrams.Of' function and sort it out.

public static List<string> Of(string s)
{
    if (s.Length <= 1)
        return new List<string> { s };
    if(s.Length == 2)
    {
        return new List<string>
               {
                   s, 
                   s.Substring(1,1) + s.Substring(0,1)
               };
    }
    return new List<string>
               {
                   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[0], 
                   s.Substring(0,1) + Anagrams.Of(DropCharacter(s,0))[1], 
                   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[0], 
                   s.Substring(1,1) + Anagrams.Of(DropCharacter(s,1))[1], 
                   s.Substring(2,1) + Anagrams.Of(DropCharacter(s,2))[0], 
                   s.Substring(2,1) + Anagrams.Of(DropCharacter(s,2))[1]
               };
}

Those six lines of code that create the six anagrams of a three character word look promising. There's a reasonably obvious loop happening there.

The first thing we need to do is get rid of the list initialiser code, so that we can add items to the list using a loop.

var anagrams = new List<string>();
anagrams.Add(s.Substring(0, 1) + Anagrams.Of(DropCharacter(s, 0))[0]);
anagrams.Add(s.Substring(0, 1) + Anagrams.Of(DropCharacter(s, 0))[1]);
anagrams.Add(s.Substring(1, 1) + Anagrams.Of(DropCharacter(s, 1))[0]);
anagrams.Add(s.Substring(1, 1) + Anagrams.Of(DropCharacter(s, 1))[1]);
anagrams.Add(s.Substring(2, 1) + Anagrams.Of(DropCharacter(s, 2))[0]);
anagrams.Add(s.Substring(2, 1) + Anagrams.Of(DropCharacter(s, 2))[1]);
return anagrams;

Now let's get a loop working for us and delete some of this code.

var anagrams = new List<string>();
for (int i = 0; i < 3; i++ )
{
    anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[0]);
    anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[1]);                
}
return anagrams;

We can nest another loop to reduce things even further

for (int i = 0; i < 3; i++ )
    for (var j = 0; j < 2; j++)
        anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);

The upper bounds of both loop are hardcoded. Let's get rid of those.

for (int i = 0; i < s.Length; i++ )
    for (var j = 0; j < s.Length - 1; j++)
        anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);

It took a while to get here, but now we can tidy up the 'Anagrams.Of' function.

public static List<string> Of(string s)
{ 
    if (s.Length <= 1)
        return new List<string> { s };
    var anagrams = new List<string>();
    for (var i = 0; i < s.Length; i++ )
        for (var j = 0; j < s.Length - 1; j++)
            anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);
    return anagrams;
}

In terms of a Kata we could maybe call this done, the goal after all is more about the tools and process than the final code. But there's a big problem. Our final code is wrong.

We've managed to introduce a bug. We screwed up in our "careful" step by step refinement of the code.

We've tested our Anagrams function up to strings of length 3. We're reaching a point where creating tests for longer strings is a pain simply because of the sheer size of the expected results list. However, if we try running for longer strings, even a string of 4 characters, it will give incorrect results.

We can prove this without doing a full list of the expected results. Try the following test

        [Test]
public void CountOfAnagramsOfFourCharacters()
{
    Assert.That(Anagrams.Of("ABCD").Count, Is.EqualTo(24));
}

We know that the answer should be 24 because the number of results should be the factorial of the original string length. If you didn't know that already, you do now. Strictly speaking this whole article is about Permutations, not Anagrams.

Anyhow, back to the test. It will fail. Instead of the 24 results that we expect, we get 12. Why so? How could this happen?

Well, remember when I said that replacing literal values can be tricky? In this case we've been bitten in the ass by that very problem. We made an assumption and it led us astray.

Our nested loops originally looked like this:

for (int i = 0; i < 3; i++)
    for (var j = 0; j < 2; j++)
        anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);

And we got rid of the literal values in the loop guards by refining to this

for (int i = 0; i < s.Length; i++)
    for (var j = 0; j < s.Length - 1; j++)
        anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);

The loop guard of 3 in the outer loop is replaced by the length of the string, which is right because we're iterating over the characters in the string.

What about the guard of j<2 in the inner loop?

2 is 3-1, right?

j < s.Length - 1?

We get to green, job done?

Except we're not done. This is a case of a coincidental pass. It's what happens when we mindlessly refactor the code, without really thinking about what's going on.

The inner loop is iterating over all of the anagrams of the characters that remain after we drop a character. In the case of a string with three characters we're left with 2 characters, and there are 2 anagrams of any two character string. We've mixed up our 2's.

Instead of the code above, we should have done the following

public static List<string> Of(string s)
{
    if (s.Length <= 1)
        return new List<string> { s };
    var anagrams = new List<string>();
    for (var i = 0; i < s.Length; i++)
        for (var j = 0; j < Anagrams.Of(DropCharacter(s, i)).Count; j++)
            anagrams.Add(s.Substring(i, 1) + Anagrams.Of(DropCharacter(s, i))[j]);
    return anagrams;
}

That's fixed the bug, but it's pretty ugly, not to mention inefficient.

But, we're at green, and legitimately this time, so we can refactor with confidence.

for (var i = 0; i < s.Length; i++) 
{ 
    var anagramsOfRest = Anagrams.Of(DropCharacter(s, i));
    for (var j = 0; j < anagramsOfRest.Count; j++)
        anagrams.Add(s.Substring(i, 1) + anagramsOfRest[j]);
}

We can tweak that a little more:

for (var i = 0; i < s.Length; i++)
{
    var anagramsOfRest = Anagrams.Of(DropCharacter(s, i));
    foreach (var anagramOfRest in anagramsOfRest)
        anagrams.Add(s.Substring(i, 1) + anagramOfRest);                    
}

And one more tweak

for (var i = 0; i < s.Length; i++)
{
    var droppedCharacter = s.Substring(i, 1);
    var anagramsOfRest = Anagrams.Of(DropCharacter(s, i));
    foreach (var anagramOfRest in anagramsOfRest)
        anagrams.Add(droppedCharacter + anagramOfRest);
}

And that's it. Here's the full listing of Tests and Code.

namespace AnagramKata
{
    [TestFixture]
    public class AnagramTests
    {
        [Test]
        public void NoCharacters()
        {
            var expected = new List<string> { "" };
            Assert.That(Anagrams.Of(""), Is.EqualTo(expected));
        }
        [Test]
        public void OneCharacter()
        {
            var expected = new List<string> { "A" };
            Assert.That(Anagrams.Of("A"), Is.EqualTo(expected));
        }
        [Test]
        public void TwoCharacters()
        {
            var expected = new List<string> { "AB", "BA" };
            Assert.That(Anagrams.Of("AB"), Is.EqualTo(expected));
        }
        [Test]
        public void ThreeCharacters()
        {
            var expected = new List<string> { "ABC", "ACB", "BAC", "BCA", "CAB", "CBA" };
            Assert.That(Anagrams.Of("ABC"), Is.EqualTo(expected));
        }
        [Test]
        public void CountOfAnagramsOfFourCharacters()
        {
            Assert.That(Anagrams.Of("ABCD").Count, Is.EqualTo(24));
        }
    }
    public class Anagrams
    {
        public static List<string> Of(string s)
        {
            if (s.Length <= 1)
                return new List<string> { s };
            var anagrams = new List<string>();
            for (var i = 0; i < s.Length; i++)
            {
                var droppedCharacter = s.Substring(i, 1);
                var anagramsOfRest = Anagrams.Of(DropCharacter(s, i));
                foreach (var anagramOfRest in anagramsOfRest)
                    anagrams.Add(droppedCharacter + anagramOfRest);
            }
            return anagrams;
        }
        private static string DropCharacter(string s, int index)
        {
            return s.Substring(0, index) + s.Substring(index + 1, s.Length - (index + 1));
        }
    }
}

Conclusion

In this article we've looked at the concept of a Kata as a means of deliberate practice. We've seen how functionality can be added incrementally, driven by tests. We've also seen how we can protect existing code while making new tests pass in the simplest possible way, and how to then generalise those simple specific solutions to remove duplication.

Finally we looked beyond Kata's at how bugs can be introduced and missed even when using TDD, and how we should isolate bugs using a failing test and then fix the bug leaving all tests passing.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Richard A. Dalton

Ireland

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.