Introduction

After learning the MiniMax algorithm, I decided to practice it on Tic Tac Toe. The algorithm is simple to implement. However, it took me much more time than I expected. So, I would like to share what I have learned here.

MiniMax algorithm with alpha beta pruning

The shortest description of MiniMax that I can find is from Wikipedia. I implemented it this way:

public int MiniMaxShortVersion(int depth, int alpha, int beta, out Board childWithMax)
{
    childWithMax = null;
    if (depth == 0 || IsTerminalNode())
    {
        //When it is turn for PlayO, we need to find the minimum score.
        RecursiveScore = m_Score;
        return m_TurnForPlayerX ? m_Score : -m_Score;
    }
    foreach (Board cur in GetChildren())
    {
        Board dummy;
        int score = -cur.MiniMaxShortVersion(depth - 1, -beta, -alpha, out dummy);
        if (alpha < score)
        {
            alpha = score;
            childWithMax = cur;
            if (alpha >= beta)
            {
                break;
            }
        }
    }
    RecursiveScore = alpha;
    return alpha;
}
//To call the function:
MiniMaxShortVersion(depth, int.MinValue + 1, int.MaxValue - 1, out ret);

The description on Wikipedia is very short and elegant, but there are some things to watch out:

First, we cannot use int.MinValue and int.MaxValue as -/+infinity. -int.MaxValue is not a valid integer. So, I used int.MinValue+1 and int.MaxValue-1 instead.

Second, the terminal nodes include nodes with no successors, and game-over nodes (winner already decided). In my initial implementation, I forgot the game-over nodes.

Finally, the description on Wikipeida seems to indicate that the node type (min/max) doesn't matter. However, the heuristic value needs to change the sign based on whether the node is a min or max node. In the longer version descriptions, the heuristic score doesn't have to change, which makes the coding much easier. In order to find where the mistake is, I had to implement the longer version:

public int MiniMax(int depth, bool needMax, int alpha, int beta, out Board childWithMax)
{
    childWithMax = null;
    System.Diagnostics.Debug.Assert(m_TurnForPlayerX == needMax);
    if (depth == 0 || IsTerminalNode())
    {
        RecursiveScore = m_Score;
        return m_Score;
    }
    foreach (Board cur in GetChildren())
    {
        Board dummy;
        int score = cur.MiniMax(depth - 1, !needMax, alpha, beta, out dummy);
        if (!needMax)
        {
            if (beta > score)
            {
                beta = score;
                childWithMax = cur;
                if (alpha >= beta)
                {
                    break;
                }
            }
        }
        else
        {
            if (alpha < score)
            {
                alpha = score;
                childWithMax = cur;
                if (alpha >= beta)
                {
                    break;
                }
            }
        }
    }
    RecursiveScore = needMax ? alpha : beta;
    return RecursiveScore;
}

The second version takes a few more minutes to type. But it saves me a couple hours from debugging. It has another advantage: when its RecurisveScore is positive, Player X is winning. When the RecurisveScore is negative, Player O is winning. The shorter version on Wikipedia requires more logic.

Heuristic score in Tic Tac Toe

The Minimax algorithm can be applied to many games. Its implementation doesn't change for a different game. However, the challenge part is to define the heuristic score. I found that it is not easy even for a game as simple as Tic Tac Toe. There are totally 8 rows in a Tic Tac Toe board. The rules to calculate the score are:

For each row, if there are both X and O, then the score for the row is 0.
If the whole row is empty, then the score is 1.
If there is only one X, then the score is 10.
If there are two Xs, then the score is 100.
If there are 3 Xs, then the score is 1000, and the winner is Player X.
For Player O, the score is negative.
Player X tries to maximize the score.
Player O tries to minimize the score.
If the current turn is for Player X, then the score of Player X has more advantage. I gave the advantage rate as 3.

The source code for computing the heuristic score is:

int GetScoreForOneLine(GridEntry[] values)
{
    int countX=0, countO=0;
    foreach (GridEntry v in values)
    {
        if (v == GridEntry.PlayerX)
            countX++;
        else if (v == GridEntry.PlayerO)
            countO++;
    }
    if (countO == 3 || countX == 3)
    {
        GameOver = true;
    }
    //The player who has turn should have more advantage.
    int advantage = 1;
    if (countO == 0)
    {
        if (m_TurnForPlayerX)
            advantage = 3;
        return (int)System.Math.Pow(10, countX)*advantage;
    }
    else if (countX == 0)
    {
        if (!m_TurnForPlayerX)
            advantage = 3;
        return -(int)System.Math.Pow(10, countO) * advantage;
    }
    return 0;
}
void ComputeScore()
{
    int ret = 0;
    int[,] lines = { { 0, 1, 2 }, { 3, 4, 5 }, 
                   { 6, 7, 8 }, { 0, 3, 6 }, 
                   { 1, 4, 7 }, { 2, 5, 8 }, 
                   { 0, 4, 8 }, { 2, 4, 6 } 
                   };
    for (int i = lines.GetLowerBound(0); i <= lines.GetUpperBound(0); i++)
    {
        ret += GetScoreForOneLine(new GridEntry[] 
           { m_Values[lines[i, 0]], 
             m_Values[lines[i, 1]], 
             m_Values[lines[i, 2]] });
    }
    m_Score = ret;
}

Interesting Tic Tac Toe move

Because Player X always makes the first move, it is more difficult for Player O to win. The program prints all the winning moves for Player O. I found these moves for Player O are interesting:

[1]Winner is PlayerO:
OXX    OXX    OXX    OXX    OXX    OXX
--- => O-- => O-- => OO- => OO- => OOO
---    ---    X--    X--    X-X    X-X
[2]Winner is PlayerO:
OX-    OX-    OX-    OX-    OX-    OXO
--X => --X => X-X => XOX => XOX => XOX
---    O--    O--    O--    O-X    O-X
[3]Winner is PlayerO:
X--    X--    X-X    X-X 
XOX => XOX => XOX => XOX 
--O    O-O    O-O    OOO 
[4]Winner is PlayerO:
OX-    OX-    OXX    OXX 
X-O => X-O => X-O => XOO 
X--    X-O    X-O    X-O 
[5]Winner is PlayerO:
XXO    XXO    XXO    XXO 
X-- => X-- => XX- => XX- 
-O-    OO-    OO-    OOO

The program can also print out all the winning moves for Player X. The rules for Player O to defend are:

If Player X makes the first move at the corner, then Player O has to take the center.
If Player X makes the first move at the center, then Player O has take the corners.
If Player X makes the first move at the top edge, then Player O cannot take the side edges or unconnected corners.

Two of the interesting moves for Player X are:

[1]Winner is PlayerX:
X--    X--    X--    X-X    XOX    XOX
--- => --- => O-- => O-- => O-- => OX-
--O    X-O    X-O    X-O    X-O    X-O
[2]Winner is PlayerX:
X-O    X-O    X-O    X-O    X-O    X-O
--- => X-- => X-- => XX- => XXO => XXO
---    ---    O--    O--    O--    O-X

Similar Tic Tac Toe boards

When I printed out the Tic Tac Toe moves, I found that many moves were redundant because they were very similar. I defined a class Transform to find out all the similar Tic Tac Toe boards.

class Transform
{
    const int Size = 3;
    delegate Point TransformFunc(Point p);
    public static Point Rotate90Degree(Point p)
    {
        //012 -> x->y, y->size-x
        return new Point(Size - p.y -1, p.x);
    }
    public static Point MirrorX(Point p)
    {
        //012 -> 210
        return new Point(Size - p.x -1, p.y);
    }
    public static Point MirrorY(Point p)
    {
        return new Point(p.x, Size - p.y -1);
    }
    List<TransformFunc> actions = new List<TransformFunc>();
    public Point ActOn(Point p)
    {
        foreach (TransformFunc f in actions)
        {
            if(f!=null)
                p = f(p);
        }
        return p;
    }
    Transform(TransformFunc op, TransformFunc[] ops)
    {
        if(op!=null)
            actions.Add(op);
        if (ops!=null && ops.Length > 0)
            actions.AddRange(ops);
    }
    public static List<Transform> s_transforms = new List<Transform>();
    static Transform()
    {
        for (int i = 0; i < 4; i++)
        {
            TransformFunc[] ops = 
              Enumerable.Repeat<TransformFunc>(Rotate90Degree, i).ToArray();
            s_transforms.Add(new Transform(null, ops));
            s_transforms.Add(new Transform(MirrorX, ops));
            s_transforms.Add(new Transform(MirrorY, ops));
        }
    }
}

When a board can be transformed into another board, they are considered similar.

Conclusion

In this article, I implemented Tic Tac Toe in two versions of the MiniMax algorithm. One is shorter but more difficult to debug. The longer version takes more time to type in, but saves a lot of trouble when debugging.