## Introduction

I was interested in Needleman Wunsch Algorithm and I was searching for a good example in C#, but I had no luck. So I figured this might help someone.

## Background

I wrote this based on this article.

There are three steps:

- Initialization step
- Matrix Fill step
- Traceback step

Let's say we have:

- sequence #1 GAATTCAGTTA and
- sequence #2 GGATCGA
- and scoring scheme is assumed
- S
_{i,j} = 2 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise
- S
_{i,j} = -1 (mismatch score)
- w = -2 (gap penalty)

### Initialization Step

The first row and the first column of the matrix can be initially filled with `0`

as below:

and the code would be as follows:

int refSeqCnt = refSeq.Length + 1;
int alineSeqCnt = alignSeq.Length + 1;
int[,] scoringMatrix = new int[alineSeqCnt, refSeqCnt];
for (int i = 0; i < alineSeqCnt; i++)
{
scoringMatrix[i, 0] = 0;
}
for (int j = 0; j < refSeqCnt; j++)
{
scoringMatrix[0, j] = 0;
}

### Matrix Fill Step

For each position, M_{i,j} is defined to be the maximum score at position i,j.

M_{i,j = Maximum[}

_{Mi-1,j-1 + Si,j (Match/mismatch in the diagonal),}

M_{i,j-1} + w (gap in sequence #1)

M_{i-1,j} + w (gap in sequence #2)

for (int i = 1; i < alineSeqCnt; i++)
{
for (int j = 1; j < refSeqCnt; j++)
{
int scroeDiag = 0;
if (refSeq.Substring(j - 1, 1) == alignSeq.Substring(i - 1, 1))
scroeDiag = scoringMatrix[i - 1, j - 1] + 2;
else
scroeDiag = scoringMatrix[i - 1, j - 1] + -1;
int scroeLeft = scoringMatrix[i, j - 1] - 2;
int scroeUp = scoringMatrix[i - 1, j] - 2;
int maxScore = Math.Max(Math.Max(scroeDiag, scroeLeft), scroeUp);
scoringMatrix[i, j] = maxScore;
}
}

### Traceback Step

After the matrix fill step, the maximum global alignment score for the two sequences is `3`

. The traceback will determine the actual alignment(s) that results in the maximum score.

This gives alignment of:

char[] alineSeqArray = alignSeq.ToCharArray();
char[] refSeqArray = refSeq.ToCharArray();
string AlignmentA = string.Empty;
string AlignmentB = string.Empty;
int m = alineSeqCnt - 1;
int n = refSeqCnt - 1;
while (m > 0 && n > 0)
{
int scroeDiag = 0;
if (alineSeqArray[m - 1] == refSeqArray[n - 1])
scroeDiag = 2;
else
scroeDiag = -1;
if (m > 0 && n > 0 && scoringMatrix[m, n] == scoringMatrix[m - 1, n - 1] + scroeDiag)
{
AlignmentA = refSeqArray[n - 1] + AlignmentA;
AlignmentB = alineSeqArray[m - 1] + AlignmentB;
m = m - 1;
n = n - 1;
}
else if (n > 0 && scoringMatrix[m, n] == scoringMatrix[m, n - 1] - 2)
{
AlignmentA = refSeqArray[n - 1] + AlignmentA;
AlignmentB = "-" + AlignmentB;
n = n - 1;
}
else {
AlignmentA = "-" + AlignmentA;
AlignmentB = alineSeqArray[m - 1] + AlignmentB;
m = m - 1;
}
}

I tested with 14739 bp sequence but it was a bit slow. If you have a better and faster solution, please let me know. I just wanted to share this with someone who might be interested in this algorithm in C#. You are welcome to give me any advice or suggestions.

Now I would like to research the multiple sequence alignment.