Click here to Skip to main content
Click here to Skip to main content

Tagged as

Needleman Wunsch Algorithm in C#

, 23 Aug 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
Sequence Alignment using Needleman Wunsch algorithm in C#

Introduction

I was interested in Needleman Wunsch Algorithm and I was searching for a good example in C#, but I had no luck. So I figured this might help someone.

Background

I wrote this based on this article.

There are three steps:

  1. Initialization step
  2. Matrix Fill step
  3. Traceback step

Let's say we have:

  • sequence #1 GAATTCAGTTA and
  • sequence #2 GGATCGA
  • and scoring scheme is assumed
    • Si,j = 2 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise
    • Si,j = -1 (mismatch score)
    • w = -2 (gap penalty)

Initialization Step

The first row and the first column of the matrix can be initially filled with 0 as below:

and the code would be as follows:

int refSeqCnt = refSeq.Length + 1;
int alineSeqCnt = alignSeq.Length + 1;

int[,] scoringMatrix = new int[alineSeqCnt, refSeqCnt];

//Initialization Step - filled with 0 for the first row and the first column of matrix
for (int i = 0; i < alineSeqCnt; i++)
{
    scoringMatrix[i, 0] = 0;
}

for (int j = 0; j < refSeqCnt; j++)
{
    scoringMatrix[0, j] = 0;
}    

Matrix Fill Step

For each position, Mi,j is defined to be the maximum score at position i,j.

Mi,j = Maximum[

Mi-1,j-1 + Si,j (Match/mismatch in the diagonal),

Mi,j-1 + w (gap in sequence #1)

Mi-1,j + w (gap in sequence #2)

//Matrix Fill Step
for (int i = 1; i < alineSeqCnt; i++)
{
    for (int j = 1; j < refSeqCnt; j++)
    {
        int scroeDiag = 0;
        if (refSeq.Substring(j - 1, 1) == alignSeq.Substring(i - 1, 1))
            scroeDiag = scoringMatrix[i - 1, j - 1] + 2;
        else
            scroeDiag = scoringMatrix[i - 1, j - 1] + -1;

        int scroeLeft = scoringMatrix[i, j - 1] - 2;
        int scroeUp = scoringMatrix[i - 1, j] - 2;

        int maxScore = Math.Max(Math.Max(scroeDiag, scroeLeft), scroeUp);

        scoringMatrix[i, j] = maxScore;
    }
} 

Traceback Step

After the matrix fill step, the maximum global alignment score for the two sequences is 3. The traceback will determine the actual alignment(s) that results in the maximum score.

This gives alignment of:

  • GAATTCAGTTA
  • GGA-TC-G--A
//Traceback Step
char[] alineSeqArray = alignSeq.ToCharArray();
char[] refSeqArray = refSeq.ToCharArray();

string AlignmentA = string.Empty;
string AlignmentB = string.Empty;
int m = alineSeqCnt - 1;
int n = refSeqCnt - 1;
while (m > 0 && n > 0)
{
    int scroeDiag = 0;

    //Remembering that the scoring scheme is +2 for a match, -1 for a mismatch and -2 for a gap
    if (alineSeqArray[m - 1] == refSeqArray[n - 1])
        scroeDiag = 2;
    else
        scroeDiag = -1;

    if (m > 0 && n > 0 && scoringMatrix[m, n] == scoringMatrix[m - 1, n - 1] + scroeDiag)
    {
        AlignmentA = refSeqArray[n - 1] + AlignmentA;
        AlignmentB = alineSeqArray[m - 1] + AlignmentB;
        m = m - 1;
        n = n - 1;
    }
    else if (n > 0 && scoringMatrix[m, n] == scoringMatrix[m, n - 1] - 2)
    {
        AlignmentA = refSeqArray[n - 1] + AlignmentA;
        AlignmentB = "-" + AlignmentB;
        n = n - 1;
    }
    else //if (m > 0 && scoringMatrix[m, n] == scoringMatrix[m - 1, n] + -2)
    {
        AlignmentA = "-" + AlignmentA;
        AlignmentB = alineSeqArray[m - 1] + AlignmentB;
        m = m - 1;
    }
} 

I tested with 14739 bp sequence but it was a bit slow. If you have a better and faster solution, please let me know. I just wanted to share this with someone who might be interested in this algorithm in C#. You are welcome to give me any advice or suggestions.

Now I would like to research the multiple sequence alignment.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

BlackMirrh
Software Developer (Senior)
United States United States
ASP.NET MVC 3/4, C#, jQuery, HTML5, CSS3, javascript, JSP, Java, MS-SQL, Oracle, DBA, Silverlight, C++

Comments and Discussions

 
Questionsimilarity score PinmemberMember 100298287-Mar-15 11:21 
Questionquestion Pinmembermona mohamed9317-Nov-14 9:21 
QuestionAffine Gap PinmemberMember 112059244-Nov-14 5:14 
QuestionTruncated Sequences PinmemberMember 1117836826-Oct-14 1:55 
AnswerRe: Truncated Sequences PinmemberBlackMirrh29-Oct-14 13:44 
QuestionFantastic! PinmemberNorman Paterson14-Jan-14 16:27 
AnswerRe: Fantastic! [modified] PinmemberBlackMirrh15-Jan-14 5:56 
QuestionUploading a full code would be helpful PingroupGrasshopper.iics16-Aug-13 20:52 
AnswerRe: Uploading a full code would be helpful PinmemberBlackMirrh19-Aug-13 4:23 
GeneralRe: Uploading a full code would be helpful PingroupGrasshopper.iics19-Aug-13 10:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150327.1 | Last Updated 23 Aug 2013
Article Copyright 2013 by BlackMirrh
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid