5,666,979 members and growing! (17,153 online)
Email Password   helpLost your password?
General Programming » Programming Tips » General     Intermediate

Matrix/Vector Multiplication Optimization

By aurelien400

This article describes a way to make matrix vector multiplication faster.
VC6, C++Windows, WinXPVS6, Visual Studio, Dev

Posted: 14 Sep 2005
Updated: 14 Sep 2005
Views: 41,224
Bookmarked: 13 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
6 votes for this Article.
Popularity: 2.56 Rating: 3.29 out of 5
1 vote, 16.7%
1
0 votes, 0.0%
2
2 votes, 33.3%
3
2 votes, 33.3%
4
1 vote, 16.7%
5

Sample Image

Introduction

The purpose of this article is to show how to speed up some matrix operations. This improvement is obtained by using assembly language with SSE (Streaming SIMD Extension) technology.

Background

In this article, we are going to see how to improve the speed of the mathematical operation: Matrix Vector multiplication. I have chosen this operation because it is often used in programming (image processing, 2D, 3D etc.).

Since the Pentium III Processor, Intel encloses the SSE technology. This technology was included in order to improve image, video, audio processing. The SSE technology uses SIMD (Single Instruction Multiple Data) instructions. For example, this allows to perform four multiplications (on single precision float) with only one instruction. By using these characteristics, I tried to speed up the matrix/vector multiplication.

Requirement

In order to use the SSE instruction set, you need a computer with a Pentium III processor or newer. Next, Visual C++ can't compile the SSE instructions without the Processor Pack. (Instructions to install it in the Zip file.)

Using the code

The program provided by the link on the top performs a matrix/vector multiplication. It displays the time spent in the C++ function and the time spent in the assembly function. It also displays the matrix and the two vectors (multiplication and result).

You can find two ways to proceed this operation (one in C++ and another in assembler). The code shows you the time (in CPU clock cycles) spent in each function.

First in the C++ function, the code is optimized when you compile in Release mode. So the matching must be done in the Release mode.

Here is the main of the project:

//

// The main contain the call to the functions,

// declarations, and time measurement


int main(int argc, char* argv[])
{
    //Enter the size of the matrix


    int size;

    int i;

    printf("Enter the size of the matrix:\n");
    scanf("%d",&size);
    
    // Allocate memory

    float* matrix=(float*) malloc (size*size*sizeof(float));
    float* vector=(float*) malloc (size*sizeof(float));
    float* result=(float*) malloc (size*sizeof(float));
    float* matrix1=(float*) malloc (size*size*sizeof(float));
    for(i=0; i<size*size; i++)
        matrix1[i]=(float)i;

    // Writting values in the matrix and vector

    MatrixVectorWritting(matrix, vector, size);

     // Benchmark the two fonctions

     __int64 t1=GetTime();
     for(i=0; i<100; i++)
         result=MatrixVector_C(matrix,vector, size);
     __int64 t2=GetTime();
     __int64 time_C=t2-t1;
     printf("Time spend en C++ fonction: %d clock cycles.\n", 
                                                     time_C);

     __int64 t3=GetTime();
     for(i=0; i<100; i++)
         result=MatrixVector_SSE(matrix1, vector, size);
     __int64 t4=GetTime();
     __int64 time_SSE=t4-t3;
     printf("Time spend en Asm SSE fonction:" 
            " %d clock cycles.\n",time_SSE);       

     // Display the matrix and the two vectors

     MatrixVectorDisplay(matrix, vector, result, size);

     // Display the time improvement in percent


     TimeImprove(time_C,time_SSE);  
     return 0;
}

We can see in the main function that we make a "for" loop (100 times) on each function we compare. At the beginning of each loop we read the CPU clock. And at the end of each loop we read again the CPU timer clock. Then we take the difference between the two times read. And we obtain the number of clock cycles spent in each function (independent of the CPU used for the test).

Here is the code used to read the CPU timer called RDTSC:

/// Read the computer's timer RDTSC

__int64 GetTime()
{
    __int64 clock;
    __asm
    {
        rdtsc                        // Resad the RDTSC Timer

        mov    dword ptr[clock], eax // Store the value in EAX and EDX Registers

        mov    dword ptr[clock+4], edx
    }
    return clock;
}

Points of Interest

As you can see with the program, there are some matrix dimensions where the assembly function is not very efficient. This happens when the size of the matrix is lower than 3x3. For very low sizes of the matrix the code is not very optimized. But it's possible to make a specific function for the matrix/vector multiplication of 3x3 or 2x2 matrixes. You can find a white paper on the Intel web site.

History

Some sections of the code can be improved. So if I find anything else I'll update the code. If you have some ideas, you can propose.

Links

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

aurelien400



Location: France France

Other popular Programming Tips articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 4 of 4 (Total in Forum: 4) (Refresh)FirstPrevNext
GeneralAbout using SSE in managed C++membermalei200323:09 28 May '06  
Generaltime measurementmembermanoelaudaz8:28 11 May '06  
Generalnice article but...memberSaurabh.Garg1:54 15 Sep '05  
GeneralWhat about BLAS?memberNeWi13:40 14 Sep '05  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 14 Sep 2005
Editor: Smitha Vijayan
Copyright 2005 by aurelien400
Everything else Copyright © CodeProject, 1999-2008
Web19 | Advertise on the Code Project