Click here to Skip to main content
15,886,519 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Why are my MMX codes running slower than the c++ codes(in green)? result is the same. only speed difference
C#
void tom::add(void* btr)
{

         __declspec(align(8))short* b =(short*)btr;

         int j;


/*
for(j = 0; j < 4; j++)
    {
    /// 1st stage add.
    int s0 = (int)(b[j]     + b[j+3]);
    int s3 = (int)(b[j]     - b[j+3]);
    int s1 = (int)(b[j+1] + b[j+2]);
    int s2 = (int)(b[j+1] - b[j+2]);

    /// 2nd stage add.

    b[j]        = (short)(s0 + s1);
    b[j+8]  = (short)(s0 - s1);
    b[j+4]  = (short)(s2 + (s3 << 1));
    b[j+12] = (short)(s3 - (s2 << 1));
    }//end for j...

   */
    __m64*b1 = (__m64*)b;
        j=0;
    __m64 f0 = _mm_set_pi16(b[j+12],b[j+8],b[j+4],b[j]);
   __m64 f1 = _mm_set_pi16(b[j+13],b[j+9],b[j+5],b[j+1]);
    __m64 f2 = _mm_set_pi16(b[j+14],b[j+10],b[j+6],b[j+2]);
    __m64 f3 = _mm_set_pi16(b[j+15],b[j+11],b[j+7],b[j+3]);
    for(j = 0; j < 4; j+=4)
       {

         // stage one add
       __m64 s0 =_mm_add_pi16(f0,f3);
    __m64 s3 =_mm_sub_pi16(f0,f3);
    __m64 s1 =_mm_add_pi16(f1,f2);
     __m64 s2 =_mm_sub_pi16(f1,f2);
         // stage two add
    *(&b1[j]) =_mm_add_pi16(s0,s1);
    *(&b1[j+2]) =_mm_sub_pi16(s0,s1);
    *(&b1[j+1]) =_mm_add_pi16(s2,_mm_slli_pi16(s3, 1));  
    *(&b1[j+3]) =_mm_sub_pi16(s3,_mm_slli_pi16(s2, 1));
    }
      _mm_empty();
}
Posted

1 solution

In your MMX version you have a loop declared as for(j = 0; j < 4; j += 4); this is not needed, it executes just once, then you could remove it and assume j=0.
Are you using some compiler optimization and/or enabled the SSE instructions? Generally speaking, optimizations made by the compiler tends to be too much better than the code that you can manually write using assembly.
 
Share this answer
 
Comments
SMART LUBOBYA 28-Sep-10 11:08am    
i am using visio studio 2008 compiler.sse is enable.
Sauro Viti 28-Sep-10 11:14am    
With SSE instruction set enabled, is possible that the compiler itself unroll your non-MMX version using SSE... Try to disable the SSE instruction set ;-)
SMART LUBOBYA 29-Sep-10 5:09am    
Reason for my vote of 5
Automatic vote of 5 for accepting answer.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900