Click here to Skip to main content
15,881,803 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
the mullo instruction does seem to give good result;expected 6,000,000 results.what is the proper use of this instruction?

C#
#include "stdafx.h"
#include <mmintrin.h>

int _tmain(int argc, _TCHAR* argv[])
{
    short B[4]={2000,2000,2000,2000};
    short C[4]={3000,3000,3000,3000};
__m64*b =(__m64*)B;
__m64*c =(__m64*)C;
    __m64 r0,r1;
    r0 =_mm_mullo_pi16(*b,*c);
    r1 =_mm_add_pi16(*b,*c);

    return 0;
}
Posted
Updated 3-Oct-10 5:14am
v2

1 solution

Are you by any chance getting 36224 as a result? _mm_mullo_pi16 only returns the lower 16 bits of the result. Presumably you should use a _mm_mulhi_pi16 as well to get the whole 32-bit result.

See MSDN[^], for more.

Charles Keepax
 
Share this answer
 
Comments
SMART LUBOBYA 5-Oct-10 8:07am    
yes, with 'mullo' i get 36224 and with 'mulhi', i get 91. more confusing. 2000x3000=6,000,000 for each array pair multiplication. kindly, help.
Charles Keepax 5-Oct-10 9:19am    
That would be about right, you need to stick them together to get your answer. So 36224 in hex is 0x8D80 and 91 in hex is 0x5B. Shift the hi result by 16 bits then AND them together and you will get 0x5B8D80, which is hex for 6000000.
SMART LUBOBYA 7-Oct-10 4:00am    
i tried to follow your suggested steps above. i still need help. here is what i tried:
#include #Quote#stdafx.h#Quote#
#include <mmintrin.h>

int _tmain(int argc, _TCHAR* argv[])
{
short B[4]={2000,2000,2000,2000};
short C[4]={3000,3000,3000,3000};
__m64*b =(__m64*)B;
__m64*c =(__m64*)C;
__m64 r0,r1,r2,r3,r4;
r0 =_mm_mullo_pi16(*b,*c);
r1 =_mm_mulhi_pi16(*b,*c);
r2 =_mm_slli_pi16(r1,15);
r3 =_mm_and_si64(r2,r0);
r4 =_mm_add_pi16(*b,*c);
_mm_empty();
return 0;

}
SMART LUBOBYA 7-Oct-10 4:01am    
which shift should i use?
Charles Keepax 7-Oct-10 4:27am    
Your variables r0-r4 are effectively arrays of 4 16-bit numbers; a 16-bit number can only be 0 to 65536 or -32768 to 32767 since you are using signed variables. 6,000,000 is too large to store in a 16-bit number, thus you will need to extract it into a 32-bit number. The following code (assuming int is 32-bit on your system) does this with the first number however it is almost certainly a poor way to use the MMX extensions, but the best way to use them would depend rather heavily on context.

unsigned short B[4]={2000,2000,2000,2000};
unsigned short C[4]={3000,3000,3000,3000};
unsigned short D[4]={0,0,0,0};
unsigned short E[4]={0,0,0,0};
__m64*b =(__m64*)B;
__m64*c =(__m64*)C;
__m64*d =(__m64*)D;
__m64*e =(__m64*)E;

*d =_mm_mullo_pi16(*b,*c);
*e =_mm_mulhi_pi16(*b,*c);

int temp = E[0];
temp = (temp<<16) | D[0];

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900