![]() |
General Programming »
Algorithms & Recipes »
Math
Intermediate
Introduction to MMX ProgrammingBy Alex FrAn article shows an example of image processing using the Intel MMX™ technology |
VC7.1Win2K, WinXP, Win2003, MFC, VS.NET2003, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Visual Studio .NET 2003 supports a set of MMX Intrinsics which allow the use of the MMX instructions directly from C++ code, without writing the Assembly instructions. Reading the MSDN MMX topics [2] together with Intel Software manuals [1] gives the opportunity to understand the basics of MMX programming.
MMX technology implememts the SIMD (single-instruction, multiple-data) execution model. Consider the following programming task: adding some value to each element in a BYTE array. The algorithm for this task may be written by such way:
for each b in array
b = b + n
With more details: for each b in array
{
load b to the register
add n to the register
read the result from the register to memory
}
Processors with the Intel MMX support have eight 64-bit registers, each of which may contain 8 bytes, or 4 words, or 2 double-words. MMX is a set of instructions which allow to load a numeric data (bytes, words, double-words) into the MMX registers, make arithmetic and logical operations with them and read the results back to memory. Using the MMX technology, algorithm may be written by such way: for each 8 members in array
{
load 8 members to the MMX register
add n to each byte in one operation
write the result from the register back into memory
}
A C++ programmer writing a program using MMX Intrinsics doesn't work with the MMX registers directly. He has a 64-byte __m64 type and set of functions to perform an arithmetic and logical operations. The C++ compiler takes care of registers and code optimizations.
Visual C++ MMXSwarm sample [4] shows the use of the MMX technology in image processing. It contains a set of wrapper classes simplifying work with MMX Intrinsics, and shows how to make image processing operations on various types of images (monochrome, RGB 24 bits, RGB 32 bits etc.). This article is a simple introduction to C++ MMX programming. Everyone who is interesting in this technology is strongly encouraged to read the MMXSwarm sample.
emmintrin.h file: #include <emmintrin.h>Since MMX instructions are compiler intrinsics and not functions, there are no lib-files.
_m64 are automatically aligned on 8-byte boundaries.
cpuid Assembly command. See details in this sample and in the Intel Software manuals [1].
BYTE variable which has value 255. In wraparound mode result will be 0 (carry bit is ignored). In saturation mode result will be 255. The same effect is in the low range, for example, 1 - 2 = 0 (for BYTE type, in saturation mode). Each MMX arithmetic instruction has two sub-types: saturated and wraparound. The demo project from this article uses only saturated instructions.
MMX8 is SDI application which makes simple processing with a monochrome 8 bits per pixel image. Source image or result of it's processing is shown in the window. New ATL class CImage is used to extract an image from resources and to show it in the window. Two operations are done with the image: inversion and changing of brightness. Each operation may be done by one of the following ways:
C++ image inversion function:
void CImg8Operations::InvertImageCPlusPlus( BYTE* pSource, BYTE* pDest, int nNumberOfPixels) { for ( int i = 0; i < nNumberOfPixels; i++ ) { *pDest++ = 255 - *pSource++; } }The best way to find the required MMX instruction is reading the Intel Software manuals [1]. The name of the required Assembly MMX instruction may be found in the short MMX technology overview (Volume 1, Chapter 8). Detailed instruction definition is in the volume 2. This definition contains also the name of appropriate C++ compiler intrinsic. Some C++ MMX intrinsic are composite (translated to more than one Assembly instructions). They should be found directly in the MSDN documentation [2].
The summary of all MMX instructions used in the MMX8 sample is shown in the following table:
| Required Function | Assembly Instruction | MMX Intrinsic |
| Empty MMX state (prevents collisions with floating-point operations) | emms | _mm_empty |
| Unsigned subtraction with saturation of each byte in two 64-bits operands | psubusb | _mm_subs_pu8 |
| Unsigned addition with saturation of each byte in two 64-bits operands | paddusb | _mm_adds_pu8 |
Image inversion function in C++ with MMX Intrinsics:
void CImg8Operations::InvertImageC_MMX( BYTE* pSource, BYTE* pDest, int nNumberOfPixels) { __int64 i = 0; i = ~i; // 0xffffffffffffffff // 8 pixels are processed in one loop int nLoop = nNumberOfPixels/8; __m64* pIn = (__m64*) pSource; // input pointer __m64* pOut = (__m64*) pDest; // output pointer __m64 tmp; // work variable _mm_empty(); // emms __m64 n1 = Get_m64(i); for ( int i = 0; i < nLoop; i++ ) { tmp = _mm_subs_pu8 (n1 , *pIn); // Unsigned subtraction with // saturation. // tmp = n1 - *pIn for each byte *pOut = tmp; pIn++; // next 8 pixels pOut++; } _mm_empty(); // emms } __m64 CImg8Operations::Get_m64(__int64 n) { union __m64__m64 { __m64 m; __int64 i; } mi; mi.i = n; return mi.m; }Since the functions are executed in a very short time, I call them a number of times to see the significant difference. Calculation times on my computer:
Changing of brighntess is done by the most simple way - just adding or substracting some value to/from each pixel in the image. Conversion functions are slightly more complicated because we need two different branches for a positive and negative changes.
C++ function for changing an image brightness:
void CImg8Operations::ChangeBrightnessCPlusPlus( BYTE* pSource, BYTE* pDest, int nNumberOfPixels, int nChange) { if ( nChange > 255 ) nChange = 255; else if ( nChange < -255 ) nChange = -255; BYTE b = (BYTE) abs(nChange); int i, n; if ( nChange > 0 ) { for ( i = 0; i < nNumberOfPixels; i++ ) { n = (int)(*pSource++ + b); if ( n > 255 ) n = 255; *pDest++ = (BYTE) n; } } else { for ( i = 0; i < nNumberOfPixels; i++ ) { n = (int)(*pSource++ - b); if ( n < 0 ) n = 0; *pDest++ = (BYTE) n; } } }Changing an image brightness using C++ with MMX Intrinsics:
void CImg8Operations::ChangeBrightnessC_MMX( BYTE* pSource, BYTE* pDest, int nNumberOfPixels, int nChange) { if ( nChange > 255 ) nChange = 255; else if ( nChange < -255 ) nChange = -255; BYTE b = (BYTE) abs(nChange); // make 64 bits value with b in each byte __int64 c = b; for ( int i = 1; i <= 7; i++ ) { c = c << 8; c |= b; } // 8 pixels are processed in one loop int nNumberOfLoops = nNumberOfPixels / 8; __m64* pIn = (__m64*) pSource; // input pointer __m64* pOut = (__m64*) pDest; // output pointer __m64 tmp; // work variable _mm_empty(); // emms __m64 nChange64 = Get_m64(c); if ( nChange > 0 ) { for ( i = 0; i < nNumberOfLoops; i++ ) { tmp = _mm_adds_pu8(*pIn, nChange64); // Unsigned addition // with saturation. // tmp = *pIn + nChange64 // for each byte *pOut = tmp; pIn++; // next 8 pixels pOut++; } } else { for ( i = 0; i < nNumberOfLoops; i++ ) { tmp = _mm_subs_pu8(*pIn, nChange64); // Unsigned subtraction // with saturation. // tmp = *pIn - nChange64 // for each byte *pOut = tmp; pIn++; // next 8 pixels pOut++; } } _mm_empty(); // emms }Notice that the sign of the
nChange parameter is checked once outside of loop and not thousands of times inside of loop. Calculation times on my computer:
MMX32 project makes an operations with 32 bits per pixel RGB image. Operations are inversion and changing of image color balance (multiplication of each color to some value).
MMX multiplication is done by more complicated way that addition or subtraction, because result of multiplication is not of the same size as operands. For example, if multiplication operands have a BYTE type, result should have a WORD type. This requires additional conversions, and difference between C++ and MMX execution times is minimal (5-10%).
Changing an image color balance using C++ with MMX Intrinsics:
void CImg32Operations::ColorsC_MMX( BYTE* pSource, BYTE* pDest, int nNumberOfPixels, float fRedCoefficient, float fGreenCoefficient, float fBlueCoefficient) { int nRed = (int)(fRedCoefficient * 256.0f); int nGreen = (int)(fGreenCoefficient * 256.0f); int nBlue = (int)(fBlueCoefficient * 256.0f); // make multiplication coefficient __int64 c = 0; c = nRed; c = c << 16; c |= nGreen; c = c << 16; c |= nBlue; __m64 nNull = _m_from_int(0); // null __m64 tmp = _m_from_int(0); // work variable _mm_empty(); // emms __m64 nCoeff = Get_m64(c); DWORD* pIn = (DWORD*) pSource; // input pointer DWORD* pOut = (DWORD*) pDest; // output pointer for ( int i = 0; i < nNumberOfPixels; i++ ) { tmp = _m_from_int(*pIn); // tmp = *pIn (write to low // 32 bits) tmp = _mm_unpacklo_pi8(tmp, nNull ); // convert low 4 bytes of // tmp to 4 words // high byte for each word // is taken from nNull tmp = _mm_mullo_pi16 (tmp , nCoeff); // multiply each word in // tmp to word in nCoeff // get low word of each // result tmp = _mm_srli_pi16 (tmp , 8); // shift each word in tmp // right to 8 bits (/256) tmp = _mm_packs_pu16 (tmp, nNull); // Pack with unsigned // saturation. // Convert 4 words from tmp // to 4 bytes and write them // to low 32 bits of tmp. // Convert 4 words from nNull // to 4 bytes and write them // to high 32 bits of tmp. *pOut = _m_to_int(tmp); // *pOut = tmp (low 32 bits) pIn++; pOut++; } _mm_empty(); // emms }See additional details in the demo project source code.
MMXSwarm C++ sample works both with MMX and integer SSE2 instructions.
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+PgUp/PgDown to switch pages.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 8 Jul 2003 Editor: Rob Manderson |
Copyright 2003 by Alex Fr Everything else Copyright © CodeProject, 1999-2010 Web10 | Advertise on the Code Project |