Click here to Skip to main content
15,886,199 members
Articles / Desktop Programming / ATL
Article

CImage pixel access performance optimization

Rate me:
Please Sign up or sign in to vote.
4.27/5 (5 votes)
23 Jul 2007CPOL3 min read 57.5K   889   24   9
The article describes a simple performance improving wrapper for the CImage class.

Screenshot - CImage_pixel_access_optimization.jpg

Introduction

As you probably know, especially if you've found this article through Google :), the ATL CImage class' pixel access performance is terrible. Despite the fact that it's a fairly popular problem, I have not found the simplest (certainly not the best :) ) solution to the problem anywhere. Based on the Bitmap usage extension library, I've created a simple wrapper class that provides pixel access by directly accessing the bitmap bits. With a minor refactoring, it could be easily separated to also provide pixel access to standalone DIBs.

The way the class is designed is to allow easy optimization or extension of software projects already using the CImage class, or using CImage in new projects, without worrying about pixel access performance.

The wrapper class public interface and usage

C++
class CImagePixelAccessOptimizer
{
public:
    CImagePixelAccessOptimizer( CImage* _image );
    CImagePixelAccessOptimizer( const CImage* _image );
    ~CImagePixelAccessOptimizer();

    COLORREF GetPixel( int _x, int _y ) const;
    void SetPixel( int _x, int _y, const COLORREF _color );
};

If you need fast per pixel access in your code, all you have to do is create a temporary stack variable of the CImagePixelAccessOptimizer class and then change/add calls to the SetPixel and GetPixel methods so that they use the temporary optimizer object, and not the CImage object directly. An example from my turf is a trivial image rotation:

C++
CImagePixelAccessOptimizer tempImageOpt( pTmpImage );
CImagePixelAccessOptimizer currImageOpt( pCurrentImage );
for( unsigned x=0; x < uOrgWidth; ++x )
{
    for( unsigned y=0; y < uOrgHeight; ++y )
    {
        tempImageOpt.SetPixel( uOrgHeight - y - 1, x, currImageOpt.GetPixel( x, y ) );
    }
}

It's probably not the fastest way to rotate images, but it works, and shows the point quite well.

Some internals

The class encapsulates simple methods found here and there that let you access pixel information directly from the DIB table(s) based on their native format. The fact of using a temporary class object gives the ability to keep the original code as simple as possible, but at the same time, giving you all the needed areas for optimization. Each operation that's constant between the GetPixel and SetPixel calls is performed and remembered in the constructor of the CImagePixelAccessOptimizer class. Calculating the row width of the DIB table, or obtaining the palette table and image dimensions, is done only once.

Thanks to this, the GetPixel and SetPixel methods may be really fast, coming down to just a single switch statement and a quite simple table indirection or two.

C++
inline COLORREF CImagePixelAccessOptimizer::GetPixel( int _x, int _y ) const
{
    ASSERT( PositionOK( _x, _y ) );

    FOR_GET_SET_PIXEL_ASSERT( const COLORREF color = m_image->GetPixel( _x, _y ) );

    const RGBQUAD* rgbResult = NULL;
    RGBQUAD tempRgbResult;
    switch( m_bitCnt )
    {
    case 1:        //Monochrome

        rgbResult = &m_colors[ *(m_bits + m_rowBytes*_y + _x/8) & 
                                    (0x80 >> _x%8) ];
        break;
    case 4:
        rgbResult = &m_colors[ *(m_bits + m_rowBytes*_y + _x/2) & 
                                    ((_x&1) ? 0x0f : 0xf0) ];
        break;
    case 8:
        rgbResult = &m_colors[ *(m_bits + m_rowBytes*_y + _x) ];
        break;
    case 16:
        {
            WORD dummy = *(LPWORD)(m_bits + m_rowBytes*_y + _x*2);

            tempRgbResult.rgbBlue = (BYTE)(0x001F & dummy);
            tempRgbResult.rgbGreen = (BYTE)(0x001F & (dummy >> 5));
            tempRgbResult.rgbRed = (BYTE)(0x001F & dummy >> 10 );
            rgbResult = &tempRgbResult;
        }
        break;
    case 24:
        rgbResult = (LPRGBQUAD)(m_bits + m_rowBytes*_y + _x*3);
        break;
    case 32:
        rgbResult = (LPRGBQUAD)(m_bits + m_rowBytes*_y + _x*4);
        break;
    default:
        //error

        ASSERT( false );
        break;
    }

    const COLORREF rgbResultColorRef = RGB( rgbResult->rgbRed, 
                   rgbResult->rgbGreen, rgbResult->rgbBlue );
    GET_SET_PIXEL_ASSERT( rgbResultColorRef == color );

    return rgbResultColorRef;
}

Debugging

If you find issues with the code where the colors are set badly or in the wrong places, try un-commenting the below:

C++
#define ENABLE_GET_SET_PIXEL_VERIFICATION

It will enable checks in which the optimized results will be compared with the behavior provided by the CImage class itself - please report any issues that you find.

The code used for the checks may be seen in the above example. If ENABLE_GET_SET_PIXEL_VERIFICATION is defined, then GET_SET_PIXEL_ASSERT becomes a "standard" ASSERT ( ;) ) statement, and FOR_GET_SET_PIXEL_ASSERT becomes just the enclosed statement. If ENABLE_GET_SET_PIXEL_VERIFICATION is not defined, then both defines give empty statements. Thanks to this, you can enable additional code and assertions using that code with a single define while keeping the code clean and simple at the same time (no three line #ifdefs).

By default, CImagePixelAccessOptimizer does not use the additional verification, as it would bring us back where we started performance wise :).

Success story ;)

I have optimized out practically all pixel access performance issues from my simple image viewing and bad-pixel detecting program called ImageViewer, using this method - from a major usability issue, the pixel access performance became a no issue in a matter of hours - and now, it will be seconds for you. :)

To be true, it's probably not the best idea to use the built-in CImage class at all, but if you're already there or don't want to install/link some third party libraries into your project, then this simple wrapper located in a single header may be just the thing you need. You get it for free with one exception :) - while running the code from the Bitmap usage extension library, I've found an issue that caused a "memory can't be read" problem - the code copied the whole RGBQUAD structure from the end of the 24bit DIB table - the reserved member of the RGBQUAD structure was outside the memory allocated for the DIB. If you find anything like this or images on which the code does not work correctly, please let me know.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
Poland Poland
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralBug with tif images depth/colors: 1/2 [modified] Pin
Shimshonw20-Jun-08 4:38
Shimshonw20-Jun-08 4:38 
GeneralPerformance Pin
ETA23-Jul-07 19:32
ETA23-Jul-07 19:32 
GeneralRe: Performance Pin
Rafal Struzyk23-Jul-07 21:04
Rafal Struzyk23-Jul-07 21:04 
GeneralRe: Performance Pin
Rafal Struzyk24-Jul-07 6:25
Rafal Struzyk24-Jul-07 6:25 
Btw - I decided to check this - even Visual Studio 2003 Standard edition (a supposedly non optimizing compiler) does this switch through a lookup table like this:
switch( m_bitCnt )
00412FE0 mov eax,dword ptr [this]
00412FE3 mov ecx,dword ptr [eax+10h]
00412FE6 mov dword ptr [ebp-18h],ecx
00412FE9 mov edx,dword ptr [ebp-18h]
00412FEC sub edx,1
00412FEF mov dword ptr [ebp-18h],edx
00412FF2 cmp dword ptr [ebp-18h],1Fh
00412FF6 ja $L187955+1Bh (413149h)
00412FFC mov eax,dword ptr [ebp-18h]
00412FFF movzx ecx,byte ptr [eax+41318Fh]
00413006 jmp dword ptr [ecx*4+413173h]
{
case 1: //Monochrome
rgbResult = &m_colors[ *(m_bits + m_rowBytes*_y + _x/8) & (0x80 >> _x%8) ];
0041300D mov edx,dword ptr [this]
00413010 mov eax,dword ptr [edx+14h]
...

I also made an experiment and removed the switch to work on a single "test" image and done the virtual calls:
No switch at all:
Took 469, avg: 469.000000
Took 594, avg: 531.500000
Took 422, avg: 495.000000
Took 594, avg: 519.750000
Took 422, avg: 500.200000
Took 578, avg: 513.166667
Took 422, avg: 500.142857
Took 578, avg: 509.875000
Took 437, avg: 501.777778
Took 578, avg: 509.400000

Standard switch statement:
Took 547, avg: 547.000000
Took 704, avg: 625.500000
Took 484, avg: 578.333333
Took 687, avg: 605.500000
Took 500, avg: 584.400000
Took 687, avg: 601.500000
Took 500, avg: 587.000000
Took 687, avg: 599.500000
Took 500, avg: 588.444444
Took 687, avg: 598.300000

Virtual function calls:
Took 500, avg: 500.000000
Took 703, avg: 601.500000
Took 453, avg: 552.000000
Took 641, avg: 574.250000
Took 453, avg: 550.000000
Took 640, avg: 565.000000
Took 469, avg: 551.285714
Took 641, avg: 562.500000
Took 469, avg: 552.111111
Took 640, avg: 560.900000

Manual inline no switch:
Took 329, avg: 329.000000
Took 453, avg: 391.000000
Took 312, avg: 364.666667
Took 453, avg: 386.750000
Took 297, avg: 368.800000
Took 438, avg: 380.333333
Took 312, avg: 370.571429
Took 437, avg: 378.875000
Took 313, avg: 371.555556
Took 437, avg: 378.100000

Manual inline, switch:
Took 437, avg: 437.000000
Took 609, avg: 523.000000
Took 406, avg: 484.000000
Took 625, avg: 519.250000
Took 407, avg: 496.800000
Took 609, avg: 515.500000
Took 391, avg: 497.714286
Took 593, avg: 509.625000
Took 406, avg: 498.111111
Took 610, avg: 509.300000

So there is a bit difference between switch and virtual function call but it's not really noticeable ~5% - and that's with Visual Studio Standard that did not inline GetPixel and SetPixel methods or optimize them too much - we see that most gain is taken when one inlines the call to GetPixel and SetPixel (another hint against virtual function calls) - in that case the switch statement implementation is 10% faster than virtual function calls.

Probably the full version of visual studio or gcc would probably do the switch outside the two loops that iterated over the images in my test (the rotation code from the above article) and the performance would be comparable with manual inline no switch - beating virtual function calls nearly 2x performance wise.

best regards
Rafal
GeneralRe: Performance Pin
ETA26-Jul-07 8:11
ETA26-Jul-07 8:11 
GeneralRe: Performance Pin
Rafal Struzyk26-Jul-07 18:14
Rafal Struzyk26-Jul-07 18:14 
GeneralBug Pin
Marek Konopka23-Jul-07 5:30
Marek Konopka23-Jul-07 5:30 
GeneralRe: Bug [modified] Pin
Rafal Struzyk23-Jul-07 8:58
Rafal Struzyk23-Jul-07 8:58 
GeneralBrilliant Pin
boreddead23-Jul-07 3:32
boreddead23-Jul-07 3:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.