Click here to Skip to main content
Click here to Skip to main content

Fast Dyadic Image Scaling with Haar Transform

By , 18 Oct 2007
 

Introduction

This is the fast dyadic image down sampling class based on Haar transform. It extends BaseFWT2D class from my other article 2D Fast Wavelet Transform Library for Image Processing for this specific purpose. It uses MMX optimization and is applicable in the image processing field where you perform dyadic down sampling: 2, 4, 8, 16, 32 ... pow(2, N) times. I use that code as a preprocessing in the face detection process.

Background

You need to be familiar with Haar transform.

Using the Code

I've arranged console project allocating RGB array for 640x480 image and implementing several runs of down sampling to gather statistics and output average time for it. I used the precision time counter - I remember I downloaded it some long time ago from The Code Project. On my 2.2GHz TravelMate under licensed Vista it runs 5-6ms for down sampling this image to 80x60, eight times smaller.

The classes in the project are:

  • vec1D //1D vector wrapper
  • vec2D //2D vector wrapper
  • BaseFWT2D //abstract base class for 2D FWT
  • Haar : public BaseFWT2D //Haar based down sampling
  • ImageResize //provides RGB data down sampling

You can learn about vec1D and BaseFWT2D from my 2D Fast Wavelet Transform Library for Image Processing article and about vec2D from my other article 2D Vector Class Wrapper SSE Optimized for Math Operations.

The ImageResize class contains three objects of class Haar for red, green and blue channels down sampling. First, you need to initialize the ImageResize object to specific width, height and down sampling ratio:

  • void init(unsigned int w, unsigned int h, float zoom = 0.125f);

The zoom is the image down sampling factor, with resulting image down sampled by 1/zoom times. The default one (0.125f) provides 8 times down sampled image. You can down sample the image only with zoom equal to 1/2, 1/4, 1/8, ... 1/pow(2,N).

Then you can proceed with down sampling incoming images with either of the overloaded functions:

  • int resize(const unsigned char* pBGR);
  • int resize(const unsigned char* pR, const unsigned char* pG, 
                const unsigned char* pB) const;

The first one takes RGB stream with the first byte in the triplet for blue channel and the last one for red. The second takes the RGB channels in separate buffers.

//your bitmap data goes in that fashion
//unsigned char* pBGR = new unsigned char[width*height*3];

unsigned int width = 640;
unsigned int height = 480;
float zoom = 0.25;

ImageResize resize;
resize.init(width, height, zoom);

//keep resizing incoming data after initialization.
resize.resize(pBGR);

To access down sampled image, the following functions are defined:

  • char** getr() const;
  • char** getg() const;
  • char** getb() const;

Note they provide 2D char pointers to the data in char range -128 ... 127.

//print out resized red channel
char** pr = resize.getr();
for(unsigned int y = 0; y < height * zoom; y++) {
        for(unsigned int x = 0; x < width * zoom; x++)
                wprintf(L" %d", (pr[y][x] + 128));
        wprintf(L"\n");
}

You can also access down sampled gray version of the RGB bitmap after resize() call with:

  • inline const vec2D* gety() const;

It returns the pointer of vec2D type to it. I've written rgb2y(int r, int g, int b) function to convert a single RGB triplet to gray pixel with SSE optimization, however I use simple floating point arithmetic currently in that version of class and turn on the compiler's SSE optimization. It actually runs slightly faster than my SSE optimized function (have to look at that a moment later).

The Haar extension to the BaseFWT2D is pretty simple. I've provided implementations for virtual functions BaseFWT2D::transrows() and BaseFWT2D::transcols() (I have not written it for BaseFWT2D::synthrows() and BaseFWT2D::synthcols() since this is a down sampling class and not up sampling yet). They are MMX optimized and the math behind Haar transform is that you take 2 consecutive pixels, and calculate their mean. So you first decrease the size of your image twice along the horizontal direction and the same along the vertical. It is easy when you do this column wise but with a single row, you have to select even and odd consecutive pixels and just average them in parallel.

I do it this way:

unsigned char* sour;

__m64 m00FF;
m00FF.m64_u64 = 0x00FF00FF00FF00FF;

__m64 *msour = (__m64 *)sour;

//even coeffs
__m64 even = _mm_packs_pu16(_mm_and_si64
    (*msour, m00FF), _mm_and_si64(*(msour + 1), m00FF));
//odd coeffs
__m64 odd = _mm_packs_pu16(_mm_srli_pi16(*msour, 8), _mm_srli_pi16(*(msour + 1), 8));

msour += 2;

Points of Interest

The Haar class could be modified with SSE2 integer intrinsic for even faster processing, I hope I can implement it later and submit the update, otherwise if someone interested is eager to modify it with SSE2 support, please let me know. I bet it could do the same 640x480 down sampling to 80x60 for about 1-2ms with SSE2.

History

  • 18th October, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

Chesnokov Yuriy
Engineer
Russian Federation Russian Federation
Member
Former Cambridge University postdoc (http://www-ucc-old.ch.cam.ac.uk/research/yc274-research.html), Department of Chemistry, Unilever Centre for Molecular Informatics, where I worked on the problem of complexity analysis of cardiac data.
 
As a subsidiary result we achieved 1st place in the annual PhysioNet/Computers in Cardiology Challenge 2006: QT Interval Measurement (http://physionet.org/challenge/2006/)
 
My research intrests are: digital signal processing in medicine, image and video processing, pattern recognition, AI, computer vision.
 
My recent publications are:
 
Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods. Artificial Intelligence in Medicine. 2008. V43/2. PP. 151-165 (http://dx.doi.org/10.1016/j.artmed.2008.03.009)
 
Face Detection C++ Library with Skin and Motion Analysis. Biometrics AIA 2007 TTS. 22 November 2007, Moscow, Russia. (http://www.dancom.ru/rus/AIA/2007TTS/ProgramAIA2007TTS.html)
 
Screening Patients with Paroxysmal Atrial Fibrillation (PAF) from Non-PAF Heart Rhythm Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 459–463 (http://www.cinc.org/archives/2007/pdf/0459.pdf)
 
Distant Prediction of Paroxysmal Atrial Fibrillation Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 455-459 (http://www.cinc.org/archives/2007/pdf/0455.pdf)
 
Individually Adaptable Automatic QT Detector. Computers in Cardiology 2006. V. 33. PP. 337-341 http://www.cinc.org/archives/2006/pdf/0337.pdf)

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionA questionmemberzeebedee14 Aug '12 - 6:06 
Hi, thanks for your article. I am a C# developer and have limited experience in C++. at the start of your article you said that you performed certain operations before conducting the face detection process. my question(s) are, what process do you perform on an image before face detection, what does the process do and what process do you use to perform the actual face detection. Are you talking about OpenCV?
 
Thanks for your time...
AnswerRe: A questionmemberChesnokov Yuriy20 Aug '12 - 5:10 
no, it is not opencv
the face detection is based on PCA transform and neural networks for classification
Чесноков

QuestionHow to use resize code with HBITMAP?memberMember 33450959 Sep '09 - 2:52 
No doubt, this is a great article. Though i am not with a DSP background, yet i was able to understand almost all of this article. I highly appreciate Chesnokov Yuriy for writing such a great article and hope see more from him in future.
 
My question is as follows:
I have a HBITMAP and want to resize it. After resize, i want to have another HBITMAP or an object of CxImage, CBitmapEx etc. Can you plz tell me how to do this?
 
Regards,
AnswerRe: How to use resize code with HBITMAP?memberChesnokov Yuriy9 Sep '09 - 20:02 
Many thanks.
All you need is to get the raw pixel data from HBITMAP handle, or any other object
 
Чесноков

GeneralRe: How to use resize code with HBITMAP?memberMember 33450959 Sep '09 - 22:50 
Thanks for the reply.
 
Actually my problem is to reconstruct the image after calling resize on pixel data. As told in article, i m accessing down sampled image using the following functions:
char** getr() const
char** getg() const
char** getb() const
 
But i do not know how to correctly combine individual colors arrays into a BITMAP again.
 
What i m doing is to take a screen capture of desktop (1366 X 768 ) and scale down it to 640 x 480. Following is the code snipped that i m using:
 

//following 4 lines gets the desktop screenshot
HDC hDesktopCompatibleDC=CreateCompatibleDC(hDesktopDC);
HBITMAP DesktopCompatibleBitmap=CreateDIBSection(hDesktopDC,&bmpInfo,DIB_RGB_COLORS,&pBits,NULL,0);
SelectObject(hDesktopCompatibleDC,hDesktopCompatibleBitmap);
BitBlt(hDesktopCompatibleDC,0,0,nWidth,nHeight,hDesktopDC,0,0,SRCCOPY|CAPTUREBLT);
 
//After capturing desktop screenshot, pBits contains the pixels
//Now, i m resizing the captured bitmap with the help of ur code.
 
unsigned int steps = 20;
unsigned int width = 1366;
unsigned int height = 768;
ImageResize resize;
resize.init(width, height, 0.5f);
for (unsigned int i = 0; i < steps; i++)
{
resize.resize((const unsigned char *)pBits );
}
char** pr = resize.getr();
char** pb = resize.getb();
char** pg = resize.getg();
 
DWORD *ptr=new DWORD[1366/2*480/2];
memset(ptr,0,1366/2*480/2);
 
for(unsigned int y = 0; y < height * 0.125f; y++)
{
for(unsigned int x = 0; x < width * 0.125f; x++)
{
ptr[y* ((int)(height * 0.5f)) + x]=RGB(pr[y][x]+128 ,pg[y][x]+128,pb[y][x]+128);
}

}
 
HBITMAP h=::CreateBitmap(1366/2,480/2,1,32,ptr);
CxImage abc;
abc.CreateFromHBITMAP(h);
if (abc.IsValid())
abc.Save("c:\\scaleddown.bmp",1);

 
when i open "c:\\scaleddown.bmp", i do not see a scaled dwon bitmap but a mix of colors.
 
Looking for ur kind help.
 
Regards,
GeneralRe: How to use resize code with HBITMAP?memberMember 334509514 Sep '09 - 22:06 
Dear Chesnokov Yuriy,
 
I am Waiting for your reply. I shall be thankful if you can reply at your first preference.
 
Thanks in advance.
Questionuse this code under Wincememberwolf_of_it13 Nov '08 - 1:55 
Hi, Sir:
When I try to use your code under Wince, I find these code hase cmpile error, many MMX function and type can not found.
How to deal with this problem?
 
Thanks.
 
wolf
AnswerRe: use this code under WincemvpChesnokov Yuriy13 Nov '08 - 2:51 
simple enough Wink | ;-) write non MMX one haar transform.
 
chesnokov

GeneralResizing from 352x288 to 176x144membercaptainc/c++26 Nov '07 - 3:04 
Hi,
 
My question is not going to be very interesting, since I am not doing anything with DSP, especially not in postgraduate level. But, as a common people trying to get benefit from your research, I am wondering how long does it take to resize CIF into QCIF in 30fps?
 
And, with your code, what are the measures need to be taken? (implementation)
 
Thanks a bunch.
 
~God Bless The Internet~

AnswerRe: Resizing from 352x288 to 176x144memberChesnokov Yuriy26 Nov '07 - 3:51 
//your bitmap data goes in that fashion
//unsigned char* pBGR = new unsigned char[width*height*3];
 
unsigned int width = 352;
unsigned int height = 288;
float zoom = 0.5;
 
ImageResize resize;
resize.init(width, height, zoom);
 
//keep resizing incoming data after initialization.
resize.resize(pBGR);
 
//the resized image in separate channels
char** r = getr(); //get Red channel
char** g = getg(); //get Green channel
char** b = getb(); //get Blue channel
 
The RGB resizing takes about 5ms on 2Ghz -> 200fps
 

 
chesnokov
GeneralRe: Resizing from 352x288 to 176x144membercaptainc/c++26 Nov '07 - 4:15 
Hi Yuriy,
 
Thanks for your response.
 
This is what I did so far:
if(isVideoSend)
{
//change from CIF to QCIF format
unsigned int steps = 20;
unsigned int width = IMAGE_WIDTH_LOCAL; //352
unsigned int height = IMAGE_HEIGHT_LOCAL; //288
float zoom = 0.5;
unsigned char* pBGR = new unsigned char[width * height * 3];
pBGR = (unsigned char*) data; //pRGB is the Entry point
ImageResize resize;
resize.init(width, height, zoom); //downsized into half
 
wprintf(L" downsampling 352x288 RGB image to 176x144 image\n\n");
 
__int64 ms = 0;
for (unsigned int i = 0; i < steps; i++) {
tic();
resize.resize(pBGR);
ms += toc();
}
 
wprintf(L"\n avrg time: %dms for 640x480 image", int(float(ms) / float(steps)));
//log.WriteString("\n Converting to YUV format..");

//Convert the data from rgb format to YUV format
ConvertRGB2YUV(QCIF_WIDTH,QCIF_HEIGHT,data,yuv); //Yuriy, what's the output point of your source code?
 

// Reset the counter
count=0;
 
//Compress the data...to h263
cparams.format=CPARAM_QCIF;
cparams.inter = CPARAM_INTRA;
cparams.Q_intra = 8;
cparams.data=yuv; // Data in YUV format...
CompressFrame(&cparams, &bits);

// Transmit the compressed frame
//log.WriteString("Transmitting the frame");
dvideo.SendVideoData(cdata,count);
}
but, it gave me error LNK2019: unresolved external symbol "public: __int64 __thiscall CVideoNetDlg::toc(void)"
AND
error LNK2019: unresolved external symbol "public: void __thiscall CVideoNetDlg::tic(void)"
 
I have added __64 toc() and tic()in ".h" and ".cpp" of this respective file.
 
What went wrong? And anything else I have overlooked?
 
~God Bless The Internet~

QuestionRe: Resizing from 352x288 to 176x144membercaptainc/c++26 Nov '07 - 15:16 
what do toc() and tic() do?
 
~God Bless The Internet~

QuestionRe: Resizing from 352x288 to 176x144membercaptainc/c++27 Nov '07 - 21:52 
Hi Yuriy!
 
I look at the basefwt.cpp and ImageResize.cpp, and I found out that each of red, green, and blue data from pBRG are stored and later downsized using the function below,
 
int BaseFWT2D::trans(unsigned int scales, unsigned int th)
{
if (m_status <= 0)
return -1;
 
J = scales;
TH = th;
unsigned int w = m_width;
unsigned int h = m_height;
 
for (unsigned int j = 0; j < J; j++) {
transrows(tspec2d, spec2d, w, h);
transcols(spec2d, tspec2d, w, h);
w /= 2;
h /= 2;
TH /= 4;
}
 
return 0;
}
 
Here you took red data in J variable and scales in th variable. //isn't it?
 
How to put them together in pBRG after they've been downsized into 176x144? Does Haar Algorithm solve upsizing?
 
PS: Hey, get a broadband soon will ya!

 
~God Bless The Internet~

AnswerRe: Resizing from 352x288 to 176x144memberChesnokov Yuriy28 Nov '07 - 3:01 
All I can say 'I can not help' you with your programing learning.
 
You do not need to Resize.init() multiple time once you initialized it to specific width height.
 
J is the scale, th is threshold.
 
To put everything together use getr(), getg(), getb() Smile | :) read them to RGB buffer.
 
chesnokov

GeneralRe: Resizing from 352x288 to 176x144membercaptainc/c++28 Nov '07 - 21:57 
Smile | :) OK Yuriy,
 
But, I still don't understand about your main(), since I want to sample until I stop video capturing button in MFC.
 
And another thing is I notice that this will make RGB to fit YUV422 conversion, what should I do to make it to fit YUV420 conversion?
 

Thanks a million.
 
~God Bless The Internet~

GeneralRe: Resizing from 352x288 to 176x144membercaptainc/c++29 Nov '07 - 3:37 
// Convert from RGB24 to YUV420
//
int ConvertRGB2YUV(int w,int h,unsigned char *bmp,unsigned int *yuv)
{
/*
unsigned int *u,*v,*y,*uu,*vv;
unsigned int *pu1,*pu2,*pu3,*pu4;
unsigned int *pv1,*pv2,*pv3,*pv4;
unsigned char *r,*g,*b;
int i,j;
*/
 
uu_uu=new unsigned int[w*h];
vv_vv=new unsigned int[w*h];
 
if(uu_uu==NULL || vv_vv==NULL)
return 0;
 

 
y_y=yuv;
u_u=uu_uu;
v_v=vv_vv;
 
// Get r,g,b pointers from bmp image data....
ImageResize imres;
float zoom = 0.5;
if(count==0)
imres.init(IMAGE_WIDTH_LOCAL, IMAGE_HEIGHT_LOCAL, zoom); //downsized into half
 
__int64 ms = 0;

 
for (unsigned int i = 0; i < sizeof(yuv); i++)
{
tic();
imres.resize(bmp);
ms += toc();
}
 
r_temp = imres.getr();
g_temp = imres.getg();
b_temp = imres.getb();
 
// the original containers
//r_r=bmp;
//g_g=bmp+1;
//b_b=bmp+2;
 
//check values of each r , g, b
if(count<2)
{

Msg(TEXT("r_r=0x%d, g_g=0x%d, b_b=0x%d"), r_temp, g_temp, b_temp);
count++;
}
 
My question is, why r_temp, g_temp and b_temp holds 0x000000 address after iteration 1 (count = 0)?

 
~God Bless The Internet~

Generalbroken linkmemberdouble(U)20 Nov '07 - 7:29 
you wrote :
>> http://vasc.ri.cmu.edu/idb/images/face/frontal_images/images.html link does not work.
 

try this one:
http://vasc.ri.cmu.edu/idb/images/face/frontal_images/[^]
 
In the lower part you will find a link to "tar file" containing all files. This works fine.
 
double(U)
GeneralFace DetectionmemberDark.Elf.ipl27 Oct '07 - 0:00 
Hi,
I'm working on Face Detection using Haar-like features and/or convolutional NN. What's your approach?
 
Ihor
GeneralRe: Face DetectionmemberChesnokov Yuriy27 Oct '07 - 1:24 
Hi
 
Image down sampling -> Skin and motion detection -> rough NN, SVM non-faces rejection -> PCA projection, NN classification on image pyramid.
 
You're talking about Viola and Johns approach. Is it better than standard PCA? My code SSE optimized detects at 15fps rate on 640x480 on 2.2Ghz.
 
I'm planning to post it about a week later or so.
 
chesnokov
GeneralRe: Face DetectionmemberDark.Elf.ipl27 Oct '07 - 5:59 
Hi,
 
Thank you for your answer.
 
>Image down sampling -> Skin and motion detection -> >rough NN, SVM non-faces rejection -> PCA >projection, NN classification on image pyramid.
 
What detection rate/false alarms have you achieved on CMU test set?
What NN's architecture?
 
>You're talking about Viola and Johns approach. Is >it better than standard PCA? My code SSE >optimized detects at 15fps rate on 640x480 on 2.2Ghz.
 

GeneralRe: Face DetectionmemberChesnokov Yuriy27 Oct '07 - 18:28 
I did not test it on CMU database. Currently I have not got broadband and I trained it only my images collected over the years and some friends. It detects me in real time without skipping a frame undetected. I tested on some girls pics from inet and provided about 95% correct rate from the pics I downloaded
 
chesnokov
GeneralRe: Face DetectionmemberDark.Elf.ipl28 Oct '07 - 22:24 
Every serious article on FD presents detection rate / false positives on CMU test set. This makes very easy to evaluate different FD methods.
 
Just follow the link
http://vasc.ri.cmu.edu/idb/images/face/frontal_images/images.tar
download and try this set.
 
I think it would be interesting to you (and me) to compare your results with others.
 
Ihor
GeneralRe: Face DetectionmemberChesnokov Yuriy28 Oct '07 - 23:57 
Absolutly agree, though this is not the face detection article yet.
I'm not able currently to download more than couple of megs with GPRS mobile connection, and their http://vasc.ri.cmu.edu/idb/images/face/frontal_images/images.html link does not work.
By the time I'll post the paper you may evaluate it yourself on their work, dont forget to retrain it on 200meg CBCL database.
 
chesnokov

GeneralRe: Face Detectionmemberdouble(U)20 Nov '07 - 7:31 
sorry, something went wrong.
please see "broken link" link above.
 
double(U)
GeneralRe: Face Detection (Cont.)memberDark.Elf.ipl27 Oct '07 - 6:09 
>You're talking about Viola and Johns approach. Is >it better than standard PCA? My code SSE optimized >detects at 15fps rate on 640x480 on 2.2Ghz.
 
15fps with motion detection + skin color segmentation + ...?
 
Viola & Jones achieved 64ms on 320x240 image.
Det. rate on CMU test set = 76%/10 false alarms, 88%/31FA, 91%/50FA
 
Ihor

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 18 Oct 2007
Article Copyright 2007 by Chesnokov Yuriy
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid