Easily load JPEG, GIF, BMP, and PNGs and distinguish between text and regular images

Mukit, Ataul

4.74/5 (23 votes)

Aug 5, 2010

CPOL

5 min read

60846

4703

See how easy it is to load JPEG, GIF, BMP, and PNG images with MFC and process bitmap data to determine if the image is a text image/scanned document or a regular picture.

Download source code - 310 KB

Introduction

This article shows how to easily load a JPG, BMP, GIF, or PNG image using the CImage class and process the bitmap data to find out if the image is a text image or a regular image. By text image, what is meant here is scanned documents or a screenshot of a numerous lines of text.

The main purpose of the article is to show an easy way to load JPG, BMP, GIF, and PNGs and work with raw bitmap data.

Background

Some time back, I was working on a project which loads an image and detects the image skew angle. Skew angle is perceived to be the angle by which the image has shifted from its original position because of a problem with scanning, or maybe a problem with the camera angle. In a scanned document, the skew can be thought of as the deviation from the horizontal line (at 0 degree angle) of the document.

Therefore, the first task was to load a JPG/BMP image and determine whether it is a scanned document or a normal picture. The algorithms used to detect the skew of a scanned document and a normal image are quite different from each other, so developing a fast mechanism to distinguish between scanned documents and normal photos was very important.

Hence I came up with a pretty fast algorithm to distinguish between scanned documents and other pictures, which was good enough for the images I was dealing with.

The CImage Class

The CImage class is a fabulous addition to MFC, and is available with Microsoft Visual Studio 2005. This class makes the loading of JPG, BMP, GIF, and PNG images super easy. In order to use this class, all you have to do is add the following headers:

#include <afxstr.h>
#include <atlimage.h>

Loading an image with this class is super simple. Just call the Load API. For example:

BOOL CScannedDocTestDoc::OnOpenDocument(LPCTSTR lpszPathName)
{
    // here m_Image is a member variable
    HRESULT hr = m_Image.Load(lpszPathName);
    return SUCCEEDED(hr);
}

Getting hold of a workable bitmap data loaded through the CImage class is also pretty easy. This is how I did it:

byte* pBitmapBits = new byte[nDataSize];
if(pBitmapBits != NULL)
{
    ::GetBitmapBits((HBITMAP)m_Image, nDataSize, pBitmapBits);
}

Once you get hold of the bitmap data, you first convert the bitmap to gray scale bitmap data and then pass it to the function IsTextImage to find out whether the image is a scanned document/text image or a normal picture This is how the function looks like:

static bool IsTextImage(byte* imageData, int imageWidth, int imageHeight)
{
    const int blacklimit = 20;
    const int greylimit = 140;

    const int contrast_offset = 80;

    // Holds the count of different patterns in the processed data
    long color_pattern_count[200000];
    
    //to avoid division by zero.. set it to 1
    color_pattern_count[B2G] = 1;
    color_pattern_count[G2B] = 1;
    color_pattern_count[G2W] = 1;
    color_pattern_count[W2G] = 1;
    color_pattern_count[B2W] = 1;
    color_pattern_count[W2B] = 1;    
    
    color_pattern_count[B2B] = 1;
    color_pattern_count[W2W] = 1;
    color_pattern_count[G2G] = 1;    
        
    long prev_color[256];
    long cur_color[256];

    int i;
    for(i = 0; i < 256; i++)
    {
        cur_color[i]  =  0;
        prev_color[i] = 0;
    }

    for(i = 0; i <= blacklimit; i++)
    {
        cur_color[i]  = C_B;
        prev_color[i] = P_B;
    }

    for(i = blacklimit + 1 + contrast_offset; i <= greylimit; i++)
    {
        cur_color[i]  = C_G;
        prev_color[i] = P_G;
    }

    for(i = greylimit + 1 + contrast_offset; i <= 255; i++)
    {
        cur_color[i]  = C_W;
        prev_color[i] = P_W;
    }
    
    byte* buffer = imageData;
    
    int y, x;
    
    int line_count = 0;

    byte prev_pixel;
    byte cur_pixel;
    long change_pattern = 0;
    long white_amt = 0;
    long n = -1;
    for(y = 0; y < imageHeight; y+=10)
    {        
        n++;
        white_amt = 0;
        for(x = 1; x < imageWidth; x++)
        {
            prev_pixel = buffer[y*imageWidth + (x-1)];
            cur_pixel = buffer[y*imageWidth + x];
                
            change_pattern = prev_color[prev_pixel] + cur_color[cur_pixel];
            if((prev_color[prev_pixel]) && (cur_color[cur_pixel]))
            {            
                color_pattern_count[change_pattern]++;
                if(change_pattern == W2W)
                {
                    white_amt++;
                }
            }
        }
            
        if(((double)white_amt/(double)x) > 0.85) //it's a white line
        {
            line_count++;
        }
    }
        
    double B2GRatio = (double)abs(color_pattern_count[B2G] - 
           color_pattern_count[G2B]) / (double)color_pattern_count[B2G];
    double G2WRatio = (double)abs(color_pattern_count[G2W] - 
           color_pattern_count[W2G]) / (double)color_pattern_count[G2W];
    double B2WRatio = (double)abs(color_pattern_count[B2W] - 
           color_pattern_count[W2B]) / (double)color_pattern_count[B2W];

    double line_count_ratio = (n != 0) ? (double)line_count/(double)n : 0.0;
        
    if(B2GRatio == 0.0 && G2WRatio == 0.0 && B2WRatio == 0.0)
    {
        B2GRatio = (double)abs(color_pattern_count[B2G] + 
                    color_pattern_count[G2B]) / (double)color_pattern_count[W2W];
        G2WRatio = (double)abs(color_pattern_count[G2W] + 
                    color_pattern_count[W2G]) / (double)color_pattern_count[W2W];
        B2WRatio = (double)abs(color_pattern_count[B2W] + 
                    color_pattern_count[W2B]) / (double)color_pattern_count[W2W];
    }
        
    bool bRet = false;
    if( (B2WRatio > (B2GRatio + G2WRatio))
            || (B2GRatio > (B2WRatio + G2WRatio))  )

    {
        bRet = true;

        if(line_count_ratio < 0.5 || line_count_ratio > 1.0)
        {
            bRet = false;
        }
    }
    return bRet;
}

You have to keep in mind that the bitmap data passed in this function is expected to be a gray scale bitmap data generated from the original bitmap.

How IsTextImage Works

The actual algorithm is pretty simple but effective, which works with 99% of the types of images that I had to work with. The main idea of the algorithm is, it tries to find the color changing patterns of different contrast colors, and guesses from the pattern if it's a text image or a normal image. In other words, it looks for the number of times black pixels change from black to white and vice versa, or black to gray and vice versa. If it seems there are a lot of frequencies of black to white (or vice versa) occurring and very less of other patterns, or in some cases, black to gray (or vice versa) and less of other patterns, then the image is assumed to be a text image. The general concept here is, in documents or text images, there will be a lot of small black lines representing alphabets, so the color would change from black to white (or vice versa) frequently. In the case of gray to black (and vice versa), gray is considered similar to white, as some text images might have a grayish background instead of white ones.

Also, the algorithm takes into account white lines too. If there are more than 50% of white lines between blackish or grayish lines, it is assumed to be a text image. But, if every line is a white line, then it discards it as a text image.

An image with a lot of gray to white frequencies and vice versa is most likely to be a normal image and not a text image.

Using the code

The code related to the using the IsTextImage function is as follows:

First, you gather info about the image from the CImage class instance:

int nPitch = m_Image.GetPitch();
// Here m_Image can be a global variable or a member variable of a class

int nWidth    = m_Image.GetWidth();
int nHeight = m_Image.GetHeight();
    
int nBytesPerPixel = m_Image.GetBPP() / 8;

Then use this info to get the size of the data. In the source code uploaded, there is a function:

static unsigned int GetDataSize(unsigned int w, unsigned int h, unsigned int bitdepth);

which can be used to get the data size. Using this, a buffer can be reserved to create space for a gray scale image which is to be passed to IsTextImage. A function like the following can be used to obtain a gray scale image from the bitmap data passed (this function code is also available with the uploaded source code):

static byte* GetGrayScaleImage(const byte* imageData, int width,
       int height, unsigned int bytesPerPixel, int pitch)

Then pass the gray scale data along with its width and height to:

bool IsTextImage(byte* imageData, int imageWidth, int imageHeight)

which returns true if the image is a text image, and false otherwise.

So to summarize it all, here is the code at a glance:

BOOL CScannedDocTestDoc::OnOpenDocument(LPCTSTR lpszPathName)
{
    m_Image.Destroy();
    HRESULT hr = m_Image.Load(lpszPathName);
    
    if(SUCCEEDED(hr))
    {
        int nPitch = m_Image.GetPitch();

        int nWidth    = m_Image.GetWidth();
        int nHeight = m_Image.GetHeight();
        
        int nBytesPerPixel = m_Image.GetBPP() / 8;

        if(nBytesPerPixel)
        {
            unsigned int nDataSize = 
              ImageFunctions::GetDataSize(nWidth, nHeight, m_Image.GetBPP());
            
            byte* pBitmapBits = new byte[nDataSize];
            if(pBitmapBits != NULL)
            {
                ::GetBitmapBits((HBITMAP)m_Image, nDataSize, pBitmapBits);
                
                byte* pGrayScaleBits = ImageFunctions::GetGrayScaleImage(pBitmapBits, 
                                       nWidth, nHeight, nBytesPerPixel, nPitch);
                if(pGrayScaleBits != NULL)
                {
                    m_bTextImage = 
                      ImageFunctions::IsTextImage(pGrayScaleBits, nWidth, nHeight);
                    delete pGrayScaleBits;
                }

                delete pBitmapBits;
            }
        }

        return TRUE;
    }

    return FALSE;
}

I have kept the image routines code inside a namespace called ImageFunctions.

Points of Interest

The IsTextImage function returns false quite easily if the image is a very colorful picture. For black and white images with edges, sometimes we have trouble distinguishing from text images if the pattern is similar to a scanned document or normal text writing pattern.

Acknowledgements

Tanzim Husain: who was a mentor in my early development years as well as a great inspiration.

History

Article uploaded: 5^th August, 2010.