Screen Scraper in Managed Code

Chris Gorecki

4.42/5 (10 votes)

Apr 7, 2008

GPL3

5 min read

68420

1033

Efficent code for finding a subimage in the desktop window.

Download source - 2.08 MB

ScreenScraper

Introduction

Finding a sub-image within a large image, such as the screen, can be a time consuming task. The screen on my laptop contains over a million pixels, and to ensure that an image is found within the screen, it is necessary to examine most of them, possibly more than once.

My goal when writing this article was to write a program that could efficiently find all the occurrences of a sub-image within the screen image, using only MFC and managed code. In the end, I resorted to using unsafe pointers in C# to boost performance, but operational speed was one of my primary criteria.

Background

Screen scraping is an idea that has been around for quite some time. Originally, I believe it was used to refer to the idea of extracting data from web pages by examining the HTML code. A more recent idea is using technologies such as OCR and image processing to extract useful information from other forms of media. The problem of sub-image location that is solved here could have many applications in the areas of search, gaming, AI, and others.

Using the code

The algorithm I adopted first finds a pixel, within the sub-image that is being searched for, that has an infrequently occurring ARGB value. This pixel value, since it is relatively rare, should be more indicative of an occurrence of the sub-image.

private void Init(Bitmap bmImage)
{
    image = new int[bmImage.Width, bmImage.Height];
    Hashtable repeats = new Hashtable();

    for (int x = 0; x < bmImage.Width; x++)
    {
        for (int y = 0; y < bmImage.Height; y++)
        {
            image[x, y] = bmImage.GetPixel(x, y).ToArgb();

            // The pixle value has been found before
            if (!repeats.ContainsKey(image[x, y]))
            {
                // a = {number times found, x location, y location}
                int[] a = { 1, x, y };
                repeats.Add(image[x, y], a);
            }
            else
            {
                // Increment the number of times the values been found
                ((int[])repeats[image[x, y]])[0]++;
            }
        }
    }

    // Find the pixel value that has been found the least number of times
    int min = int.MaxValue, ix = -1, iy = -1;
    foreach (DictionaryEntry de in repeats)
    {
        int[] a = (int[])de.Value;
        if (a[0] < min)
        {
            min = a[0];
            ix = a[1];
            iy = a[2];
        }
    }

    pixels.Add(image[ix, iy], new Point(ix, iy));
}

To find the least frequent pixel value, a hash table is populated with each pixel value that occurs in the image, along with the number of times that it occurs. Then, it is a simple matter of iterating through the table to find the value with the minimum number of occurrences. There are two global data structures being used here. The first is image, which a 2D array containing the pixel values of the sub-image, and the second is the Hashtable, pixels, which, in the last line of code, is being populated with the least frequently occurring pixel value as a key and the location of that pixel as a value.

The first challenge encountered when actually writing the code was capturing the screen image without falling back on the Win32 API. To do this, I resorted to using the SendKeys method to activate the PrintScreen button and then grab the resulting screen image off of the Clipboard. Of course, this has the downside of clearing whatever was on the Clipboard before. When trying to fix the undesirable clearing behavior, I managed to obtain a disconnected context error in relation to COM objects, which was a first for me.

private static Bitmap getDesktopBitmap()
{
    SendKeys.SendWait("^{PRTSC}");
    Bitmap bm = new Bitmap(Clipboard.GetImage());
    Clipboard.Clear();
    return bm;
}

Once the screen image has been captured, all that remains to be done is to find if there are any occurrences of the sub-image within the screen image.

public List<Point> findImages()
{
    Bitmap bm = getDesktopBitmap();
    BitmapData bmd = bm.LockBits(new Rectangle(0, 0, bm.Width, bm.Height),
                                ImageLockMode.ReadOnly, bm.PixelFormat);
    List<Point> results = new List<Point>();
    foundRects = new List<Rectangle>();

    for (int y = 0; y < bmd.Height; y++)
    {
        byte* scanline = (byte*)bmd.Scan0 + (y * bmd.Stride);

        for (int x = 0; x < bmd.Width; x++)
        {
            int xo = x * PIXLESIZE;
            byte[] buff = { scanline[xo], scanline[xo + 1], 
                            scanline[xo + 2], 0xff };
            int val = BitConverter.ToInt32(buff, 0);

            // Pixle value from subimage in desktop image
            if (pixels.ContainsKey(val) && notFound(x, y))
            {
                Point loc = (Point)pixels[val];

                int sx = x - loc.X; 
                int sy = y - loc.Y;
                // Subimage occurs in desktop image 
                if (imageThere(bmd, sx, sy))
                {
                    Point p = new Point(x - loc.X, y - loc.Y);
                    results.Add(p);
                    foundRects.Add(new Rectangle(x, y, bmImage.Width, 
                                                       bmImage.Height));
                }
            }
        }
    }

    return results;
}

private bool imageThere(BitmapData bmd, int sx, int sy)
{
    int ix;

    for (int iy = 0; iy < bmImage.Height; iy++)
    {
        // Horizontal line of pixles in the bitmap data
        byte* scanline = (byte*)bmd.Scan0 + ((sy + iy) * bmd.Stride);

        for (ix = 0; ix < bmImage.Width; ix++)
        {
            // Offset into the scan line
            int xo = (sx + ix) * PIXLESIZE;
            // Convert PixelFormat.Format24bppRgb
            // to PixelFormat.Format32bppArgb
            byte[] buff = { scanline[xo], scanline[xo + 1], 
                            scanline[xo + 2], 0xff };
            // Pixle value
            int val = BitConverter.ToInt32(buff, 0);

            if (val != image[ix, iy])
                return false;
        }
        ix = 0;
    }

    return true;
}

private bool notFound(int x, int y)
{
    Point p = new Point(x, y);
    foreach (Rectangle r in foundRects)
    {
        if (r.Contains(p))
            return false;
    }

    return true;
}

The first step in the process is to lock the bits in the bitmap to obtain the bitmap data. This will allow us to use a pointer into the bitmap to access the pixel values directly, instead of relying on the bitmap functions bm.GetPixel(x, y).ToArgb(): here is where we receive the necessary performance increase.

To obtain a particular pixel value from the bitmap data, a scan line is first determined. A scan line can be thought of as a single horizontal row in the bitmap. As seen in the line:

scanline = (byte*)bmd.Scan0 + (y * bmd.Stride)

the scan line can be determined by taking the byte offset of the first pixel in the bitmap, and adding to it the y position of the scan line (the number of lines it is from the top of the image) multiplied by the number of bytes there is in each scan line. We now have an array of bytes, which represents the y value of the pixel we are trying to find the value of. The x offset into the scan line is simply the number of bytes per pixel times the number of pixels we are looking into the scan line. However, there is a little trouble here. It turns out that using the Print Screen method of capturing the desktop returns a bitmap that uses 32 bits for the RGB values of a pixel, with the last 8 bits being 0xff. Since the image array is populated with the ARGB values of the sub-image bitmap, we must convert from one format to another. This is achieved by the following lines of code:

int xo = x * PIXLESIZE;
byte[] buff = {scanline[xo],scanline[xo + 1],scanline[xo + 2], 0xff};
int val = BitConverter.ToInt32(buff, 0);

All together, this is functionally equivalent to val = bmd.GetPixel(x, y).ToArgb().

So, now, all that is left to do is find if the value of the screen image pixel is the same as the rare sub-image pixel value that we placed in the Hashtable earlier. But, first, we check to see if the x, y location of the screen pixel we are examining is contained within an area of a sub-image we have previously located. If it is, we just move on, to avoid finding the same image more than once. The list of rectangles, foundRects, is used for this purpose, as it contains a rectangle of the same dimensions and location as each sub-image that has been found.

To determine if the sub-image occurs in the screen image, imageThere does a pixel-by-pixel examination, and returns true if all the pixels match up. A single different pixel is taken to mean that the sub-image does not occur, and thus false is returned.

Points of interest

There are a couple of things to keep in mind when using this program:

Requires the images loaded are in 32 bit ARGB format
The time it takes to run is dependent on the existence of a good unique pixel value

On a 2 GHz Athelon with a 15.4'' screen, about .5 sec. for most images
About 30 sec. for small white images and white screen background

The Print Screen functionality was only tested on Windows XP, and may not work the same on Vista, etc.

History

4/8/2008 - Original article.