65.9K
CodeProject is changing. Read more.
Home

Image Classification Using File Header Information

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.72/5 (9 votes)

May 12, 2006

1 min read

viewsIcon

63253

downloadIcon

1150

How to do image classification using file header information.

Sample Image

Introduction

This is a program in ANSII C that reads through image file headers and classify the images based on their type.

Background

I had a lot of images of different types with lost file extensions and no other way of classifying them. So I wrote this small program in ANSII C to classify the images by reading their header information. The code runs on several platforms like Windows and Linux.

Input

The program takes multiple file-names (with file path) as command line arguments. It processes each file one by one.

int _tmain(int argc, _TCHAR* argv[])
{
    int File; // file descriptor
    char pbData[10]; // buffer to reader file headers
    int nImageType = 0;

    if (argc > 1)
    {
        // start from 1 because 0 index point to exe it self
        for(int nNumberOfFileProcessed=1; 
            nNumberOfFileProcessed < argc; 
            nNumberOfFileProcessed++)
        {
            File = _open(argv[nNumberOfFileProcessed],_O_BINARY);
            // read 10 bytes
            _read(File, pbData,10);
        }
    }
}

The first 10 bytes of each file are read and compared with the predefined header information. The results are classified for the following image types:

 TIFFINTEL:    "Tiff image for Intel processor"    
 TIFFMOTOROLA:    "Tiff image for Motorola processor"
 GIF87a:        "GIF87a Image"    
 GIF89a:        "GIF89a Image"
 PNG:        "PNG Image"
 JPEGJFIF:    "JPEG JFIF compliant image"
 JPEGEXIF:    "JPEG EXIF compliant image"
 JPEGAPP2:    "JPEG with APP2 marker"
 JPEGAPP3:    "JPEG with APP3 marker"
 JPEGAPP4:    "JPEG with APP4 marker"
 JPEGAPP5:    "JPEG with APP5 marker"
 JPEGAPP6:    "JPEG with APP6 marker"
 JPEGAPP7:    "JPEG with APP7 marker"
 JPEGAPP8:    "JPEG with APP8 marker"
 JPEGAPP9:    "JPEG with APP9 marker"
 JPEGAPPA:    "JPEG with APPA marker"
 JPEGAPPB:    "JPEG with APPB marker"
 JPEGAPPC:    "JPEG with APPC marker"
 JPEGAPPD:    "JPEG with APPD marker"
 JPEGAPPE:    "JPEG with APPE marker"
 JPEGAPPF:    "JPEG with APPF marker"
 BITMAP:        "Bitmap file"

Customization

The code may be customized and reproduced according to your requirements. All you need to do is use the core function in your own program.

int DisplayImageType(char *pData)
{
    // compare file headers to determine the file type
    if (!memcmp(pData, szTiffHeaderForIntel,3))
        return TIFFINTEL;
    else if (!memcmp(pData, szTiffHeaderForMotorola,3))
        return TIFFMOTOROLA;
    else if (!memcmp(pData, szPNGHeader,8))
        return PNG;
    else if (!memcmp(pData, szGIF87aHeader,6))
        return GIF87a;
    else if (!memcmp(pData, szGIF89aHeader,6))
        return GIF89a;
    else if (!memcmp(pData, szBMPHeader,2))
    {
        // 7 to ten byte must be zero
        // 3 to 6 is size of image
        char szNull[] = "\x0\x0\x0\x0";
        if (!memcmp(pData+6, szNull,4))
            return BITMAP;
    }
    else if (!memcmp(pData, szJPEGCommonHeader,3))
    {
        switch (((long)*(pData+3)) & 0xFF)
        {
        case 0xE0:
            return JPEGJFIF;
            break;
        case 0xE1:
            return JPEGEXIF;
            break;
        case 0xE2:
            return JPEGAPP2;
            break;
        case 0xE3:
            return JPEGAPP3;
            break;
        case 0xE4:
            return JPEGAPP4;
            break;
        case 0xE5:
            return JPEGAPP5;
            break;
        case 0xE6:
            return JPEGAPP6;
            break;
        case 0xE7:
            return JPEGAPP7;
            break;
        case 0xE8:
            return JPEGAPP8;
            break;
        case 0xE9:
            return JPEGAPP9;
            break;
        case 0xEA:
            return JPEGAPPA;
            break;
        case 0xEB:
            return JPEGAPPB;
            break;
        case 0xEC:
            return JPEGAPPC;
            break;
        case 0xED:
            return JPEGAPPD;
            break;
        case 0xEE:
            return JPEGAPPE;
            break;
        case 0xEF:
            return JPEGAPPF;
            break;
        default:
            break;
        }
    }
}

You will also need to define the header files.

char szTiffHeaderForMotorola[]= "MM*";
Char szTiffHeaderForIntel[]    = "II*";
Char szPNGHeader[]        = "\x89PNG\r\n\x1a\n";
char szGIF87aHeader[]    = "GIF87a";
char szGIF89aHeader[]    = "GIF89a";
// this part of the header 
char szJPEGCommonHeader[]    = "\xFF\xD8\xFF";
// for future use
char szJPEGCommonEOI[]    = "\xFF\xD9";
// followinf 4 bytes will be size and the 4 will be
char szBMPHeader[]        = "\x42\x4D";

Limitations

For now, the code works for the above mentioned image types only, but hopefully, I will be updating and expanding the types of images it can handle. The future versions will also include further details about the images, such as image resolution, size, etc.