Image Classification Using File Header Information






2.72/5 (9 votes)
May 12, 2006
1 min read

63253

1150
How to do image classification using file header information.
Introduction
This is a program in ANSII C that reads through image file headers and classify the images based on their type.
Background
I had a lot of images of different types with lost file extensions and no other way of classifying them. So I wrote this small program in ANSII C to classify the images by reading their header information. The code runs on several platforms like Windows and Linux.
Input
The program takes multiple file-names (with file path) as command line arguments. It processes each file one by one.
int _tmain(int argc, _TCHAR* argv[]) { int File; // file descriptor char pbData[10]; // buffer to reader file headers int nImageType = 0; if (argc > 1) { // start from 1 because 0 index point to exe it self for(int nNumberOfFileProcessed=1; nNumberOfFileProcessed < argc; nNumberOfFileProcessed++) { File = _open(argv[nNumberOfFileProcessed],_O_BINARY); // read 10 bytes _read(File, pbData,10); } } }
The first 10 bytes of each file are read and compared with the predefined header information. The results are classified for the following image types:
TIFFINTEL: "Tiff image for Intel processor" TIFFMOTOROLA: "Tiff image for Motorola processor" GIF87a: "GIF87a Image" GIF89a: "GIF89a Image" PNG: "PNG Image" JPEGJFIF: "JPEG JFIF compliant image" JPEGEXIF: "JPEG EXIF compliant image" JPEGAPP2: "JPEG with APP2 marker" JPEGAPP3: "JPEG with APP3 marker" JPEGAPP4: "JPEG with APP4 marker" JPEGAPP5: "JPEG with APP5 marker" JPEGAPP6: "JPEG with APP6 marker" JPEGAPP7: "JPEG with APP7 marker" JPEGAPP8: "JPEG with APP8 marker" JPEGAPP9: "JPEG with APP9 marker" JPEGAPPA: "JPEG with APPA marker" JPEGAPPB: "JPEG with APPB marker" JPEGAPPC: "JPEG with APPC marker" JPEGAPPD: "JPEG with APPD marker" JPEGAPPE: "JPEG with APPE marker" JPEGAPPF: "JPEG with APPF marker" BITMAP: "Bitmap file"
Customization
The code may be customized and reproduced according to your requirements. All you need to do is use the core function in your own program.
int DisplayImageType(char *pData) { // compare file headers to determine the file type if (!memcmp(pData, szTiffHeaderForIntel,3)) return TIFFINTEL; else if (!memcmp(pData, szTiffHeaderForMotorola,3)) return TIFFMOTOROLA; else if (!memcmp(pData, szPNGHeader,8)) return PNG; else if (!memcmp(pData, szGIF87aHeader,6)) return GIF87a; else if (!memcmp(pData, szGIF89aHeader,6)) return GIF89a; else if (!memcmp(pData, szBMPHeader,2)) { // 7 to ten byte must be zero // 3 to 6 is size of image char szNull[] = "\x0\x0\x0\x0"; if (!memcmp(pData+6, szNull,4)) return BITMAP; } else if (!memcmp(pData, szJPEGCommonHeader,3)) { switch (((long)*(pData+3)) & 0xFF) { case 0xE0: return JPEGJFIF; break; case 0xE1: return JPEGEXIF; break; case 0xE2: return JPEGAPP2; break; case 0xE3: return JPEGAPP3; break; case 0xE4: return JPEGAPP4; break; case 0xE5: return JPEGAPP5; break; case 0xE6: return JPEGAPP6; break; case 0xE7: return JPEGAPP7; break; case 0xE8: return JPEGAPP8; break; case 0xE9: return JPEGAPP9; break; case 0xEA: return JPEGAPPA; break; case 0xEB: return JPEGAPPB; break; case 0xEC: return JPEGAPPC; break; case 0xED: return JPEGAPPD; break; case 0xEE: return JPEGAPPE; break; case 0xEF: return JPEGAPPF; break; default: break; } } }
You will also need to define the header files.
char szTiffHeaderForMotorola[]= "MM*"; Char szTiffHeaderForIntel[] = "II*"; Char szPNGHeader[] = "\x89PNG\r\n\x1a\n"; char szGIF87aHeader[] = "GIF87a"; char szGIF89aHeader[] = "GIF89a"; // this part of the header char szJPEGCommonHeader[] = "\xFF\xD8\xFF"; // for future use char szJPEGCommonEOI[] = "\xFF\xD9"; // followinf 4 bytes will be size and the 4 will be char szBMPHeader[] = "\x42\x4D";
Limitations
For now, the code works for the above mentioned image types only, but hopefully, I will be updating and expanding the types of images it can handle. The future versions will also include further details about the images, such as image resolution, size, etc.