![](iptc/wizard.jpg) This is our sample file |
![](iptc/iptcsample.gif) The sample application |
![](iptc/psinfo.gif) File info from Adobe Photoshop |
Introduction
Until I had to write an application that extracts image information from a JPEG image, I didn't know that a JPEG file can contain various information beside the pure image information (size, colors and image data). But it can contain quite a whole lot of textual and other sort of information like copyrights, captions, keywords and other stuff.
Adobe Photoshop can be used to manipulate this information from the File->File information menu. (This is probably the right place to mention that all Photoshop information here is translated from a German copy of Photoshop and thus might not exactly match the real names of the English version).
Ok. So you can edit the data with Photoshop. But what if you would like to use this data in your own application, let's say to store the images in a database along with the extracted information? That's the task of this class. You pass a file name, and then the member variables are filled. You can even modify them and write them back to the JPEG file. However this currently only works with files that already contain IPTC information.
Decoding the file format
As I tried to develop my application I searched for a similar application to reduce my work. I didn't find one. I even had a hard time to gather information on the JPEG file format specification (without buying some books) let alone the specification on the Photoshop headers. After firing up the Hex Editor I tried to find out how the things are stored. So all the structure of the Photoshop specific headers is more or less based on some findings from some sample files. If you have found a complete specification feel free to drop me a line or post it in the comments.
The basic structure is as follows. More information on that can be found here and at The Graphics File Formats Page.
JPEG image |
Contents |
Name |
Description |
0xFF 0xD8 |
SOI |
Start of image |
Segments (see below) |
0xFF 0xD9 |
EOI |
End of image |
JPEG segments |
Description |
Segment marker (2 bytes) |
Segment size (2 bytes) excl. marker |
Segment data |
Some JPEG segment markers |
Contents |
Name |
Description |
0xFF 0xE0 |
APP0 |
Application marker (in every JPEG file) |
0xFF 0xDB |
DQT |
Quantization Table |
0xFF 0xC0 |
SOF0 |
Start of frame |
0xFF 0xC4 |
DHT |
Define Huffman Table |
0xFF 0xDA |
SOS |
Start of scan |
0xFF 0xED |
APP14 |
This is the marker where Photoshop stores its information |
The Photoshop segment
The APP14 segment is the one we are after. Here starts the non-documented area.
APP14 segment |
Contents |
Description |
0xFF 0xED |
APP14 marker |
Segment size (2 bytes) excl. marker |
Photoshop 3.0\x00 |
Photoshop identification string |
8BIM segments (see below) |
A JPEG file from Photoshop has various 8BIM (I don't know the real name) headers. The one with the type 0x04 0x04 contains the textual information. The image URL is stored in a different header. That's why it is currently not supported by the demo class. Other headers contain a thumbnail image and other information.
Photoshop 6 introduced a slight variation in this header segment. Basically the 4 byte padding has been replaced by a header description text of variable length. The updated sample can now handle these files as well.
8BIM segment |
Description |
8BIM Segment marker (4 bytes) |
Segment type (2 bytes) |
Zero byte padding (4 bytes) |
Segment size (2 bytes excl. marker, type, padding and size) |
Segment data |
The 8BIM header with the text is divided by even more headers, prefixed by 0x1C 0x02. These blocks then finally contain the information. Multiple blocks with the same type (e.g. Keywords) form a list.
0x1C 0x02 segment |
Description |
0x1C 0x02 Segment marker (2 bytes) |
Segment type (1 byte) |
Segment size (2 bytes excl. marker, type and size) |
Segment data |
The sample application
With a description of the format of the information it is easy to write an application that scans through the file and extract the interesting bytes. This is exactly what the provided sample class does.
Have fun with it!