(untagged)

Reading IPTC APP14 Segment Header Information from JPEG Images

ThomasBBrown

0.00/5 (No votes)

27 Apr 2011

A simple class illustrating how to scrub meta data info tags from JPEG files using C# .NET

Download source - 2.86 KB

Introduction

Recently, I was tasked by my boss to come up with an app that can read the info tags buried inside JPEG files… Knowing nothing at the time about meta data standards, I embarked on a bumpy adventure on finding information on the internet on the subject. Unfortunately, at the time, not knowing the acronym for IPTC (International Press Telecommunications Council), I couldn’t locate a beautiful article about it on CodeProject by Christian Tratz, which I just found out about, when I tried to post my work on the subject…

To cut the story short, it took me quite some time, analyzing, reverse engineering cryptic PHP bits and pieces of samples, to come up with this simple C# class that can parse a JPEG file and extract tags from the Photoshop 3.0 section of it, codenamed APP14 section by Adobe standards. I strongly recommend reading the theory behind meta data in JPEG file located here.

The JPEGMetaData class contains a constructor that takes a reference to the location of the JPEG file on its corresponding drive. It encodes the headers in a separate Hash-Table for clarity. The APP14 section is characterized with the opening tag of 0xFF & 0xED. It should contain a Zero terminated string “Photoshop 3.0” in it. Within the section, various tags could exist, depending on whether the author of the image or whoever authored it last in an app like PhotoShop or Photo Mechanic, has populated any of the available meta data fields. If any of the sought field are not found in the meta-data, an appropriate message is returned back to the user.

public JPEGMetaData(string FileName)
{
	PS3Tags.Add("PS3SectionHeader", "\u00FF\u00ED");
	PS3Tags.Add("PS3SectionIDTag", "Photoshop 3.0\u0000");
	PS3Tags.Add("PS3SectionObjNameTag", "\u001C\u0002\u0005");
	PS3Tags.Add("PS3SectionHeadlineTag", "\u001C\u0002\u0069");
	PS3Tags.Add("PS3SectionCaptionTag", "\u001C\u0002\u0078");
	
	JPEGContentBuffer = LoadJPEG(FileName);
	PS3SectionContentBuffer = 
	ExtractPS3ContentSection(PS3Tags["PS3SectionHeader"],
                                          PS3Tags["PS3SectionIDTag"]);

	PS3TagContents.Add("PS3SectionObjNameTag", 
	ExtractTag(PS3Tags["PS3SectionObjNameTag"].ToString()));
	PS3TagContents.Add("PS3SectionHeadlineTag", 
	ExtractTag(PS3Tags["PS3SectionHeadlineTag"].ToString()));
	PS3TagContents.Add("PS3SectionCaptionTag", 
	ExtractTag(PS3Tags["PS3SectionCaptionTag"].ToString()));
}

The actual raw JPEG file is loaded internally and converted to a string in a local buffer private string JPEGContentBuffer for further slicing.

private string LoadJPEG(string FileName)
{
        FileStream fs = new FileStream(FileName, 
                                       FileMode.Open, 
                                       FileAccess.Read);

        byte[] RAWdata = new byte[fs.Length];
        fs.Read(RAWdata, 0, RAWdata.Length);
        fs.Close();

        return Encoding.Default.GetString(RAWdata, 0, RAWdata.Length);
}

The class exposes only a one Hash-table named PS3TagContents, that holds the contents of the following three major IPTC tags, identified by Adobe as:

IPTC	ApplicationRecord	Tags
5	ObjectName	`string[0,64]`
105	Headline	`string[0,256]`
120	Caption-Abstract	`string[0,2000]`

The actual data extraction is performed in the ExtractTag method of the class. It searches for the corresponding tag header, acquires its block length, and then extracts the actual content from that location.

private string ExtractTag(string currTagSought)
{
        int pos = PS3SectionContentBuffer.IndexOf(currTagSought);
            if (pos > 0)
            {
                pos += 3;
                int BlockSize = (int)(PS3SectionContentBuffer[pos] * 256) + 
                (int)(PS3SectionContentBuffer[pos + 1]);
                
                pos += 2;
                byte[] tagHeaderContent = new byte[BlockSize];
                System.Buffer.BlockCopy(Encoding.Default.GetBytes
		(PS3SectionContentBuffer), 
                pos, tagHeaderContent, 0, BlockSize);
                return Encoding.Default.GetString(tagHeaderContent);
            }
            else
                return currTagSought + " is not available!";
}

Finally, the harvested meta data could be rendered to the output console by invoking the DisplayAllTags() method of the class.

Hope this may help someone in their quest to process JPEG meta-data tags, the way I did at the time having to hustle to get this functionality together. I am attaching the full source code with the accompanying sample harness for the class.

History

27^th April, 2011: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here