65.9K
CodeProject is changing. Read more.
Home

Decode quoted-printable data by Regex

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.50/5 (4 votes)

Feb 13, 2012

CPOL
viewsIcon

22040

A "single-liner" to decode quoted-printable data.

Overview

Some data like MHTML[^] contain parts that are encoded as quoted-printable[^] data stream. That format is quite simple:

  • All printable ASCII characters may be represented by themselves, except the equal sign
  • Space and tab may remain as plain text unless they appear at the end of a line
  • All other bytes are represented by an equal sign followed by two hex digits representing the byte value
  • No line must be longer than 76 characters: if they were longer, they are broken by a trailing equal sign

Example

The following quoted-printable encoded text...

This is a long text with some line break and some encoding of the equal sig=
n (=3D). Any line longer than 76 characters are broken up into lines of 76 =
characters with a trailing equal sign.

...results in the following after decoding...

This is a long text with some line break and some encoding of the equal sign (=). 
  Any line longer than 76 characters are broken up into lines of 76 characters with a trailing equal sign.

The Trick

I came up with the following Regex since I could not find a suitable class in the .NET framework to decode quoted-printable data.

string raw = ...;
string txt = Regex.Replace(raw, @"=([0-9a-fA-F]{2})|=\r\n",
              m => m.Groups[1].Success
                   ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
                   : "");

Where to go from here

Once you have the decoded text, you can for example strip off all HTML tags, e.g.:

string textonly = HttpUtility.HtmlDecode(Regex.Replace(txt, @"<[\S\s]*?>", ""));
Console.WriteLine("{0}", textonly);

Input:

<a href=""#print_link"">Expression&lt;Action&lt;T&gt;&gt; expr = s =&gt; Console.WriteLine(&quot;{0}&quot;, s);

Output:

Expression<Action<T>> expr = s => Console.WriteLine("{0}", s);

Finally, the plain text can be searched for some pattern, e.g.:

var q = from m in Regex.Matches(textonly,
               @"Expression\s*<\s*Action\s*<\s*\w+\s*>\s*>\s*(\w+)\s*=")
               .Cast<Match>()
        select m.Groups[1].Value;
q.Aggregate(0, (n, v) => { Console.WriteLine("{0}: Expression<Action<T>> {1}", ++n, v); return n; });

Possible output:

1: Expression<Action<T>> calculate
2: Expression<Action<T>> print
3: Expression<Action<T>> store

Summary

Performance may not be optimal, but it keeps me going with my other tasks... ;-)