Click here to Skip to main content
15,885,910 members
Articles / General Programming / Regular Expressions
Tip/Trick

Decode quoted-printable data by Regex

Rate me:
Please Sign up or sign in to vote.
4.50/5 (4 votes)
14 Feb 2012CPOL 21.3K   1   1
A "single-liner" to decode quoted-printable data.

Overview


Some data like MHTML[^] contain parts that are encoded as quoted-printable[^] data stream. That format is quite simple:



  • All printable ASCII characters may be represented by themselves, except the equal sign
  • Space and tab may remain as plain text unless they appear at the end of a line
  • All other bytes are represented by an equal sign followed by two hex digits representing the byte value
  • No line must be longer than 76 characters: if they were longer, they are broken by a trailing equal sign

Example


The following quoted-printable encoded text...


This is a long text with some line break and some encoding of the equal sig=
n (=3D). Any line longer than 76 characters are broken up into lines of 76 =
characters with a trailing equal sign.

...results in the following after decoding...


This is a long text with some line break and some encoding of the equal sign (=). 
  Any line longer than 76 characters are broken up into lines of 76 characters with a trailing equal sign.

The Trick


I came up with the following Regex since I could not find a suitable class in the .NET framework to decode quoted-printable data.


C#
string raw = ...;
string txt = Regex.Replace(raw, @"=([0-9a-fA-F]{2})|=\r\n",
              m => m.Groups[1].Success
                   ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
                   : "");

Where to go from here


Once you have the decoded text, you can for example strip off all HTML tags, e.g.:


C#
string textonly = HttpUtility.HtmlDecode(Regex.Replace(txt, @"<[\S\s]*?>", ""));
Console.WriteLine("{0}", textonly);

Input:


<a href=""#print_link"">Expression&lt;Action&lt;T&gt;&gt; expr = s =&gt; Console.WriteLine(&quot;{0}&quot;, s);

Output:


Expression<Action<T>> expr = s => Console.WriteLine("{0}", s);

Finally, the plain text can be searched for some pattern, e.g.:


C#
var q = from m in Regex.Matches(textonly,
               @"Expression\s*<\s*Action\s*<\s*\w+\s*>\s*>\s*(\w+)\s*=")
               .Cast<Match>()
        select m.Groups[1].Value;
q.Aggregate(0, (n, v) => { Console.WriteLine("{0}: Expression<Action<T>> {1}", ++n, v); return n; });

Possible output:


1: Expression<Action<T>> calculate
2: Expression<Action<T>> print
3: Expression<Action<T>> store

Summary


Performance may not be optimal, but it keeps me going with my other tasks... ;-)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder eXternSoft GmbH
Switzerland Switzerland
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.

More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

Comments and Discussions

 
GeneralReason for my vote of 5 Good Tip Pin
ProEnggSoft24-Feb-12 20:06
ProEnggSoft24-Feb-12 20:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.