Click here to Skip to main content
Click here to Skip to main content

Decode quoted-printable data by Regex

, 14 Feb 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
A "single-liner" to decode quoted-printable data.

Overview

Some data like MHTML[^] contain parts that are encoded as quoted-printable[^] data stream. That format is quite simple:

  • All printable ASCII characters may be represented by themselves, except the equal sign
  • Space and tab may remain as plain text unless they appear at the end of a line
  • All other bytes are represented by an equal sign followed by two hex digits representing the byte value
  • No line must be longer than 76 characters: if they were longer, they are broken by a trailing equal sign

Example

The following quoted-printable encoded text...

This is a long text with some line break and some encoding of the equal sig=
n (=3D). Any line longer than 76 characters are broken up into lines of 76 =
characters with a trailing equal sign.

...results in the following after decoding...

This is a long text with some line break and some encoding of the equal sign (=). 
  Any line longer than 76 characters are broken up into lines of 76 characters with a trailing equal sign.

The Trick

I came up with the following Regex since I could not find a suitable class in the .NET framework to decode quoted-printable data.

string raw = ...;
string txt = Regex.Replace(raw, @"=([0-9a-fA-F]{2})|=\r\n",
              m => m.Groups[1].Success
                   ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
                   : "");

Where to go from here

Once you have the decoded text, you can for example strip off all HTML tags, e.g.:

string textonly = HttpUtility.HtmlDecode(Regex.Replace(txt, @"<[\S\s]*?>", ""));
Console.WriteLine("{0}", textonly);

Input:

<a href=""#print_link"">Expression&lt;Action&lt;T&gt;&gt; expr = s =&gt; Console.WriteLine(&quot;{0}&quot;, s);

Output:

Expression<Action<T>> expr = s => Console.WriteLine("{0}", s);

Finally, the plain text can be searched for some pattern, e.g.:

var q = from m in Regex.Matches(textonly,
               @"Expression\s*<\s*Action\s*<\s*\w+\s*>\s*>\s*(\w+)\s*=")
               .Cast<Match>()
        select m.Groups[1].Value;
q.Aggregate(0, (n, v) => { Console.WriteLine("{0}: Expression<Action<T>> {1}", ++n, v); return n; });

Possible output:

1: Expression<Action<T>> calculate
2: Expression<Action<T>> print
3: Expression<Action<T>> store

Summary

Performance may not be optimal, but it keeps me going with my other tasks... Wink | ;-)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Andreas Gieriet
Founder eXternSoft GmbH
Switzerland Switzerland
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.
 
More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).
Follow on   LinkedIn

Comments and Discussions

 
GeneralReason for my vote of 5 Good Tip PinmemberProEnggSoft24-Feb-12 21:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.1411019.1 | Last Updated 14 Feb 2012
Article Copyright 2012 by Andreas Gieriet
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid