Click here to Skip to main content
13,352,161 members (72,513 online)
Click here to Skip to main content
Add your own
alternative version


69 bookmarked
Posted 27 Jul 2007

MIME Compliant Parser

, 8 Jul 2008
Rate this:
Please Sign up or sign in to vote.
An attempt to separate MIME parsing from mail protocol.


This article and its code sample aim to disconnect MIME parsing functionality from any mail protocol, i.e. it aims to implement RFC2045 without coupling it too tightly with either the POP3 or IMAP protocol.


The motivation for me writing this code was originally that I needed support for mail download automation. I started to look around for free or open source alternatives. However, the projects or solutions I found either did not have full support for attachments or their implementation was not modular enough. I therefore decided to start writing my own POP3 client implementation. After fighting a while trying to do a fast hack I soon realised that I had to read the concerned RFC's. I then realised that POP3 (RFC 1939) as a protocol in turn relied on the concept of MIME (RFC 2045, 2046 etc) for attachments. When realizing this, I came up with the idea of trying to write a parser which could be used in both IMAP and POP3.

The main features are as briefly stated above; that the code aims to separate MIME functionality from any mail transfer protocol.

It is also an attempt to parse MIME messages on the fly i.e. it reads portions of the stream and then parses it. This behaviour will hopefully minimize memory consumption. As one might notice, the code takes advantage of a StringBuilder to compile the whole message source, which is against this whole argument of minimizing memory consumption. However, this StringBuilder could easily be removed if one does not need to be able to read the whole message source as such.

The library is also written with the aim of keeping it as "pluggable" as possible i.e. I have tried to keep the library and its classes as loosely coupled as possible. To achieve this I have tried to publish all functionality as Interfaces and used dependency injection as often as possible.


When I first started out with this project, I read many articles. Among the ones I read was this one written by Peter Huber SG here at Code Project. It covers much of the topic on MIME. However I found it too tightly coupled with the POP3 protocol to fit my needs. Nevertheless Peter explains the MIME concept in detail which helped me a lot in starting to grasp the concept. Other excellent sources of information are sites such as this and this.

Using the Code

When reading the RFC 2045 specification, one soon recognizes that the concept which everything revolves around is a concept called entity. Since the entity is so central to the MIME concept I have tried to model a class hierarchy which depicts concepts such as "Message", "Entity", "Body part" and "Body" as they are described in the RFC 2045. specification.

Screenshot - RFC2045.gif

The main entry point for the library is the MIMER.RFC2045.MailReder which implements the MIMER.IMailReader. The IMailReader only contains one method signature "Read".

IMailMessage Read(ref System.IO.Stream dataStream, IEndCriteriaStrategy

The Read function requires a System.IO.Stream and a MIMER.IEndCriteriaStrategy. The IEndCriteriaStrategy should reference an object with a method which can determine when the stream has reached the end of a mailmessage. Hence it should (even if not implemented yet) be possible to rather easily extend the functionality of this MIME parser to conform with IMAP as well. To extend with IMAP functionality would in theory only require one to write a class which implements the MIMER.IEndCriteriaStrategy interface and then use this class when calling the MailReader constructor. A worst case scenario could require one to write a new IMailReader. Nevertheless much of the functionality spread among the supporting classes could probably be reused.

The IMailReader interface is the most universal (RFC822) implementation of a MailReader. Since the RFC822 specification came before the MIME (RFC2045 etc.) specification this Interface and its Read method return an IMailMessage which does not support attachments.

public interface IMailMessage
        MailAddress From
        {get; set;}
        MailAddressCollection To
        {get; set;}
        MailAddressCollection CarbonCopy
        {get; set;}
        MailAddressCollection BlindCarbonCopy
        {get; set;}
        String Subject
        {get; set; }
        string Source
        {get; set; }
        string TextMessage
        {get; set; }
        bool IsNull();

However, the MIMER.RFC2045.MailReader also has a ReadMimeMessage method which returns an IMimeMailMessage which is a specialization of the IMailMessage interface, and this interface supports attachments.

IMimeMailMessage ReadMimeMessage(ref System.IO.Stream dataStream,
                IEndCriteriaStrategy endOfMessageCriteria);
public interface IMimeMailMessage : IMailMessage
    IDictionary<string, string> Body{}
    IList<IAttachment> Attachments{}
    IList<IMimeMailMessage> Messages{} //Added in version 0.4
    IList<ternateView> Views{}
    System.Net.Mail.MailMessage ToMailMessage();


The library has implemented decoder functionality for base64 encodings and QuotedPrinteable encoding. The IDecoder interface publishes the signatures expected by the MIMER.RFC2045.MailReader which therefore can be easily extended with more decoders to support more encodings.

public interface IDecoder
    bool CanDecode(string encodign);
    byte[] Decode(ref System.IO.Stream dataStream);
    byte[] Decode(ref string data);

public MailReader(IList<IDecoder> decoders)

Header Fields

Much of the work in parsing mail messages is done by reading and parsing Fields. The most basic field is defined in the RFC 822 specification. Conceptually it contains a "name" and a "body". This definition is implemented in the MIMER.RFC822.Field.

From the RFC822 specification:

field = field-name ":" [ field-body ] CRLF field-name = 1*<any CHAR, excluding CTLs, SPACE, and ":"> field-body = field-body-contents [CRLF LWSP-char field-body]

public class Field
    public string Name{}
    public string Body{}

The RFC2045 specification does however extend the RFC822 field definition with fields such as Content-type etc. These definitions are implemented in the MIMER.RFC2045.ContentTypeField and the MIMER.RFC2045. ContentTransferEncodingField.

public class ContentTypeField : MIMER.RFC822.Field
    public string Type{}
    public string SubType{}
    public StringDictionary Parameters{}

public class ContentTransferEncodingField : MIMER.RFC822.Field
    public string Encoding{}


The logic of the field parsing is divided among the FieldParser classes all of which implement the IFieldParser interface.

public interface IFieldParser
{void Parse(ref IList<RFC822.Field> fields, ref stringfieldString);}

The parsing is implemented by using regular expressions as much as possible. This is done by imitating the definitions found in the RFC's as identically as possible.

public class FieldParser:IFieldParser
    protected readonly string m_QuotedPairPattern = "\x5C\x5C[\x00-" +
    protected readonly string m_DtextPattern =
protected readonly string m_AtomPattern = "[^][()<>@,;:." +
protected readonly string m_UnfoldPattern = "\x0D\x0A\x5Cs";
protected readonly string m_FieldPattern = "[^\x00-\x20\x7F:]{1,}:{1,1}.+";
protected readonly string m_FieldNamePattern = "[^\x00-\x20\x7F:]{1,}(?=:)";
protected readonly string m_QuotedStringPattern = "\x22(?:(?:(?:\x5C\x5C" +
protected readonly string m_CtextPattern = "[^()\x5C\x5C]+";

Since the RFC2045 specification leaves room for future media subtypes, the parsing functionality needed some easy way to be extended. This I have attempted to resolve by defining a virtual CompilePattern() method.

public class FieldParser:IFieldParser
    public virtual void CompilePattern(){}
public class ContentTypeFieldParser:RFC822.FieldParser, IFieldParser
        protected IList<string>

        public override void CompilePattern()
                m_SubType = new Regex("((?<=multipart/)"
            + m_MultipartSubtypesBuilder.ToString() + "|" +
                "(?<=text/)" +
                m_TextSubtypesBuilder.ToString() + "|" + "(?<=image/)" +
                m_ImageSubtypesBuilder.ToString() + "|"+
                "(?<=application/)" +
                m_ApplicationSubtypesBuilder.ToString() + "|"+
                "(?<=message/)" +
                m_MessageSubtypesBuilder.ToString() + "|" +
                "(?<=audio/)" +
                m_AudioSubtypesBuilder.ToString() + ")",
            // This should be called if we want to add functionality in
            //this method but let base build/compile it

By defining theIList<string> m_ApplicationSubtypes; as protected it can be accessed by its child classes which means they could add new application subtypes not needing to rewrite the whole parsing logic. A child implementation might then look something like this:

Public class ExtendedContentTypeFieldParser:RFC2045.ContentTypeFieldParser
    Public override void CompilePattern()


Since I first wrote this article a few Issues with the code have surfaced. Among these issues were the one pointed out by fellow coder "Lex1985". It turns out that I had embarrassingly enough forgotten to implement support for embedded messages (message/rfc822).

Embedded messages are essentially messages within a message i.e. there can be any number of messages within another message. This is truly a recursive behaviour. Since an embedded message (message/rfc822) is a type of Multipart-entity, it made me look for a boundary when parsing it from the stream. However a 'Content-Type' header field does not have to have a boundary parameter, it was this assumption that made the code throw an exception stating that it "could not find the mandatory delimiter in multi part entity". Aside from this, the parsing of an embedded message differs from parsing of other multipart entities. An embedded message has as all other entities descriptive 'Content-' headers but they also have their special message headers.

// These are the descriptive content headers of the entity
 Content-Type: message/rfc822

// These are the message headers
 Received: by
                 Mon, 20 Aug 2007 21:00:36 +0200
 Content-class: urn:content-classes:message
 Subject: VB:
 Date: Mon, 20 Aug 2007 21:00:28 +0200
 MIME-Version: 1.0
 Content-Type: multipart/mixed;
 Message-ID: <13176CE1A8A2C4428E514E5E603A56C0039BC7@
 Thread-Index: AcfjWkdexeNWdEWXRm6O87G7fcacpwAAhhFE
 References: <13176CE1A8A2C4428E514E5E603A56C06802@
 From: "client" <>

 This is a multi-part message in MIME format.


This forces the flow of parsing an embedded message to be a bit different from the parsing of a 'normal' multipart entity. When the parser finds a multipart entity with content-type defined as "Content-Type: message/rfc822" it must create a new message and recursively call upon itself. The call-trace of the parsing is as follows:

public IMimeMailMessage ReadMimeMessage(ref System.IO.Stream dataStream, 
                IEndCriteriaStrategy endOfMessageCriteria)


private string ParseMessage(ref Stream dataStream, 
            ref Message message, IList<rfc822.field> fields)


private string CreateEntity(ref Stream dataStream, 
            ref IMultipartEntity parent, out IEntity entity)

It is within the CreateEntity method we recursively have to call upon ourselves if we come upon a message/rfc822 entity.

private string CreateEntity(ref Stream dataStream, 
            ref IMultipartEntity parent, out IEntity entity)
        entity = null;
        IList<RFC822.Field> fields;
        int cause = ParseFields(ref dataStream, out fields);
        if (cause > 0)
            foreach (RFC822.Field contentField in fields)
                if (contentField is ContentTypeField)
                    ContentTypeField contentTypeField = 
                    contentField as ContentTypeField;

                    if (m_FieldParser.CompositeType.IsMatch
                        MultipartEntity mEntity = new MultipartEntity();
                        mEntity.Fields = fields;
                        entity = mEntity;
                        entity.Parent = parent;

                    // It is here we must call upon our self when 
                    // finding a multipart entity of type message/rfc822
                        if (Regex.IsMatch(contentTypeField.Type, 
                        "(?i)message") &&
                            Message message = new Message();
                            IList<RFC822.Field> messageFields;
                            cause = ParseFields(ref dataStream, 
                        out messageFields);
                            message.Fields = messageFields;
                             message.Parent = mEntity;
                             if(cause > 0)
                                return ParseMessage(ref dataStream, 
                        ref message, messageFields);
                            mEntity.Delimiter = ReadDelimiter
                        (ref contentTypeField);
                            return parent.Delimiter;
                    else if (m_FieldParser.DescriteType.IsMatch
                        entity = new Entity();
                        entity.Fields = fields;
                        entity.Parent = parent;
                        return parent.Delimiter;
        return string.Empty;

It is this recursive call that has been added. However some changes were also needed in the RFC2045.IMIMEMailMessage definition to support embedded messages. The RFC2045.IMIMEMailMessage now looks like this:

public interface IMimeMailMessage : IMailMessage
    IDictionary<string, string> Body{}
    IList<IAttachment> Attachments{}
    IList<IMimeMailMessage> Messages{}
    IList<ternateView> Views{}
    System.Net.Mail.MailMessage ToMailMessage();

This design makes it possible to read any number of recursively embedded message e.g.: A recursive MIME message structure like the one below will be possible to access through code. See example.


Message m = ReadMimeMessage(ref s, endCriteria);
string subject = m.Messages[0].Messages[0].Messages[0].Subject;

To Sum Up

This article and its code have aimed to explain my attempt of implementing a MIME competent parser which should not be too tightly coupled with the POP3 mail protocol. Hopefully you can use the source completely or partially in your own coding. Although the code is not stable and thoroughly tested and it can most definitely be extensively improved with regard to both architecture and performance. I do however think its overall architecture and idea are worth studying. Also look out for my next article which will describe the implementation of this library in a POP3 compliant client library.


  • 2007-07-27: Article created
  • 2007-08-10: Zip file updated
  • 2007-09-03: Article and Zip file updated
  • 2008-06-13: Zip file updated
  • 2008-07-08: Zip file updated


This article, along with any associated source code and files, is licensed under The BSD License


About the Author

Web Developer
Sweden Sweden
No Biography provided

You may also be interested in...

Comments and Discussions

QuestionHow open under vs2015 Pin
strzata5-Jan-17 5:38
memberstrzata5-Jan-17 5:38 
QuestionSamle on how to use this code Pin
Chandrashekar yeskay30-May-13 1:08
memberChandrashekar yeskay30-May-13 1:08 
AnswerRe: Samle on how to use this code Pin
smithimage13-Aug-13 3:13
membersmithimage13-Aug-13 3:13 
GeneralMissing Bcc from .eml file Pin
Brad Bruce22-Dec-10 11:41
memberBrad Bruce22-Dec-10 11:41 
QuestionMime Message Body parser for windows mobile 6 Pin
ChincoCodus3-Nov-10 1:23
memberChincoCodus3-Nov-10 1:23 
AnswerRe: Mime Message Body parser for windows mobile 6 Pin
smithimage4-Nov-10 10:19
membersmithimage4-Nov-10 10:19 
GeneralIncorrect parsing of encoded subject (Extended Field) Pin
Jabberwok123-Mar-10 7:30
memberJabberwok123-Mar-10 7:30 
GeneralRe: Incorrect parsing of encoded subject (Extended Field) Pin
smithimage4-Nov-10 10:10
membersmithimage4-Nov-10 10:10 
GeneralRe: Incorrect parsing of encoded subject (Extended Field) Pin
Jabberwok14-Nov-10 12:25
memberJabberwok14-Nov-10 12:25 
QuestionHow to use it in vc6? Pin
newflying25-Aug-09 2:43
membernewflying25-Aug-09 2:43 
AnswerRe: How to use it in vc6? Pin
smithimage25-Aug-09 3:08
membersmithimage25-Aug-09 3:08 
GeneralRe: How to use it in vc6? Pin
newflying26-Aug-09 2:33
membernewflying26-Aug-09 2:33 
GeneralRe: How to use it in vc6? Pin
smithimage27-Aug-09 10:59
membersmithimage27-Aug-09 10:59 
GeneralParsing RFC822 Headers Pin
Eric Legault2-Jun-09 22:26
memberEric Legault2-Jun-09 22:26 
AnswerRe: Parsing RFC822 Headers Pin
smithimage2-Jun-09 23:35
membersmithimage2-Jun-09 23:35 
GeneralAttached Messages not quite right Pin
Member 59722814-May-09 0:13
memberMember 59722814-May-09 0:13 
AnswerRe: Attached Messages not quite right Pin
smithimage15-May-09 0:32
membersmithimage15-May-09 0:32 
Questionhmm????? Pin
PoweredByOtgc11-May-09 20:25
memberPoweredByOtgc11-May-09 20:25 
AnswerRe: hmm????? Pin
smithimage11-May-09 22:28
membersmithimage11-May-09 22:28 
QuestionParser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
Stewart Roberts29-Apr-09 11:42
memberStewart Roberts29-Apr-09 11:42 
AnswerRe: Parser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
Stewart Roberts30-Apr-09 7:25
memberStewart Roberts30-Apr-09 7:25 
GeneralRe: Parser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
smithimage1-May-09 2:15
membersmithimage1-May-09 2:15 
GeneralRe: Parser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
Stewart Roberts1-May-09 4:21
memberStewart Roberts1-May-09 4:21 
GeneralRe: Parser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
joaosilva9981-May-09 5:34
memberjoaosilva9981-May-09 5:34 
GeneralRe: Parser not reading RFC822 Message attachments or embedded images (in alternate views). Pin
Stewart Roberts1-May-09 6:13
memberStewart Roberts1-May-09 6:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180111.1 | Last Updated 8 Jul 2008
Article Copyright 2007 by smithimage
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid