rEmail Part 3 – MIME Introduction

Robert James Metcalf

Rate me:

2.85/5 (8 votes)

11 Jun 20057 min read

33.9K

rEmail is a set of tutorials I am writing describing how I am building tools to download, interpret and send email messages. MIME is the format email messages are encoded in and this article describes the design of my class to decode it.

Introduction

rEmail is a set of tutorials I am writing describing how I am building tools to download, interpret and send email messages (including attachments). This article is the third in the series and describes how I intend to interpret the MIME messages previously downloaded.

It would be fair to point out that there are many articles on this and other sites that detail how you can grab messages from POP servers, and you could say the previous two articles are just repeating this. MIME on the other hand is hardly covered at all and without shelling out big bucks it is very difficult to find resources to help programmers to write applications that use it. In this article, I have documented how I decided to structure my classes that deal with MIME.

Sources

The subject of MIME, essential for sending and receiving attachments, is lightly covered for a reason. The simple process of sending and receiving emails has been designed in such a way that it is not in fact simple at all. Programmers wanting to create software that uses MIME need to start by reading RFC 822, then continue on and read RFC 2045 – 2049 which cover MIME itself. There are other related RFC’s, but taking these alone gives:

RFC 822 – 47 Pages
RFC 2045 – 31 Pages
RFC 2046 – 44 Pages
RFC 2047 – 15 Pages
RFC 2048 – 21 Pages
RFC 2049 – 24 Pages

When I have finished all my articles, readers should have a simple class that shows a message pretty much like an email program would. Having subject, body and attachments etc. As someone who wants to write code that reads/sends email with attachments, I would like to download this final state and get on with the next task. The reason I am breaking down my articles on MIME is because of the complexity of the format. First of all, I will only implement and test the bits that I need. I am sure that some readers will want to do something with this class that I don’t. Secondly my class may pass the tests I put it through but I have no doubt that it will contain bugs and not properly match what is required from the RFC’s. If $soft can’t do it with armys of highly paid programmers, you shouldn’t expect me to write bug free standards compliant code.

My strategy to reduce the amount of time that fellow programmers will use trying to get their programs working will be to explain exactly my design and importantly how to extend my design to meet further needs. So the next few articles will take you through a series of classes that are under the bonnet of my MIME implementation. This is why this article has no code but reading it will help you understand the code in the next article.

My Design

MIME is somewhat recursive. You have a main message that can be split into two parts. A header and a body. The header will contain to, from subject, as well as some more fields that describe what the body is. The body is separated from the header by a blank line. Some of the fields in the header tell you how to interpret the body. This could include some ‘multipart’ types that basically are more parts with headers and bodies. This recursive nature forms the basis of my design.

Part: A piece of message that has a header and a body associated with it.

The basic idea of my design is summarized in the diagram above. It shows the main classes of object I will use and their relationship. rMIME_part will carry all the functionality common to all the different types of body I can have. The other rMIME classes inherit from it and add to the functionality. This diagram shows the relationship between the types of object, it is also useful to see what the structure of the instances of the actual objects will be:

This is a shot from an appWizard program I threw together to test the functionality of my code. (It will be included in future article in this series, hopefully with another pane on the right showing the contents of the selected object.) It shows how the objects will relate to each other. As you can see, the rMIME_message class is the main class that is created first. Some messages are made up of just simple text bodies, others have more complicated structures. You will note the Null Item lines with ‘?’ icons. These are the bodies in the classes that are held as char buffers rather than instances of other objects. For some of them, the Application/wnd.ms-excel types in this example, I will create more rMIME_* classes to hold and access the functionality to deal with them. For the text/plain, I have made the decision not to do this. At the moment, I see no reason why accessing the char buffer is not good enough for my purposes. It may not be the same case for all readers. If you needed to interpret the char set header field, for example, you will need to create another class derived from rMIME_part.

At this point, I would like to note an issue that bugged me for some time before I concluded that this method was a good way of doing things. The issue is the header fields. A rMIME_message instance will have from, to, and subject header fields. These fields relate to the object they are found in, however other header fields, namely content-type and other content header fields relate to the body of the part they are found in. I figured out that this was acceptable but you must be clear in your mind when reading the header variables as to which object they refer to.

That is the main idea of the structure.

I also needed to put some thought into how this structure is created. The stages for received email will be something like as follows:

Create an instance of rMIME_message.
Load the message into its temporary bugger.
Process the message (this does the job of creating objects for all the parts if it can and decoding the message so it’s not in base64 or quoted printable any more).
If the process function returned true, you can call rMIME_message member functions to access the information.

This will translate to something like:

rMIME_message* pMes = new rMIME_message();
pMes->ResetBuffer(BUFFER_SIZE * 8);

int bytes_read = 0;
int total_bytes_read = 0;
int c=0;
char buf[BUFFER_SIZE];
while( !feof( f ) )
{
    bytes_read = fread(buf,sizeof(char),BUFFER_SIZE,f);
    total_bytes_read += bytes_read;
    pMes->AddToBuffer(buf,bytes_read);
};
fclose(f);

if (!pMes->ProcessBuffer()) {
    delete pMes;
    return false;
};

One of the requirements of my project is that, I will need to forward messages received unaltered, with all the header fields intact and untouched, so the class will need to be able to rebuild itself. Due to the recursive nature of MIME, it would be easy to attach the rMIME_message I received to another rMIME_message object I will send. (I will probably actually use a rMIME_MultiPart message to forward it with some body text as well).

Of course, you will have to create MIME messages for sending. I think the process may look something like:

Create an instance of rMIME_message.
Add sub objects to it.

Next...

That’s it so far. The next article will include code to receive and interpret MIME messages. It won’t use the code from rTCPIP or rPOP but these will be linked in at a later stage. It will include my MIME message viewer. This program can read the files created by the save email program provided in the last article. Don’t forget to build up a stock of test email messages. This is what I have done when writing the code and when I first ran it, it crashed on pretty much every message I gave it. I have ironed out quite a few bugs in the system simply by opening loads of messages with this program, and making it work! I hope I have found most of the problems but I daresay that there are some MIME messages out there that can break it.

Another thing to note is that, I have been having problems with the files that the message saves. Firstly, if I alter them and save them in notepad, it messes up all the line breaks. I think this is because the files use CRLF as the line break not what Windows uses. As when my final system is built, these classes will need to read data direct from the buffer, not from a file. I will have to live with this. Also, Windows search facility will not search these files. This is annoying me and I would be interested to know if anyone else has similar problems with these files.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Robert James Metcalf

Web Developer

United Kingdom

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.