|
Introduction
There are already other articles here on The Code Project that shows how to decode Base64 and Quoted-Printable, but they all use MFC. I needed some code that didn't use MFC, so I wrote AMMimeUtils.
I wrote these classes because I was working with receiving and sending emails and Usenet messages. Almost all email messages and attachments are either Base64 or Quoted-Printable encoded. Attachments in Usenet messages are often UU encoded, I still need to write a class to handle this, but it might come in a later version.
When you get an email, the subject and other header fields might also be encoded, so this code also includes some code to decode these fields. Different mail programs encode the subject in different ways. The following text: Just a small text (for demo), and some more text...
can look both like =?iso-8859-1?Q?Just a small text =28for demo=29, and some more text...?=
or like Just a small text =?iso-8859-1?Q?=28for demo=29?=, and some more text...
The first line is easy, because we can see that the entire string is encoded with Quoted-Printable (the ?Q? part means Quoted-Printable). In the second string, it's only a part of it that's encoded, so we have to get the first non-encoded part, decode the encoded part, and get the last non-encoded part, and add the 3 parts together to get the final subject.
I made a function char* MimeDecodeMailHeaderField(char *s); to handle this. If you have a string called s containing the subject you want to decode, simply call it like this: s = MimeDecodeMailHeaderField(s);
Now s contains the decoded text.
I have 2 classes CBase64Utils and CQPUtils for general encoding and decoding of Base64 and Quoted-Printable. The interface looks like:
class CBase64Utils
{
private:
int ErrorCode;
public:
int GetLastError() {return ErrorCode;};
CBase64Utils();
~CBase64Utils();
char* Decode(char *input, int *bufsize);
char* Encode(char *input, int bufsize);
};
class CQPUtils
{
private:
char* ExpandBuffer(char *buffer, int UsedSize,
int *BufSize, bool SingleChar = true);
int ErrorCode;
public:
int GetLastError() {return ErrorCode;};
char* Decode(char *input);
char* Encode(char *input);
CQPUtils();
~CQPUtils();
};
The only difference is the Decode() and Encode() functions. Quoted-Printable is always text, therefore it only takes one parameter, the string containing encoded text, and returns a pointer to a new buffer containing the decoded text. Base64 might be an encoded binary file, so it puts the length of the returned buffer in the bufsize variable. Then it's possible to save the decoded buffer as a binary file.
Both classes have a function GetLastError(), if you decode something, and this variable is zero, everything is fine, if it's non-zero there was an error in the input, but you still get the (maybe) encoded/decoded result.
Right now, this code only has functions for what I needed when I wrote it. In the future, it I might add some better error handling.
If you want to know more about MIME and email messages, you can take a look at:
- RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One, Format of Internet Message Bodies
- RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two, Format of Internet Message Bodies
- RFC 2044 - Multipurpose Internet Mail Extensions (MIME) Part Three, Format of Internet Message Bodies
- RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 28 (Total in Forum: 28) (Refresh) | FirstPrevNext |
|
|
 |
|
|
 |
|
|
void CUFMovImage::GetData(unsigned char* str) { int pBuffsize =strlen((char *) str); //Set pBuffsize !!! pBase64 = Decode((char*)str, &pBuffsize); if (image4.m_pPicture != NULL) image4.UnLoad();
if (image4.m_pPicture == NULL) { image4.LoadFromBuffer((unsigned char*)pBase64,pBuffsize); } }
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
I think there´s a bug in the function that decodes the mail´s header. My problem was that the realloc gets out of memory. To fix that I´ve declared another char *, malloc memory for it, operating then with that string and returning it at the end.
Gr
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi there I use CBase64Utils to decode a b64 HTML code, and get the original code back. However, there is always a at the end of the decoded HTML? I try other decoder, also try the decode the same source with some email client, with which I can get the correct HTML! I am just wondering if it is a bug.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The bugs are still in the source code, even they are known since some time. Is it to difficult to update the sources?
(Lost 1 hour to understand the 8-bit sign problem)
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I found 2 bugs: one when encoding and one when decoding.
If you attempt to encode a 1 or 2 byte buffer, nothing will encode. The fix is to add: + if (bufsize > 2) + { while (count <= bufsize) ... (about line 266) (and, of course, the closing paren!)
When decoding, the wrong length was reported (from line 423). Delete the line resultlen++; in the 'for' loop, and then after the loop, add int nBits = (count % 4) * 6; resultlen += nBits / 8; I tested this with: #define MAX_BITS 2000 // I varied this, started small ... char bits[MAX_BITS]; for (int i = 1; i < MAX_BITS; ++i) { for (int l = 0; l < i; ++l) bits[l] = i; UINT nSize = l; Base64 encode; // my wrapper std::string moreBits = encode.Encode(bits, nSize); // minor change to api, guts are the same Base64 decode; // separate object as I let it do the memory management char* pOutput; size_t len; decode.Decode(moreBits, bits, len); ASSERT(len == nSize); ASSERT(memcmp(pOutput, bits, len) == 0); } ...
David Connet http://www.agilityrecordbook.com
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
I was using your code for a some time and I found that sometimes it fails to decode quoted-printable format of some e-mail messages.
I found that the problem is that those messages don't use capital letters in QP text.
Here is a modified code to handle decoding:
bool ok = true; for (i = 0; i < 2; i++) { if (hexmap[ toupper(input[i]) ] == SKIP) { //we have an error, or a linebreak, in the encoding... ok = false; if (input[i] == '\r' && input[i + 1] == '\n') { input += 2; //*(result++) = '\r'; //*(result++) = '\n'; break; } else { //we have an error in the encoding... bError = TRUE; //s--; } } mid[i] = toupper(input[i]); }
My modification was to add toupper calls in those 2 places Best regards! Irek
Check out my software at: http://www.ireksoftware.com
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
I'm not sure I would call that an improvement. The MIME RFC specifies that the quoted printable characters must always be in uppercase ("Uppercase letters must be used; lowercase letters are not allowed." / RFC 20459). So using toupper would violate this requirement and hence make your implementing a bit buggier..
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
I agree. But the world is not perfect. There is a lot of software around that produces non RFC content. If you want to create a reliable e-mail reader (like in my case) you need to handle also a non fully RFC compatible messages (otherwise your users will start complaing that your e-mail reader sometimes don't work (and MS Outlook has no problem to decode the message)).
Check out my software at: http://www.ireksoftware.com
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Line: 322 in AMMimeUtils.cpp Function: CBase64Utils::Encode(char *input, int bufsize) 320: unsigned char mid = (256 - (0 - *s)); 321: tmp |= mid; 322: //tmp |= *s; 323: tmp <<= 8; 324: count++; 325: s++;
KG
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
your softwar doesn't work if you have lot of "=?iso-8859-1?Q?" in the string (exception....) and you've some troubles with your free !!!! DevCrazy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I can't see how this class will be able to encode/decode binary data since input values are char instead of unsigned char. I tried to encoded an array of numbers without success 
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I needed also this feature. In fact the char is not a problem here. The problem is that output is threated as a ascii string. Here is a modified, stand alone decode funtion + unchanged hex table. I decided also to dump some of pointers (I dont like using raw pointers too much) and added stl::string instead.
This is how I call this funtion to decode QP binnary attachment from mine message and write it to disk file:
char* pPointerToQPData; std::string sOutput;
long lFileSize = QuotedPrintable_DecodeEx(pPointerToQPData, sOutput); CFile file; file.Open("C:\\file.bin", CFile::modeCreate|CFile::modeReadWrite); file.Write(sOutput.c_str(), lFileSize); file.Close();
//I hope this will help. //Best regards, Irek Zielinski, Krakow, Poland
---- the code:
#include
#define SKIP '\202'
const char hexmap[] = { SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, 10, 11, 12, 13, 14, 15, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP };
/* Return value: length of binary data */ int QuotedPrintable_DecodeEx(const char *input, std::string& a_sOutput) { BOOL bError = FALSE; int iFileLen = 0; a_sOutput.resize(strlen(input)+1);
while (*input != '\0') //loop through the entire string... { if (*input == '=') //woops, needs to be decoded... { for (int i = 0; i < 3; i++) //is s more than 3 chars long... { if (input[i] == '\0') { //error in the decoding... bError = TRUE; return iFileLen; } } char mid[3]; input++; //move past the "=" //let's put the hex part into mid... bool ok = true; for (i = 0; i < 2; i++) { if (hexmap[input[i]] == SKIP) { //we have an error, or a linebreak, in the encoding... ok = false; if (input[i] == '\r' && input[i + 1] == '\n') { input += 2; //*(result++) = '\r'; //*(result++) = '\n'; break; } else { //we have an error in the encoding... bError = TRUE; //s--; } } mid[i] = input[i]; } //now we just have to convert the hex string to an char... if (ok) { input += 2; int m = hexmap[mid[0]]; m <<= 4; m |= hexmap[mid[1]]; a_sOutput[iFileLen++] = m; } } else { if (*input != '\0') { a_sOutput[iFileLen++] = *(input++); } } } return iFileLen; }
Check out my software at: http://www.ireksoftware.com
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi,
If its not too much trouble, can you tell me the algorithm that I can use with your class to predetermine the encoded length of an uncoded buffer.
For example, lets say that I will be encoding a file of data and the filelength is 100. I'm looking for an algorithm to tell me how large the encoded data will be without doing the actual encoding.
Thanks for any help that you can offer.
--Paul K.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I see the line in your BASE64 encoder routine:
int alsize = ((bufsize * 4) / 3); char *finalresult = (char*)calloc(alsize + ((alsize / MaxLineLength) * 2) + (10 * sizeof(char)), sizeof(char));
So you do the standard calc. of * 4 / 3. Then to add in 2 bytes for every line break. But what about this 10 extra bytes tack on at the end. Is this to account for any possible "=" padding characters???
Would this be correct to get the EXACT number of bytes required for the encoded buffer?:
int alsize = ((bufsize * 4) / 3); // Basic calc. alsize = alsize + ((alsize / MaxLineLength) * 2) // Add CRLFs while( bufsize % 4 != 0 ) // Add padding amount if needed. alsize++;
I'd just to the encoding and get the length of the string returned but in my case I have to know the length beforehand.
--Paul K.
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
This code return a truncated string : CQPUtils qp2; char *result22 = qp2.Encode(buf);
Example: Microsoft welcome message in Outlook after calling this routine looks like:
Welcome to
Microsoft Outlook 2000
One Window to Your World of Information
=9
---- The string is much longer than this. Please help
|
| Sign In·View Thread·PermaLink | 3.00/5 (2 votes) |
|
|
|
 |
|
|
I think I found the bug: when mids = "9", mids[1]='\0' and it will insert '\0' and terminate the string!
//add the hex value for the char... char mids[3]; itoa(mid, mids, 16); strupr(mids); *(fresult++) = '='; *(fresult++) = mids[0]; *(fresult++) = mids[1]; UsedSize += 3; LineLen += 2; s++;
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Change line
//check to see if it's a legal base64 char... while (base64map[*s] == SKIP) to
//check to see if it's a legal base64 char... while (base64map[(unsigned char)(*s)] == SKIP)
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
I'll take a look at it, but I have been testing the class with both executable and zip files, and it seems to be working fine...
- Anders
Money talks, but all mine ever says is "Goodbye!"
|
| Sign In·View Thread·PermaLink | 2.00/5 (3 votes) |
|
|
|
 |
|
|
//we have some remaining chars, now decode them... for (int i = 0; i < 4 - (count % 4); i++) { std <<= 6; resultlen++; }
///*** should be***
//we have some remaining chars, now decode them... for (int i = 0; i < 4 - (count % 4); i++) std <<= 6; resultlen += (count % 4) - 1;
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|