Click here to Skip to main content
15,663,711 members
Please Sign up or sign in to vote.
5.00/5 (2 votes)
See more:
I am trying to find out how to analyze raw data of a Gmail email and save its elements (i.e. Body, Subject, Date and attachments). I found code samples for parsing multi-part data which I think can be used.

Raw Gmail Data - See this explaination[^]

What I am looking for is a specific solution for Gmail raw data, being a complex example on MIME, having multiple types of elements embedded (images, HTML / rich text, attachments). My goal is to extract these elements and store them separately. I am not looking for an interactive protocol such as POP3 but a static one, meaning that with this raw data, one can get the entire email along with its elements even when offline.

IDE - I am using Visual Studio Ultimate, C++ along with Win32 API.

What I have tried:

For example this article[^] seems to have the building blocks for parsing such email. However I am looking for a solution dedicated to such raw data, as this type of data is quite complex, combining various elements, attachments, all in one file (or block of data).

Here is my current code.

LPCSTR szMailId, LPCSTR szMailBody;
while ((*szMailBody == ' ') || (*szMailBody == '\r') || (*szMailBody == '\n'))
char deli[] = "<pre class=\"raw_message_text\" id=\"raw_message_text\">";
szMailBody = strstr(szMailBody, deli);
szMailBody += strlen(deli);

CStringA Body = szMailBody;
Body = Body.Left(Body.Find("<//pre><//div><//div><//div><//body><//html>"));
Body = Body.Mid(Body.Find("<html>"));

szMailBody = Body.GetString();
if (c.Parse(szMailBody) != MIMELIB::MIMEERR::OK)
// Get some headers
auto senderHdr = c.hval("From");
string strDate = c.hval("Date");    // Example Sat, 13 Jan 2018 07:54:39 -0500 (EST)
auto subjectHdr = c.hval("Subject");

auto a1 = c.hval("Content-Type", "boundary");
// Not a multi-part mail if empty
// Then use c.Decode() to get and decode the single part body
if (a1.empty())
vector<MIMELIB::CONTENT> Contents;
MIMELIB::ParseMultipleContent2(szMailBody, strlen(szMailBody), a1.c_str(), Contents);

int i;
for (i = 0; i < Contents.size(); i++)
    vector<char> d;
    string type = Contents[i].hval("Content-type");
    d = Contents[i].GetData(); // Decodes from Base64 or Quoted-Printable
Updated 15-Jan-18 7:33am

In addition to your thread at the mentioned article which seems to have solved your problem and to have an answer here:

There are no Gmail specific "raw" data. It is the format of mail messages as defined by RFC 2822: Internet Message Format[^] and related RFCs like RFC 2045 - 2049 for the MIME extensions.

Those RFCs contain the necessary information to write a parser.

Example code using the mimelib.h file from the mentioned article. Compiled and tested with VS 2017. Requires /Zc:strictStrings-.

#include "stdafx.h"

#include <windows.h>
#include <WinInet.h>
#include <string>
#include <sstream>
#include <vector>
#include <memory>
#include <intrin.h>

using namespace std;

#include "mimelib.h"

#pragma comment(lib, "crypt32")

MIMELIB::MIMEERR ParsePart(MIMELIB::CONTENT& c, const char* szPart = "")
    auto boundary = c.hval("Content-Type", "boundary");
    // Single part
    if (boundary.empty())
        std::string strPart = (szPart && *szPart) ? szPart : "1";
        auto typeHdr = c.hval("Content-Type");
        if (typeHdr.empty())
            wprintf(L"Part %hs: Default (single)\n", strPart.c_str());
            typeHdr = "text/plain;";
            wprintf(L"Part %hs: %hs\n", strPart.c_str(), typeHdr.c_str());
        auto fileName = c.hval("Content-Disposition", "filename");
        if (fileName.empty())
            // Create a file name from part and an extension that matches the content type
            std::string ext = "txt";
            auto subTypeS = typeHdr.find('/');
            auto subTypeE = typeHdr.find(';');
            if (subTypeS > 0 && subTypeE > subTypeS)
                ext = typeHdr.substr(subTypeS, subTypeE - subTypeS);
            if (ext == "plain")
                ext = "txt";
            else if (ext == "octet-stream")
                ext = "bin";
            fileName = "Part";
            fileName += strPart;
            fileName += '.';
            fileName += ext;
        // Get the decoded body of the part
        vector<char> partData;
        // TODO: Decode fileName if it is inline encoded
        FILE *f;
        errno_t err = fopen_s(&f, fileName.c_str(), "wb");
        if (err)
            char errBuf[128];
            strerror_s(errBuf, err);
            fwprintf(stderr, L" Failed to create file %hs: %hs\n", fileName.c_str(), errBuf);
            fwrite(, partData.size(), 1, f);
            wprintf(L" Saved part to file %hs\n", fileName.c_str());
        // Decoded part of mail (full mail with top level call)
        auto data = c.GetData();
        // Split it into the boundary separated parts 
        vector<MIMELIB::CONTENT> Contents;
        merr = MIMELIB::ParseMultipleContent2(, data.size(), boundary.c_str(), Contents);
        if (MIMELIB::MIMEERR::OK == merr)
            int part = 1;
            for (auto & cp : Contents)
                std::string strPart;
                if (szPart && *szPart)
                    strPart = szPart;
                    strPart += '.';
                char partBuf[16];
                _itoa_s(part, partBuf, 10);
                strPart += partBuf;
                ParsePart(cp, strPart.c_str());
    return merr;

int main(int argc, char *argv[])
    if (argc < 2)
        fwprintf(stderr, L"Usage: ParseMail <file>\n");
        return 1;
    struct _stat st;
    if (_stat(argv[1], &st))
        fwprintf(stderr, L"File %hs not found\n", argv[1]);
        return 1;
    FILE *f = NULL;
    errno_t err = fopen_s(&f, argv[1], "rb");
    if (err)
        char errBuf[128];
        strerror_s(errBuf, err);
        fwprintf(stderr, L"File %hs can't be opened: %hs\n", argv[1], errBuf);
        return 1;
    char *buf = new char[st.st_size + 1];
    fread(buf, 1, st.st_size, f);
    buf[st.st_size] = 0;

    MIMELIB::MIMEERR merr = c.Parse(buf);
    if (merr != MIMELIB::MIMEERR::OK)
        fwprintf(stderr, L"Error pasing mail file %hs\n", argv[1]);
        auto senderHdr = c.hval("From");
        auto dateHdr = c.hval("Date");
        auto subjectHdr = c.hval("Subject");
        wprintf(L"From: %hs\n", senderHdr.c_str());
        wprintf(L"Date: %hs\n", dateHdr.c_str());
        wprintf(L"Subject: %hs\n\n", subjectHdr.c_str());
        merr = ParsePart(c);
    delete[] buf;
    return merr;

Example output for a multipart mail:
From: [redacted]
Date: Tue, 26 Sep 2017 09:44:15 +0200
Subject: =?ISO-8859-1?Q?WG=3A_Haftverzichtserkl=E4rung_f=FCr_[...]_Fa=2E_S
iS?=; =?ISO-8859-1?Q?_-_EMB_168_-_12=2E10=2E2017?=

Part 1.1: text/plain; charset="UTF-8"
 Saved part to file Part1.1.txt
Part 1.2: text/html; charset="UTF-8"
 Saved part to file Part1.2.html
Part 2: application/octet-stream; name="HaVerzSiS.pdf"
 Saved part to file HaVerzSiS.pdf
Part 3: image/jpeg; name="Liegeplatz FS EMB.jpg"
 Saved part to file Liegeplatz FS EMB.jpg
Share this answer
Michael Haephrati 9-Jan-18 5:34am    
Unfortunately the thread correspondence didn't solve my problem. So can you point to me of a source code example for parsing this format? (RFC 2822: Internet Message Format)
Jochen Arndt 9-Jan-18 5:59am    
I had to search for one too but "c++ mime parser" gets lots of results. I would look for a well known / often used library.

I suggest also to extent the code from the article to know where the parsing fails. Maybe your mail is not well formed (unlikely because it would be complained by a mail client), the article code is not RFC compliant, or does not support a specific part used by your mail.

You might also add the failing mail content to your question (anonymised and with shortened MIME data lines = only first and last line) and/or at the article. Then I and others may have a look.
Michael Haephrati 9-Jan-18 6:02am    
Its not about a specific mail but in general and I am reading the emails directly from If you use gmail you can try with any gmail message and see.
Jochen Arndt 9-Jan-18 6:24am    
Maybe it is a problem of using the article code and/or getting the mail content. You have to pass it as it is (a std::string). That is: No conversions for line feeds (must be CR-LF) or character encodings (must be 7/8 bit).

For example save the content to a file and check it with a text and/or hex editor. If it looks OK, read the file into a char[] buffer, append a NULL byte and assign that to a std::string. Then it should work. If not, you should ask again in the article forum.
Michael Haephrati 9-Jan-18 6:58am    
If I save the data into a file, and name it test.html, it will be shown properly. Can I hire you privately to do such job?
Here is a copy of a typical Gmail message (interestingly from you):
Delivered-To: anonymousnam AT
Received: by with SMTP id 70csp1706370jao;
        Fri, 12 Jan 2018 02:08:02 -0800 (PST)
X-Google-Smtp-Source: ACJfBovDfSZaL48gp2hiXRdWrkQ2fN4ADImypAfgO6nn3bL9YXe9pyOS1NCsj6nejU8n0AFGoP8W
X-Received: by with SMTP id o75mr23661888iod.219.1515751682675;
        Fri, 12 Jan 2018 02:08:02 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1515751682; cv=none;; s=arc-20160816;
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;; s=arc-20160816;
ARC-Authentication-Results: i=1;;
       spf=pass ( domain of anonymousnam AT designates as permitted sender) smtp.mailfrom=anonymousnam AT
Return-Path: <anonymousnam AT>
Received: from ( [])
        by with ESMTPS id l64si3747147iof.279.2018.
        for <anonymousname AT>
        (version=TLS1 cipher=AES128-SHA bits=128/128);
        Fri, 12 Jan 2018 02:08:02 -0800 (PST)
Received-SPF: pass ( domain of anonymousname AT designates as permitted sender) client-ip=;
       spf=pass ( domain of anonymousname AT designates as permitted sender) smtp.mailfrom=anonymousname AT
Message-Id: <>
Received: from CP-WEB2 ( []) by (Postfix) with ESMTP id 2FB2E1E0DE8 for <anonymousname AT>; Fri, 12 Jan 2018 04:55:19 -0500 (EST)
MIME-Version: 1.0
From: CodeProject Answers <anonymousname AT>
To: anonymousname AT>
Date: 12 Jan 2018 05:08:01 -0500
Subject: CodeProject | A reply was posted to your comment
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.="><html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
<meta name=3D"viewport" content=3D"width=3Ddevice-width">
</head><body style=3D"background-color: white;font-size: 14px; font-family:=
 'Segoe UI', Arial, Helvetica, sans-serif">
<style type=3D"text/css">
body,table,p,div { background-color:white; }
body,table,p,div { background-color:white; }
body, td, p,h1,h2,h6,h3,h4,li,blockquote,div{ font-size:14px; font-family: =
'Segoe UI', Arial, Helvetica, sans-serif; } =20
h1 {font-size: 26px; font-weight: bold; color: #498F00; margin-bottom:5px;m=
argin-top:0px;} =20
h2 { font-size: 24px; font-weight: 500; }
h4 { font-size: 16px; }
h3 {font-size: 11pt; font-weight:bold;} =20
h6 {font-size:6pt;color:#666;margin:0;} =20
table =09=09=09{ width: 100%;} =20
table.themed =09{ background-color:#FAFAFA; } =20
a =09=09=09=09{ text-decoration:none;} =20
a:hover =09=09{ text-decoration:underline;} =20
.tiny-text=09=09{ font-size: 12px; }
.desc =09=09=09{ color:#333333; font-size:12px;}
.themed td  =09{ padding:2px; } =20
.themed .alt-item { background-color:#FEF9E7; } =20
.header =09=09{ font-weight:bold; background-color:#FF9900; vertical-align:=
middle;} =20
.footer =09=09{ font-weight:bold; background-color: #488E00; color:White; v=
ertical-align:middle; }
.signature =09=09{ border-top: solid 1px #CCCCCC; padding-top:0px; margin-t=
op:10px; max-height:150px; overflow:auto;}

.content-list=09=09{ margin-bottom: 17px;}
.content-list-item=09{ margin:     10px 0; }
.doctype img=09=09{ vertical-align:bottom; padding-right:3px;}
.entry=09=09=09    { font-size: 14px; line-height:20px; margin: 0;}
.title=09=09=09    { font-size: 16px; font-weight:500; padding:0; }
.entry=09=09=09=09{ font-size: 14px; color:#666; }
.author, .author a  { font-size: 11px; font-weight:bold; }
.location=09=09    { font-size: 11px; font-weight:bold; color: #999}
.summary            { font-size: 12px; color: #999; padding: 0px 0 10px; }
.theme-fore         { color: #f90; }
.theme-back         { background-color: #f90; }
<table cellspacing=3D"1" cellpadding=3D"3" class=3D"header" border=3D"0" st=
yle=3D"background-color: #FF9900;width: 100%;font-weight: bold;vertical-ali=
gn: middle"><tbody><tr><td style=3D"font-size: 14px; font-family: 'Segoe UI=
', Arial, Helvetica, sans-serif">
<img border=3D"0" src=3D"
/Img/logo225x40.gif" width=3D"225" height=3D"40"></td></tr></tbody></table>

<p style=3D"background-color: white;font-size: 14px; font-family: 'Segoe UI=
', Arial, Helvetica, sans-serif">Michael Haephrati has pos=
ted a reply to your comment about=20
"<a href=3D"
de-Cplusplus-Microsoft-UTF-Nati?cmt=3D969334#cmt969334" style=3D"text-decor=
ation: none">Conversion to Unicode (C++, Microsoft, UTF-16, Native Windows)=

<blockquote style=3D"font-size: 14px; font-family: 'Segoe UI', Arial, Helve=
tica, sans-serif">Apparently there is no expiration date to questions and w=
hen I looked for unanswered questions, I got here...</blockquote>

<hr class=3D"divider" noshade=3D"noshade" size=3D"1">
<div style=3D"background-color: white;font-size: 14px; font-family: 'Segoe =
UI', Arial, Helvetica, sans-serif"><a href=3D"" =
style=3D"text-decoration: none">CodeProject</a></div>
<div class=3D"small" style=3D"background-color: white;font-size: 14px; font=
-family: 'Segoe UI', Arial, Helvetica, sans-serif">Note: T=
his message has been sent from an unattended email box.</div>

So you can see that the message headers are control labels followed by a colon. The start of the message body is separated from the message headers by a blank line, and may be rich text, HTML or plain text. The mail RFC lists all the possible header label names.
Share this answer
Jochen Arndt 12-Jan-18 6:06am    
Now we and mail harvesters run by spammers know your Gmail address.

You should make them both anonymous.
Richard MacCutchan 12-Jan-18 8:58am    
Yes, thanks for noticing.
Michael Haephrati 12-Jan-18 7:01am    
Is this your solution??? the question is how can such raw data be converted to the email's ingredients, such as inline photos, attachments, etc.
Richard MacCutchan 12-Jan-18 9:06am    
Images, attachments etc are usually converted to base64 encoding. As mentioned by me and Jochen, all this information is freely available. Perhaps you could show some of your code and explain exactly what your problem is.
Michael Haephrati 12-Jan-18 10:52am    
I updated the question with my recent source code. This source code is from the point that szMailBody contains the email's raw data and using mimelib ( ) I am trying to extract from the raw data all elements (embedded images, HTML body, attachments, etc.) assuming that can be done statically, i.e. with no need for any interaction with the Gmail server.
Problem, I am expecting Contents to contain all elements but it doesn't.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900