Click here to Skip to main content
15,885,546 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have several thousand emails with order confirmations from online sales. I have exported the emails to a CSV, but the body of the email is contained within one cell. The body of the email contains the primary content that I want to utilize.

My goal is to convert the body content into columns for each type of data. There is a Order Number, Transaction Date/Time, Shipping Address, Billing Address, Email and order detail with line items and pricing.

I'd be happy with the name, email, address and phone number of each client. A bonus would be to have the other data with order details.

I've tried the text to column in Excel, but there are no clear line breaks that I'm able to utilize. I'm a novice with this type of data.

The actual body content seems pretty well structured. Could someone help me with VBA or Excel tips to accomplish my goal? I hope I'm posting this correctly, here is a sample of the layout. All the emails have the same layout/format, but the address lines vary depending on the country of origin.

In the sample below, the format looks good, except the purchase details appear to be wrapped.

Thanks in advance!

"The following order was placed with Merchant:

Order Number : 5
Placed       : 03/19/2014 20:37:09 CDT

Ship To:                                 Bill To:
FIRSTNAME LASTNAME                       FIRSTNAME LASTNAME
email@domain.com                         email@domain.com
5559991111                               5559991111
                                         
                                         
123 Address                              123 Address
City ST 55555                            Portland OR 97210
US                                       US

Code           Name                                    Quantity     Price/Ea.         Total
-------------------------------------------------------------------------------------------
V-BotAAS2      Item Name                                      1        $29.95        $29.95
                                                                                      
                                                            Shipping: Price+:         $6.95
                                                                   Sales Tax:         $0.00
-------------------------------------------------------------------------------------------
                                                                       Total:        $36.90
"
Posted

It's quite often asked question. I have answered it few times. Please use SearchBox[^] at the right-top corner of this site.

Member 10684287 wrote:
I have several thousand emails with order confirmations from online sales. I have exported the emails to a CSV, but the body of the email is contained within one cell. The body of the email contains the primary content that I want to utilize.


I'd suggest you to NOT export emails into csv files, but read it directly from Outlook. There is few properties of MailItem object, such as HTMLBody, wich stores body of email in html format. When invoice is between <table> and </table> tags, you'll be able to read it using RegEx[^]. Please, follow above link to visit my past answers.
 
Share this answer
 
Comments
CHill60 23-Mar-14 14:00pm    
5'd
Maciej Los 23-Mar-14 14:22pm    
Thank you, Chill ;)
Member 10684287 24-Mar-14 0:26am    
Maciej thanks for the response. I did plenty of searches, but sometimes finding the answer is knowing the right question. I searched this site and Google, and found examples of this situation, but wasn't certain how to apply the response to my specific issue. I appreciate you taking the time, and I'll definitely scan your articles for more detail!

First a couple of warnings ...
1. We don't normally just produce code unless the OP has tried something for themselves
2. You're lucky I got intrigued by this, but I haven't made it particularly tidy - there are better ways of doing some bit and you will have to sort out the different address formats for yourself.
3. I wouldn't have done this in VBA - but I've assumed that's the requirement. This solution will port across to VB.NET without too many issues.

How I approached this...
Quote:
the body of the email is contained within one cell
- OK, that means I can just treat it as a string and split it across the lines.
Then I hit my first problem - the same one you did
Quote:
but there are no clear line breaks that I'm able to utilize

So I utilised the immediate window in Excel VBA and typed
? asc(mid$(s, 46,1))
which gave me the answer 10. That means I can treat each cell as a string and split it using chr$(10) to give me the body of the text as an array of strings.
VB
Dim s As String
s = Range("A1").Text

Dim lines() As String
lines = Split(s, Chr$(10))


Then I took what I knew about the text based on observation. I knew I was going to be interested in specific words or phrases, so I put the text into some constants ...
VB
Const order_no = "Order Number : "
Const bill_to = "Bill To:"
Const shipping = "Shipping: Price+:"
Const salestax = "Sales Tax:"
Const total = "Total:"  'NB must have the colon on the end to diffentiate from cpTotal below
Const headers = "Quantity     Price/Ea."    'This identifies the start of the order list
'The text that indicates the individual components within each order
Const cpName = "Name"
Const cpQuant = "Quantity"
Const cpPrice = "Price/Ea."
Const cpTotal = "Total"

Also from observation I noticed that I was going to need to know not just an overview of what the line contained, but also in which position (column) certain things could be found, so
VB
'the column position for each of the components
Dim bill_to_pos As Integer
Dim pCode As Integer
Dim pName As Integer
Dim pQuant As Integer
Dim pPrice As Integer
Dim pTotal As Integer

Then it was a simple case of moving through the array, determining what, if anything, I could get out of each line, and extracting the data I needed using string functions.

Here's the entire function ... not elegant but it works
VB
Option Explicit
Sub Test()

    Const order_no = "Order Number : "
    Const bill_to = "Bill To:"
    Const shipping = "Shipping: Price+:"
    Const salestax = "Sales Tax:"
    Const total = "Total:"  'NB must have the colon on the end to diffentiate from cpTotal below
    Const headers = "Quantity     Price/Ea."    'This identifies the start of the order list
    'The text that indicates the individual components within each order
    Const cpName = "Name"
    Const cpQuant = "Quantity"
    Const cpPrice = "Price/Ea."
    Const cpTotal = "Total"

    'the column position for each of the components
    Dim bill_to_pos As Integer
    Dim pCode As Integer
    Dim pName As Integer
    Dim pQuant As Integer
    Dim pPrice As Integer
    Dim pTotal As Integer


    Dim order_number As String
    Dim ship_to_name As String
    Dim bill_to_name As String
    Dim ship_to_email As String
    Dim bill_to_email As String
    Dim ship_to_phone As String
    Dim bill_to_phone As String

    Dim ship_to_addr1 As String
    Dim bill_to_addr1 As String
    Dim ship_to_addr2 As String
    Dim bill_to_addr2 As String
    Dim ship_to_addr3 As String
    Dim bill_to_addr3 As String
    Dim shipcost As String
    Dim taxcost As String
    Dim totalcost As String

    'These will hold details for the order items
    Const MaxNoItems = 5    'Set this to whatever you want
    Dim bInList As Boolean
    Dim Codes(MaxNoItems) As String
    Dim ItemNames(MaxNoItems) As String
    Dim Quants(MaxNoItems) As String
    Dim Prices(MaxNoItems) As String
    Dim Totals(MaxNoItems) As String
    Dim NoOfItems As Integer

    Dim s As String
    s = Range("A1").Text

    Dim lines() As String
    lines = Split(s, Chr$(10))

    bInList = False 'This will be set to true if we are in the list of items
    NoOfItems = 0   'This is a counter of the number of items found in the order list

    Dim i As Integer
    Dim p As Integer
    For i = 0 To UBound(lines)

        'Get the order number
        p = InStr(lines(i), order_no)
        If p > 0 Then
            order_number = Mid$(lines(i), p + Len(order_no))
        End If

        'Get the position where bill to address starts
        p = InStr(lines(i), bill_to)
        If p > 0 Then
            bill_to_pos = p
            'Assume Names are on the next line
            i = i + 1   'Note be careful as we are amending the loop index
            ship_to_name = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_name = Mid$(lines(i), bill_to_pos)
            i = i + 1
            ship_to_email = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_email = Mid$(lines(i), bill_to_pos)
            i = i + 1
            ship_to_phone = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_phone = Mid$(lines(i), bill_to_pos)
            i = i + 3   'NB skipping blank lines
            ship_to_addr1 = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_addr1 = Mid$(lines(i), bill_to_pos)
            i = i + 1
            ship_to_addr2 = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_addr2 = Mid$(lines(i), bill_to_pos)
            i = i + 1
            ship_to_addr3 = Mid$(lines(i), 1, bill_to_pos - 1)
            bill_to_addr3 = Mid$(lines(i), bill_to_pos)
        End If

        p = InStr(lines(i), headers)
        If p > 0 Then
            'Work out where on the line each of the components will start/end
            pCode = 1
            pName = InStr(lines(i), cpName)
            pQuant = InStr(lines(i), cpQuant)
            pPrice = InStr(lines(i), cpPrice)
            pTotal = pPrice + Len(cpPrice)  ' NOT InStr(lines(i), cpTotal)
            bInList = True
            i = i + 1   'Skip over this line so that we don't try to process it again below
        End If

        'NB handling the shipping and total bit before I handle the individual items
        'This is because if we come across "Shipping: Price+:" we know there are no
        'more items to deal with and we can set bInList back to False
        p = InStr(lines(i), shipping)
        If p > 0 Then
            shipcost = Mid$(lines(i), p + Len(shipping))
            bInList = False
        End If

        p = InStr(lines(i), salestax)
        If p > 0 Then
            taxcost = Mid$(lines(i), p + Len(salestax))
        End If

        p = InStr(lines(i), total)
        If p > 0 Then
            totalcost = Mid$(lines(i), p + Len(total))
        End If

        'This bit handles the individual items
        If bInList Then
            'ignore blank lines and separators
            If Trim$(lines(i)) <> "" And Left$(lines(i), 3) <> "---" Then
                NoOfItems = NoOfItems + 1
                Codes(NoOfItems) = Mid$(lines(i), pCode, pName - 1)
                ItemNames(NoOfItems) = Mid$(lines(i), pName, pQuant - pName - 1)
                Quants(NoOfItems) = Mid$(lines(i), pQuant, pPrice - pQuant - 1)
                Prices(NoOfItems) = Mid$(lines(i), pPrice, pTotal - pPrice - 1)
                Totals(NoOfItems) = Mid$(lines(i), pTotal)
            End If
        End If
    Next

    Debug.Print "Order Number  : " + order_number
    Debug.Print ""
    Debug.Print "Ship to Name  : " + ship_to_name
    Debug.Print "Ship to email : " + ship_to_email
    Debug.Print "Ship to Phone : " + ship_to_phone
    Debug.Print "Ship to Addr1 : " + ship_to_addr1
    Debug.Print "Ship to Addr2 : " + ship_to_addr2
    Debug.Print "Ship to Addr3 : " + ship_to_addr3
    Debug.Print ""
    Debug.Print "Bill To Name  : " + bill_to_name
    Debug.Print "Bill to email : " + bill_to_email
    Debug.Print "Bill to Phone : " + bill_to_phone
    Debug.Print "Bill to Addr1 : " + bill_to_addr1
    Debug.Print "Bill to Addr2 : " + bill_to_addr2
    Debug.Print "Bill to Addr3 : " + bill_to_addr3
    Debug.Print ""
    Debug.Print "Shipping Cost : " + shipcost
    Debug.Print "Sales Tax     : " + taxcost
    Debug.Print "Total Cost    : " + totalcost
    Debug.Print ""
    'These will hold details for the order items
    Debug.Print "Number of items : " + CStr(NoOfItems)
    For i = 1 To NoOfItems
        Debug.Print "-- Code  : " + Codes(NoOfItems)
        Debug.Print "-- Name  : " + ItemNames(NoOfItems)
        Debug.Print "-- Price : " + Prices(NoOfItems)
        Debug.Print "-- Total : " + Totals(NoOfItems)
    Next


End Sub

And here is the output from all those Debug.Print statements at the end (Note I changed some of the numbers to prove I was picking up the correct text)
Order Number  : 5

Ship to Name  : FIRSTNAME LASTNAME
Ship to email : email@domain.com
Ship to Phone : 5559991111
Ship to Addr1 : 123 Address
Ship to Addr2 : City ST 55555
Ship to Addr3 : US

Bill To Name  : FIRSTNAME LASTNAME
Bill to email : email@domain.com
Bill to Phone : 5559991111
Bill to Addr1 : 123 Address
Bill to Addr2 : Portland OR 97210
Bill to Addr3 : US

Shipping Cost :          $6.95
Sales Tax     :          $0.00
Total Cost    :         $36.90

Number of items : 1
-- Code  : V-BotAAS2
-- Name  : Item Name
-- Price :    $29.9
-- Total :         $59.90


It's down to you to convert those strings into something useful.
Also watch out because you can't have macros in CSV files, you may have to convert to XLS or have the spreadsheet import the CSV via another macro
 
Share this answer
 
Comments
Maciej Los 23-Mar-14 13:21pm    
Chill60, why are you using undefined constants? VBA treats them as variant data type. It consumes much more memory than it should.
[EDIT]
+4 for effort!
Please, see my answer ;)
CHill60 23-Mar-14 14:00pm    
Very true ... I draw your attention to my warning #2 :-) I was being lazy. Good point though and I hope the OP and any other reader takes notice of it.
I do prefer your solution (solutions!). I only started to play as I'm re-acquainting myself with the latest versions of Office (... and it was 3 days old)
Member 10684287 24-Mar-14 0:23am    
CHill60 thank you for the detailed response and the time you took to address my issue in detail. I'll give this a try and follow up. I really appreciate it.
CHill60 24-Mar-14 7:11am    
That's my pleasure - thank you for the thank you! Do note my responses to Maciej - his is definitely the better solution, but if you use mine for a tactical "get-it-working-now" fix, note that I should have done things like Const order_no As String = "Order Number : " and not just Const order_no = "Order Number : "

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900