Extracting Attachments from Outlook Mailboxes using C#






4.27/5 (5 votes)
How to extract attachments from Outlook mailboxes using C#
Introduction
My personal mailbox, with emails going back to the late 90s, is full of old attachments that bloat the PST file, but aren't really needed. The PST file, with attachments, is around 40Gb now.
I decided to write a simple C# console app to extract them to reduce the size of my PST file.
The application itself will perform a few simple tasks:
- Find the root folder in the Outlook Datastore
- Iterate recursively through the folder structure
- Iterate through each email message in each folder, looking for attachments
- When found, save each attachment in a folder structure on the hard disk, representing the Outlook folder structure
Prerequisites
Firstly, create a C# console application in Visual Studio, targeting the .NET 4.5 or higher framework.
The application makes use of the Microsoft.Office.Interop.Outlook assembly, so you'll need to add this as a reference in your project.
The Outlook Primary Interop Assembly (PIA) Reference provides help for developing managed applications for Outlook 2013 and 2016. It extends the Outlook 2013 and 2016 Developer Reference from the COM environment to the managed environment, allowing you to interact with Outlook from a .NET application.
You also need to have Microsoft Outlook installed on your PC - otherwise the Interop assembly has nothing to talk to.
Learn more on MSDN.
Iterating through Outlook Accounts
Before we can go through each folder and email in Outlook, we need to find an actual account, and build the root folder from this.
The root folder is in the format \\foldername\, and the inbox is located one level below this, at \\foldername\Inbox\.
To do this, we simply iterate through the Outlook.Application.Session.Accounts
collection.
Outlook.Application Application = new Outlook.Application();
Outlook.Accounts accounts = Application.Session.Accounts;
foreach (Outlook.Account account in accounts)
{
Console.WriteLine(account.DisplayName);
}
From these, we can derive the root folder name.
Recursing Through Folders
Using the function below, we initially pass it the root folder. It then looks for any child (sub) folders, and passes this to itself recursively, following the folder structure until it reaches the end.
static void EnumerateFolders(Outlook.Folder folder)
{
Outlook.Folders childFolders = folder.Folders;
if (childFolders.Count > 0)
{
foreach (Outlook.Folder childFolder in childFolders)
{
// We only want Inbox folders - ignore Contacts and others
if (childFolder.FolderPath.Contains("Inbox"))
{
Console.WriteLine(childFolder.FolderPath);
// Call EnumerateFolders using childFolder,
// to see if there are any sub-folders within this one
EnumerateFolders(childFolder);
}
}
}
}
Iterating through Emails in a Folder and Listing Their Attachments
Using the function below, we initially pass it the current folder. It will then iterate through the folder.Items
object, which literally contains a collection of the actual email messages in the Outlook folder.
Each email is returned as an item, containing the property .Attachments.Count
, which indicates how many attachments the email message has.
Where this is not zero (!= 0
), we simply list out each attachment in the email. From here, you can save the attachment, delete it, or otherwise process it however you wish.
static void IterateMessages(Outlook.Folder folder)
{
var fi = folder.Items;
if (fi != null)
{
foreach (Object item in fi)
{
Outlook.MailItem mi = (Outlook.MailItem)item;
var attachments = mi.Attachments;
if (attachments.Count != 0)
{
for (int i = 1; i <= mi.Attachments.Count; i++)
{
Console.WriteLine("Attachment: " + mi.Attachments[i].FileName);
}
}
}
}
}
Looking for Specific Types of Attachments
It's quite common for Outlook to store embedded images (such as logos in an email) and other files you wouldn't normally need as attachments, so I create an array of extension types that I'd like to extract, ignoring those that aren't useful to me.
By comparing the attachment filename to the array of extensions, I can then determine what to keep.
As this is only performing a basic string
comparison, any file containing one of the string
s in the array will be identified. For example, both hellowworld.doc (Office) and hellowworld.docx (Office Open XML format from Outlook 2007 onwards) contain .doc so will both be identified.
// attachment extensions to save
string[] extensionsArray = { ".pdf", ".doc",
".xls", ".ppt", ".vsd", ".zip",
".rar", ".txt", ".csv", ".proj" };
if (extensionsArray.Any(mi.Attachments[i].FileName.Contains)) {
// the filename contains one of the extensions
}
Saving and Deleting the Attachments
Saving each attachment is remarkably easy, and the assembly provides a function to perform the save to the local disk. In the example below, pathToSaveFile
is a local disk path, such as c:\temp\.
mi.Attachments[i].SaveAsFile(pathToSaveFile);
Similarly, deleting attachments is as simple as invoking the .Delete
function.
mi.Attachments[i].Delete();
In the example code below, we save each attachment to a folder based on the structure:
(basepath)(accountname)(folderstructure)(sender)
Download
You can download the code to this project from GitHub, or check out the code below.
Download Follow @matthewproctor
The Full Code
/// /// Outlook Attachment Extractor /// Version 0.1 /// Build 2015-Oct-18 /// Written by Matthew Proctor /// www.matthewproctor.com /// using System; using System.Linq; using System.IO; using Outlook = Microsoft.Office.Interop.Outlook; namespace OutlookAttachmentExtractor { class Program { // Path where attachments will be saved static string basePath = @"c:\temp\emails\"; // Integer to store to the total size of all the files - displayed // after processing to indicate possible PST file size saving static int totalfilesize = 0; static void Main(string[] args) { EnumerateAccounts(); } // Uses recursion to enumerate Outlook subfolders. static void EnumerateFolders(Outlook.Folder folder) { Outlook.Folders childFolders = folder.Folders; if (childFolders.Count > 0) { // loop through each childFolder (aka sub-folder) in current folder foreach (Outlook.Folder childFolder in childFolders) { // We only want Inbox folders - ignore Contacts and others if (childFolder.FolderPath.Contains("Inbox")) { // Write the folder path. Console.WriteLine(childFolder.FolderPath); // Call EnumerateFolders using childFolder, // to see if there are any sub-folders within this one EnumerateFolders(childFolder); } } } // pass folder to IterateMessages which processes individual email messages Console.WriteLine("Looking for items in " + folder.FolderPath); IterateMessages(folder); } // Loops through each item (aka email) in a folder static void IterateMessages(Outlook.Folder folder) { // attachment extensions to save string[] extensionsArray = { ".pdf", ".doc", ".xls", ".ppt", ".vsd", ".zip", ".rar", ".txt", ".csv", ".proj" }; // Iterate through all items ("messages") in a folder var fi = folder.Items; if (fi != null) { try { foreach (Object item in fi) { Outlook.MailItem mi = (Outlook.MailItem)item; var attachments = mi.Attachments; // Only process item if it has one or more attachments if (attachments.Count != 0) { // Create a directory to store the attachment if (!Directory.Exists(basePath + folder.FolderPath)) { Directory.CreateDirectory(basePath + folder.FolderPath); } // Loop through each attachment for (int i = 1; i <= mi.Attachments.Count; i++) { // Check wither any of the strings in the // extensionsArray are contained within the filename var fn = mi.Attachments[i].FileName.ToLower(); if (extensionsArray.Any(fn.Contains)) { // Create a further sub-folder for the sender if (!Directory.Exists(basePath + folder.FolderPath + @"\" + mi.Sender.Address)) { Directory.CreateDirectory(basePath + folder.FolderPath + @"\" + mi.Sender.Address); } totalfilesize = totalfilesize + mi.Attachments[i].Size; if (!File.Exists(basePath + folder.FolderPath + @"\" + mi.Sender.Address + @"\" + mi.Attachments[i].FileName)) { Console.WriteLine("Saving " + mi.Attachments[i].FileName); mi.Attachments[i].SaveAsFile(basePath + folder.FolderPath + @"\" + mi.Sender.Address + @"\" + mi.Attachments[i].FileName); // Uncomment next line to delete attachment after saving it // mi.Attachments[i].Delete(); } else { Console.WriteLine("Already saved " + mi.Attachments[i].FileName); } } } } } } catch (Exception e) { // Console.WriteLine("An error occurred: '{0}'", e); } } } // Retrieves the email address for a given account object static string EnumerateAccountEmailAddress(Outlook.Account account) { try { if (string.IsNullOrEmpty(account.SmtpAddress) || string.IsNullOrEmpty(account.UserName)) { Outlook.AddressEntry oAE = account.CurrentUser.AddressEntry as Outlook.AddressEntry; if (oAE.Type == "EX") { Outlook.ExchangeUser oEU = oAE.GetExchangeUser() as Outlook.ExchangeUser; return oEU.PrimarySmtpAddress; } else { return oAE.Address; } } else { return account.SmtpAddress; } } catch (Exception ex) { Console.WriteLine(ex.Message); return ""; } } // Displays introduction text, lists each Account, and prompts user to select one for processing. static void EnumerateAccounts() { Console.Clear(); Console.WriteLine("Outlook Attachment Extractor v0.1"); Console.WriteLine("---------------------------------"); int id; Outlook.Application Application = new Outlook.Application(); Outlook.Accounts accounts = Application.Session.Accounts; string response = ""; while (true == true) { id = 1; foreach (Outlook.Account account in accounts) { Console.WriteLine(id + ":" + EnumerateAccountEmailAddress(account)); id++; } Console.WriteLine("Q: Quit Application"); response = Console.ReadLine().ToUpper(); if (response == "Q") { Console.WriteLine("Quitting"); return; } if (response != "") { if (Int32.Parse(response.Trim()) >= 1 && Int32.Parse(response.Trim()) < id) { Console.WriteLine("Processing: " + accounts[Int32.Parse(response.Trim())].DisplayName); Console.WriteLine("Processing: " + EnumerateAccountEmailAddress(accounts[Int32.Parse(response.Trim())])); Outlook.Folder selectedFolder = Application.Session.DefaultStore.GetRootFolder() as Outlook.Folder; selectedFolder = GetFolder(@"\\" + accounts[Int32.Parse(response.Trim())].DisplayName); EnumerateFolders(selectedFolder); Console.WriteLine("Finished Processing " + accounts[Int32.Parse(response.Trim())].DisplayName); Console.WriteLine(""); } else { Console.WriteLine("Invalid Account Selected"); } } } } // Returns Folder object based on folder path static Outlook.Folder GetFolder(string folderPath) { Console.WriteLine("Looking for: " + folderPath); Outlook.Folder folder; string backslash = @"\"; try { if (folderPath.StartsWith(@"\\")) { folderPath = folderPath.Remove(0, 2); } String[] folders = folderPath.Split(backslash.ToCharArray()); Outlook.Application Application = new Outlook.Application(); folder = Application.Session.Folders[folders[0]] as Outlook.Folder; if (folder != null) { for (int i = 1; i <= folders.GetUpperBound(0); i++) { Outlook.Folders subFolders = folder.Folders; folder = subFolders[folders[i]] as Outlook.Folder; if (folder == null) { return null; } } } return folder; } catch (Exception ex) { Console.WriteLine(ex.Message); return null; } } } }
Testing
I've tested this code on mailboxes hosted with an on-premises Exchange 2013 environment, Office 365 and a POP3/IMAP mailbox as well - all functioning exactly the same.
Further Reading
The links below provide more information on how to use the Outlook Interop service.