The cyotek.com receives an awful lot of spam and a lot of this is sent to email addresses that don't exist. However, as we currently have catch all's enabled, it means we receive it regardless. This is compounded by the fact that I tend to create a unique email address for each website or service I interact with. And it's impossible to remember them all!
As a first step to deleting the catch alls, I wanted to see how many unique @cyotek.com addresses were in use. The simplest way of picking up these would be scanning PST files - we have email going back to 2002 in these files, and there's the odd backup elsewhere going back even further. Last time I used OLE Automation with Outlook was back in the days of VB6 and I recall well getting plagued with permission dialogs each time I dreamed of trying to access the API. Still, I thought I'd take a look.
Setting up
Note: I tested this project on an Outlook profile which has loaded a primary PST, an archive PST, and a Gmail account. I haven't tested this with any other type of account (for example Exchange) or with accounts using non-SMTP email addresses. Caveat emptor!
The first thing to do is add a reference to the Outlook COM objects. I have VS2010 and VS2012 installed on this machine, and one of them has installed a bunch of prepared Office Interop DLL's into the GAC. Handy, I won't have to create my own! Adding a reference to the Microsoft Outlook 14.0 Object Library added three references, Microsoft.Office.Interop.Outlook.dll, Office.dll and stdole to my project.
Note: Depending on your version of VS / .NET Framework, the references may have a property named Embed Interop Types which defaults to
true
. When left at this, you may have problems debugging as you won't be able to access the objects properly through the Immediate window, instead getting an error similar to"Member 'To' on embedded interop type 'Microsoft.Office.Interop.Outlook.MailItem' cannot be evaluated while debugging since it is never referenced in the program. Consider casting the source object to type 'dynamic' first or building with the 'Embed Interop Types' property set to false when debugging"
Probably a good idea to set this to false before debugging your code!
Connecting to Outlook
All the code below assumes that you have a using Microsoft.Office.Interop.Outlook; statement at the top of your code file.
Connecting to Outlook is easy enough, just create a new instance of the Application interface. We'll use as a root for everything else.
Application application;
application = new Application();
Remember I mentioned permission dialogs? Older versions of Outlook used to prompt for permissions. Outlook 2010 just seems to quietly get on with things. The only thing I've noticed is that if you try and create a new
Application
when Outlook isn't currently running, it will be silently started and the system tray icon will have a slightly different icon and a tooltip informing that some other program is using Outlook. Much nicer than previous behaviours!
Getting Account Folders
The Session
property of the Application
interface returns a NameSpace
that details your Outlook setup, and allows access to accounts, profile details etc. However, for this project, the only thing I care about is the Folders
property which returns a collection of MAPIFolder
objects. In my case, it was the three top level folders for my profile - I was somewhat surprised that the Gmail account was loaded actually.
Now that we have a folder, we can scan it by enumerating the Items
property. As Outlook folders can contain items of various types, you need to check the item type - I'm looking for MailItem
objects in order to extract those addresses.
Pulling out email addresses
Each MailItem
has Sender
, To
and Recipients
properties. To
seems to be just a string version of Recipients
and so shall be completely ignored - why bother parsing it manually when Recipients
already does it for you. The Sender
property returns an AddressEntry
, and each item in the Recipients
collection (a Recipient
) offers an AddressEntry
property. So we're all set!
The following code snippet is from the example project, and basically shows how I scan a source MAPIFolder
looking for MailItem
objects.
protectedvirtualvoid ScanFolder(MAPIFolder folder) {this.CurrentFolderIndex++;this.OnFolderScanning(new MAPIFolderEventArgs(folder, this.FolderCount, this.CurrentFolderIndex));// itemsforeach (object item in folder.Items) {if (item is MailItem) { MailItem email; email = (MailItem)item; // add the sender of the emailif (this.Options.HasFlag(Options.Sender))this.ProcessAddress(email.Sender);// add the recipies of the emailif (this.Options.HasFlag(Options.Recipient)) {foreach (Recipient recipient in email.Recipients)this.ProcessAddress(recipient.AddressEntry); } } }// sub foldersif (this.Options.HasFlag(Options.SubFolders)) {foreach (MAPIFolder childFolder in folder.Folders)this.ScanFolder(childFolder); } }
When I find an AddressEntry
to process, I call the following functions:
protectedvirtualvoid ProcessAddress(AddressEntry addressEntry) {if (addressEntry != null&& (addressEntry.AddressEntryUserType == OlAddressEntryUserType.olSmtpAddressEntry || addressEntry.AddressEntryUserType == OlAddressEntryUserType.olOutlookContactAddressEntry))this.ProcessAddress(addressEntry.Address);elseif (addressEntry != null) Debug.Print("Unknown address type: {0} ({1})", addressEntry.AddressEntryUserType, addressEntry.Address); }protectedvirtualvoid ProcessAddress(string emailAddress) {int domainStartPosition; domainStartPosition = emailAddress.IndexOf("@");if (!string.IsNullOrEmpty(emailAddress) && domainStartPosition != -1) {bool canAdd;if (this.Options.HasFlag(Options.FilterByDomain)) canAdd = this.IncludedDomains.Contains(emailAddress.Substring(domainStartPosition + 1));else canAdd = true;if (canAdd)this.EmailAddresses.Add(emailAddress); } }
Although I'm scanning my entire PST, I don't want every single email address in there - I ran it once and it brought back just over 5000 addresses. What I want, is addresses tied to the domains I own, so I added some filtering for this. With this filtering enabled it returned a more managable 497 unique addresses. Although I'm not creating 497 aliases on the email server!
Wrapping up
This is a lot easier than what I was expecting, and in fact this is probably the smoothest piece of COM interop I've done with .NET yet. No strange errors, no forced to compile in 32bit mode, It Just Works.
You can find the example project in the link below.
Downloads
- OutlookEmailAddressExtract.zip (27.15 KB)
All content Copyright © by Cyotek Ltd or its respective writers. Permission to reproduce news and web log entries and other RSS feed content in unmodified form without notice is granted provided they are not used to endorse or promote any products or opinions (other than what was expressed by the author) and without taking them out of context. Written permission from the copyright owner must be obtained for everything else.
Original URL of this content is http://www.cyotek.com/blog/extracting-email-addresses-from-outlook?source=rss