Extracting EMail Addresses From a Document or String






3.75/5 (17 votes)
Jul 19, 2003

140904

1375
This project shows how to extract email addresses from a document or string.
Introduction
This project shows how to extract email addresses from a document or string.
Background
I was listening to the most recent .NET Rocks where Carl Franklin mentioned an exercise he had in a class that asked the attendees to extract email addresses from a string. He said that the exercise took some people a couple hours to complete using VB 6.0 but I was just working with the System.Text.RegularExpressions
namespace and I thought this would be quite easy in .NET.
Using the code
The sample application will open a Word Document, Rich Text Document, or Text File and give you all the email addresses contained within. It uses Word (late-bound so it's version independant) to open the .DOC or .RTF files.
The heart of the sample application is the method listed below. It uses the Regex.Matches method to search the string for matches to the regular expression provided. You then just need to enumerate the returned MatchCollection
to extract the email addresses.
Perhaps the biggest challenge is to construct the proper regular expression for the search. I went to The Regular Expression Library to search for the one used here.
Imports System.Text.RegularExpressions
'.......................
Private Function ExtractEmailAddressesFromString(ByVal source As String) _
As String()
Dim mc As MatchCollection
Dim i As Integer
' expression garnered from www.regexlib.com - thanks guys!
mc = Regex.Matches(source, _
"([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})")
Dim results(mc.Count - 1) As String
For i = 0 To results.Length - 1
results(i) = mc(i).Value
Next
Return results
End Function