Click here to Skip to main content
14,599,340 members

Email Address Validation: Explained in Detail, Code for Production Quality WPF TextBox

Rate this:
3.95 (8 votes)
Please Sign up or sign in to vote.
3.95 (8 votes)
30 Mar 2020Public Domain
A WPF TextBox which can validate email addresses depending on your needs, plus detailed description of the many ways a valid email address may look like.
Validating email addresses is difficult, because the specification what constitutes a valid email address allows for an immense variety. Trying to translate these specifications into proper Regex leads to huge and complicated expressions which still might not cover all cases. This article describes how email address validation can be reasonably done and provides a production quality code WPF TextBox, including letting the user enter only valid keystrokes, controlling if an email address must be entered (Required field) and alerting the user if he tries to close the window without saving the changes made.

Table of Contents

Introduction

Writing a WPF control which validates email addresses is a challenge, because nearly all Unicode characters and many formats are allowed. The specifications allow for a great variety, but in reality only few formats are used, because the exotic ones might get rejected by some email software (clients like Outlook, servers like Exchange). The control must be flexible enough to meet your requirements, i.e., if you need a strict control with precise formatting or if you just want to alert the user if he keyed in a strange email address. Of course, best is if you prevent the user from making invalid input and control which keystrokes he can make. Since this control is part of WpfWindowLib, it doesn't let the user save the data if the email address is required and missing. It also informs the user if he tries to close the window without saving the data.

Email Address Format

The relevant specification can be found at RFC 5322: Internet Message Format: Address Specification.

It defines a valid email address in 2 steps:

1. Address

This higher level specifies, among others, that an Address can consist of a display-name and an Addr-Spec, which is the part we usually mean when we talk about email addresses. The Address could look like this:

John Doe<John.Doe@example.com>

  • "John Doe" is the display name. It is not used for the routing of the email, but allows the email address to be shown in a more user friendly form.
  • "<John.Doe@example.com>": is called in the specification angle-addr and contains addr-spec (the email address used for routing the email) in angle brackets.

The display name is optional. If there is no display name, angle brackets are neither needed.

2. addr-spec

This part describes what usually is called an email address:

John.Doe@example.com

Basically, the address has two parts:

local-part @ domain-part

The domain-part is the internet DNS address of the email server, while local-part is the name of the "mailbox" within that email server. The domain-part is quite well defined and needs to be understood for the routing of email addresses by everyone, while the exact meaning of the local-part is defined by the receiving email server software and the sender does not necessarily need to understand the local-part structure. The specification wants to give the email server as much freedom as possible, which makes it hard to validate if an email address is actually correct.

The only commonly agreed requirement for a valid email address is:

There must be two parts separated by exactly one '@'.

But even this simple specification is not always correct, because the following is also a valid email address:

"John@Doe"@example.com

The first '@' is in a quoted string. All visible ASCII characters (i.e., from 0x21 to 0x7E) are allowed to be used in the local-part when they are quoted. They can be between 2 quotes '"' or a single special character might be preceded by a backslash '\':

John\@Doe@example.com (this is the same address like the one above with quoted strings).

Valid Email Addresses

(Inspired by https://en.wikipedia.org/wiki/Email_address#Examples and RFC 3696 Application Techniques for Checking and Transformation of Names: Restrictions on email addresses)

simple@example.com
Domain should contain a '.', because the root domain cannot be the address of the email server. Of course, as to most rules, there is also an exception: example@localhost. This domain address is not for the Internet but the company's internal network.

x@example.com
One-letter local-part is ok.

John.Doe@example.com
The period '.' character follows some special rules: it cannot be the first or the last letter of the local-part or domain-part and there cannot be 2 consecutive dots like '..'.

Gmail treats 'John.Doe' and 'JohnDoe' as the same address. As mentioned above, it is up to the receiving email server how it wants to interpret the addresses of its mailboxes. But this poses a problem when the email address is used to identify individuals, like a login page. Will it assume that John.Doe@example.com and JohnDoe@example.com are 2 different people ?

-minus-sign-@example.com
_under_score_@example.com
Hyphens '-' and underscores '_' are everywhere accepted

John.Doe+Filter@example.com
Might be legal, but does all email software understand this? Some email server will use "John.Doe" as the actual mailbox name and ignore the '+' and whatever follows it until the '@'.

Actually, all of these characters are legal in the local-part ! # $ % & ' * + - / = ? ^ _ ` . { | } ~ But not all email software might accept them. Poor O’Leary@example.com, some email clients just send it to OLeary@example.com.

Display Names

The '<' and '>' characters are not in the above list, because they have a special meaning, they separate a 'display name' from the real 'emailaddress':
My Name <MyName@Example.com>
A display name can only be before the emailaddress in angle brackets. It can contain the same characters like an email address and blanks.

Comments

Also the brackets '(' are missing ')' in the list above. They enclose comments:
Name(Comment1)<(Comment2)Name(Comment3)@(Comment4)Example.com(Comment5>(Comment2)
Comments are in round brackets . A comment can contain any printable ASCII character except '(', ')' and '\'. This is strictly according to RFC5322, but I guess a lot of email software will not interpret it properly. Some even use it as display name, as described above, which is wrong. If possible, don't use comments.

Quoting

" "@example.org
Space between the quotes is the name of the mailbox. Between 2 double quotes, nearly anything goes. Which makes validating difficult.

"john..doe"@example.org
quoted double dot, which are not allowed without quotes

Some\@saple@example.com
The first @ should be treated as a simple character and not as the control character separating local-part from domain-part.

John\ Doe@example.com
Spaces are only allowed when quoted (in RFC parlance, quoted also means a leading backslash '\'.

John.\\Doe@example.com
The first back slash '\' makes the second back slash '\' an ordinary character.

Some Strange Looking Valid Addresses

"very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com

mailhost!username@example.org
Bangified host route used for uucp mailers

user%example.com@example.org
% escaped mail route to user@example.com via example.org

Domain-Part Requirements

The domain-part is a DNS hostname, consisting of letters, digits, hyphens and dots. In seldom cases, an IP address can be used instead, enclosed by square brackets:
John.Doe@[192.168.0.0]
user@[IPv6:2001:db8::1]

Use of none ASCII Characters (UTF8)

rfc6530, rfc6531, rfc6532 specify how any UTF character can be used for email addresses, if the email software supports it. Since emails often are forwarded from server to server to server ..., it might very well be that one of these servers does not support UTF8 and an error message is returned to the sender that the email could not be delivered. For this reason, it is safer not to use UTF8 for email addresses, although the following examples are actually valid addresses:

Pelé@example.com
δοκιμή@παράδειγμα.δοκιμή
我買@屋企.香港
संपर्क@डाटामेल.भारतारत

Especially troublesome is the domain-part, because the actual IP address must be looked up from a DNS server, which might not support UTF8. For that reason, PunyCode was invented, which encodes UTF8 in pure ASCII, which then can also be handled by a none UTF8 domain server. But does your email software support Punycode?

As mentioned before, the goal is to use email addresses which will not cause troubles. Using UTF8 based addresses is asking for trouble. If you must, accept UTF8, but your life is much easier if you don't.

Length Restrictions

In addition to restrictions on syntax, there is a length limit on email addresses. That limit is a maximum of 64 characters (UTF8 bytes) in the local-part and a maximum of 255 characters (UTF8 bytes) in the domain-part for a total length of 320 characters.

Invalid Email Addresses

As mentioned above, there are many valid structures for email addresses. But there are also illegal ones:

Abc.example.com
No @ character

A@b@c@example.com
Only one @ is allowed outside quoted strings, except when it is quoted

john..doe@example..com
double dots '..' are not allowed in neither part

a"b(c)d,e:f;g<h>i[j\k]l@example.com
None of the special characters in this local-part are allowed outside quotation marks

just"not"right@example.com
Quoted strings must be dot separated or the only element making up the local-part

1234567890123456789012345678901234567890123456789012345678901234+x@example.com
Local part is longer than 64 characters

Recommendations

  • Warn the user if he enters a strange looking email address. Most likely it is a typo, but it might also be just a strange looking but valid email address. Let the user decide.
  • Help the user to limit the number of errors he can make by reducing the character set he can enter. If possible, avoid UTF8.

It is very hard, some say nearly impossible, to write an email validation that correctly identifies all legal and all illegal email addresses. And even if that would be possible, it still doesn't mean that the email address actually works. The only way to verify that is to send an email and to wait for a reply. Therefore, it is better just to use a relative simple validation to alert the user if something looks strange so that typos are found, but leave the final decision to the user.

Using the Code

<wwl:CheckedWindow x:Class="Samples.SampleWindow"

        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

        xmlns:wwl="clr-namespace:WpfWindowsLib;assembly=WpfWindowsLib"

                   SizeToContent="WidthAndHeight">
  <StackPanel>
    <Label Content="Name (required)"/>
    <wwl:CheckedTextBox x:Name="NameTextBox" MinWidth="100" IsRequired="True"/>
    <Label Content="Email (not required)"/>
    <wwl:EmailTextBox x:Name="EmailTextBox" MinWidth="100"/>
    <Button x:Name="SaveButton" Content="_Save"/>
  </StackPanel>
</wwl:CheckedWindow>

Image 1

The upper window displayed is the data entry window with a Name TextBox, where the user has to enter some data before saving, and the EmailTextBox. The TextBox needs a name, but the user has not entered any data yet, that's why the background is khaki. That is also the reason why the Save Button is disabled. The user has entered an email address. The user tried then to close the window without saving the data. A second window with a warning opened and the Email TextBox's background got light green, to show the user which data has changed. For a detailed explanation how this works, see my article, Base WPF window functionality for data entry.

Image 2

In this screenshot, the user has entered a name. The Save button is therefore enabled. The user clicks the Save Button, but gets a warning because the email address looks strange (no '.' in the domain-part) and can then decide if the email address should be saved or if some further editing is needed.

Configuring EmailTextBox

In many cases, you don't need to configure anything. You might want to set IsRequired in XAML or, if the user wants to edit some existing data, call Initialise() passing the existing email address and isRequired as parameters from code behind.

Instance Properties

Some properties can be set individually for every PhoneTextBox:

  • IsRequired (DependencyProperty): Needs the user to provide this control with a value ?
  • MaxLength (DependencyProperty): The maximum number of characters that can be manually entered into the text box.

Static Properties

Some properties apply for every PhoneTextBox and are therefore declared as static:

  • AsciiSpecialChars (string): Characters allowed additionally to letters and digits in the local-part of the email address. Default: ".@-_+". To allow more characters, assign your own string or call EmailTextBox.SetExtendedAsciiSpecialChars() or EmailTextBox.SetExtendedQuotedAsciiSpecialChars().
  • IsBlankAllowed: Set to true if user should be able to key in a blank.
  • IsInternationalCharSetAllowed: Set to true if user should be able to use Unicode characters greater 0x7F.
  • IsValidEmailChar (Func<char, bool>): Gets called to validate if the character the user just entered is allowed in the local-part of an email address. Assign your own function if you want to use a different validation.
  • IsValidDnsChar (Func<char, bool>): Gets called to validate if the character the user just entered is allowed in the domain-part of an email address. Assign your own function if you want to use a different validation.
  • IsValidEmail (Func<string, bool>): Gets called to validate the complete email address once the keyboard focus is leaving the EmailTextBox. Assign your own function if you want to use a different validation.
  • ShowLooksStrangeWarning (Func<EmailTextBox, bool>): Gets called when IsValidEmail detects a problem. Assign your own function if you want to display the problem differently.

Getting the Code

The latest version is available from Github: https://github.com/PeterHuberSg/WpfWindowsLib.

Download or clone everything to your PC, which gives you a solution WpfWindowsLib with the following projects:

  • WpfWindowsLib: (.Dll) to be referenced from your other solutions, contains EmailTextBox
  • Samples: WPF Core application showing all WpfWindowsLib controls
  • WpfWindowsLibTest: with few WpfWindowsLib unit tests

Recommended Reading

History

  • 19th March, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

Peter Huber SG
Software Developer (Senior)
Singapore Singapore
Retired SW Developer from Switzerland living in Singapore

Interested in WPF projects.

Comments and Discussions

 
PraiseExcellent Pin
Andreas Saurwein6-Apr-20 6:08
MemberAndreas Saurwein6-Apr-20 6:08 
GeneralMy vote of 4 Pin
tiddlesuk6-Apr-20 5:24
Membertiddlesuk6-Apr-20 5:24 
QuestionExcellent - simple and readily understandable Pin
tiddlesuk6-Apr-20 5:29
Membertiddlesuk6-Apr-20 5:29 
PraiseMy vote of 5 Pin
Michael Haephrati31-Mar-20 11:27
mvaMichael Haephrati31-Mar-20 11:27 
SuggestionVa;id Top-Level Domains Pin
Jalapeno Bob21-Mar-20 21:54
professionalJalapeno Bob21-Mar-20 21:54 
GeneralRe: Va;id Top-Level Domains Pin
Peter Huber SG25-Mar-20 0:48
MemberPeter Huber SG25-Mar-20 0:48 
QuestionCool, but not necessary Pin
tlford6521-Mar-20 6:15
professionaltlford6521-Mar-20 6:15 
AnswerRe: Cool, but not necessary Pin
Peter Huber SG25-Mar-20 0:38
MemberPeter Huber SG25-Mar-20 0:38 
The EmailTextBox does much more than just validate the email address. It prevents also the user from keying in invalid characters and prevents the window from getting closed when the email address has changed. There are many more WPF controls providing similar functionality in WpfWindowsLib.

By the way: a regex is not the best way to valid an email address, better is a state machine, because the email address specification is also a kind of state machine.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 30 Mar 2020

Tagged as

Stats

6.3K views
10 bookmarked