When it comes to database design, the fundamentals and the little details make all the difference. Some feel that a few bytes here and a few bytes there on each record is no big deal. Why worry about field sizes when memory, disk, and processor are so cheap? Well, multiply these few bytes over a few million records, and it quickly becomes something to think about. Add the overhead of indexes, inefficient data retrieval, and updating, and these few bytes can be more than the straw that breaks the camel's back, it could be several bails of hay.
Let us look at the simple phone number. Almost every business database has fields and records to store contact information. Every contact has at least one phone number, and most people have several. Between home, work, fax, cell, and pager numbers, it is not uncommon to have five or more numbers for a single individual. Coming up with an efficient way to store these numbers is a good idea.
Character Data Types
Most databases I have seen (and most I have designed) have stored phone numbers in a character field. Typically, this could be as big as a
varchar(22) field which can store the formatted North American phone number with an extension, in the form "(999) 555-1212 ext1234". This is the easiest way to store the number, but also the least efficient. I won't talk about the
nvarchar(22) version which happens to be twice as big for no benefit. Since this is a
varchar field, let's assume that the average field size is ~17 characters since not all phone numbers would have an extension.
Storing raw data with formatting is never a good idea (this should be handled by the user interface), so stripping out the non-data characters leaves us with a field looking like "9995551212 1234", with the extension delimited by a space. Things are looking better, with a the field defined as
varchar(15) and an average field size of ~12 bytes; but of course, we can do even more.
Numeric Data Types
The first thing to realize about phone numbers is that they are in fact numbers. They contain no other characters, so why not store them in numeric type fields? One approach would be to store the phone number in a
bigint field and the extension in a separate
smallint field. The
int datatype is not large enough to hold the number and area code, while the next available number,
bigint, can easily store over 18 digits. It is the same story with
smallint. We want to be able to hold extension numbers up to 9999, and
tinyint is only good for 255 while
smallint has lots of room for four digit extensions. Using these two data types results in a total of 10 bytes to store the phone number.
This is good, but is it possible to do this with one field, to eliminate the overhead of the extra field? Of course, it is! This looks like a good candidate for the
numeric data types, which are fixed width data types, meaning they do not lose precision as the number becomes larger. We can be clever and store the extension in the fraction portion of the number defined with four digits. To store a 10 digit phone number with extension, we could define the data type as
decimal(14,4). This would require 9 bytes of storage.
decimal data types are stored in 4 byte chunks, so defining the data type as
decimal(19,4) also takes the same 9 bytes of storage but gives us a few extra digits for international phone numbers, if required, at no extra cost. If you require more digits in the extension, you can define the number as
Cutting the data storage requirement in half is pretty good, but we can still do a little better. SQL Server also has the
money data type which is just like the
decimal data type but is a predefined, built-in type, so it only requires 8 bytes to store a four digit extension and a phone number with up to 14 digits.
money datatype, the data space requirements are cut to less than half compared to the basic
varchar version. In fact, if we where to compare apples to apples, the equivalent non-formatted
varchar size would be 19 characters since the
money version can store up to 14 digits. Besides the obvious benefits of a smaller storage requirement, operations on fixed numeric data types are faster than that on other types. Fixed width data types don't require the overhead of tracking the actual bytes stored in the field. Indexes are also more efficient for the same reason. Don't forget, that if you choose to index your phone number field, the index size will also be less than half the size of the bloated
varchar version, so you get double savings and efficiency gain. Also, data integrity is built-in since a numeric data type only accepts numbers; no funny characters can be persisted in the database, by default.
Defining the Telephone Data Type
Now that we have determined the type of data to store phone numbers is
money, we can explicitly create a phone number data type. SQL Server lets you create User Defined Types (UDTs) using the SQL Server 2005 command:
CREATE TYPE [dbo].[phonenumber] FROM [money] NULL
or by selecting the Programmability/Types/User-defined Data Type menu in SQL Server Management Studio. Creating a User Defined Type for a phone number allows you to explicitly state the intent of the data field. It makes a cleaner, more defined database implementation. We can also bind rules to the type to enforce data integrity.
money data type is a numeric data type which accepts negative numbers. Phone numbers should only be positive, so let's add a rule to prohibit negative numbers. In SQL Server 2005, this looks like:
CREATE RULE PhoneNumber_Domain
AS @Value > 0
The rule must be bound to the new data type using the command:
EXEC sp_bindrule 'PhoneNumber_Domain', 'phonenumber'
With a user defined data type and a bound custom rule, it is easy to implement phone numbers as a custom type wherever a phone number field is required.
In Telephone Numbers in SQL Server – Part 2 – Formatting, I will talk about presenting phone numbers in SQL Server using User Defined Functions (UDFs).