Introduction
Sometimes you'd like the SQL Server itself to automatically generate a
sequence for entities in your table as they are created. For example,
assigning each new Customer added to your table a unique
"CustomerNumber". There are effectively two ways to do using the
built-in features that T-SQL provides:
- Identity Columns - An identity is a common "auto
generated" primary key to use in a SQL Server database these days. An
identity is simply an integer value that "auto increments" as each new
row is added to your table. You can specify when it should start and how
it should increment when you add the column to your table:
alter table YourTable add ID int identity(start, increment)
- GUID Columns - A "GUID" is a Globally Unique Identifier that
can be assigned a unique yet random long string of characters like
"B3FA6F0A-523F-4931-B3F8-0CF41E2A48EE". You can either use the NEWID()
function when inserting into your table or set a default like this to
implement a GUID column in your tables:
alter table YourTable add ID uniqueidentifier default newid()
However, we often see questions in the forums regarding how to create
other types of auto-generated sequences in tables. For example, you
might want your customers to automatically be assigned "Customer
Numbers" as formatted like this:
C0001
C0002
...
C9998
C9999
In other cases, people would like to use incrementing letters instead
of numbers, some combination of both, or for the digits to be "reset"
on some specific condition, and so on.
The most important and crucial part of implementing this is not
writing the code! It is clearly defining your specification and
ensuring that it is logical and works for you. Before you can write code
that will automatically generate sequences for you, you must consider:
- How many numbers will you ever need? Does your specification handle this?
- What happens when values are deleted? Are they re-used?
- Are these sequences dependant on data that might ever change?
What happens to these values when the data does change? Does it make
sense, then, to incorporate this data into your sequence algorithm?
- If you have a complicated rule (i.e., "AA-00" through "ZZ-99"),
is every step and possibility clearly defined? In this example, what
comes after AA-99? Is it "BA-00", "AB-00", "BB-00", or something else?
What comes after "ZZ-99"?
So, the very first step is to clearly, accurately, and completely
define how your sequence values will be generated. You must explicitly
map out how to handle all possible situations and you must do some
research to ensure that your specification will work for the data you
are handling. A primary key of "A0" through "Z9" will only work for
26*10 = 260 values -- is this really what you want?
There are a few different approaches you can take in order to facilitate this behavior in SQL Server, so let's take a look.
Option 1: Determine the next value by querying a table
This first approach is usually the most common, and in my opinion
also the worst. What people try to do here is to query the existing
table to retrieve the last sequence value created, and then use that
value to create the next one.
For example, you might create a User-Defined Function that you could use in a trigger like this:
create function NextCustomerNumber()
returns char(5)
as
begin
declare @lastval char(5)
set @lastval = (select max(customerNumber) from Customers)
if @lastval is null set @lastval = 'C0001'
declare @i int
set @i = right(@lastval,4) + 1
return 'C' + right('000' + convert(varchar(10),@i),4)
end
This can cause some issues, however:
- What if two processes attempt to add a row to the table at the exact same time? Can you ensure that the same value is not generated for both processes?
- There can be overhead querying the existing data each time you'd like to insert new data
- Unless this is implemented as a trigger, this means that all
inserts to your data must always go through the same stored procedure
that calculates these sequences. This means that bulk imports, or moving
data from production to testing and so on, might not be possible or
might be very inefficient.
- If it is implemented as a trigger, will it work for a set-based
multi-row INSERT statement? If so, how efficient will it be? This
function wouldn't work if called for each row in a single set-based
INSERT -- each NextCustomerNumber() returned would be the same value.
Overall, if this approach is absolutely required, then it's what
you've got to do, but be sure that you consider the next two options
first, which are much easier to implement and will generally work in
most cases.
Option 2: Make it a presentation issue
The most common rules seem to be the simple ones, such as previous
example ("C0000" to "C9999"), or something similar. Looking at this
closely, we see that it is really just simply a number from 1-x,
formatted with leading zeroes to be 4 digits, and then prefixed with a
"C". Which means that all the database needs to do is generate a number
from 1-x, which means ... why not just use an identity and let your
front-end format the sequence value? Do you really need to
store it in the database as a VARCHAR? What does this gain? If you
simply use an identity and return an integer to the front-end, it is
trivial to format it with a "C" in the front and with the necessary
leading zeroes. By doing this, you have all of the advantages of a
built-in SQL Server generated identity value with none of the headaches
-- no worries about concurrency, performance, set-based triggers, and so
on.
So, really carefully think to yourself: "Do I really need to format a simple integer
in T-SQL and store that formatted value in my tables? Or can I simply
use that integer internally throughout the database and format it any
way I want at my presentation layer?"
The advantage of this approach is simplicity, but the
disadvantage is that you must rely on your front-end applications and
reports to understand how to format your codes.
Option 3: Let an Identity be your guide
If you really need to store the actual, auto-generated sequence in
your tables, or if a simple incrementing integer formatted a certain way
isn't enough, there is another simple option: Use a regular identity
column internally as your table's primary key, but then use some math to
calculate your external "auto-generated sequence" in another column
using the identity value generated. This column can be stored in your
table via a trigger, added as a computed column, or calculated using a
View. You can implement it any way you wish.
The advantage of this approach is that we are using the
database's built-in feature that guarantees that no two identities in a
table will be alike, and that every row will get one; therefore, if we
map each integer to a unique value in our sequence, we are guaranteed
that all of our sequence values will also be unique. And this will work
for set-based operations as well.
Let's start with the simple "C0000" - "C9999" example. First, let's create our Customers table like this:
create table Customers
(
dbID int identity not null primary key,
CustomerName varchar(100)
)
Note that the dbID column is standard, database-generated identity
which will be our physical primary key of the table. However, we will
add a CustomerNumber column which will be what we expose to the outside
world in the "C0000" format, as described.
Let's create a function accepts an integer, and uses that integer to return our CustomerNumber:
create function CustomerNumber (@id int)
returns char(5)
as
begin
return 'C' + right('0000' + convert(varchar(10), @id), 4)
end
Using that function, we can simply add a computed column to our table like this:
alter table Customers add CustomerNumber as dbo.CustomerNumber(dbID)
Or, we could also create a column in our table to store the Customer Number, and use a trigger to populate it:
alter Customers add CustomerNumber varchar(10)
create trigger Customers_insert on Customers
after insert as
update
Customers
set
Customers.customerNumber = dbo.CustomerNumber(Customers.dbID)
from
Customers
inner join
inserted on Customers.dbID= inserted.dbID
Using either method, once they are in place, we can simply insert
into our table, and for each Row added a unique "Customer Number" is
assigned:
insert into Customers (CustomerName) values ('jeff')
select * from Customers
returns:
(1 row(s) affected)
dbID CustomerName CustomerNumber
1 jeff C0001
(1 row(s) affected)
The key to making this work is the formula that maps a 1:1 relation between integers and your sequence values.
That example was very trivial to implement; let's try some more complicated ones.
For example, suppose that the CustomerNumber will be in this format:
AAAA
AAAB
...
AAAZ
AABA
...
ZZZZ
How can we take an integer and map it to this? In this case, it is
the same way you convert digits from decimal to hexadecimal or any other
"base" -- we are converting from a base 10 (decimal) to base 26 (A-Z).
Since we are working with a 4 "digit" value in base 26, this gives us 26
to the power of 4 different possibilities -- 456,976 different Customer
Numbers can generated. (Remember the most important step before writing
any code: is this acceptable?)
A simple algorithm to convert our integers to this base 26 number could be like this:
Starting with an integer identity value x:
- x mod 26 corresponds to the RIGHTMOST letter (0=A, 25=Z)
- x divided by 26 (26 to the power of 1) mod 26 corresponds to the NEXT letter from right to left
- x divided by 26*26 (26 to the power of 2) mod 26 corresponds to the NEXT letter, from right to left.
- x divided by 26*26*26 (26 to the power of 3) mod 26 corresponds to the LEFTMOST (first) letter.
To convert a number from 0-25 to a letter from A-Z, we add 65 to it
and use the CHAR() function, since the ASCII value of "A" is 65 and the
rest of the letters all follow in sequence (i.e., "B" is 66, "C" is 67,
etc).
Thus, to implement this method, our CustomerNumber function becomes:
create function CustomerNumber (@id int)
returns char(5)
as
begin
return char(@id / power(26,3) % 26 + 65) +
char(@id / power(26,2) % 26 + 65) +
char(@id / 26 % 26 + 65) +
char(@id % 26 + 65)
end
Equally as important as writing the algorithm out on paper is then
testing your implementation. We can do some simple testing like this to
help us ensure that our function is working:
select x, dbo.CustomerNumber2(x)
from
(
select 1 as x union all
select 25 union all
select 26 union all
select 27 union all
select 51 union all
select 52
) x
which returns:
x
----------- -----
1 AAAB
25 AAAZ
26 AABA
27 AABB
51 AABZ
52 AACA
(6 row(s) affected)
And that appears to do what we need. You should of course test the higher values as well.
The key is that we have mapped our integer values using our
function to Customer Numbers from AAAA-ZZZZ, all guaranteed to be
unique, and we don't need to worry about set-based inserts or lookups on
our existing table.
Let's try one more. How about
AA000
AA001
...
AA999
AB000
AB001
...
AZ999
BA000
BA001
...
ZZ999
In this one, the left 2 digits are base 26; the right 3 are base 10. This gives us 26*26*10*10*10 = 676,000 possible values.
We use the same basic algorithm, starting from an integer x and working right to left:
- x mod 10 equals that is the rightmost digit
- x / 10 mod 10 equals the next digit
- x / 10*10 mod 10 equals the next digit
- x / 10*10*10 mod 26 equals the next letter
- x / 26*10*10*10 mod 26 equals the first letter.
Let's put this logic into a User Defined Function, which makes testing easier:
create function CustomerNumber(@i int)
returns char(5)
as
begin
return (char(@i / 26000 % 26 + 65) +
char(@i / 1000 % 26 + 65) +
char(@i / 100 % 10 + 48) +
char(@i / 10 % 10 + 48) +
char(@i % 10 + 48))
end
And let's test this function to ensure that it works:
select x, dbo.customerNumber(x)
from
(
select 9 as x union all
select 99 union all
select 100 union all
select 999 union all
select 1000 union all
select 25999 union all
select 26000 union all
select 51999 union all
select 52000
) x
x
----------- -----
9 AA009
99 AA099
100 AA100
999 AA999
1000 AB000
25999 AZ999
26000 BA000
51999 BZ999
52000 CA000
(9 row(s) affected)
And, again, that is just a guideline, but you should do as much
testing as you can before implementing any of these features to ensure
that they accurately do what you need. The key is to identify and test
the "boundary" numbers where the values need to reset or change in a
more complicated manner than simply incrementing a single digit.
If you need to add dashes, or a constant prefix or suffix or
anything else, you can easily do it as well, all in your UDF. The key is
to make your logic entirely dependant on an integer value, and to map
each integer value to a unique value in your designated sequence. By
doing this, you are letting SQL Server do the hard part -- ensure that
your keys are consistent and unique -- but now you have the flexibility
of creating your "CustomerNumbers" or other values in the exact format
that you choose.
Summary
So, if you do find that you need to custom sequences of some sort at the database layer, here's my recommendation:
- Be sure that your code generation algorithm is well defined and handles your needs
- Be sure that it isn't just a presentation issue
- Write a UDF to map integers to your sequence values and let SQL Server generate those integers via an Identity column
- Test your UDF on its own thoroughly to ensure that it produces unique values in the format that you want
- Decide if a computed column, trigger, or View is the best way to incorporate these codes into your database layer
- Use identities internally as physical primary keys to your
tables and for relations. (Note that this is optional; you may wish to
use your sequence column, but be sure that it is indexed and constrained
properly in your database.)