Introduction
This article shows the pros and cons of using the INT or GUID types on primary keys. Hope it helps you decide which suites your project best.
Using INT on primary keys
Advantages
- Small amount of storage size (and integer is 4 bytes).
- Increased readability, practical use in testing, and easy to remember.
- Chronology of data; if two records are in ascending order, we can deduce that the second record was inserted after the first one.
- Support for functions that return the last primary key generated (
@@IDENTITY, SCOPE_IDENTITY()).
Disadvantages
- Difficulty in the case of merging tables (the need to make remapping because the primary keys may be duplicated).
- Hard to work with distributed tables.
- Primary key in the form of
INT/BIGINT is LUID, local unique identifier, which is only used locally in a table.
- After a large number of operations (insert, delete), the counter for primary key can be reset, bringing the problem of chronology to 1.
Using GUID on primary keys
Advantages
- Easy to merge tables.
- Easy to work with distributed tables.
- The primary key is uniquely identified in the entire system (the number of machines or tables doesn't matter).
- Don't have problems when inserting a large number of operations.
Disadvantages
- Bigger storage size (16 bytes - four times the size of an integer).
- Don't have the chronology assumption.
- Hard to remember and use in testing.
- Don't benefit from a mechanism to obtain the last generated primary key (MS SQL Server).
My decision
I had a hard time deciding which solution suites best for my project, but I finally used GUIDs because of the advantages mentioned. But this happened only once, the rest of my projects have databases with INT primary keys. Hope someone finds this useful. Special thanks to Iancu Caputa for helping with this article.
|
| | Msgs 1 to 10 of 10 (Total in Forum: 10) (Refresh) | FirstPrevNext |
|
 |
|
 |
OK but what about replication. how do you link other tables as a reference ( i.e freign key). do you use a guid for the link or what. jsut a question. good article
cheers, Donsw My Recent Article : Backup of Data files - Full and Incremental
|
|
|
|
 |
|
 |
Have you considered using composite primary key? One of them is identity field, and another - table id (constant value for table). It would help with merging tables, but you still have readability.
|
|
|
|
 |
|
 |
When comparing INT with GUID I would mention another GUID disadvantage. If you use NEWID() T-SQL function to generate unique identifiers you should be aware of the fragmentation problem. Usually tables using GUIDs are highly fragmented due to the nonsequential nature of GUIDs. If you use NEWSEQUENTIALID() T-SQL function for generating sequential GUIDs you should be concerned with the privacy, since given any GUID all the others may be guessed without any problem. The following T-SQL code shows high fragmentation level for table populated with GUIDs:
IF object_id('TBL_GUIDS') is not null DROP TABLE TBL_GUIDS;
CREATE TABLE TBL_GUIDS(guid_col uniqueidentifier primary key default (NEWID()), updated_date datetime default (GETDATE()));
declare @counter int SET @counter = 0 WHILE @counter < 10000 BEGIN INSERT INTO TBL_GUIDS DEFAULT VALUES; SET @counter = @counter + 1; END
SELECT * FROM sys.dm_db_index_physical_stats (db_id(), object_id('TBL_GUIDS'), NULL, NULL, NULL);
If you run the same code, but instead of using GUID you use BIGINT and IDENTITY, you will experience low fragmentation. It is not a secret that fragmentation has very severe impact on SQL Server performance. So if performance is a concern, don't use GUIDs.
|
|
|
|
 |
|
 |
Nice pointed out. I didn't thought of fragmentation, but it really has an impact on performance.
modified on Wednesday, May 14, 2008 1:41 AM
|
|
|
|
 |
|
 |
Can you clarify what you mean by the following?
Geyzerskiy Dmitriy wrote: If you use NEWSEQUENTIALID() T-SQL function for generating sequential GUIDs you should be concerned with the privacy
Is framentation an issue when you use an index based on some other field. For example, a user table where username must be unique. You can put your unique index on username, but use the guid key in FK relationships. I typically use this approach in user tables because a user name can be changed over time. In most tables I use an int or bigint primary key and index as appropriate based on the estimated number of rows in the application lifespan.
|
|
|
|
 |
|
 |
Mike Lang wrote: Can you clarify what you mean by the following?
Geyzerskiy Dmitriy wrote: If you use NEWSEQUENTIALID() T-SQL function for generating sequential GUIDs you should be concerned with the privacy
If privacy is a concern, do not use the NEWSEQUENTIALID() function. It is possible to guess the value of the next generated GUID, and therefore access data associated with that GUID. As for the fragmentation issue I think you will find more info here: [Fragmentation and Database Performance].
|
|
|
|
 |
|
 |
If you put an index on other field it is out of scope for this discussion. We're talking here about using GUIDs for primary key. Primary keys are always represented by index either clustered or nonclustered. Both of them suffer from fragmentation issues if the values you insert are nonsequential. NEWID() function generates unique but non sequential IDs, NEWSEQUENTIALID() function generate sequential IDs but exposes some sort of privacy breach.
|
|
|
|
 |
|
 |
Did you give performance any consideration before making your choice? I've wondered for a long time about the difference in performance between fetching a row based on an Int versus a Guid but have never made time to test it out.
|
|
|
|
 |
|
 |
Performance wasn't an issue for me, I mean the cost of maintenance vs. the performance is tricky. If you need speed and don't have the problems I've mentioned in Disadvantages section for INT's, you should probably use INT's. I've tested once the performance fetching 1.000.0000 GUIDs on a 1.5 GHz with 1GB RAM. I used two techniques of generating the GUIDs: 1) NEWID() 2) DECLARE @aGuid UNIQUEIDENTIFIER SET @aGuid = CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER) (algorithm named COMBs, by Jimmy Nilsson)
The results were the following: 1) fetching 1.000.0000 GUIDs was 10 times consuming than fetching 1.000.0000 INTs 2) fetching 1.000.0000 GUIDs was 1.2 times consuming than fetching 1.000.0000 INTs (almost the same)
After this test I was convinced that using GUIDs was the best solution for my project, since I've kinda needed the advantages specified. The best thing to do is take a shot and test this yourself, see what results you get.
|
|
|
|
 |
|
 |
"Performance wasn't an issue for me, I mean the cost of maintenance vs. the performance is tricky. If you need speed and don't have the problems I've mentioned in Disadvantages section for INT's, you should probably use INT's."
Are you implying that integers are faster but GUIDs are easier to maintain?
|
|
|
|
 |
|
|
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.