|
There are two important facts, which probably no one can deny:
1) AutoIncrement integers are not useful for replication or merging records of different databases.
2) GUIDs are not useful as a primary key and therefore as a foreign key in linked tables.
Because you work more often within one database and replicate/merge data infrequently, I think an application designer should use AutoIncrements as primary keys.
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
There is absolutely no need for GUIDs; no user, no developer, nore a replication algorithm can handle GUIDs (... duplicates ...). Merging (or importing/exporting) data by "identification code" means, that the user defines, which data is merged and which is duplicated.
Best regards
Joachim van de Bruck
|
|
|
|
|
I'm going to respectfully disagree. If you have ever worked with data in an enterprise, you will know that this data must last forever. And, this same data is very often combined with other data, be it from other divisions or just other applications.
Even though my recommendation primarily applies to Enterprise Applications, I also recommend using GUIDs for smaller applications as it ensure data integrity. For example, SplendidCRM has more than 359 tables and they all use a GUID primary key. A significant number of these tables manage relationships between to other tables, such as ACCOUNTS_CONTACTS being a relationship table between Accounts and Contacts. Now, imagine if through programmer error, a Lead GUID got into the ACCOUNTS_CONTACTS table. This error can be easily caught via some validation function or via a foreign key constraint. However, if all the primary keys were integers, then this same programming error might go unnoticed as there would be no way to distinguish from Lead ID 77890 and Contact ID 77890.
So while you can promote the performance benefits of integer keys all day, I would still pick a slow robust application, than a fast buggy one. If speed is indeed a concern, then simply buy a faster server. In my opinion, it is much cheaper to buy a faster server than it is to pay a developer to fix bugs.
|
|
|
|
|
Thank you for your reply.
As noted before, there are advantages and disadvantages to both, integers and GUIDs. So you have to weight it to your personal needs. To diminish the performance lack of GUIDs, you can use sequential GUIDs - NEWSEQUENTIALID() in Sql Server - and to diminish the lack of uniqueness of integers for all tables, you can (very often, but not always) use different starting points and different increments in each table.
In one of our (older) applications, we use only GUID primary keys in all of the 134 tables. Validating new functions and procedures is sometimes really awful, because the colleagues have to search for specific GUIDs in a huge amount of data. Integers are easier to discover by a human eye, aren't they?
Your example is logically consistent, but when a programmer puts a lead id into a column, where a contact id is expected, the error can be easily recognized because of the absence of the rows for the specific lead id and at the latest because of the oversize of the rows for the contact id. This kind of programming error normally happens in an early state of the development, doesn't it?
Fast or robust? No user likes a slow application, and you cannot guarantee the robustness, because you have to reckon on coding errors. Robustness isn't increased by GUIDs, but by testing, testing, testing. On the other hand, integer primary keys can make the application significant faster, especially if the GUIDs are not sequential. I like the application to be fast and robust.
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you? Or what do you think about unique, user defined "identification codes" for merging data?
Best regards
Joachim van de Bruck
modified on Monday, June 1, 2009 7:30 AM
|
|
|
|
|
jvdb2508 wrote: No user likes a slow application, and you cannot guarantee the robustness
I keep seeing this argument as though if you replace guid's with ints the app will magically speed up to the point the user can barely keep up with it. In my experience that is patently false.
I *can* tell you from my personal experience what a user will not only not like but will villify you for and what can ruin not only their business but your own and that is lost data. You can test until you're blue in the face but you won't ever be able to test for every crazy thing that will happen at the end users site be it hardware failure, malicious actions or just plain stupid actions.
I don't mean to judge you but your arguments don't seem to match my experience in the real world dealing with thousands of end users of our software world wide for decades now. When it's a choice between safety and future scalability of data and performance, performance loses every time and the end user will thank you for it. Performance is fine when it's noticeable to the end user but it should never be the first consideration. Performance can be adequate to blindingly fast and the end user won't care one bit but lose one bit of their data or tell them they can't scale up as their business grows and there's really bad real world consequences.
jvdb2508 wrote: My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
jvdb2508 wrote: Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
jvdb2508 wrote: There are two important facts, which probably no one can deny
That's a bold statement.
A fact which I think no one with any amount of real world experience can disagree with is that there are no absolutes in anything, only the appropriate technique or tool for the situation.
Newbies unfortunately need to hang on to some kind of framework before they develop the skills to recognize what technique or technology to apply to a situation. Making absolute statements is unhelpful to them.
jvdb2508 wrote: As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. In the modern era of replication and scaling of applications it's wise to uniquely identify every record completely apart from anything the user enters. While not appropriate necessarily in every case it's rare that it isn't and the currently most acceptable and useful way to do that is with a Guid. Anything else is failing to plan for future growth of either the software or the company using it.
Examples that illustrate why this is a bad idea are extremely easy to come by, for example let's say you write an inventory management application and allow the user to enter a part number and use that as a unique way to identify a record. How hard is it to imagine that the end user will decide to use manufacturer part numbers and suddenly they will get a new part from a new manufacturer and it's exactly the same as an existing part. Or last names or phone numbers or addresses when a company goes nationwide or global after starting out in one city. Then come the inevitable workarounds and it spirals into a mess that never needed to happen in the first place.
Using a Guid as a surrogate key ensures at the very least that there will never be a collision between data no matter how it gets handled. It's cheap insurance in this day and age and the end user never needs to see them or interact with them. I personally think it would be madness to not use them in almost every situation with few exceptions because I've seen just about every app I write grow and gain requirements for many years after I initially thought they would just be small projects.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
There are two important facts, which probably no one can deny
That's a bold statement.
Sorry for my bad english (not my native language); I thought that 'probably' ensures, that I do not claim absoluteness, and the content of the 'facts' does it too.
John C wrote:
jvdb2508 wrote:
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. ...
Sorry again, but I can't understand your argument. As noted before, these 'identification codes' are not the primary key of the table. Let's say, that it is because of my bad english, that you didn't get my idea or otherwise I do not get your argument. Maybe you did not read my explanations in an earlier message (it's located on the second page, and therefore hard to find).
John C wrote:
jvdb2508 wrote:
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
John C wrote:
jvdb2508 wrote:
Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
A discussion without contrary standpoints is boring. But I miss your factual arguments. Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
Well, it's because of my bad english - or where did I say, that performance is more important than robustness?
Let me try a last statement about GUIDs and primary keys, I'm taking pain to phrase it factually:
If you replicate-copy one database into two databases, both databases can deal with any kind of primary key (either integers or guids) as long as new records henceforth are inserted in only one of the new databases. Of course, you want to insert new records in both databases and replicate-merge both of them or - different scenario - you have to merge two different databases into one or you have to import foreign data into your database. As mentioned before, you cannot merge data by integers! But ...
... if you use guids instead of integers to merge data, the only - but nevertheless important - advantage is, that you will not lose any data. On the other hand, if the same record was inserted in both databases, or if you import data, which already exists, you will get duplicates. The user has no loss of data, but now has to handle these duplicates. We can argue now, whether the duplicates hassle more then the loss of data, but to avoid this, I suggested the use of the 'identification codes'. They are not the primary keys of any table, but they are unique and defined by the user for merging data and for nothing else. That means, the user can define, whether the records of different sources are merged or duplicated. He/She can change the identification code to his needs and he can decide during or before an import, whether the data will be duplicated or overwrite the existing data or will be ignored.
No facts, just my opinion ...
I propagate the use of 'identification codes' for merging data from different sources. If a database has no 'identification code', you can define one before you import or merge. To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Best regards
Joachim van de Bruck
|
|
|
|
|
Joachim your english is excellent you don't need to make any excuses for it.
jvdb2508 wrote: Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
jvdb2508 wrote: To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Well as with anything it depends on the specific requirements and a good convincing case can be made for any practice if the circumstances merit it. For that reason we must speak in general terms for things that apply to the most common cases and in my opinion the most common cases are optimal with a Guid.
I disagree strenuously with relying on user entered data for any internally fundamental process. I come to this point of view after over a decade of publishing our own software and having to deal with thousands of users globally who are unbelievably creative at breaking things. Simply put I see too much chance for collision with user entered unique identification codes.
In our largest app we have something like 60 different tables in it all normalized to a high degree, I need to be absolutely sure that no record anywhere in the entire database is sharing an identification with any other record, once I'm assured of this then it opens up all manner of possibilities and gives me great peace of mind that I can save the users from themselves no matter what they enter or do to the database or the computer hosting it.
At least we can agree that integers are not optimal in most cases.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
I can't agree and I think, that you were arguing against the use of 'identification codes' as internal key, which is responsible for any process of data logic. Be sure, that the user entered unique icd is only used for matching data from different sources.
Who else, but the user can decide, whether two records from different sources, shall henceforth exist side by side or which one of them will survive while the other one will die. I like to give the user the tools and he has to handle them carefully and responsibly.
Our biggest customer uses our enterprise application to create monthly time schedules for 2,000 stuff members. There are 50,000 entries per month, wheryby each entry is linked to at least 9 master data tables in up to 4 levels. The entries are partially exported for time recording machines and partially imported back with corrected starting times and durations. Observing and clearing the time schedule - gross wage reports and invoices - will produce many additional records, because each entry is splitted into parts where each part is waged or invoiced different. At least the data is aggregated and exported for additional processes (net wage, payments, accounting, ...). All in all - including workflow management - we have 600,000 records per month, 8 millions per year.
The difference between processing integers or guids can not be recognized with a small amount of data. But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys. Especially the algorithm, which fills up to 80% of the time schedules handling user preferences and specifications, stuff qualifications, and minimizing costs and thereby maximizing earnings benefits from integer primary keys. And when a user is working on a time schedule, he wants to see the impacts of his planning immediately. It depends on the complexity of the queries, the number of linking levels, the overall amount of data. If your application potentially/eventually has to handle more than half a million records per month sometimes, I advise not to start with guids.
Some of our customers use a distributed database system. That's where we revert to Sql Servers built in replication capabilities, and where the database itself has added the indexed rowguid columns. Of course we provide the user with the functions to detect and handle duplicates. For automated replication guids are somehow essential, but not as the primary key in every table.
That's my experience, which is contrary to your's, isn't it? I cannot say, which of our experiences is more valuable, but be sure, that I do not work in a sandbox.
Best regards
Joachim van de Bruck
modified on Thursday, June 4, 2009 2:23 AM
|
|
|
|
|
Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
Discussion point #1: GUIDs vs. Ints for merging databases. You stated yourself that merging is not possible with Ints. Ergo, if you need merging or anticipate merging, do NOT use Ints. In 99% of the cases you'll still get enough performance with sequential GUIDs. Period.
Discussion point #2, which is completely orthogonal/independent/complementary (pick your favorite), is the issue of detecting duplicates during merge operations. JohnC nowhere stated that you couldn't have natrual 'keys' in addition to the GUIDs, you just wouldn't use them for joining, etc. The merging problem you have whether you use GUIDs or not, and it needs to be addressed on a system-by-system basis. And if you're happy you can even constrain the natural 'key' column to unique values.
<blockquote class="FQ"><div class="FQA">jvdb2508 wrote:</div>But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys.</blockquote>
Interesting, so using ints instead of GUIDs in fact increased the time by a factor of 8 OK, I'm sure it's a typo but you still chould have tried switching to sequential GUIDs. No point in complaining about the obvious. I successfully (and performantly) used tables with a couple millions of records, GUID IDs, joins across at least 5 tables and pretty complex filter queries... on a laptop (alright, I used a fast eSATA HDD).
Oh, and before I forget it, I still see some contradiction here. On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
With kind regards, Mr. 495784930
|
|
|
|
|
cwienands wrote: Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
I think you predominantly got it. My points are:
(1) Use AutoIncrement Integers as primary keys and for linking tables.
(2) Use additional 'natural keys' for import/export and joining data.
(3) Use additional GUIDs solely for master data replication on distributed databases.
(4) Points (1) to (3) apply nearly allways, but there may be exceptions.
The origin - in the article and assured by JohnC and others - was, that using GUIDs instead of Integers as primary keys is enough or all for linking, joining and replicating data. I think, that there are at the least three different tasks and I feel good, when each task has its own data.
My other contribution to this discussion was, that if you use guids as primary keys, then use sequential ones.
cwienands wrote: On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
That's where you did not get it. I suggested to eat apples AND oranges AND pears, not apples OR pears. On the other hand I know many accountants who think that an integer is a natural key.
Best regards
Joachim van de Bruck
|
|
|
|
|
Religion and programming doesn't go well together.
I have signficant high performance e-commerce experience and must caution anybody reading this article that Guids is not as wonderful as the author proclaims them to be.
1) Due to the random nature of Guids they fragment any index on them very quickly. On a DB with many inserts this is a recipe for disaster. Especially your clustered index (Ever tried to rebuild an index on a 100gig table on a production machine that has to continue transacting?)
2) Guids are significantly slower than integers for linking.
3) Memory is a very PRECIOUS commodity on large DBs wasting 4 times as much memory space is frankly said not a way to create linear scaling.
4) A very large percentage of queries are sequential or between two ranges like. Making SQL and other DBMS life easy to cache only portions and achieve maximum performance at the same time (provided you are using ints). With Guids any caching algorithm is likely to be SIGNIFICANTLY slower as access is no longer linear or mostly linear but random and random is not good for disks and caching.
5) Hard disk IO is also a significant bottleneck quite often and an index with two guids will take 32 bytes per record where the same index with 2 ints will take only 8 bytes.
There are many more but lets leave it at that for now. In short if performance is what you are after then ints will win this fight hands down. If you want some of the benefits of Guids then use it as a candidate key but not a primary key. Using a guids only policy will work well on a small scale but for anything beyond that be careful there is likely to be as many advantages of using guids as reasons not to.
I have seen this over and over again, why don't applications scale? because the people who designed them could identify the things that will prevent them from scaling in the first place.
|
|
|
|
|
Thank you for your comments. The points you have made are reasonable and follow conventional wisdom. However, I believe that it is important to recognize that both perspectives are valid. While I agree that a 100,000,000 record table might be better served using integer keys, I will also say that the size and performance difference between a guid key and an integer key when the table has 1000 records is completely irrelevant.
So, there is a threshold whereby a guid key will be noticeably slower than an integer key. The question becomes, where is that threshold. Ten years ago, I might have placed that threshold at 50,000 records, but today, with the typical server being a quad-core with many gigs of RAM, that threshold is closer to 1,000,000 records.
|
|
|
|
|
My view is that Guids are only a good idea typically where you have small tables with few inserts.
Where you need to exchange ids with other systems or for security use guids sparringly. Think twice before using it as a primary key on a table.
Guids make good candidate keys. It is always cheaper to rebuild a candidate key than your clustered index by a mile. Also with Sql enterprise you can get away with an online build which basically means you have no downtime or don't lock the table.
In my experience true enterprise applications have millions of records and sometimes billions. Sadly very few programmers/architects seem to understand the importance of designing for size.
I once inherited a DB that used the numeric type for most of its keys. A rather complex report took about 30 minutes to an hour to run on this system. During which time you can pretty much forget about capturing data. Sql can very easily and effectively use multiple processors to do the job. The primary enhancement was to get rid of the numeric type. The total reporting time was reduced to a few seconds. The difference? A lot less IO and fewer CPU cycles. Delays are cummulative and simple fixes like this can really make the world of difference.
It is naive to think that guids is ALWAYS only slightly slower than ints. It can be order of magnitudes slower on complex queries. Most enterprise applications get complex eventually.
|
|
|
|
|
While you may be technically correct in your arguments, I disagree with the gist of it; in my decades of experience exactly in this area with small to very large sites running on many different platforms there is now at this time no penalty of *any* significance in the real world for using Guid's and many benefits.
I come from an old school background where every bit of optimization used to matter to a high degree and a decade ago I would have agreed with you wholeheartedly but in these days of superfast cheap hardware there is no need to hold on to much of those optimization relics any more. I believe, and my experience has proven this to me, that these days it is of utmost important to concentrate on normalization and proper DB schema design where you can save a heck of a lot more than a few bytes per record that a Guid consumes and other optimizations that can shave real world noticeable to the end user amounts of time off operations.
Using ints just goes against the grain of modern development and limits scalability in critical ways.
This kind of bit fiddling which was once hard a hard won skill and supremely important is now just irrelevant at best and at worst bad advice for newbies who know no better.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
I have developed a corporate level Content Management System and initially, I was using integer PK. In fact, I still am and I was using them to link. However, my CMS has the ability for offline editing of the pages and the need to push the changes to the live / online server.
The CMS also hosts many sites. The problem was that the online DB could also be changed with any of the other sites, thereby knocking out all my relationships. My solution was to quickly change the DB schema to hold a guid (seperate from the PK) and link by guid in the Data Access Layer. This worked and allowed me to deploy the offline sites when I need to, without having to worry about PK relationships.
So, depending on the scenario, I still use auto inc integers, but for scalable and multi-system solutions, I will use guids.
|
|
|
|
|
I can agree with that. I often use guids where sensitive keys may end up in sensitive places or there are other good reasons for using them. Rarely do I use them for primary keys though and if you do you have be very careful with where and how.
|
|
|
|
|
Using a guids only approach is like having one tool in your toolbox, a hammer and that hammer will fix everything.
If guids is so wonderful why didn't MS, Oracle and everyone else not force it in with a boot?
If you understand caching, paging and how an RDBMS works then you will know the answer to this.
|
|
|
|
|
As I said in another post with this article an experienced developer knows which tool to use when. That being said I see no compelling argument not to use Guid's in the vast majority of software development out there.
As I said before there is no significant real world penalty for using Guid's over ints in anything I've ever seen or worked with and these kind of geeky obsessions with optimization are a disservice to the end user when there are so many benefits for them in a system designed using Guids from the start.
I could sit in an ivory tower all day and argue quite convincingly against a *lot* of modern software development practices and a lot of other ivory tower types or newbies that had "heard" about it would agree with me and 5 vote me all day but I work and live in the real world down in the trenches where this stuff actually matters, where you can't afford to hold on to these old fashioned and mostly irrelevant points of view at the expense of the end user if you want to keep a roof over your head.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
Geeky obsession? ok if that is what you call it. Tiger Woods once took off a year or something like that to perfect his swing (he is scared somebody is going to catch up). Lance Armstrong is extremely tuned to what he does so much so he can detect the smallest change in his bike set up. Sports pro's worry and make the smallest adjustments, so small they appear irrelevant to must of us.
I read a good quote recently that said, "all victories are the result of arranging the unobvious". Proper systems design is similar, to you may be geeky but I can assure any young developer the small details matter. Once you have created a terrabyte monster you simple cannot change course easily it is a lot more than that but being tuned into the small things is good.
So to some it will appear my views are old fashioned and irrelevant, the zealots you can never convince or give another point of view, they are already smarter than the rest.
To those with a more healthy point of view, I would like to re-iterate be careful to use guids as primary keys. There are circumstances you may get away with but that is like putting with your driver in golf it is most probably a bad idea in most cases.
|
|
|
|
|
No, I refute that entirely.
Being obsessed with irrelevant optimization is completely negative. It takes time away better spent on *relevant* optimization, designing features properly, testing etc etc. What you are describing is in modern times a completely irrelevant optimization for almost any case that the average developer is going to run across.
Pursuing irrelevant optimization is robbing your employer to stroke your own ego to no useful purpose.
If the end user won't notice it, it's NEVER an optimization worth pursuing.
Geeky obsession sums it up nicely.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
...It seems to me that Tom is the most moderate and open minded here. He's just saying "use GUIDs, but with wisdom". And also "don't take assumptions, be careful that sometimes, little small peformance gains can lead to really big change in query timings".
Others, and especially you, John, changed this into a flame war about GUIDs yes or no.
I agree with you Tom that optimization is important and the lack of it is the reason why the Shuttle can fly to space with an 8086 CPU while we need a Pentium Dual Core to write a letter...
And John, who talked about "irrelevant optimization"? Tom brought a real world example of a hundreds times faster query (30 minuts to few seconds) with an optimization. Would you call it "irrelevant"? Some optimization are irrelevant, some are relevant. If your style is "never optimize" you will loose many advantages (and your customers many moneys, by being forced to buy bigger, faster servers). Lucky you that customers are not technicians and will follow your advice of buying a new server, instead of changing their software supplier... ))
|
|
|
|
|
I don't know who you are or why you jumped into this discussion that is all but over but picking and choosing statements to mischaracterize out of context is no basis for any useful or meaningful discussion.
csantia66 wrote: And John, who talked about "irrelevant optimization"? Tom brought a real world example of a hundreds times faster query (30 minuts to few seconds) with an optimization. Would you call it "irrelevant"
I'm fairly certain that if you are in fact interested in the discussion and take the time to read what I said about irrelevant optimization I qualified it as anything the end user doesn't notice. As well if you care enough to read my posts you would see that I was making an argument for the majority of the cases for production code not the more rare or obscure edge conditions which can be used to justify any argument (i.e. I'm certain there is a very good case to be made for using a piece of cheese to fix some precision equipment, interesting perhaps but not worth injecting into a discussion about the majority of cases) which any pedant can come up with to jump into any discussion irrelevantly.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
<blockquote class="FQ"><div class="FQA">John C wrote:</div>I don't know who you are or why you jumped into this discussion </blockquote>
ok.
I'm nobody.
You are the God of Programming.
Please forgive me for my intrusion.
And, by the way, the statement "don't do irrelevant optimization" is something as obvious as meaningless. What you you keep ranting out is that integer vs GUIDs is an irrelevant optimization, which, at least in one case, was proved wrong by Tom.
And that's all I have to say about this issue.
|
|
|
|
|
Sorry that might have come out wrong, what I meant is you were no where in this discussion earlier then you jumped in took what I said out of context and started attacking me with it.
Irrelevant optimization is most certainly not obvious, it's there to bite the unwary in the butt at all times and very often seen the in the domain of people who come from a background where it was one of the most important job requirements back in the day. It's a clear dividing line between programmers who move on and are flexible and keep learning and those who do not.
If I was interviewing a programmer and using a typical modern day business application as an example for discussion to feel them out and they started ranting on about optimizing database access by using ints instead of guid's I'd quickly shuffle them to the bottom of the pile.
However if I was running Google's database department then I might have the opposite reaction.
The fact is that for the very vast majority of anything that anyone here will ever be working on Guid's are almost always going to be the way to go and while there may be a few edge cases it's unproductive to characterize them as somehow invalidating the very common modern and widespread use and usefulness of Guid's.
Disputing something obvious by citing an obscure case is a common enough tactic on any message board, highly dubious how it helps a newbie reading this, perhaps saying that there are cases where it can be useful but accepting that it is rare would be a classier way to go than fighting tooth an nail about it but hey that's what freedom of speech is all about I guess.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
I agree with Tom on this. Guids are a great tool when needed, but using them on very large databases and tables results in fragmentation and slow queries. We even have to resort to (gasp) natural keys sometimes.
The difference between a 200 GB db and a 400 GB database doesn't seem significant until you have to back it up, restore it, and warehouse it. Also, try moving them around from server to server.
You can have an array of Cray's running and you won't be able to do some of these reports unless you pick the correct clustered indexes. The disks simply can't spin fast enough and there isn't enough memory.
(edit)
I also like the idea of the article. Wanted to make sure I didn't confuse that here. I look forward to reading more in the future.
Wes
A picture is worth a thousand words but it takes 3,000 times the disk space. ~Author Unknown
My Site
|
|
|
|
|