|
Absolutely agree on all counts. This guy should be sued.
|
|
|
|
|
Meh, if it is as he describes he did nothing wrong legally but it's not something I'd be proud of personally or want to share with others on a site such as this.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
Article is mis-titled and frankly starting a business by essentially copying someone else's software and hard work while technically legal isn't particularly instructive to write about in my opinion.
An article with this title should be about an idea, the trials and tribulations of design, implementation, hiring programmers if necessary, internationalization issues, installer issues, development team, scheme used for version control and backup and code safety, translation services if used, documentation process, testing process, sales process, marketing process, distribution, support systems and processes, trademarks, payment processing, taxes, incorporation or proprietorship, license agreements, licensing systems, ways to deal with difficult customers (i.e. if and when to say no), financial arrangements necessary, ongoing maintenance, how it impacted your lifestyle personally, sacrifices made, things gained and learned along the way etc etc. Frankly, from my own experience to do it justice it would take a book or at least 10 large articles here at a guess.
Aside from all that I agree with the use of Guid's, there are some dinosaur mentalities still out there, perhaps an article just on that topic would be useful since people seem to be operating from very outdated points of view.
modified on Tuesday, June 2, 2009 3:04 AM
|
|
|
|
|
I have to disagree, Paul's line of reasoning is how most successful businesses are run. Almost no product is something entirely new. Small steps are much easier to do successfully, that means starting out with something existing, and making improvements it where you think it's lacking. Starting from scratch is usually a bad idea, you will make the same mistakes that the other guy has made, and assuming you are statistically just as smart as the other guy, you'll end up with just about the same thing. It makes much more sense to take something out there, analyze its strong and weak points and improve from there.
Look around... google... wasn't anything new, they just did it better. C#... you could call it a rip off of java, but ended up being better than java. Asian cars are often rip offs of other cars, but asian companies are learnt from it and put their own spin on it. Just by doing people gain more insight. In the end the amount of work that's put into something will make the difference, and mostly it's not a single brilliant novel concept that makes a good product, it's all about execution.
Wout
|
|
|
|
|
Or at least it started interesting. However most of it was about GUIDs and I don't think it was worth explaining in so many words, why GUIDs were chosen for primary keyse. The main topic was about starting a company not "are GUIDs good for primary keys and why".
|
|
|
|
|
Please cover more than technical stuff, did you incorporate?
Where, Nevada for Free or Massachusetts for $500 a year forever?
Did you go S2 corp or LLC and why?
Did you get a business license form your town hall so you could open a bank account?
Did you pay off the BBB to get listed on their web site?
|
|
|
|
|
Use of Guids section is out of place with title.
|
|
|
|
|
Congrats, Good article and i look forward to your more of your experiences.
|
|
|
|
|
There are two important facts, which probably no one can deny:
1) AutoIncrement integers are not useful for replication or merging records of different databases.
2) GUIDs are not useful as a primary key and therefore as a foreign key in linked tables.
Because you work more often within one database and replicate/merge data infrequently, I think an application designer should use AutoIncrements as primary keys.
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
There is absolutely no need for GUIDs; no user, no developer, nore a replication algorithm can handle GUIDs (... duplicates ...). Merging (or importing/exporting) data by "identification code" means, that the user defines, which data is merged and which is duplicated.
Best regards
Joachim van de Bruck
|
|
|
|
|
I'm going to respectfully disagree. If you have ever worked with data in an enterprise, you will know that this data must last forever. And, this same data is very often combined with other data, be it from other divisions or just other applications.
Even though my recommendation primarily applies to Enterprise Applications, I also recommend using GUIDs for smaller applications as it ensure data integrity. For example, SplendidCRM has more than 359 tables and they all use a GUID primary key. A significant number of these tables manage relationships between to other tables, such as ACCOUNTS_CONTACTS being a relationship table between Accounts and Contacts. Now, imagine if through programmer error, a Lead GUID got into the ACCOUNTS_CONTACTS table. This error can be easily caught via some validation function or via a foreign key constraint. However, if all the primary keys were integers, then this same programming error might go unnoticed as there would be no way to distinguish from Lead ID 77890 and Contact ID 77890.
So while you can promote the performance benefits of integer keys all day, I would still pick a slow robust application, than a fast buggy one. If speed is indeed a concern, then simply buy a faster server. In my opinion, it is much cheaper to buy a faster server than it is to pay a developer to fix bugs.
|
|
|
|
|
Thank you for your reply.
As noted before, there are advantages and disadvantages to both, integers and GUIDs. So you have to weight it to your personal needs. To diminish the performance lack of GUIDs, you can use sequential GUIDs - NEWSEQUENTIALID() in Sql Server - and to diminish the lack of uniqueness of integers for all tables, you can (very often, but not always) use different starting points and different increments in each table.
In one of our (older) applications, we use only GUID primary keys in all of the 134 tables. Validating new functions and procedures is sometimes really awful, because the colleagues have to search for specific GUIDs in a huge amount of data. Integers are easier to discover by a human eye, aren't they?
Your example is logically consistent, but when a programmer puts a lead id into a column, where a contact id is expected, the error can be easily recognized because of the absence of the rows for the specific lead id and at the latest because of the oversize of the rows for the contact id. This kind of programming error normally happens in an early state of the development, doesn't it?
Fast or robust? No user likes a slow application, and you cannot guarantee the robustness, because you have to reckon on coding errors. Robustness isn't increased by GUIDs, but by testing, testing, testing. On the other hand, integer primary keys can make the application significant faster, especially if the GUIDs are not sequential. I like the application to be fast and robust.
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you? Or what do you think about unique, user defined "identification codes" for merging data?
Best regards
Joachim van de Bruck
modified on Monday, June 1, 2009 7:30 AM
|
|
|
|
|
jvdb2508 wrote: No user likes a slow application, and you cannot guarantee the robustness
I keep seeing this argument as though if you replace guid's with ints the app will magically speed up to the point the user can barely keep up with it. In my experience that is patently false.
I *can* tell you from my personal experience what a user will not only not like but will villify you for and what can ruin not only their business but your own and that is lost data. You can test until you're blue in the face but you won't ever be able to test for every crazy thing that will happen at the end users site be it hardware failure, malicious actions or just plain stupid actions.
I don't mean to judge you but your arguments don't seem to match my experience in the real world dealing with thousands of end users of our software world wide for decades now. When it's a choice between safety and future scalability of data and performance, performance loses every time and the end user will thank you for it. Performance is fine when it's noticeable to the end user but it should never be the first consideration. Performance can be adequate to blindingly fast and the end user won't care one bit but lose one bit of their data or tell them they can't scale up as their business grows and there's really bad real world consequences.
jvdb2508 wrote: My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
jvdb2508 wrote: Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
jvdb2508 wrote: There are two important facts, which probably no one can deny
That's a bold statement.
A fact which I think no one with any amount of real world experience can disagree with is that there are no absolutes in anything, only the appropriate technique or tool for the situation.
Newbies unfortunately need to hang on to some kind of framework before they develop the skills to recognize what technique or technology to apply to a situation. Making absolute statements is unhelpful to them.
jvdb2508 wrote: As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. In the modern era of replication and scaling of applications it's wise to uniquely identify every record completely apart from anything the user enters. While not appropriate necessarily in every case it's rare that it isn't and the currently most acceptable and useful way to do that is with a Guid. Anything else is failing to plan for future growth of either the software or the company using it.
Examples that illustrate why this is a bad idea are extremely easy to come by, for example let's say you write an inventory management application and allow the user to enter a part number and use that as a unique way to identify a record. How hard is it to imagine that the end user will decide to use manufacturer part numbers and suddenly they will get a new part from a new manufacturer and it's exactly the same as an existing part. Or last names or phone numbers or addresses when a company goes nationwide or global after starting out in one city. Then come the inevitable workarounds and it spirals into a mess that never needed to happen in the first place.
Using a Guid as a surrogate key ensures at the very least that there will never be a collision between data no matter how it gets handled. It's cheap insurance in this day and age and the end user never needs to see them or interact with them. I personally think it would be madness to not use them in almost every situation with few exceptions because I've seen just about every app I write grow and gain requirements for many years after I initially thought they would just be small projects.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
There are two important facts, which probably no one can deny
That's a bold statement.
Sorry for my bad english (not my native language); I thought that 'probably' ensures, that I do not claim absoluteness, and the content of the 'facts' does it too.
John C wrote:
jvdb2508 wrote:
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. ...
Sorry again, but I can't understand your argument. As noted before, these 'identification codes' are not the primary key of the table. Let's say, that it is because of my bad english, that you didn't get my idea or otherwise I do not get your argument. Maybe you did not read my explanations in an earlier message (it's located on the second page, and therefore hard to find).
John C wrote:
jvdb2508 wrote:
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
John C wrote:
jvdb2508 wrote:
Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
A discussion without contrary standpoints is boring. But I miss your factual arguments. Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
Well, it's because of my bad english - or where did I say, that performance is more important than robustness?
Let me try a last statement about GUIDs and primary keys, I'm taking pain to phrase it factually:
If you replicate-copy one database into two databases, both databases can deal with any kind of primary key (either integers or guids) as long as new records henceforth are inserted in only one of the new databases. Of course, you want to insert new records in both databases and replicate-merge both of them or - different scenario - you have to merge two different databases into one or you have to import foreign data into your database. As mentioned before, you cannot merge data by integers! But ...
... if you use guids instead of integers to merge data, the only - but nevertheless important - advantage is, that you will not lose any data. On the other hand, if the same record was inserted in both databases, or if you import data, which already exists, you will get duplicates. The user has no loss of data, but now has to handle these duplicates. We can argue now, whether the duplicates hassle more then the loss of data, but to avoid this, I suggested the use of the 'identification codes'. They are not the primary keys of any table, but they are unique and defined by the user for merging data and for nothing else. That means, the user can define, whether the records of different sources are merged or duplicated. He/She can change the identification code to his needs and he can decide during or before an import, whether the data will be duplicated or overwrite the existing data or will be ignored.
No facts, just my opinion ...
I propagate the use of 'identification codes' for merging data from different sources. If a database has no 'identification code', you can define one before you import or merge. To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Best regards
Joachim van de Bruck
|
|
|
|
|
Joachim your english is excellent you don't need to make any excuses for it.
jvdb2508 wrote: Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
jvdb2508 wrote: To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Well as with anything it depends on the specific requirements and a good convincing case can be made for any practice if the circumstances merit it. For that reason we must speak in general terms for things that apply to the most common cases and in my opinion the most common cases are optimal with a Guid.
I disagree strenuously with relying on user entered data for any internally fundamental process. I come to this point of view after over a decade of publishing our own software and having to deal with thousands of users globally who are unbelievably creative at breaking things. Simply put I see too much chance for collision with user entered unique identification codes.
In our largest app we have something like 60 different tables in it all normalized to a high degree, I need to be absolutely sure that no record anywhere in the entire database is sharing an identification with any other record, once I'm assured of this then it opens up all manner of possibilities and gives me great peace of mind that I can save the users from themselves no matter what they enter or do to the database or the computer hosting it.
At least we can agree that integers are not optimal in most cases.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
I can't agree and I think, that you were arguing against the use of 'identification codes' as internal key, which is responsible for any process of data logic. Be sure, that the user entered unique icd is only used for matching data from different sources.
Who else, but the user can decide, whether two records from different sources, shall henceforth exist side by side or which one of them will survive while the other one will die. I like to give the user the tools and he has to handle them carefully and responsibly.
Our biggest customer uses our enterprise application to create monthly time schedules for 2,000 stuff members. There are 50,000 entries per month, wheryby each entry is linked to at least 9 master data tables in up to 4 levels. The entries are partially exported for time recording machines and partially imported back with corrected starting times and durations. Observing and clearing the time schedule - gross wage reports and invoices - will produce many additional records, because each entry is splitted into parts where each part is waged or invoiced different. At least the data is aggregated and exported for additional processes (net wage, payments, accounting, ...). All in all - including workflow management - we have 600,000 records per month, 8 millions per year.
The difference between processing integers or guids can not be recognized with a small amount of data. But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys. Especially the algorithm, which fills up to 80% of the time schedules handling user preferences and specifications, stuff qualifications, and minimizing costs and thereby maximizing earnings benefits from integer primary keys. And when a user is working on a time schedule, he wants to see the impacts of his planning immediately. It depends on the complexity of the queries, the number of linking levels, the overall amount of data. If your application potentially/eventually has to handle more than half a million records per month sometimes, I advise not to start with guids.
Some of our customers use a distributed database system. That's where we revert to Sql Servers built in replication capabilities, and where the database itself has added the indexed rowguid columns. Of course we provide the user with the functions to detect and handle duplicates. For automated replication guids are somehow essential, but not as the primary key in every table.
That's my experience, which is contrary to your's, isn't it? I cannot say, which of our experiences is more valuable, but be sure, that I do not work in a sandbox.
Best regards
Joachim van de Bruck
modified on Thursday, June 4, 2009 2:23 AM
|
|
|
|
|
Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
Discussion point #1: GUIDs vs. Ints for merging databases. You stated yourself that merging is not possible with Ints. Ergo, if you need merging or anticipate merging, do NOT use Ints. In 99% of the cases you'll still get enough performance with sequential GUIDs. Period.
Discussion point #2, which is completely orthogonal/independent/complementary (pick your favorite), is the issue of detecting duplicates during merge operations. JohnC nowhere stated that you couldn't have natrual 'keys' in addition to the GUIDs, you just wouldn't use them for joining, etc. The merging problem you have whether you use GUIDs or not, and it needs to be addressed on a system-by-system basis. And if you're happy you can even constrain the natural 'key' column to unique values.
<blockquote class="FQ"><div class="FQA">jvdb2508 wrote:</div>But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys.</blockquote>
Interesting, so using ints instead of GUIDs in fact increased the time by a factor of 8 OK, I'm sure it's a typo but you still chould have tried switching to sequential GUIDs. No point in complaining about the obvious. I successfully (and performantly) used tables with a couple millions of records, GUID IDs, joins across at least 5 tables and pretty complex filter queries... on a laptop (alright, I used a fast eSATA HDD).
Oh, and before I forget it, I still see some contradiction here. On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
With kind regards, Mr. 495784930
|
|
|
|
|
cwienands wrote: Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
I think you predominantly got it. My points are:
(1) Use AutoIncrement Integers as primary keys and for linking tables.
(2) Use additional 'natural keys' for import/export and joining data.
(3) Use additional GUIDs solely for master data replication on distributed databases.
(4) Points (1) to (3) apply nearly allways, but there may be exceptions.
The origin - in the article and assured by JohnC and others - was, that using GUIDs instead of Integers as primary keys is enough or all for linking, joining and replicating data. I think, that there are at the least three different tasks and I feel good, when each task has its own data.
My other contribution to this discussion was, that if you use guids as primary keys, then use sequential ones.
cwienands wrote: On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
That's where you did not get it. I suggested to eat apples AND oranges AND pears, not apples OR pears. On the other hand I know many accountants who think that an integer is a natural key.
Best regards
Joachim van de Bruck
|
|
|
|
|
Religion and programming doesn't go well together.
I have signficant high performance e-commerce experience and must caution anybody reading this article that Guids is not as wonderful as the author proclaims them to be.
1) Due to the random nature of Guids they fragment any index on them very quickly. On a DB with many inserts this is a recipe for disaster. Especially your clustered index (Ever tried to rebuild an index on a 100gig table on a production machine that has to continue transacting?)
2) Guids are significantly slower than integers for linking.
3) Memory is a very PRECIOUS commodity on large DBs wasting 4 times as much memory space is frankly said not a way to create linear scaling.
4) A very large percentage of queries are sequential or between two ranges like. Making SQL and other DBMS life easy to cache only portions and achieve maximum performance at the same time (provided you are using ints). With Guids any caching algorithm is likely to be SIGNIFICANTLY slower as access is no longer linear or mostly linear but random and random is not good for disks and caching.
5) Hard disk IO is also a significant bottleneck quite often and an index with two guids will take 32 bytes per record where the same index with 2 ints will take only 8 bytes.
There are many more but lets leave it at that for now. In short if performance is what you are after then ints will win this fight hands down. If you want some of the benefits of Guids then use it as a candidate key but not a primary key. Using a guids only policy will work well on a small scale but for anything beyond that be careful there is likely to be as many advantages of using guids as reasons not to.
I have seen this over and over again, why don't applications scale? because the people who designed them could identify the things that will prevent them from scaling in the first place.
|
|
|
|
|
Thank you for your comments. The points you have made are reasonable and follow conventional wisdom. However, I believe that it is important to recognize that both perspectives are valid. While I agree that a 100,000,000 record table might be better served using integer keys, I will also say that the size and performance difference between a guid key and an integer key when the table has 1000 records is completely irrelevant.
So, there is a threshold whereby a guid key will be noticeably slower than an integer key. The question becomes, where is that threshold. Ten years ago, I might have placed that threshold at 50,000 records, but today, with the typical server being a quad-core with many gigs of RAM, that threshold is closer to 1,000,000 records.
|
|
|
|
|
My view is that Guids are only a good idea typically where you have small tables with few inserts.
Where you need to exchange ids with other systems or for security use guids sparringly. Think twice before using it as a primary key on a table.
Guids make good candidate keys. It is always cheaper to rebuild a candidate key than your clustered index by a mile. Also with Sql enterprise you can get away with an online build which basically means you have no downtime or don't lock the table.
In my experience true enterprise applications have millions of records and sometimes billions. Sadly very few programmers/architects seem to understand the importance of designing for size.
I once inherited a DB that used the numeric type for most of its keys. A rather complex report took about 30 minutes to an hour to run on this system. During which time you can pretty much forget about capturing data. Sql can very easily and effectively use multiple processors to do the job. The primary enhancement was to get rid of the numeric type. The total reporting time was reduced to a few seconds. The difference? A lot less IO and fewer CPU cycles. Delays are cummulative and simple fixes like this can really make the world of difference.
It is naive to think that guids is ALWAYS only slightly slower than ints. It can be order of magnitudes slower on complex queries. Most enterprise applications get complex eventually.
|
|
|
|
|
While you may be technically correct in your arguments, I disagree with the gist of it; in my decades of experience exactly in this area with small to very large sites running on many different platforms there is now at this time no penalty of *any* significance in the real world for using Guid's and many benefits.
I come from an old school background where every bit of optimization used to matter to a high degree and a decade ago I would have agreed with you wholeheartedly but in these days of superfast cheap hardware there is no need to hold on to much of those optimization relics any more. I believe, and my experience has proven this to me, that these days it is of utmost important to concentrate on normalization and proper DB schema design where you can save a heck of a lot more than a few bytes per record that a Guid consumes and other optimizations that can shave real world noticeable to the end user amounts of time off operations.
Using ints just goes against the grain of modern development and limits scalability in critical ways.
This kind of bit fiddling which was once hard a hard won skill and supremely important is now just irrelevant at best and at worst bad advice for newbies who know no better.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
I have developed a corporate level Content Management System and initially, I was using integer PK. In fact, I still am and I was using them to link. However, my CMS has the ability for offline editing of the pages and the need to push the changes to the live / online server.
The CMS also hosts many sites. The problem was that the online DB could also be changed with any of the other sites, thereby knocking out all my relationships. My solution was to quickly change the DB schema to hold a guid (seperate from the PK) and link by guid in the Data Access Layer. This worked and allowed me to deploy the offline sites when I need to, without having to worry about PK relationships.
So, depending on the scenario, I still use auto inc integers, but for scalable and multi-system solutions, I will use guids.
|
|
|
|
|
I can agree with that. I often use guids where sensitive keys may end up in sensitive places or there are other good reasons for using them. Rarely do I use them for primary keys though and if you do you have be very careful with where and how.
|
|
|
|
|
Using a guids only approach is like having one tool in your toolbox, a hammer and that hammer will fix everything.
If guids is so wonderful why didn't MS, Oracle and everyone else not force it in with a boot?
If you understand caching, paging and how an RDBMS works then you will know the answer to this.
|
|
|
|
|