|
Hi Paul,
I enjoyed your article, and have no problem that it is slightly "variant" from the "modal" type of CP article, but you clearly state this is an introduction to a series of articles, and I can't understand folks who would react so quickly to that.
The advice you offer to programmers out of work, based on your experiences, seems very timely, well thought-out, and practical to me.
The idea that someone who comes along and decides to implement their own solution in a broad arena of technical development, like databases, is "poaching" in some way on some other companies' products reminds me of the idea of patenting "brown shoes"
That said, I think you gave your knee-jerk critics some ammunition by perhaps talking too much about the specifics of the other CRM system in your comparison, and it would be more "tactful," perhaps, not to mention the "other company" by name.
If use of GUID's as primary keys are a major "value proposition" of your vision of a CRM, why shouldn't you discuss it at length ? Using the term "religion" does, imho, offer up a kind of "troll bait," however.
I appreciate the way you responded to the various replies on use of GUID's (as well as the astute technical observations of some of the "dissenting posts") : that discussion was very valuable to me.
thanks, Bill
"Many : not conversant with mathematical studies, imagine that because it [the Analytical Engine] is to give results in numerical notation, its processes must consequently be arithmetical, numerical, rather than algebraical and analytical. This is an error. The engine can arrange and combine numerical quantities as if they were letters or any other general symbols; and it fact it might bring out its results in algebraical notation, were provisions made accordingly." Ada, Countess Lovelace, 1844
|
|
|
|
|
Very enlightening and interesting
However, you may consider mentioning less about the use of GUID and other tech details. My two cents.
Thanks.
|
|
|
|
|
Your line of thought makes a lot of sense to me, just ignore the one votes, they just don't get how new products are created. You are on the right track sir, just keep going and keep improving your CRM.
Wout
|
|
|
|
|
And why did you feel the need to break this up into two separate "articles"? Now, I have to reject two instead of just one.
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997 ----- "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001
|
|
|
|
|
I want to thank all that have commented on article 1. My intent was to provide my personal experience so that others can benefit and your feedback has been greatly appreciated.
I have just posted article 2 at http://www.codeproject.com/KB/aspnet/splendid-guide-article2.aspx and I encourage everyone to continue to provide feedback.
|
|
|
|
|
A Programmer's Guide to Starting a Software Company and Building an Enterprise Application
1. steal someone else's software
2. use GUIDs
Hardly instructive...
|
|
|
|
|
Absolutely agree on all counts. This guy should be sued.
|
|
|
|
|
Meh, if it is as he describes he did nothing wrong legally but it's not something I'd be proud of personally or want to share with others on a site such as this.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
Article is mis-titled and frankly starting a business by essentially copying someone else's software and hard work while technically legal isn't particularly instructive to write about in my opinion.
An article with this title should be about an idea, the trials and tribulations of design, implementation, hiring programmers if necessary, internationalization issues, installer issues, development team, scheme used for version control and backup and code safety, translation services if used, documentation process, testing process, sales process, marketing process, distribution, support systems and processes, trademarks, payment processing, taxes, incorporation or proprietorship, license agreements, licensing systems, ways to deal with difficult customers (i.e. if and when to say no), financial arrangements necessary, ongoing maintenance, how it impacted your lifestyle personally, sacrifices made, things gained and learned along the way etc etc. Frankly, from my own experience to do it justice it would take a book or at least 10 large articles here at a guess.
Aside from all that I agree with the use of Guid's, there are some dinosaur mentalities still out there, perhaps an article just on that topic would be useful since people seem to be operating from very outdated points of view.
modified on Tuesday, June 2, 2009 3:04 AM
|
|
|
|
|
I have to disagree, Paul's line of reasoning is how most successful businesses are run. Almost no product is something entirely new. Small steps are much easier to do successfully, that means starting out with something existing, and making improvements it where you think it's lacking. Starting from scratch is usually a bad idea, you will make the same mistakes that the other guy has made, and assuming you are statistically just as smart as the other guy, you'll end up with just about the same thing. It makes much more sense to take something out there, analyze its strong and weak points and improve from there.
Look around... google... wasn't anything new, they just did it better. C#... you could call it a rip off of java, but ended up being better than java. Asian cars are often rip offs of other cars, but asian companies are learnt from it and put their own spin on it. Just by doing people gain more insight. In the end the amount of work that's put into something will make the difference, and mostly it's not a single brilliant novel concept that makes a good product, it's all about execution.
Wout
|
|
|
|
|
Or at least it started interesting. However most of it was about GUIDs and I don't think it was worth explaining in so many words, why GUIDs were chosen for primary keyse. The main topic was about starting a company not "are GUIDs good for primary keys and why".
|
|
|
|
|
Please cover more than technical stuff, did you incorporate?
Where, Nevada for Free or Massachusetts for $500 a year forever?
Did you go S2 corp or LLC and why?
Did you get a business license form your town hall so you could open a bank account?
Did you pay off the BBB to get listed on their web site?
|
|
|
|
|
Use of Guids section is out of place with title.
|
|
|
|
|
Congrats, Good article and i look forward to your more of your experiences.
|
|
|
|
|
There are two important facts, which probably no one can deny:
1) AutoIncrement integers are not useful for replication or merging records of different databases.
2) GUIDs are not useful as a primary key and therefore as a foreign key in linked tables.
Because you work more often within one database and replicate/merge data infrequently, I think an application designer should use AutoIncrements as primary keys.
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
There is absolutely no need for GUIDs; no user, no developer, nore a replication algorithm can handle GUIDs (... duplicates ...). Merging (or importing/exporting) data by "identification code" means, that the user defines, which data is merged and which is duplicated.
Best regards
Joachim van de Bruck
|
|
|
|
|
I'm going to respectfully disagree. If you have ever worked with data in an enterprise, you will know that this data must last forever. And, this same data is very often combined with other data, be it from other divisions or just other applications.
Even though my recommendation primarily applies to Enterprise Applications, I also recommend using GUIDs for smaller applications as it ensure data integrity. For example, SplendidCRM has more than 359 tables and they all use a GUID primary key. A significant number of these tables manage relationships between to other tables, such as ACCOUNTS_CONTACTS being a relationship table between Accounts and Contacts. Now, imagine if through programmer error, a Lead GUID got into the ACCOUNTS_CONTACTS table. This error can be easily caught via some validation function or via a foreign key constraint. However, if all the primary keys were integers, then this same programming error might go unnoticed as there would be no way to distinguish from Lead ID 77890 and Contact ID 77890.
So while you can promote the performance benefits of integer keys all day, I would still pick a slow robust application, than a fast buggy one. If speed is indeed a concern, then simply buy a faster server. In my opinion, it is much cheaper to buy a faster server than it is to pay a developer to fix bugs.
|
|
|
|
|
Thank you for your reply.
As noted before, there are advantages and disadvantages to both, integers and GUIDs. So you have to weight it to your personal needs. To diminish the performance lack of GUIDs, you can use sequential GUIDs - NEWSEQUENTIALID() in Sql Server - and to diminish the lack of uniqueness of integers for all tables, you can (very often, but not always) use different starting points and different increments in each table.
In one of our (older) applications, we use only GUID primary keys in all of the 134 tables. Validating new functions and procedures is sometimes really awful, because the colleagues have to search for specific GUIDs in a huge amount of data. Integers are easier to discover by a human eye, aren't they?
Your example is logically consistent, but when a programmer puts a lead id into a column, where a contact id is expected, the error can be easily recognized because of the absence of the rows for the specific lead id and at the latest because of the oversize of the rows for the contact id. This kind of programming error normally happens in an early state of the development, doesn't it?
Fast or robust? No user likes a slow application, and you cannot guarantee the robustness, because you have to reckon on coding errors. Robustness isn't increased by GUIDs, but by testing, testing, testing. On the other hand, integer primary keys can make the application significant faster, especially if the GUIDs are not sequential. I like the application to be fast and robust.
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you? Or what do you think about unique, user defined "identification codes" for merging data?
Best regards
Joachim van de Bruck
modified on Monday, June 1, 2009 7:30 AM
|
|
|
|
|
jvdb2508 wrote: No user likes a slow application, and you cannot guarantee the robustness
I keep seeing this argument as though if you replace guid's with ints the app will magically speed up to the point the user can barely keep up with it. In my experience that is patently false.
I *can* tell you from my personal experience what a user will not only not like but will villify you for and what can ruin not only their business but your own and that is lost data. You can test until you're blue in the face but you won't ever be able to test for every crazy thing that will happen at the end users site be it hardware failure, malicious actions or just plain stupid actions.
I don't mean to judge you but your arguments don't seem to match my experience in the real world dealing with thousands of end users of our software world wide for decades now. When it's a choice between safety and future scalability of data and performance, performance loses every time and the end user will thank you for it. Performance is fine when it's noticeable to the end user but it should never be the first consideration. Performance can be adequate to blindingly fast and the end user won't care one bit but lose one bit of their data or tell them they can't scale up as their business grows and there's really bad real world consequences.
jvdb2508 wrote: My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
jvdb2508 wrote: Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
jvdb2508 wrote: There are two important facts, which probably no one can deny
That's a bold statement.
A fact which I think no one with any amount of real world experience can disagree with is that there are no absolutes in anything, only the appropriate technique or tool for the situation.
Newbies unfortunately need to hang on to some kind of framework before they develop the skills to recognize what technique or technology to apply to a situation. Making absolute statements is unhelpful to them.
jvdb2508 wrote: As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. In the modern era of replication and scaling of applications it's wise to uniquely identify every record completely apart from anything the user enters. While not appropriate necessarily in every case it's rare that it isn't and the currently most acceptable and useful way to do that is with a Guid. Anything else is failing to plan for future growth of either the software or the company using it.
Examples that illustrate why this is a bad idea are extremely easy to come by, for example let's say you write an inventory management application and allow the user to enter a part number and use that as a unique way to identify a record. How hard is it to imagine that the end user will decide to use manufacturer part numbers and suddenly they will get a new part from a new manufacturer and it's exactly the same as an existing part. Or last names or phone numbers or addresses when a company goes nationwide or global after starting out in one city. Then come the inevitable workarounds and it spirals into a mess that never needed to happen in the first place.
Using a Guid as a surrogate key ensures at the very least that there will never be a collision between data no matter how it gets handled. It's cheap insurance in this day and age and the end user never needs to see them or interact with them. I personally think it would be madness to not use them in almost every situation with few exceptions because I've seen just about every app I write grow and gain requirements for many years after I initially thought they would just be small projects.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
There are two important facts, which probably no one can deny
That's a bold statement.
Sorry for my bad english (not my native language); I thought that 'probably' ensures, that I do not claim absoluteness, and the content of the 'facts' does it too.
John C wrote:
jvdb2508 wrote:
As posted earlier in this thread, I prefer user defined, unique "identification codes" for merging databases or to import/export rows. An "external primary key", a user can handle because he defined it.
*Extremely* bad idea to let user entered data uniquely identify a record in any way. No matter if you use Guid's or not this is just plain horrible advice and should be disregarded completely by any new programmers out there reading this. ...
Sorry again, but I can't understand your argument. As noted before, these 'identification codes' are not the primary key of the table. Let's say, that it is because of my bad english, that you didn't get my idea or otherwise I do not get your argument. Maybe you did not read my explanations in an earlier message (it's located on the second page, and therefore hard to find).
John C wrote:
jvdb2508 wrote:
My main point was, that GUIDs are not the general solution for replication and import/export, because they will result in duplicates. I assume, that you agree, don't you?
I don't, not one iota.
John C wrote:
jvdb2508 wrote:
Or what do you think about unique, user defined "identification codes" for merging data?
Horrible idea not to be considered under any circumstances for any application that is not in-house only and tiny.
A discussion without contrary standpoints is boring. But I miss your factual arguments. Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
Well, it's because of my bad english - or where did I say, that performance is more important than robustness?
Let me try a last statement about GUIDs and primary keys, I'm taking pain to phrase it factually:
If you replicate-copy one database into two databases, both databases can deal with any kind of primary key (either integers or guids) as long as new records henceforth are inserted in only one of the new databases. Of course, you want to insert new records in both databases and replicate-merge both of them or - different scenario - you have to merge two different databases into one or you have to import foreign data into your database. As mentioned before, you cannot merge data by integers! But ...
... if you use guids instead of integers to merge data, the only - but nevertheless important - advantage is, that you will not lose any data. On the other hand, if the same record was inserted in both databases, or if you import data, which already exists, you will get duplicates. The user has no loss of data, but now has to handle these duplicates. We can argue now, whether the duplicates hassle more then the loss of data, but to avoid this, I suggested the use of the 'identification codes'. They are not the primary keys of any table, but they are unique and defined by the user for merging data and for nothing else. That means, the user can define, whether the records of different sources are merged or duplicated. He/She can change the identification code to his needs and he can decide during or before an import, whether the data will be duplicated or overwrite the existing data or will be ignored.
No facts, just my opinion ...
I propagate the use of 'identification codes' for merging data from different sources. If a database has no 'identification code', you can define one before you import or merge. To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Best regards
Joachim van de Bruck
|
|
|
|
|
Joachim your english is excellent you don't need to make any excuses for it.
jvdb2508 wrote: Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
jvdb2508 wrote: To count on guids for merging data is doing things by halves - to merge integers is doing things on the wrong track.
Well as with anything it depends on the specific requirements and a good convincing case can be made for any practice if the circumstances merit it. For that reason we must speak in general terms for things that apply to the most common cases and in my opinion the most common cases are optimal with a Guid.
I disagree strenuously with relying on user entered data for any internally fundamental process. I come to this point of view after over a decade of publishing our own software and having to deal with thousands of users globally who are unbelievably creative at breaking things. Simply put I see too much chance for collision with user entered unique identification codes.
In our largest app we have something like 60 different tables in it all normalized to a high degree, I need to be absolutely sure that no record anywhere in the entire database is sharing an identification with any other record, once I'm assured of this then it opens up all manner of possibilities and gives me great peace of mind that I can save the users from themselves no matter what they enter or do to the database or the computer hosting it.
At least we can agree that integers are not optimal in most cases.
"Creating your own blog is about as easy as creating your own urine, and you're about as likely to find someone else interested in it." -- Lore Sjöberg
|
|
|
|
|
John C wrote: jvdb2508 wrote:
Okay, you do not agree, but why? And you think, that my idea is horrible - again: why? The only answer I can see is "your experience", something I will not deny.
If you re-read my message which you are replying to, you will see that I gave several concrete real world examples why I think this is a bad idea.
I can't agree and I think, that you were arguing against the use of 'identification codes' as internal key, which is responsible for any process of data logic. Be sure, that the user entered unique icd is only used for matching data from different sources.
Who else, but the user can decide, whether two records from different sources, shall henceforth exist side by side or which one of them will survive while the other one will die. I like to give the user the tools and he has to handle them carefully and responsibly.
Our biggest customer uses our enterprise application to create monthly time schedules for 2,000 stuff members. There are 50,000 entries per month, wheryby each entry is linked to at least 9 master data tables in up to 4 levels. The entries are partially exported for time recording machines and partially imported back with corrected starting times and durations. Observing and clearing the time schedule - gross wage reports and invoices - will produce many additional records, because each entry is splitted into parts where each part is waged or invoiced different. At least the data is aggregated and exported for additional processes (net wage, payments, accounting, ...). All in all - including workflow management - we have 600,000 records per month, 8 millions per year.
The difference between processing integers or guids can not be recognized with a small amount of data. But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys. Especially the algorithm, which fills up to 80% of the time schedules handling user preferences and specifications, stuff qualifications, and minimizing costs and thereby maximizing earnings benefits from integer primary keys. And when a user is working on a time schedule, he wants to see the impacts of his planning immediately. It depends on the complexity of the queries, the number of linking levels, the overall amount of data. If your application potentially/eventually has to handle more than half a million records per month sometimes, I advise not to start with guids.
Some of our customers use a distributed database system. That's where we revert to Sql Servers built in replication capabilities, and where the database itself has added the indexed rowguid columns. Of course we provide the user with the functions to detect and handle duplicates. For automated replication guids are somehow essential, but not as the primary key in every table.
That's my experience, which is contrary to your's, isn't it? I cannot say, which of our experiences is more valuable, but be sure, that I do not work in a sandbox.
Best regards
Joachim van de Bruck
modified on Thursday, June 4, 2009 2:23 AM
|
|
|
|
|
Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
Discussion point #1: GUIDs vs. Ints for merging databases. You stated yourself that merging is not possible with Ints. Ergo, if you need merging or anticipate merging, do NOT use Ints. In 99% of the cases you'll still get enough performance with sequential GUIDs. Period.
Discussion point #2, which is completely orthogonal/independent/complementary (pick your favorite), is the issue of detecting duplicates during merge operations. JohnC nowhere stated that you couldn't have natrual 'keys' in addition to the GUIDs, you just wouldn't use them for joining, etc. The merging problem you have whether you use GUIDs or not, and it needs to be addressed on a system-by-system basis. And if you're happy you can even constrain the natural 'key' column to unique values.
<blockquote class="FQ"><div class="FQA">jvdb2508 wrote:</div>But in our scenario, preparing the reports - not the printing itself - and other daily tasks like searching a substitute for an omitted employee who is available and qualified to do the job was increased up to the factor 8 after we have replaced the not even sequential guids with integer primary keys.</blockquote>
Interesting, so using ints instead of GUIDs in fact increased the time by a factor of 8 OK, I'm sure it's a typo but you still chould have tried switching to sequential GUIDs. No point in complaining about the obvious. I successfully (and performantly) used tables with a couple millions of records, GUID IDs, joins across at least 5 tables and pretty complex filter queries... on a laptop (alright, I used a fast eSATA HDD).
Oh, and before I forget it, I still see some contradiction here. On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
With kind regards, Mr. 495784930
|
|
|
|
|
cwienands wrote: Joachim, what's your point? You're throwing apples and pears in a basket and argue those are oranges.
I think you predominantly got it. My points are:
(1) Use AutoIncrement Integers as primary keys and for linking tables.
(2) Use additional 'natural keys' for import/export and joining data.
(3) Use additional GUIDs solely for master data replication on distributed databases.
(4) Points (1) to (3) apply nearly allways, but there may be exceptions.
The origin - in the article and assured by JohnC and others - was, that using GUIDs instead of Integers as primary keys is enough or all for linking, joining and replicating data. I think, that there are at the least three different tasks and I feel good, when each task has its own data.
My other contribution to this discussion was, that if you use guids as primary keys, then use sequential ones.
cwienands wrote: On the one hand you're arguing for int keys for performance reason but then the next sentence you're arguing we should use natural, customer-defined keys. I doubt that your customers are happy with using int-based natural keys
That's where you did not get it. I suggested to eat apples AND oranges AND pears, not apples OR pears. On the other hand I know many accountants who think that an integer is a natural key.
Best regards
Joachim van de Bruck
|
|
|
|
|
Religion and programming doesn't go well together.
I have signficant high performance e-commerce experience and must caution anybody reading this article that Guids is not as wonderful as the author proclaims them to be.
1) Due to the random nature of Guids they fragment any index on them very quickly. On a DB with many inserts this is a recipe for disaster. Especially your clustered index (Ever tried to rebuild an index on a 100gig table on a production machine that has to continue transacting?)
2) Guids are significantly slower than integers for linking.
3) Memory is a very PRECIOUS commodity on large DBs wasting 4 times as much memory space is frankly said not a way to create linear scaling.
4) A very large percentage of queries are sequential or between two ranges like. Making SQL and other DBMS life easy to cache only portions and achieve maximum performance at the same time (provided you are using ints). With Guids any caching algorithm is likely to be SIGNIFICANTLY slower as access is no longer linear or mostly linear but random and random is not good for disks and caching.
5) Hard disk IO is also a significant bottleneck quite often and an index with two guids will take 32 bytes per record where the same index with 2 ints will take only 8 bytes.
There are many more but lets leave it at that for now. In short if performance is what you are after then ints will win this fight hands down. If you want some of the benefits of Guids then use it as a candidate key but not a primary key. Using a guids only policy will work well on a small scale but for anything beyond that be careful there is likely to be as many advantages of using guids as reasons not to.
I have seen this over and over again, why don't applications scale? because the people who designed them could identify the things that will prevent them from scaling in the first place.
|
|
|
|
|