|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Why is There a Problem?Database developers have historically used a somewhat cryptic system for naming database tables and fields. Originally a result of the limitations of database management systems (DBMS), these naming schemes - or the lack thereof - have been adopted by convention and tradition. However, as database applications become more complex, with more tables and larger teams of developers, and as developers come and go, it becomes much more important to implement a robust and disciplined naming scheme for database objects. A well-defined naming scheme becomes more important when you adopt object relational mapping (ORM) technologies or automatic code generation. This article presents one scheme that has worked for me for years in multiple successful commercial applications. Perhaps the most common practice in database naming is to use all one case (upper or lower) with words separated by underscores, and heavily abbreviated. Fixed case names have become the standard because database systems are traditional case-insensitive in table and field names (or required fixed case). It is also a holdover from academia where use of a single case is common not just in database design, but in programming languages (for example, C language is traditionally taught in lower case and FORTRAN in upper case). Separating words with underscores is a natural result of fixed case names and also is commonplace in academia. The heavy use of abbreviations is a holdover from the days of dBase and other systems where there was a stringent limit on table and field name lengths. Many database designers schooled in the older systems still retain this use of abbreviated names in a severe case of the psychological principle of learned helplessness. Most database books still follow these principles. Consequently, even a recent book on SQL Server 2005 uses examples of tables and fields such as: Table: With fields:
The SQL for a SELECT customers.*, orders.* FROM customers INNER JOIN orders ON _
customers.cust_id = orders.cust_id
There are several problems with such a naming scheme:
I have been developing database systems for over twelve years. My work includes custom applications, web-based database, and "shrink-wrapped" applications. I have been the lead designer for the database and had to suffer my own early mistakes. I have been forced to work with database designs developed by others - some good and some bad. And I have dug into the details of database designs for competing products in order to import their data into my company's products. Finally, I have developed an extensive set of automatic code generation tools that allows me to develop and modify robust database applications quickly. From this experience, I have come up with a database naming system that I have used for at least five years on multiple projects and found it to be very satisfactory. Even if you don't want to adopt every feature of this naming scheme, I am confident that you will find some useful naming tips in this article. My database naming scheme has some basic properties:
I will explain my reasons for each below. Naming TablesIn most databases, there are three types of tables:
In my naming scheme, I like to prefix each table name with one of three prefixes to indicate the type of table. I use the following prefixes:
AdvantagesI find that this table naming system has several advantages.
Singular/Plural NamesNote that in my data tables above, all table names are singular, i.e. Other Table TypesIn your development, you may find it advantageous to include other table types such as:
Each of these could have their own prefix as well. Naming FieldsOne of the big problems in traditional database naming schemes is that it is common to have the same field name in two different tables that will end up joined together. While this problem is not insurmountable, it is inconvenient and should be avoided for two reasons:
I have designed my field naming scheme so that no two data tables ever have the same field name (this is not so important for link tables and picklist tables because they are rarely joined on each other). For each table name, I come up with a three-character prefix unique for each data table. For example the prefix for tblCustomer
tblOrder
Now when I do a join on these two tables by customer ID, I do not have a duplicate field name because the customer ID field in the customers table has a different name from the customer ID field in the orders table. Never Use Spaces in Field NamesOne thing you should never do when naming fields is put spaces in your names - even if your DBMS (such as Microsoft Access) supports it. Putting spaces in table names results in horribly ugly and unreadable SQL statements. It is ironic, because DBMS vendors allowed for spaces in field names so that the tables themselves looked nicer to neophyte programmers and database developers. It works well when graphical query building tools generate the SQL behind the scenes. But once your database application becomes more complex and you become more adept at SQL, you will regret these "friendly names". Naming Fields of Common Purpose (Notes, Status, Name, Date, etc.)As you develop your database, you may notice that many tables will have some fields of common purpose. These are typically fields called
You can see here how using prefixes to indicate the owner table is already convenient to distinguish order notes from customer notes on a However, I recommend this method:
The difference is subtle but very important if your data tables get mapped to objects in your program code. In my program code, I create objects that wrap the data in my data tables. But I don't carry over the field name prefixes into the program code. Those prefixes are for database naming. I use my own code naming system for coding. In the first case, my It may sound minor, but as your projects grow to include hundreds of code files, naming objects to make searches easier becomes more important and you will appreciate the effort. Field Names for Link Tables and Picklist TablesRemember that for many-to-many relationships, you will be joining your data tables with link tables, so you don't want link tables to have the same field names as data tables. That means that your table
Likewise, I like to prefix picklist table fields with
A Note on Hungarian NotationSome database designers use a type of Hungarian notation - prefixing field names with a value to indicate field type, such as
Our SQL Statement RevisitedNow let's see what our SELECT tblCustomer.*, tblOrder.* FROM tblOrder INNER JOIN tblCustomer _
ON tblOrder.ordCustomerKey = tblCustomer.cusCutomerKey
Now the result set is guaranteed to have all unique names, even if both tables have a Primary KeysTraditionally, primary keys in databases have been integers. There are some good reasons: Integers take up a minimal amount of space, they sort quickly, and most database systems allow for an incremental increase in the key field so that integers are consecutive and unique. Microsoft Access defaults to AutoNumber integers as primary keys, so database neophytes who get their start in with that program become accustomed to using integers as primary keys. However, I prefer an alternative - the Globally Unique Identifier (GUID, pronounced goo-id), or more specifically, a GUID string. GUIDs can be generated by most operating systems and even some database systems such as SQL Server 2005 (which supports GUIDs as a field type). Just to clarify, GUIDs are 128-bit values guaranteed to be unique worldwide if they are created on a computer that has a network card. If they are created on a computer that does not have a network card, there is some very small chance that they will not be unique, but because they incorporate a time-stamp and random numbers, duplicate GUIDs are practically impossible if not reasonable unlikely. Using GUIDs as primary keys has two advantages: 1. Merging DatabasesThe first advantage will not apply to most projects, but when it does apply, GUIDs are far superior to integers. That involves merging databases. Suppose you sell a commercial contact management product that uses a relational database for storage. You use integers as primary keys and integers automatically increment with each new record. Every person who uses your products is going to have a Now consider the same situation, but each primary key is a GUID instead of an integer. Everyone who uses your contact manager is going to have completely unique IDs for primary key values. Now, merging data is simply a matter of appending the records from one database to another. No key values have to change, and all relationships between tables remain intact. I have developed a number of commercial database applications. Using GUIDs makes it easy for customers to merge data from different databases - something that does happen - particularly when upgrading from one database version to another. Such a scenario may be uncommon for one-off database projects such as a custom inventory management application for a program vendor. But what if there is a corporate merger in the future? Any database with GUIDs as primary keys is much easier to integrate with other systems provided that the other systems also support GUIDs as primary keys (mixing GUIDs and integers as primary keys in the same database in undesirable). 2. Knowing the Primary Key Value Before Adding the RecordEven if the first situation doesn't apply to you, this one certainly will. Here is a common scenario: You use integers as primary keys and the integers are assigned by the DBMS at the time the record is added. Now you want to add a new customer to the
The problem is step 2. Using GUIDs, we can eliminate this read operation because we assign the customer ID value - a GUID - to the customer before we write the customer to the database. Now the process is:
We have eliminated the search-and-read step to find out what ID the DBMS has assigned to the customer. Prefixed GUID StringsUsing GUIDs can help during that period of development of any database product, where the database table structure is continually changing and the code is not always in synch with the database. Consider our link table linking vendors to products in a many to many relationship. If you use integers as primary keys, opening that table will just show two columns of integers - one columns of vendor IDs and one column of product IDs - or so you would think! As I said, I like to use GUID strings rather than actual GUIDs, so my primary key fields are actually of type text. The difference is that a GUID is actually a binary value, whereas the string is actually the string representation of the binary GUID. A GUID string looks something like this:
Recall that in our data table naming convention, I like to prefix field names with a three character value to indicate the table, e.g. "
(By my own convention, I like to use upper case for the prefix and I separate the prefix from the GUID itself with a semi-colon). Why is this important? Because now any time I see a primary key value, I can immediately tell what data table it is from. That is not possible with integers. Now when I open my vendor-product link table, I can easily check to make sure that all the foreign keys in the vendor column are prefixed with Using a GUID string has the advantage that you can tag it with other diagnostic values. Perhaps you would append a Web Database Key Values in the Browser BarIf you ever write a database application for the Web, one thing you probably want to avoid is to use consecutive integers as your primary key values. This is because most database-driven web sites at some point display a key field in the address bar of the web browser. Any enterprising hacker can simply increment or decrement the key value and potentially see data that they shouldn't. By using GUIDs as primary keys, it is much more difficult for the hacker to penetrate the inner workings of the site by this method. It is still not completely secure, but it is an improvement over the integer method. You Can Never Run Out of GUIDsGUIDs have a major advantage over integers as primary keys in that there is an unlimited supply of GUIDs. Certainly, the limitation on the number of available integers isn't likely to cause a problem in your products or vendors tables - few companies have over 4 million products (maybe Amazon.com?). However, if you are using a database table to log web page views, the limitation on the number of integers may well be a problem five years from now. Disadvantages of GUIDSUse of GUIDs as primary keys does have some disadvantages, so they may not be right for every application. GUID Length - Note that a GUID string is always 36 characters in length. Add three more characters for the prefix and 1 for the colon and that means that your primary key field is a character string of length 40. Add any additional space for any other tags you may want, and the disadvantage of GUIDs becomes apparent - Performance! Ideally, primary keys should be small and sort quickly for faster performance. That is what makes integers so attractive. However, the advantages of GUIDs may offset the performance hit. Readability - GUIDs are ugly to look at. While integers are messy in that they don't tell you much about the object, during the development process, it is much easier to tell a colleague that there is a problem with customer number 3452 than to tell him that there is a problem with customer Creating GUIDs in CodeIf you decide to use GUID strings as primary keys, you will want to know how to generate them in your code. Creating GUIDs in .NETCreating System.Guid guid = System.Guid.NewGuid();
string s = guid.ToString();
Creating GUIDs in C++ with MFCCreating CString GetGuidString()
{
GUID guid;
CoCreateGuid(&guid);
#ifdef _UNICODE
WORD * str;
UuidToString((UUID*)&guid, &str);
CString s((LPTSTR)str);
RpcStringFree(&str);
return s;
#else
BYTE * str;
UuidToString((UUID*)&guid, &str);
CString s((LPTSTR)str);
RpcStringFree(&str);
return s;
#endif
}
For C++, you will need to link to RPCRT4.LIB or you will get a linker error on the Sorry VBer's. Those are the only two languages I work with. Naming Queries and Stored ProceduresUsing prefixes for your stored procedures (called queries in Microsoft Access) in your DBMS can also be helpful. I like to use a prefix that indicates the type of stored procedure:
Such a naming scheme has several advantages. First, you can quickly tell what type of stored procedure it is by the name. Second, since most DBMS's list your stored procedures alphabetically, they will be listed in groups based on the type. Any stored procedure that I am not actually using, I like to prefix with an Using some distinct naming scheme for stored procedures isn't just helpful when working in your DBMS, but it also allows you when looking through program code to quickly determine that you are "opening" a stored procedure rather than a table. Again it can be particularly helpful when doing global searches in program code. Suppose in my .NET application, I want to find all program code where I execute a new OleDbCommand("d
Considerations for ORM Technologies and Automatic Code GenerationLook through the program code for a database application and one thing will stand out - wherever there is database access, you see pretty much the same code logic over and over. Database programming is incredibly repetitive and can be very tedious. If you haven't considered either ORM or automatic code generation technologies for your database applications, I think it is well worth the effort. ORM technologies allow you to easily map objects in your program code to database tables. The ORM package handles much of the database activity of reading and writing records and automatically populates your object with data from the database. I have used NHibernate - the .NET version of the popular Hibernate product with mixed results. It simplified a lot of the details of .NET database programming (I find ADO.NET to be very verbose), but when it didn't work right, the problems could be difficult to diagnose. Automatic code generation takes this a step further. An automatic code generator is a software program that actually writes program code - such as an object class - based on your database design. I have been developing automatic code generation tools for years and the programs have gotten pretty sophisticated. You can see an early version of my work in my CodeProject article C++ Object Oriented Database Generator. Although they are a fair amount of work to develop, once you have them, the time savings are significant. They also produce more robust software. Once you get your generator to produce bug-free code, it will always be bug-free. If bugs are found, you can easily fix the code generator and regenerate all of your classes. It is also easy to add features to your objects. The use of code generation tools goes hand-in-hand with a well developed database naming convention. Your code generation tools can use table and field names to determine what types of code to generate. For example, by naming picklist tables with a different prefix than data tables, your code generator can know which tables are which and write code for different class objects.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||