In many (if not in most) cases, development projects have an advanced knowledge of the type of data server they are about to interact with. When building an isolated application for a specific customer, the customer's server is given (or decided upon) for the specific project. When developing a web-based application or service, the backing storage is an integral part of the system design. Services enjoy the boundary-less world of XML and even most off-the-shelf applications are designed for a specific database. Nevertheless, sometimes you need to write an application that produces SQL queries that may be run against an unspecified server. This article briefly describes (some of) what you might want to know when encountering such a situation.
SQL has a standard formal definition described by the American National Standards Institute (ANSI). In more than one area, you might come across a standard by these guys. The standard is usually named ANSI something. In the case of SQL, the ANSI standard is ANSI-92, named after its publishing year (replaced ANSI-89). ANSI has since published quite a few new versions of the standard (check SQL in Wikipedia), but the industry moved forward at a greater pace.
All commercial database vendors need SQL extensions. This is in part a result of the nature of business vs. standards and in part a result of the definition of SQL as a declarative language rather than a programming language. So they came up with their specific versions of SQL (dialects). Microsoft uses T-SQL (Transact SQL), Oracle has PL/SQL (Procedural language SQL), IBM (DB2) has SQL/PL (well...) and My-SQL uses SQL-PSM (Persistent Store Model).
Obviously, you would like your SQL to use the "lowest common denominator" of these popular databases in order to be "server indifferent." As far as I was able to determine, all the above mentioned databases support ANSI-92. Therefore, just get your copy of ANSI-92 and make sure to use only what is in it.
Well, getting a copy of ANSI-92 might be important but looking for everything in it would probably be a bit impractical. The following sections contain some of the things I learned during my experience – use at your own risk...
Piece of Advice
Double check your need for server indifference. The database only gives you so much and finding an alternative implementation for your needs is tricky in most cases and impossible in many. You might want to consider the business logic driving you in that direction and think things over. In my case, I was developing a product to be used with a third party system. The third party system allowed only direct database access by SQL (no stored procedures and no user function) and supported two flavors of database – Microsoft SQL server and IBM DB2. After doing some remarkable SQL acrobatics (and writing most of this article) I went to the bosses and asked them to check our target audience distribution of databases. It turned out less that 0.1% of them were using DB2. They were happy to let me off the hook.
Look for irregular data types. You usually have what you need but I think
nvarchar() is not supported and
varchar is (this is a bit funny: the 'N' in
nvarchar comes from ANSI because these fields support the ANSI standard for characters covering quite a few languages). Do not use column names longer than 16 characters. The function "
Convert" is included in ANSI-92. So are "
MIN" and "
AVG." Search the ANSI-92 document for "reserved word."
Generally, it is not advised to have strings in your code, let alone SQL. The common way of pushing your SQL out is using stored procedures which are predefined SQL queries stored in the DB. These queries also perform better as they can be cached by the server. This might seem as irrelevant as you can use ANSI-92 compliant SQL in or out of a stored procedure. The thing is, ANSI-92 has no notion of stored procedures. It seems to me that since all databases now support stored procedures (including My-SQL since 1995 version) it would be wise to use them, but I have read (cannot say when and where) that stored procedures are not recommended in this case so I am quoting to be safe. Make your own checks.
The date format compatible with all data bases is: YYYYMMDD. Thanks to the reader 'los' and to Tony Rogerson.
Validating ANSI-92 Compliance with Ms-SQL SQL Server
If you happen to be using SQL-Server Management studio, you can make sure your SQL is ANSI-92 compliant. Add the following statement at the head of your query:
SET FIPS_FLAGGER 'full'
You would get a message every time your SQL violates ANSI-92. Note the standard SQL Server validates not ANSI-92 but rather FIPS 127-2. To the best of my knowledge, they are the same in all relevant aspects. On MSDN, they say "This is based on the ISO standard."
Getting to the Top
The last thing I have come across is this: the first problem you are about to encounter once you have adopted the ANSI-92 SQL approach is selecting a specific number of records from a table. How do I know that? Well, it happened in my development team (twice), and while I was searching the net for issues related to specific SQL dialects, I found more answers to this question than I would care to remember (each dialect has a completely different approach here. I find this example very educating when discussing this problem). I came up with a way to select the first record (equivalent to T-SQL - TOP 1) and it is explained here:
Suppose you want the oldest customer in your DB. T-SQL would spell:
SELECT TOP 1 CustID
ORDER BY BirthDate ASC
Now, since we cannot use TOP, we have to define a
WHERE clause that eliminates all but one row. Here is how it is done:
WHERE (BirthDate =(SELECT MIN(BirthDate)
Note that the
WHERE should be defined so that it yields only one record. If you have two elderlies with the same birth date you might find yourself in somewhat of a tight spot. In such a case, please consider which one you really want. When you get that right, polish your inner
select. If you have exhausted your options and still cannot be sure you get only one record use
DISTINCT (included in ANSI-92). This scenario is common when selecting only one child of a parent record (for instance, selecting the last price set for the item). In such cases you have three nested queries and you have to make absolutely sure you get only one result or else your query will fail, and worse, it might pass sometimes and fail at other times.
As to selecting more than one record from the top of a result set, I have a few things to say:
- If you find yourself needing this, you are in violation of the Zero-One-Infinity rule. Check again if you really need it.
- It may be achieved using this technique: use a union between queries, each selecting one row. Each query should be formulated roughly like the one mentioned earlier but the
WHERE clause should remove records already chosen (this is achieved by nastily nesting the previous queries). This is such a bad query that I will not even give an example.
I am not a DBA, nor am I an SQL expert. I just fell into this hole and I am shouting from the deep darkness for the benefit of another innocent passer-by. I would be happy to receive any kind of extensions for this either by e-mail or replies.